Today I resolved a three week long saga of having issues with SIP hardphone registrations to an Asterisk box, and it was so obscure I feel posting the solution will be a favor to the world.
The set up is an Asterisk 1.6 server sitting in a /24 subnet. Four Fortigate 50B firewall appliances are in use, with point-to-point IPSec VPN tunnels all connecting three remote sites back to the main site which houses the Asterisk box used for SIP registration and PBX/Outcalling operations. Each site has its own /24 subnet, and network communication works perfectly.
We use Aastra desk phones, specifically the x480i line. We have two older 480i phones, one of which has been working perfectly fine over an existing IPSec tunnel for a few years. All the newer phones are the newer 9480i line, which has a completely different OS and Firmware, otherwise the form factor is identical to the older models.
Recently I added a few remote extensions, and strangely enough I could not get these particular endpoints to register with the Asterisk server. The initial REGISTER message from the Aastra would make it to the Asterisk server, the Asterisk server would respond with the “401 Unauthorized” message, and it would die there. No further communication.
Initially I deemed it due to UDP Fragmentation, which is a common SIP problem. However further analysis showed the packets to be quite small already (only 500 bytes), so reluctantly I switched the endpoints to TCP for all SIP communication and the problem disappeared.
However, doing this only resulted in a different issue. TCP/SIP is still experimental in Asterisk 1.6, and I found out in a hurry. In only a few days I ran into a chan_sip.so deadlock situation–particularly around the TLS code. Ultimately this deadlock bug would bring down the entire PBX, terminating all calls immediately. Definitely not production-ready, so back to UDP/SIP I went.
After doing some low-level packet sniffing and diagnosis on the wire today, I found that the “401 Unauthorized” message was indeed never making it past the local interface on the Asterisk-facing Fortigate. This flew against what I thought was happening–my initial impression was that the REGISTER message from the Aastra endpoint was too large with the authentication information, thus fragmenting and getting thrown out. Cue scratching head here.
Finally, out of curiosity I started Googling around for Fortigate/IPSec/SIP issues–somewhat doubtful that was the actual cause (remember, we’d had a remote 480i working for years already), when I ran across this post at Fonality.
Just for kicks, made the changes suggested in the thread… namely:
config system settings set sip-helper disable set sip-nat-trace disable Config system session-helper show
(look for the SIP entry, and):
Where <num> is the SIP entry number. Rebooted both sides, and lo and behold the phones registered!
Turns out the Fortigate is doing fixup on the SIP protocol for NAT purposes, but they do it rather stupidly. They look to see if is a protocol match by looking at the UDP, protocol number, and port, and if it matches the Fortigate paints some special sauce on the packet which breaks the outbound “401″ messages. Good going guys.
Which leads to the question, why was the older Aastra 480i working all this time? My only guess is that there were some SIP protocol changes between the different OS and Firmware revisions of the 480i and 9480i, the earlier of which snuck past the Fortigate’s markup attempts; but this is just a hunch.