Here’s an interesting item I’ve come across at work which took me several weeks to figure out. Thought I’d like to share this issue and the possible reason why it’s happening and a workaround for it.
To give you an idea about the situation; this customer wants to call services where you have to enter a long code (usually 10 digits) using DTMF. It’s kind of important that the order of DTMF is correct and that’s where he got into trouble.
If he used one particular provider for this outgoing phonecall, the order of DTMF changed into random order digits after usually 4 or 5 correct ones. So, he sent 1234567890 to the service using his phone (DTMF) and the service received 1234598076 (or anything like that).
I tested this using several other trunk-providers and not a single one had this issue. So, my logical answer to the customer was that the trunk provider had the issue. We also tried different DTMF types, but only RFC2833 was working, so this was ruled out and there was nothing more we could change for DTMF within Asterisk.
But, after 2 weeks we received the support-ticket back. The provider did some testing and even though they told us the DTMF was in correct order received and forwarded, they just bounced the ticket back to us claiming ‘it is a timing issue’. Despite the fact we had at least 9 different trunkproviders without any issue, it became our problem again and we had to fix it for the customer.
The first tests were the easy things like checking our NTP-synchronisation (timing issue could imply we were out-of-sync), but no luck here.
Then we removed Asterisk from the system (as in, callrecording and forwarding of calls) and that seemed to fix the issue, but as expected the customer actually wants callrecording and callforwarding so there’s no way we could disable this.
Oddly enough, a (Centos) FreePBX install worked and exactly the same Asterisk version on Debian was also working just fine, even if it was on very slow hardware. For a day or two it looked like it was a Asterisk speed issue, though it wouldn’t explain why it ran on a slow machine with Debian just fine.
After much debugging, reading TCPdumps and reading VERBOSE messages from Asterisk, I saw on my slow Debian machine that the Asterisk Dial-line included the “A” options to stream 42ms of silence to the other party to make sure the RTP-port opens in case of a NAT-ed machine. On the Debian machine, this file was missing. On the FreePBX machine, this file was missing as well. On the system with the DTMF-issue, the file existed.
As soon as I removed the 42ms of audio, the DTMF-problems were gone. So, why not remove the 42ms of silence right? Well, we can if all our PBX-machines are without NAT and since IPv6 is still not common (Common phones don’t support IPv6, so why bother to use it in Asterisk right?). Removing the file and executing a blind-transfer between 2 SIP-peers means the router doesn’t get where the RTP is coming from (or going to) meaning no audio. We did found a workaround for that by setting the “progressinband” to “yes” in the sip.conf, creating other problems like doing a transfer between 2 different trunks (and RTP-servers).
Back to the original problem, we now have this ‘workaround’ by removing the 42ms of silence and setting the progressinband to yes, but it’s not something I would recommend using on systems with multiple trunks.
So, why is this trunk-provider the only one with DTMF problems in the first place. Well, I think I know why. DTMF is RTP-Type 101 while G711alaw is RTP-Type 08. My guess is that this trunk provider doesn’t understand the RTP-Type 08 while it is receiving RTP-type 101. Oddly enough, Asterisk seems to send pieces of audio right after the DTMF codes. I still don’t know why or what this exactly is (perhaps the 42ms audio?). Anyway, if you calculate the both RTP-types and consider it as one, the DTMF could be considered as late while another DTMF is already sent. Meaning, the order will change on the other side of this trunk-provider. My other possible idea is that the provider somehow expects an inband DTMF because of the RTP-type 08 right after the 101. I don’t know since I can’t login to their configuration.
For my rare readers, I’d like to see your opinion on this.