VoIP Protocols

The mechanism for carrying a VoIP connection generally involves a series of signaling transactions between the endpoints (and gateways in between), culminating in two persistent media streams (one for each direction) that carry the actual conversation. There are several protocols in existence to handle this. In this section, we will discuss some that are important to VoIP in general and to Asterisk specifically.

IAX (The “Inter-Asterisk eXchange” Protocol)

If you claim to be one of the folks in the know when it comes to Asterisk, your test will come when you have to pronounce the name of this protocol. It would seem that you should say “eye-ay-ex,” but this hardly rolls off the tongue very well.[202] Fortunately, the proper pronunciation is in fact “eeks.”[203] IAX is an open protocol, meaning that anyone can download and develop for it.[204]

In Asterisk, IAX is supported by the chan_iax2.so module.


The IAX protocol was developed by Digium for the purpose of communicating with other Asterisk servers (hence the Inter-Asterisk eXchange protocol). It is very important to note that IAX is not at all limited to Asterisk. The standard is open for anyone to use, and it is supported by many other open source telecom projects, as well as by several hardware vendors. IAX is a transport protocol (much like SIP) that uses a single UDP port (4569) for both the channel signaling and media streams. As discussed later in this appendix, this makes it easier to manage when behind NATed firewalls.

IAX also has the unique ability to trunk multiple sessions into one dataflow, which can result in a tremendous bandwidth advantage when sending a lot of simultaneous channels to a remote box. Trunking allows multiple media streams to be represented with a single datagram header, which lowers the overhead associated with individual channels. This helps to lower latency and reduce the processing power and bandwidth required, allowing the protocol to scale much more easily with a large number of active channels between endpoints. If you have a large quantity of IP calls to pass between two endpoints, you should take a close look at IAX trunking.


Since IAX was optimized for voice, it has received some criticism for not better supporting video—but in fact, IAX holds the potential to carry pretty much any media stream desired. Because it is an open protocol, future media types are certain to be incorporated as the community desires them.

Security considerations

IAX includes the ability to authenticate in three ways: plain text, MD5 hashing, and RSA key exchange. This, of course, does nothing to encrypt the media path or headers between endpoints. Many solutions involve using a Virtual Private Network (VPN) appliance or software to encrypt the stream in another layer of technology, which requires the endpoints to pre-establish a method of configuring and opening these tunnels. However, IAX is now also able to encrypt the streams between endpoints with dynamic key exchange at call setup (using the configuration option encryption=aes128), allowing the use of automatic key rollover.


The IAX2 protocol was deliberately designed to work from behind devices performing NAT. The use of a single UDP port for both signaling and transmission of media also keeps the number of holes required in your firewall to a minimum. These considerations have helped make IAX one of the easiest protocols (if not the easiest) to implement in secure networks.


The Session Initiation Protocol (SIP) has taken the telecommunications industry by storm. SIP has pretty much dethroned the once-mighty H.323 as the VoIP protocol of choice—certainly at the endpoints of the network. The premise of SIP is that each end of a connection is a peer; the protocol negotiates capabilities between them. What makes SIP compelling is that it is a relatively simple protocol, with a syntax similar to that of other familiar protocols such as HTTP and SMTP. SIP is supported in Asterisk with the chan_sip.so module.[205]


SIP was originally submitted to the Internet Engineering Task Force (IETF) in February 1996 as “draft-ietf-mmusic-sip-00.” The initial draft looked nothing like the SIP we know today and contained only a single request type: a call setup request. In March 1999, after 11 revisions, SIP RFC 2543 was born.

At first, SIP was all but ignored, as H.323 was considered the protocol of choice for VoIP transport negotiation. However, as the buzz grew, SIP began to gain in popularity, and while there may be a lot of different factors that accelerated its growth, we’d like to think that a large part of its success is due to its freely available specification.

SIP is an application-layer signaling protocol that uses the well-known port 5060 for communications. SIP can be transported with either the UDP or TCP transport-layer protocols. Asterisk does not currently have a TCP implementation for transporting SIP messages, but it is possible that future versions may support it (and patches to the code base are gladly accepted). SIP is used to “establish, modify, and terminate multimedia sessions such as Internet telephony calls.”[206]

SIP does not transport media (i.e., voice) between endpoints. Instead, the Real-time Transport Protocol (RTP) is used for this purpose. RTP uses high-numbered, unprivileged ports in Asterisk (10,000 through 20,000, by default).

A common topology to illustrate SIP and RTP, commonly referred to as the “SIP trapezoid,” is shown in Figure B.1, “The SIP trapezoid”. When Alice wants to call Bob, Alice’s phone contacts her proxy server, and the proxy tries to find Bob (often connecting through his proxy). Once the phones have started the call, they communicate directly with each other (if possible), so that the data doesn’t have to tie up the resources of the proxy.

Figure B.1. The SIP trapezoid

The SIP trapezoid

SIP was not the first, and is not the only, VoIP protocol in use today (others include H.323, MGCP, IAX, and so on), but currently it seems to have the most momentum with hardware vendors. The advantages of the SIP protocol lie in its wide acceptance and architectural flexibility (and, we used to say, simplicity!).


SIP has earned its place as the protocol that justified VoIP. All new user and enterprise products are expected to support SIP, and any existing products will now be a tough sell unless a migration path to SIP is offered. SIP is widely expected to deliver far more than VoIP capabilities, including the ability to transmit video, music, and any type of real-time multimedia. While its use as a ubiquitous general-purpose media transport mechanism seems doubtful, SIP is unarguably poised to deliver the majority of new voice applications for the next few years.

Security considerations

SIP uses a challenge/response system to authenticate users. An initial INVITE is sent to the proxy with which the end device wishes to communicate. The proxy then sends back a 407 Proxy Authorization Request message, which contains a random set of characters referred to as a nonce. This nonce is used along with the password to generate an MD5 hash, which is then sent back in the subsequent INVITE. Assuming the MD5 hash matches the one that the proxy generated, the client is then authenticated.

Denial of service (DoS) attacks are probably the most common type of attack on VoIP communications. A DoS attack can occur when a large number of invalid INVITE requests are sent to a proxy server in an attempt to overwhelm the system. These attacks are relatively simple to implement, and their effects on the users of the system are immediate. SIP has several methods of minimizing the effects of DoS attacks, but ultimately they are impossible to prevent.

SIP implements a scheme to guarantee that a secure, encrypted transport mechanism (namely Transport Layer Security, or TLS) is used to establish communication between the caller and the domain of the callee. Beyond that, the request is sent securely to the end device, based upon the local security policies of the network. Note that the encryption of the media (that is, the RTP stream) is beyond the scope of SIP itself and must be dealt with separately.

More information regarding SIP security considerations, including registration hijacking, server impersonation, and session teardown, can be found in Section 26 of SIP RFC 3261.


Probably the biggest technical hurdle SIP has to conquer is the challenge of carrying out transactions across a NAT layer. Because SIP encapsulates addressing information in its data frames, and NAT happens at a lower network layer, the addressing information is not automatically modified, and thus the media streams will not have the correct addressing information needed to complete the connection when NAT is in place. In addition to this, the firewalls normally integrated with NAT will not consider the incoming media stream to be part of the SIP transaction, and will block the connection. Newer firewalls and session border controllers (SBCs) are SIP-aware, but this is still considered a shortcoming in this protocol, and it causes no end of trouble to network professionals needing to connect SIP endpoints using existing network infrastructure.


This International Telecommunication Union (ITU) protocol was originally designed to provide an IP transport mechanism for videoconferencing. It has become the standard in IP-based video-conferencing equipment, and it briefly enjoyed fame as a VoIP protocol as well. While there is much heated debate over whether SIP or H.323 (or IAX) will come to dominate the VoIP protocol world, in Asterisk, H.323 has largely been deprecated in favor of IAX and SIP. H.323 has not enjoyed much success among users and enterprises, although it might still be the most widely used VoIP protocol among carriers.

The three versions of H.323 supported in Asterisk are handled by the modules chan_h323.so (supplied with Asterisk), chan_oh323.so (available as a free addon), and chan_ooh323.so (supplied in asterisk-addons).


You have probably used H.323 without even knowing it—Microsoft’s NetMeeting client is arguably the most widely deployed H.323 client.


H.323 was developed by the ITU in May 1996 as a means to transmit voice, video, data, and fax communications across an IP-based network while maintaining connectivity with the PSTN. Since that time, H.323 has gone through several versions and annexes (which add functionality to the protocol), allowing it to operate in pure VoIP networks and more widely distributed networks.


The future of H.323 is a subject of debate. If the media is any measure, it doesn’t look good for H.323; it hardly ever gets mentioned (certainly not with the regularity of SIP). H.323 is often regarded as technically superior to SIP, but, that sort of thing is seldom the deciding factor in whether a technology enjoys success. One of the factors that makes H.323 unpopular is its complexity (although many argue that the once-simple SIP is starting to suffer from the same problem).

H.323 still carries by far the majority of worldwide carrier VoIP traffic, but as people become less and less dependent on traditional carriers for their telecom needs, the future of H.323 becomes more difficult to predict with any certainty. While H.323 may not be the protocol of choice for new implementations, we can certainly expect to have to deal with H.323 interoperability issues for some time to come.

Security considerations

H.323 is a relatively secure protocol and does not require many security considerations beyond those that are common to any network communicating with the Internet. Since H.323 uses the RTP protocol for media communications, it does not natively support encrypted media paths. The use of a VPN or other encrypted tunnel between endpoints is the most common way of securely encapsulating communications. Of course, this has the disadvantage of requiring the establishment of these secure tunnels between endpoints, which may not always be convenient (or even possible). As VoIP becomes used more often to communicate with financial institutions such as banks, we’re likely to require extensions to the most commonly used VoIP protocols to natively support strong encryption methods.

H.323 and NAT

The H.323 standard uses the Internet Engineering Task Force (IETF) RTP protocol to transport media between endpoints. Because of this, H.323 has the same issues as SIP when dealing with network topologies involving NAT. The easiest method is to simply forward the appropriate ports through your NAT device to the internal client.

To receive calls, you will always need to forward TCP port 1720 to the client. In addition, you will need to forward the UDP ports for the RTP media and RTCP control streams (see the manual for your device for the port range it requires). Older clients, such as Microsoft NetMeeting, will also require TCP ports forwarded for H.245 tunneling (again, see your client’s manual for the port number range).

If you have a number of clients behind the NAT device, you will need to use a gatekeeper running in proxy mode. The gatekeeper will require an interface attached to the private IP subnet and the public Internet. Your H.323 client on the private IP subnet will then register to the gatekeeper, which will proxy calls on the clients’ behalf. Note that any external clients that wish to call you will also be required to register with the proxy server.

At this time, Asterisk can’t act as an H.323 gatekeeper. You’ll have to use a separate application, such as the open source OpenH323 Gatekeeper (http://www.gnugk.org), for this purpose.


The Media Gateway Control Protocol (MGCP) also comes to us from the IETF. While MGCP deployment is more widespread than one might think, it is quickly losing ground to protocols such as SIP and IAX. Still, Asterisk loves protocols, so naturally it has rudimentary support for it.

MGCP is defined in RFC 3435.[207] It was designed to make the end devices (such as phones) as simple as possible, and have all the call logic and processing handled by media gateways and call agents. Unlike SIP, MGCP uses a centralized model. MGCP phones cannot directly call other MGCP phones; they must always go through some type of controller.

Asterisk supports MGCP through the chan_mgcp.so module, and the endpoints are defined in the configuration file mgcp.conf. Since Asterisk provides only basic call agent services, it cannot emulate an MGCP phone (to register to another MGCP controller as a user agent, for example).

If you have some MGCP phones lying around, you will be able to use them with Asterisk. If you are planning to put MGCP phones into production on an Asterisk system, keep in mind that the community has moved on to more popular protocols, and you will therefore need to budget your software support needs accordingly. If possible (for example, with Cisco phones), you should upgrade MGCP phones to SIP.

Proprietary Protocols

Finally, let’s take a look at two proprietary protocols that are supported in Asterisk.


The Skinny Client Control Protocol (SCCP) is proprietary to Cisco VoIP equipment. It is the default protocol for endpoints on a Cisco Call Manager PBX.[208] Skinny is supported in Asterisk, but if you are connecting Cisco phones to Asterisk, it is generally recommended that you obtain SIP images for any phones that support this and connect via SIP instead.


Asterisk’s Support for Nortel’s proprietary VoIP protocol, UNISTIM, makes it the first PBX in history to natively support proprietary IP terminals from the two biggest players in VoIP: Nortel and Cisco. UNISTIM support is totally experimental and does not yet work well enough to put it into production, but the fact that somebody has taken the trouble to implement it demonstrates the power of the Asterisk platform.

[202] It sounds like the name of a Dutch football team.

[203] Go ahead. Say it. That sounds much better, doesn’t it?

[204] Officially, the current version is IAX2 (officially standardized by the IETF in RFC 5456), but all support for IAX1 has been dropped, so whether you say “IAX” or “IAX2,” it is expected that you are talking about version 2.

[205] Having just called SIP simple, it should be noted that it is by no means lightweight. It has been said that if one were to read all of the IETF RFCs that are relevant to SIP, one would have more than 3,000 pages of reading to do. SIP is quickly earning a reputation for being far too bloated, but that does nothing to lessen its popularity.

[206] RFC 3261, “SIP: Session Initiation Protocol,” p. 9, Section 2.

[207] RFC 3435 obsoletes RFC 2705.

[208] Cisco has recently announced that it will be migrating toward SIP in its future products.