Report IP Telephony

Helsinki University of Technology Networking Laboratory
Teknillinen Korkeakoulu Tietoverkkolaboratorio

Espoo 2001
Report 2/2001
IP TELEPHONY PROTOCOLS, ARCHITECTURES AND ISSUES

Raimo Kantola (editor)
TEKNILLINEN KORKEAKOULU
TEKNISKA HÖGSKOLAN
HELSINKIN UNIVERSITY OF TECHNOLOGY
Helsinki University of Technology Networking Laboratory
Teknillinen Korkeakoulu Tietoverkkolaboratorio
Espoo 2001
Report 2/2001
IP TELEPHONY PROTOCOLS, ARCHITECTURES AND ISSUES

Raimo Kantola (editor)
Helsinki University of Technology

Department of Electrical and Communications Engineering
Networking Laboratory
Teknillinen Korkeakoulu
Sähkö- ja tietoliikennetekniikan osasto
Tietoverkkolaboratorio
Distributor:
P.O.Box 3000
FIN-02015 HUT
Tel. +358-0-451 2461
Fax +358-0-451 2474
ISBN 951-22-5452-2
ISSN 1458-0322
Otamedia Oy
Espoo 2001
Abstract
This report is a result of a post graduate seminar on IP Telephony (Course S38.130 Spring
2001). The report first gives an overview of IP Telephony and then proceeds to discuss
quality of voice in an IP network, voice coding, the IP Telephony protocols, the service
arcitecture and the potential service technologies. The protocols include signaling, transport
and routing information protocols. The papers are based on literature study including
materials of the 3GPP, students own research work and student’s own measurements. Of
particular interest are issues in applying IP Telephony in the 3rd generation mobile
networks. These issues are discussed in several papers. Of particular interest is also the
paper on choosing the transport protocol for SIP – it contains ideas that, we believe, have
not been published before.
115 pgs
i
Preface
This report is a result of a post graduate seminar on IP Telephony. The papers appearing in
the report were mainly prepared by the students during the Spring term 2001 and presented
in the seminar itself that took place in Otaniemi, Espoo, Finland on April 6-7, 2001. After
the seminar, based on comments, the students continued to improve their papers and finally
the qualifying papers were selected by the editor.
The seminar was structured around Hersent, et al book on IP Telephony. Additional

material includes RFCs and Internet Drafts related to IP Telephony and IP Voice and the
materials of the 3G Partnership Project. List of references is at the end of each paper. More
information on the seminar is available on the course www-page at
http://www.tct.hut.fi/opetus/s38130/k01/index.shtml.
ii
Contents
Abstract.......................................................................................................................................................i
Preface ...................................................................................................................................................... ii
Overall picture of IP telephony ..................................................................................................................1
H.323 Protocol Suite................................................................................................................................11
Voice Quality in IP Telephony ................................................................................................................22
Voice in Packets: RTP, RTCP, Header Compression, Playout Algorithms, Terminal
Requirements and Implementations.........................................................................................................31
Voice Coding in 3G Networks.................................................................................................................39
Session Initiation Protocol (SIP)..............................................................................................................47
A transport protocol for SIP.....................................................................................................................58
Session Initiation Protocol in 3G .............................................................................................................66
SIP Service Architecture..........................................................................................................................73
IP TELEPHONY SERVICES IMPLEMENTATION .............................................................................80
MASTER SLAVE PROTOCOL..............................................................................................................89
Network dimensioning for voice over IP .................................................................................................98
TRIP, ENUM and Number Portability...................................................................................................105
Overall picture of IP telephony
Ilkka Peräläinen
The Emergency response center authority
ilkka.peralainen @112.fi
interface. The soundboard line-in jack had to be wired

to the modem microphone and the modem speaker to
Abstract the sound board line-out jack.
Some software was needed and the telephony freeware
The main trend during the last five years in telecom-
of those days the VAT came to help. By adding some
munications, the convergence, has lead to the develop-
code to interface the modem a crude one-line gateway
ment of multimedia services in packet switched net-
prototype was invented. The potential of this primitive
works e.g. the Internet. After a short experimental
invention was huge. The development in the voice
phase the H.323 standard has been laid as a basis for
realm of Internet has since been immensely rapid and it
multi-media services and applications including IP
has made a real contribution to the much advertised
telephony. The present installed IP telephony systems
conver-gence of telecommunication.
use the H.323 protocol stack. Due to the complexity
However, the new possibilities also created new prob-
albeit flexibility of the H.323 the IETF is now finishing
lems: Internet at that time was not ready for real time
new rivalling stan-dards the SIP and MGCP, which are
applications.
aclaimed to offer better functionality and simpler
Anyhow IP telephony is growing very fast and it is
implementation. Inter-operability problems have sofar
estimated that by year 2002 nearly 20 % of the U.S.
hindered a breakthrough of IP telephony. The driwing
phone traffic will be carried over data networks.
force in the telephony net-work convergence, cheaper
By the World Wide Web the Internet had got its face,
calls, has not yet compen-sated for the technical
now it was getting a voice.
defiences.
1.2 The overall situation now
1 Introduction The beginning of IP telephony has been lucky in that
videly accepted standards have emerged in an early
The term IP telephony is older than Voice over IP. IP stage. Allmost all present implementations support the
telephony has earlier meant the use of telephones or H.323 protocol family.
hybrid equipment and PBXs over IP using gateways to
overcome the barriers of various networks. Voice over Standards should make it easy for the equipment of
IP points to a world of carrying voice over IP networks various vendors to interoperate. Unfortunately this has
not necessarily needing any separate telephone like not been the case sofar in IP telephony. On the contrary
equipment nor PBXs. Software phones in PCs are an the equipment and service implementations have
example of these new implementations. Today the two mostly been proprietary in that the vendors have
terms are more or less synonyms or IP telephony is a chosen a subset of the large and complex H.323
subset of VoIP. protocol stack that has met their immediate
requirements. If you have bought an IP telephony
1.1 A short history system from one vendor you have been stuck to bying
all future equipment from the same vendor. This lack
Only ten years ago the Internet was something totally
of interoperability has been the major impediment for
different it is today. Its use was restricted mainly to
the wider deployment of H.323. For this reason fastest
universities and research instititutes. Its interface was
growth in VoIP will probably occur in enterprise
text based and FTP was the main tool for exchanging
networks, where a uniform system and equip-ment
information alongside with email and chat.
base is easier to achieve. [3]
The first revolution occurred in 1993 with the World
Wide Web. The colourfull new user interface appealed The capabilites negotiation phase could at least to some
to thousands and thousands of new users and emerging extent solve this problem, but unfortunately even it is
search engines helped the users to find interesting new often not implemented completely.
sites.
Year 1996 the first attemps were made to build an This interoperability drawback is now luckily fading.
Internet telephony gateway. It consisted of a modem The IP telephony manufacturers are more and more
with speakerphone capabilities. The modem could only acclaiming that their hardware will interoperate with
dial the destination number. At that time some sound other vendors systems. The International multimedia
board drivers were capable of simultanious play and teleconferencing consortium IMTC has been set up
record (full-duplex), but they lacked a telephone
1
with the primary goal of ensuring that various vendors was accomplished in 1996 and the second version v2
products and services will interoperate. was ready by 1998. It includes both point-to-point and
multipoint connections.
Today the standardazation situation is however not at
all clear. To overcome the drawbacks of the H.323 is one of ITU-T’s mutually compliant videocon-
cumbersome and difficult to implement yet flexible ferencing standards. The others are:
H.323 protocol family the IETF has created new
protocols like the Session initiation protocol SIP and • H.310 for broadband ISDN (B-ISDN)
the Media gateway control protocol MGCP which offer • H.320 for for narrowband ISDN
much more func-tionality than H.323 to VoIP.
• H.321 for ATM
SIP is simpler, it scales better and it leverages the
existing DNS system instead of having created its own • H.322 for LANs with guaranteed QoS
separate hierarchy of name services. By including a • H.324 for public switched telephone networks
clients communication features within the invite (PSTN)
request, SIP negotiates these features and capabilities
of the call within a single transaction. The call setup Clients of H.323 are able to communicate with clients
delay can be as low as 100 ms depending on the of the other above mentioned networks.
network.
The H.323 standard does not assume any QoS in the
Thus the biggest question in VoIP today is which one network.
of the standards will prevail. H.323 is now videly
accepted and deployed, but many vendors have also 2.1 Components of H.323
announced support to the newcomer protocols. At this
transitional stage we will probably see systems which 2.1.1 Terminal
support both protocol families. Terminals are the LAN client endpoints providing real
time two way communications. They have to support
This paper restricts to presenting an overview of the H.245, Q.931, Registration Admission Status RAS and
present prevailing technology, which anyway has laid Real Time Transport RTP protocols.
the foundation of IP telephony and leaves the deeper
pre-sentation and comparison of the new standards to A H.323 terminal can communicate with an other
other presentations. The functionalities presented here H.323 terminal, a H.323 gateway or a MCU.
in context with the H.323 are all not H.323 dependent,
but general to VoIP and have thus to developped in the
2.1.2 Gateway
newer protocols also. A H.323 gateway endpoint is the interface between the
Internet and the PSTN or some other network. It
communicates in real time mode between H.323
1.3 Characteristics of IP telephony terminals on the IP network and other ITU terminals on
The characteristics of IP telephony are quite complex, a switched network, or to an other H.323 gateway. The
especially compared to streaming video, where large H.323 gateway is optional and thus is not needed in a
buffers can be used to compensate for the homogenous network
imperfectness of the Internet reagarding real time Gateways perform the translation between differing
applications. transmission formats like from H.225 to H.221. They
The main issues of IP telephony to be dealt with can also translate between audio and video codecs. In
include: one single LAN the gateway is not needed, as the
• The human ears perception of echo and delay terminals in this case can communicate directly. The
• The voice compression and packetization technics communication to other networks is done via gateways
• Silent suppresion and comfort noice generation using the H.245 and Q.931 protocols.
• The Internet shortcomings for packetized voice:
delay, jitter and packet loss 2.1.3 Gatekeeper
• The according remedies: buffering, redundancy, The gatekeeper is the vital - yet optional - central
time stamps and differentiated services managing point in its zone. When a gatekeeper is used
• Telephone signalling protocols and various call all endpoints in its zone (terminals, gateways and
types MCUs) have to be registered with it. It supports the
end-points of its zone by
• Address translation from an alias, such as an email
2 H.323 address or a telephone number, to a transport
H.323 is an ITU-T standard that was first developed for address using a translation table, which it updates
multimedia (voice, video and data) conferencing over by registration messages
LANs and later extended to cover Voice over IP. This • Admission control denying or accepting access
multimedia origin is partly the reason for its claimed based on e.g. call authorization or source and
complexity for mere VoIP. Its first version H.323v1 destination addresses.
2
• Call signalling either by processing the signalling Call signalling messages can be passed in two ways
itself or with the endpoints. It may alternatively • In Gatekeeper routed call signalling the signalling
connect a call signalling channel between the end- messages are routed between the endpoints via the
points and let them do the signalling directly. gatekeeper
• Call authorization using the H.225 signalling. The • In Direct endpoint call signalling the endpoints
gatekeeper can reject calls due to time period or change the messages directly
particular terminal access restrictions After the call signalling is completed the H.245 Control
• Bandwidth management, complying the number of channel is establshed. When Gatekeeper routed call
calls with the bandwidth available signalling is used, there are two ways to route the
• Call management maintaining optionally a list of H.245 Control channel. Either the control channel is
ongoing H.323 calls for e.g. Bandwidth manage- established directly between the endpoints or via the
ment purposes gatekeeper.
• Routing all calls originating or terminating in its
zone. This feature enables billing and security.
Rerouting to an other gateway in case of
bandwidth shortage is also included in this option
and it helps in developing mobile addressing, call
forwarding and voice mail diversion services.
2.1.4 Multipoint Control Unit
The Multipoint Control Unit network endpoint makes it
possible for three or more terminals and gateways to
participate in a multipoint conference. The MCU con-
sists of a mandatory Multipoint Controller MC and an
optional Multipoint Processors MP.
The MCU is an independent logical unit, but it can be
combined into a terminal, a gateway or a gatekeeper.
The MC determines the common capabilities of the Figure 1: The H.323 protocol stack
terminals by using the H.245 protocol, while the MP 2.2.2 H.245 Media and Conference control
does the multiplexing of audio, video and data streams
After a H.323 call is established, H.245 negotiates and
under the control of the MC.
establishes all the media channels carried by
In addition the MCU can determine whether to unicast RTP/RTCP.
or multicast the audio and video streams depending on The functions of H.245 are
the capability of the network and the topology of the • Determining master and slave. H.245 appoints a
multipoint conference. MC, which is in charge of central control in case a
call is extended to a conference
In a centralized multimedia conference each terminal
• H.245 negotiates compatible settings between the
establishes a point-to-point connection with the MCU
endpoints after the call establishment.
which then sends the mixed media streams to aech
Renegotiation can take place anytime during the
endpoint. In the decentralized model the MC manages
call
the communication compatibility but the terminals
multi-cast and mix the streams. • Media channel control by which separate logical
channels for audio, video and data can be opened
or closed after the endpoints have agreed on
2.2 The H.323 protocol stack capabi-lities. Audio and video channels are uni-
The audio video and registration packets of H.323 use directional while data channels are bi-directional
the unreliable UDP protocol, while the data and control • Flow control messages provide feed back in case
packets are transported by the reliable TCP protocol. of communication problems
• Conference control keeps the endpoints mutually
2.2.1 H.225 Call signalling aware in a conference situation. A media flow
The call signalling channel is used to carry the H.225 model between the endpoints is also established
control messages. In networks where a gatekeeper does 2.2.3 H.225 RAS Registration Admission
not exist, the calls are signalled directly between end-
points using Call signalling transport addresses. In this Status
it is assumed that the calling party knows the address RAS defines communications between the endpoints
of the called party. and the gate keeper (in case one exists) by unreliable
transport i.e. UDP.
If there is a gatekeeper in the network, the calling RAS communications include
party and the gatekeeper change the initial admission • Gatekeeper discovery is used by the endpoints to
message using the gatekeeper’s RAS channel transport find their gatekeeper: endpoints multicast gate-
address.
3
keeper requests to find the gatekeeper transport long setup delay especially when the gatekeeper routed
address model is used.
• Endpoint registration is compulsory in case where In a congested switched circuit network SCN, where a
a gatekeeper exists in the network. The gatekeeper call cannot be setup, the network local exchange tries
must know all the aliases and transport addresses to send the caller a ‘your call can not be connected’-
of all the endpoints in its zone message. No connect is sent because the network in-
• Endpoint location. A gatekeeper locates an forms the caller and not the endpoint.
endpoint with a specific transport address to Voice messages can be sent in version v1 only after
update its address database for example media channels have been established by sending first
2.2.4 H.248 Implementors' Guide a connect message.
The newcomer in the H.323 protocol family is the
H.248. It is an enhancement of the centralized master
slave type MGCP, Media gateway control protocol.
H.248 was developed in co-operation with IETF, which
calls it MEGACO.
One reason for the poor interoperability between
various implementations of H.323 has been attributed
to the lack of an implementation guide. This problem is
now being solved by the IETF Megaco project.
2.2.5 RTP
The Real time transport protocol RTP and RTCP are
both developed by the IETF. They transport the audio,
video and data packets of real time media over packet
switched networks. They are annexed in the H.323
protocol.
The main tasks of RTP are packet sequencing for
detecting packet losses, adjusting to changing
bandwidth conditions by payload identification, frame Figure 2: H.323 call sequence
identifica-tion, source identification and intramedia
syncronization to compensate for the varying delay
There is a ITU-T Mobility Ad Hoc Group working on
jitter of the stream packets.
mobile H.323 standardization.
2.2.6 RTCP
The Real time transport control protocol works in 3.1 Faster procedures
conjunction with the RTP. In a RTP session The Fast connect procedure was invented to overcome
participants send periodically RTCP packets to obtain the above mentioned deficiences. Fast connect solves
information about QoS, session quitting, participant the problems by
identification (email adresses, telephone numbers etc.) • Enabling uni- or bi-directional messages immedia-
and intermedia synchronization. tely after the Q.931 setup message
2.2.7 Q.931 • Allowing a basic bi-directional audio only commu-
Then main purpose of Q.931 is call signalling and nication immediately after the connect message
setting up the call. has been received
• Improving setup delays
An endpoint that uses the Fast connect procedure
3 Enhancements to H.323 informs the calling party of all the media points it is
prepared to receive or offers to send. This information
A major drawback - especially compared to the fast
is carried in the new fastStart parameter of the user to
SIP protocol - in the first H.323 version was the long
user Setup mesage. The description includes the codecs
call setup time. One message round trip is needed for
used and the receiving ports etc. This allows the early
• ARQ/ACF sequence
recei-ving of network prompts and improves also the
• Setup connect sequence setup delay.
• H.245 capabilities exchange
• H.245 master slave procedure The Fast connect procedure has been added as a core
• Setup of each logical channel feature in the ETSI TIPHON project, because it
In addition a TCP connection has to be setup for Q.931 resolves the interworking problem with the SCN.
and H.245 channels and each TCP connection also Fast connect makes it possible to build simple limited
needs an extra round trip for the TCP window capacity terminals that need only a minor part of the
synchronization. In a WAN environment one round trip H.245 protocol.
can take 100 ms, which ends up in a n unacceptably
4
H.323v2 offers an other solution with H.245 tunneling, is obvious that it will not scale to numerous
where H.245 messages are encapsulated in Q.931 participants.
messa-ges reducing the TCP connections to one. When The solution to large conferences is the H.332. A large
H.245 tunneling is used, the Q.931 channel must conference mostly has a panel of active speakers (5 to
remain open for the duration of the call. The Tunneling 10) and a large more or less passive audience of which
method can also clear the network generated messages one speaker at a time can propose a question or a
problem and will thus probably replace the Fast comment to the panelists.
connect procedure. The H.332 keeps ‘tightly coupled’ conference connec-
tions with the panelists and a multicast RTP/RTCP
The above described procedures are rather fixes to
conference with the passive listeners. The listeners
H.323v1 problems than a simpilification of the
have to know especially the codec and the UDP port
protocol.
used. H.332 uses the IETF Session description
The use of TCP causes at least one unnecessary protocol SDP to encode this information.
SYN/ACK round dtrip. If the Setup message exceeds Due to the large number of participants in a panel
the maximum transfer unit MTU size, two or more conference, a constraint must be set: the codec should
TCP segments must be used. Most TCP remain stable. No new participant should have the
implementations are network friendly mandating a possibility to change the codec as this would mean new
slow start, where the first TCP segment has to be negotiations for all the others.
acknowledged before the rest can be sent. If a listener wants to speak, he must use the regular join
procedure to attain the right to speak his mind.
A remedy to this problem is a special H.323v3 mode
that will use UDP insted of or simultaneously with 3.3 Directories and numbering
TCP signalling.
Most home IP telephony users are connected to
Internet by a dial-up link, where the IP address is
3.2 Conferencing with H.323 allocated on demand and is thus not static. In the early
A multipoint control unit MCU masters a multipoint stages the users of IP telephony software contacted a
conference. It consists of one multipoint controller MC server with a preconfigured IP address.
and optionally one or more multipoint processors MPs. H.323 makes this kind of solutions obsolete. A
3.2.1 Multipoint controller MC terminal has to register to a gatekeeper using a RAS
The MC decides message, which contains all the necessary information,
• who is allowed to participate especially the current IP address to contact the terminal
• how new participants are introduced to an ongoing by using an alias.
conference At present the Internet Domain name system DNS is
used to resolve the IP address when an alias name is
• how the participants synchronize their operation
known. The DNS servers make up an addressing net-
• who is allowed to broadcast media etc.
work, where an address can be resolved by quering
A gatekeeper or a terminal possessing sufficient resour-
proper DNS servers top down until one is found which
ces can include MC functionality in it and even mix
has detailed information of the endpoint in question. In
media locally to a limited extent.
addition to alias/IP address pairs a DNS database has
3.2.2 Multipoint processor MP much more information. It can hold information of the
When several participants of a multipoint conference gatekeepers of its domain in ras://-type txt records.
are simultaneously sending audio, video or data, there Once the gatekeeper is found, the caller knows to
has to exist a network element that can mix or switch which transport address he shall send the setup
the incoming media streams. The endpoint terminals message.
seldom have the capacity to do this. This mixer/switch An important issue today for international IP calls from
element is called the MP. a PSTN network is the lack of a global IP telephony
When video is sent, the MP might choose the pictures prefix. The solution has to scale to allow a large
of the latest speaker. When audio is the content, the amount of users. The global prefix should tell the IP-
MP could sum the voices of the potentially callers network that the call that has to be setup is an IP
simultaneous speakers. call and should thus be routed to a home gatekeeper,
In a centralized conference the MP mixes and switches which knows the location of the called party and can
the media streams, where as in a decentralized confe- then resolve the phone address to a call signalling
rence the terminals send their streams directly to all address.
other participating terminals. It is clear that an IP call should be routed via an IP
3.2.3 H.332 network avoiding the use of PSTN.
Several proposals have been made to define an IP tele-
The conference type where all participants retain a full
phony country code. The standardization process is not
H.245 control connection with the MCU is called
yet completed.
‘tightly coupled’. This type is resource intensive and it
For example the use of DNS works well when IP
address classes are used, but in the case of the ever
5
more popular classless interdomain routing CIDR, the G.723.1 Dual rate multimedia speech coders at
reverse address resolution is supported only by few 5,3 and 6,3 kbps (03/96)
servers and is thus not applicable. G.726 Speech coding at 16, 24, 32 or 40 kbps
using ADPCM to encode a G.711 bit
3.4 H.323 security H.235 stream
The aim of H.235 is to provide privacy and authen- G.728 Speech coding at 16 kbps using low-
tication to all protocols using H.245 including H.323. delay code exited linear prediction
Even without H.235 H.323 calls are more difficult to (09/92)
listen than ordinary telephone lines, which can be G.729 Speech coding at 8 kbps using
wiretapped. To break into a H.323 call you have to conjugate-structure-algebraic-code-
implement the codec algorithm. exited linear prediction (03/96)
With H.235 IP telephony becomes much safer than Video codecs
PSTN. The caller can even hide the telephone number H.261 Audiovisual video codecs at p * 64
of the endpoint it is trying to reach. Howerver, the kbps, where p = 1 – 30 (03/96)
H.235 is not yet widely deployed. H.263 Low bit rate video coding (02/96)
The first purpose of security was to secure the media
channels so that no outsider could listen to the ongoing Table 2: Audio and video codecs used with H.323
call. Soon it turned out that users most of all did not
want to be charged for calls they did not make and that The mandatory speech codec is the G.711, which is a
no one could monitor the called phone numbers. popular codec in telephony networks. It is not however
Providers wanted to authohrize calls when they were quite suitable for Internet communication, where the
set up, not when media or control channels were subscriber loop bandwidths are much smaller. Today
established. So the signalling channel had also to be most H.323 terminals use G.723.1, which is much
authenticated and secured. more efficient using only approximately one tenth of
The network elements that have to know the contents the G.711 bandwidth. The G723.1 uses 6,3 kbps
of the H.225 and H.245 messages need naturally to be bandwidth for continuous speech. When the call is
trusted by the endpoints. This authentication can be wrapped in IP packets the additional packet headers
carried out by Transport layer security TLS or a increase the bandwidth needed to 17 kbps. When
challenge response exchange using some certificate. silence suppression is used the net bandwidth reduces
H.323 does not specify the contents of the sertificates, back to ca. 6,7 kbps, which is ca. 10th of the bandwidth
but provides a way to exchange them and verify the of G.711. If IP header compression is used the relation
indentities of the callers. The identity can be verified is even greater. The G.728 and G.729 codecs are used
by several methods. A time stamp prevents repaly for high quality audio with also very low bandwidth
attacks. requirements.
H.323 does not ensure privacy on the RAS link
between an endpoint and a gatekeeper, but it does Due to the burstiness and bandwidth hungriness of
provide authentication. video communication efficient compression and
The call signalling channel H.225 can be secured by decomp-ression technics are of utmost importance.
TLS or IPSEC. H.323 specifies two video codecs namely the H.261
The control channel H.245 security method is and the H.263. Other codecs can also bee used in case
negotiated in the call signalling channel during the both endpoint support them.
initial set up process before any other H.245 messages Both the above mentioned video codecs use the
are sent. Various methods are accepted to initiate the discrete cosine transform DCT, H.261 with
secure channel. quantization and motion compression and H.263 with
After the H.245 channel is ensured, the terminals motion estimation and prediction
negotiate the media channel encryption method by
capability exchange. A new capability is defined for
each codec and encryption mode pair.
Many encryption algorithms can be utilized e.g. DES, 5 Applications and services
Diffie-Hellm and RSA. The vision of H.323 is interoperability between packet
and circuit switched networks. H.323 also promises
new value added services to the customers using circuit
4 Codecs switched networks. These goals have not yet been
The implementation of codecs is well developed and achieved. Lower operational costs alone are not a
does not create any interoperability problems. reason good enough to switch to a new technology.
Audio codecs Title and date Several Internet telephony service providers ITSPs
G.711 Pulse code modulation of voice frequ- have met the expectations of good services in North
encies at 56 or 64 kbps (11/88) America and Europe, but the global interoperability is
G.722 7 kHz audio coding at 64 kbps (11/88) still a big problem. Furthermore the features and
6
quality of service are often inferior to plain old incompability of services in end and edge devices will
telephone services POTS. be catered for by the capabilities negotiation process.
The main reasons for not meeting the quality goals are In the switch model new services are installed in the
the poor interoperability of the endpoints, especially switch and may result in upgrades in other parts of the
gateways, of various vendors and the limited scalability network before they are available for the customer. The
of H.323 communications. [3] switch is more over not at all so open to packages of
‘outside’ vendors. Yet it has to be admitted that in the
5.1 The architecture of H.323 central model the deployment of new services can be
The architecture of a protocol lays the foundation for simpler. On the other hand the switch is a single point
the services and applications that can be built on it. The of failure while a software PBX can be embedded in
architectural model of H.323 differs essiantially from each desktop phone. In this respect the distributed
that of the switched PSTN in that while PSTN is model is more fault tolerant than the switch model.
centralized the H.323 is decentralized. 5.2.3 The multi-tier approach
The architectural model of H.323 is peer-to-peer, the The modular nature of the multi-tier approach enables
procol design is based on the ISO QSIG standard and the creation of basic services out of building blocks of
the services can be built using a multi-tier approach. primitives. Compound services can then be created by
Use of the QSIG reduces the complexity to interact utilizing two or more basic services. Finally
with the circuit swithced PSTN networks that also use applications can be built by using compound services.
QSIG. The multitier model allows complex services to Simple services are for example:
be built of building blocks of simple services. • Multiple call handling
• Call transfer
5.2 H.450 Suplementary services • Call forwarding
The supplementary services of H.323 rely on the H.450 • Call park and pickup
series of recommendations. The key elements of it are • Call waiting
protocol based on the QSIG, peer-to-peer signalling • Message waiting indication
and a multi-tier approach of building services. [4] • N-way conference
H.323 architecture uses hig level Application program- Examples of compound services include:
ming interfaces APIs, so that software vendors do not • Consultation transfer
have to work with low level implementation details, • Conference out of consultation
which would decrease interoperability risks. Consultation transfer uses call hold, multiple calls and
call transfer. Conference out of consultation consists of
5.2.1 H.450 based on ISO QSIG call hold, multiple calls and n-way conference.
The installed base of private telecommunications
In Consultation transfer the user can perform three
networks that use QSIG is wide and thus the use of
operations:
QSIG in H.450 greatly helps the inter-working with
that base. The migration from PBX networks to H.323 1. Put a multimedia call on hold and retrieve it later
multimedia networks is simplified as well. Simpler
gateways are one more advantage of using a common 2. Call an other person and optionally alternate
standard the QSIG. between the two calls, or
5.2.2 Based on peer-to-peer signalling 3. Transfer the call
In this respect the H.323 network differs essiantially
In Conference out of consultation the user has also
from a circuit switched network. Like in the Internet, in
three options:
H.450 the intelligence resides in the end and edge
devices and the network simply routes the packets. The 1. Put a multimedia call on hold and retrieve it later
end device can be a PC or any IP phone and the edge
2. Call another person and optionally alternate
device is a PBX or a consumer gateway at the home
between the two calls
location. The state of the calls is also distributed in the
end and edge devices. 3. Merge the calls in one conference call
In the traditional circuit switch model the intelligence
and the state of the call reside in the network. The ends
and edges are simple phones that run a stimulus- 6 Application examples
response protocol.
In H.450 new services can be installed in the ends and
edges like software packages in a PC. Any software 6.1 Call center integration
house can develop services to this standard and sell A call center gateway lets Web surfers with properly
them directly to the end-user. This simplicity and equipped multimedia PCs (typically with the right
straight forwardness in deployment will certainly browser plug-in) connect to an existing Automatic Call
stimulate the growth of a service building software Distributor (ACD) with Internet phone technology.
industry. It should be remembered that the potential This illustrates one of the major advantages of IP
7
telephony — its ability to combine voice and data on a activated, these restrictions should not apply to
single line. IEPS users
The main advantage the IP telephony brings to Call • IEPS calls should be marked from end to end
centers is skill based routing. An incoming call can be 6.2.2 Established Telecommunication
directed to a call taker that for example can speak the services
same language as the caller or is a specialist in a field
The essential features of the E.106 for the IEPS in the
the caller wants help of. The call can also be directed to
well established circuit switched PSTN and ISDN
a personal adviser.
networks include
Emergency services provide another example of an
• Priority dial tone
architectural conflict since, for example IP addresses
have no correlation with geographic location. • Priority call setup, including priority queuing
schemes
6.2 IEPS • Exemption from restrictive management controls
In the United States the Goverment emergency
As an other example of an application of IP telephony telecom-munications service GETS uses the High
in the broad sense of the term application, this paper probability of completion HPC in SS7 signalling for
presents the basic requirements that IP telephony marking emer-gency calls. It should be noted that HPC
should take into account to support the International does not include pre-emption of existing calls. In the
emergency preparedness scheme IEPS. U.S. alternate carrier routing ACR is used in the GETS
The ITU-T recommendation E.106 for emergency in case some inter exchange carrier is not available.
communications was first defined for PSTN and ISDN GETS uses a non-geographic toll free universal access
networks, but it was soon realised that this scheme had number.
to be extended to cope with the next generation
networks i.e. the Internet and especially IP telephony. Some countries use IEPS access lines where all calls
In this regard the ITU-T Study group 16 is developing have a priority, while in some other countries priority
a new recommendation for International emergency is applied on a per call basis only.
multimedia service IEMS as an extension to E.106, to 6.2.3 Next generation networks
provide for enhanced emergency services over Internet The IEPS requirements of E.106 should also be
based net-works in the future. fullfilled by newly emerging next generation networks
The IEPS is needed when there is a crisis situation especially the Internet. The packet switching
which causes abnormal telecommunication technology provides a clearly different operational
requirements for governmental, military, civil evironment compared to the traditional circuit switched
authorities and other essen-tial users of PSTN. It allows networks. Thus new aspects have to be considered but
authorized users to be able to access the International there also emerges the possi-bility of new innovative
telephone service while the service is restiricted due to services based on the new featu-res of packet switched
damage, congestion and/or other faults. [6] networks.
6.2.1 Overall functional requirements Examples of the new features are
The primary goal of IEPS is to support crisis manage- • Quality, grade and class of service
ment arrangements by increasing the ability of the The flexibility of the emerging object oriented and
essen-tial users to communicate via the PSTN, ISDN, distributed technologies
Public land mobile networks PLMN or IP telephony. For IP telephony an IEPS indicator similar to that of
[6] the HPC has to be defined, but the IP indicator has to
The basic requirements include: be applied throughout the call.
• International and national preference schemes are There is extensive work going on in the international,
independent yet compatible: one could be national and regional standardization bodies to define
activated when the other does not need to be the next generation networks. It is of utmost
activated importance that they shall now start the work on the
• National preference scheme users may not get adaptation of IEPS. [7]
access to the international scheme, but authorized 6.2.4 Quality
users of the international scheme have to be able to The quality of video in the Internet is poor and the
use the national preference scheme audio quality is not high either especially compared to
• In some national schemes IEPS features may be the PSTN. Because H.323 is a higher layer protocol, it
enabled permanently can utilize the quality mechanisms of lower level
• Calls originated by IEPS users should be given protocols like the IntServ/RSVP and the DiffServ. In
priority in the networks envolved when IEPS is fact the development of QoS in the Internet is a result
enabled of the introduction of multimedia services to the
• There must not be any conflict between preference Internet.
for a call from an essential user and a call priority OoS features have been built in all modern LAN
for a non-essential user to an emergency service equipment although some critics say that enough of
• If call restrictions to certain specific destinations ever cheaper bandwidth will cater for the new
(countries or areas) have been set when IEPS is multimedia services and QoS will not be necessary.
8
The codecs in use today squeeze an IP call to only 7.3 The revenue share
about one 10th of the bandwidth of a traditional PSTN
As carriers move to packet networks voice and new
call and better codecs are on their way.
applications pay the bills. As figure 4 shows the
relative amount of pure datacommunication in the
7 Market trends telecommu-nication networks compared to telephony
will grow very fast in the next few years. Yet the
overwhelming majo-rity of the revenue will continue to
7.1 The operator market changes come from telephone calls and services. [10]
The new IP telephony technology has a markable Natural microsystems VOICE
influence on the telecommunications market. First the BITS DATA REVENUE
900
liberation of the telephone market legislation in the 200
Source: Probe Research, 2001 Source: Probe Research,, 2001
800
European Union. has given birth to new companies 000
700
both operators and service providers. The ease to build
600
new telephony services relying on the new IP 800
500
telephony technology contributes to this trend. 600
400
In the second phase we will very probably see a conso-
300
lidation period where new comers with unsustainable 400
200
cash flows merge with profitable players or leave the 200
100
hardening business campaign by going under.
0
Next a restructuring of the market will likely see 0
mergers of smaller companies who cannot make the
97
98
99
00
01
02
03
97
98
99
00
01
02
03
19
19
19
20
20
20
20
19
19
19
20
20
20
20
necessary investments that a new technology – how
flexible and promising it ever is - inevitably requires.
Figure 3. [9] Figure 4: Voice and data
7.4 VoIP growth obstacles

According to one of the biggest European VoIP opera-
tors Telia, the technological equipment coompatibility
process has been much too slow. Vendors have been
lacking drivers for making equipment compatible.
Luckily there is progress to be seen as more and more
vendors are claiming to make their equipment co-
operate with the equipment of the bigger vendors like
Cisco systems.
An other obstacle is the price of the equipment. Carrier
class VoIP equipment is still too expensive compared
to traditional switching equipment. As an example to
build a switched E1 connection costs $1500, while a
VoIP E1 costs more than $10000. VoIP is today thus 6
- 8 times more expensive than POTS.
For these reasons the international voice carrier
Figure 3: The operator market development
margins are very low which may to a certain extent
slow down the development of the IP telephony
7.2 The European landscape
business. [11]
Unified Europe does not – at least yet - mean unified
services. The main reasons are
• Current infrastructure is dispersed, difficult to List of acronyms
deliver seamless service throughout the continent
• Back-office language and currency challenges ACR: Alternate carrier routing
API: Application programming interface
• Relatively new competitive environment forcing
CIF: Common intermediate format
traditional players to evolve
Codec: Compression/decompression
• Consolidation and restructuring
DCT: Discrete cosine transform
DiffServ: Differentiated services
New opportunities and new players
DTMF Dial Tone Multi-Frequency
The EU is anyway focusing intensively in the telecom-
GETS: Goverment emergency telecommunications
munications field and unification will increase in the
service
coming years.
HPC: High probability of completion
IEMS: International emergency multimedia service
IEPS: International emergency preparedness scheme
9
IETF: Internet Engineering Task Force [7] Folts Hal: Functional requirements for priority
IMTC: International multimedia teleconferencing services to support critical communications,
consortium TIPHON 17, Temporary document 116, Document
IntServ: Integrated services for discussion
IP: Internet protocol [8] White paper on IP telephony, A road map to
ISDN: Integrated services digital network supporting GETS in IP networks, Prepared under
ISO: International Organization for Standardization contract no. DCA 100-99-F-4413 Data item no.
ITSP: Internet telephony service provider C002, Science applications international corpo-
ITU-T: International Telecommunication Union – ration, 27th of April 2000
Telecommunications Sector [9] Indovino, Lisa, deltathree: Show me the VoIP
MC: Multipoint controller deployment - European service providers, Spring
MCU: Multipoint control unit 2001 Voice on the net conference presentatation
MP: Multipoint processor [10] Chase, Jack, Natural MicroSystems: Convergence
MCS: Multipoint communication service of GSM, IP and VoIP, Spring 2001 Voice on the
MGCP: Media gateway control protocol net conference presentatation
OSI: Open systems interconnection [11] Dahlgren, Paul, Telia international: Show me the
PBX: Private branch exchange VoIP deployments, Spring 2001 Voice on the net
PLMN: Public land mobile network conference presentatation
POTS: Plain old telephone services
PSTN: Public switched telephone network
QSIG: D-channel signalling protocol at Q reference
point for PBX nerworking
RAS: Registration, admission, status
RFC: Request for comments
RSVP: Resource reservation protocol
RTCP: Real-time transport control protocol
RTP: Real-time transport protocol
SCN: Switched circuit network
SDP: Session description protocol
SIP: Session initiation protocol
TCP: Transmission control protocol
TIPHON: Telecommunications and Internet protocol
harmonization over networks
TLS: Transport layer security
UDP: User datagram protocol
References
[1] Arora, Rakesh: Voice over IP: Protocols and
standards, Network Magazine, , 23rd of November
1999, http://www.cis.ohio-state.edu/~jain/cis788-
99/voip_protocols/index.html
[2] Hersent Olivier, Gurle David, Petit Jean-Pierre: IP
telephony, Addison Wesley Britain 2000, ISBN 0-
201-61910-5
[3] Karim, Asim: H.323 and associated protocols,
26th of November 1999, http://www.cis.ohio-
state.edu/~jain/cis788-99/h323/index.html
[4] Kumar, Vincent: Supplementary services in the
H.323 IP multimedia telephony network, IEEE
Communications magazine, July 1999
[5] Carlberg Ken et al: Framework for supporting
IEPS in IP telephony, <draft-carlberg-ieps-
framework-00.txt>, Network working group,
November 2000
[6] Network working group, IETF: Description of an
International emergency preparedness scheme
(IEPS), <draft-itu-t-ieps-description-00.txt>, 20th
February 2001
10
H.323 Protocol Suite
Guoyou He
ghe@cc.hut.fi
point-to-point communication or multipoint

communication.
To provide interoperability for equipment from
Abstract multiple vendors, standards have been established for
POTS, ISDN, PSTN, and computer networks. For
Multimedia communication has affected various areas
example, H.320 specifies the standards for ISDN
of people’s life. Correspondingly numerous standards,
videoconferencing; H.310, H.321, and H.322 specify
communication technology and networks of multimedia
the visual terminal for networks that guarantee quality
from different vendors are evolving rapidly. The H.323
of service (QoS); H.324 specifies videoconferencing
protocol suite specified by ITU is a main technology
over POTS modem connections; and T.120 standards
for real-time communication of audio, video, and data
provide the specifications for real-time data and
over packet switched networks. It also specifies the
audiographics conferencing [2]. Following the H.323
interoperability between the packet switched networks
protocol suite for audiovisual communication will be
and circuit switched networks. In this paper, the
discussed in detail.
history H.323 protocol suite and the architecture of
H.323 system are reviewed first. Then the signaling
and connection procedures of H.323 systems are 2 What is H.323 suite
presented. Finally we discuss the new features in
H.323 version 4 and the features that are under H.323 is a standard developed by the ITU. It specifies
development or to be specified for the future releases of packet-based multimedia communications systems
H.323. across networks, which might not provide any Qos
guarantees. H.323 suite is a family of standards that
includes many other ITU standards as shown in Table 1
1 Introduction [11].
At the present time, numerous multimedia applications
Table 1: H.323 standards
and services are available. These applications and
services include video and audio, synthesized video, Network Non-guaranteed bandwidth
audio and text, as well as interactivity. This multimedia packet-switched networks (e.g. IP)
information can be used for videoconferencing, Video H.261, H.263
telephony, video games, home shopping, video on Audio G.711, G.722, G.728, G.723, G. 729
demand, audio on demand and the like. However, this Call Signalling H.225.0
rapidly advancing multimedia technology is and media
continuously spawning new products and applications, packetisation
and their emergence has significant impact on a large Call Control H.245
number of people from all walks of life. This important Multipoint H.323
and constantly evolving area comprises a number of Data T.120
technologies, which include multimedia computers,
compression and multimedia networks as well as the The H.323 standard is a principal technology for the
transport mechanisms for these networks. The transmission of real-time audio, video, and data
standards and technology for multimedia and communication over packet-based networks. It
multimedia communication are evolving at a provides both multipoint and point-to-point sessions.
prodigious pace. Videoconferencing provides for H.323 defines the components, protocols, and
audiovisual communication as well as document procedures providing multimedia communication over
sharing, including text, tables and images. The video packet-based networks, which include Inter-Networks
and audio information must be compressed prior to (including the Internet), Local Area Networks,
entering a communication network and decompressed Enterprise Area Networks, Metropolitan Area
when leaving it. Hardware or software codec can be Networks, and Intra-Networks [12]. Packet based
used for compression and decompression of video and networks also include point-to-point connections or
audio information. The multimedia communication can dial up connections over the GSTN or ISDN which can
be established with different equipment in the way of use an underlying packet based transport. H.323 can be
11
used in a variety of mechanisms, which include audio 3.4 H.323 Version 4
and video (video telephony); audio only (IP telephony);
Many new enhancements have been introduced into the
audio, video and data; video and data; multipoint-
protocol H.323 Version 4, which was approved
multimedia communications.
November 17, 2000. It contains enhancements in a
The H.323 standard is part of the H.32X family of
number of important areas, including reliability,
recommendations specified by ITU-T. The other
scalability, and flexibility. New features help facilitate
recommendations of the family define multimedia
more scalable Gateway and MCU solutions to meet the
communication service over different networks are
growing market requirements [7][10].
shown in Table 2 [11].
Table 2: H.32X recommendations 4 H.323 Architecture

ITU standard Network
H.320 ISDNs The H.323 standard specifies the elements, protocols,
H.321, H.310 B-ISDNs and procedures providing multimedia communication
H.324 SCNs over packet-based networks (see Figure 1 [10])
H.323 Non-guaranteed bandwidth packet
switched networks
H.322 LANs that provide guaranteed Qos
Interoperability with other multimedia networks is one

of the primary goals in the development of the H.323
standard.
3 H.323 Version Suites

Since the first version of H.323 was approved in 1996,
it has had 4 versions till the approval of H.323 Version
4 on November 17, 2000 [7].
3.1 H.323 Version 1

Figure 1: H.323 architecture
“Visual telephone systems and equipment for local area
networks which provide a non-guaranteed quality of An H.323 system provides point-to-point or multipoint
service”, was published in 1996 [4] and was designed multimedia communication services. It has four main
for local area networks. The first thing what companies elements including terminals, gateways, multipoint
tried to do was use H.323 in WAN, large private VoIP control units (MCUs), and gatekeeper [12]. Terminals,
networks, and the Internet. It worked very well. gateways, and MCUs are also called endpoints.
Recognizing the fact that H.323 was much more than a
LAN protocol, the name was changed to H.323 4.1 Terminals
Version2 in 1998 [10].
Terminals include Video I/O equipment, Audio I/O
3.2 H.323 Version 2 equipment, User Data Applications, and System
Control User Interfaces. Terminals can be used for
H.323 “Packet-based multimedia communications real-time bidirectional multimedia communications. An
systems” was approved in January of 1998. It brought H.323 terminal can either be a personal PC or a stand-
in H.235 Security (Authentication of participant, alone device, running an H.323 and multimedia
Integrity of data, Encryption, and digital signature), applications. It supports audio, video and data
Fast Connect, Supplementary Services (H.450.1 communications. An H.323 terminal plays a key role in
Signaling protocol, H.450.2 Call Transfer, and H.450.3 IP-telephony due to is its basic service of audio
Call Diversion), Integration of data conferencing with communications. Interworking with other multimedia
T.120, and Scalability features (Alternate Gatekeepers, network is the primary goal of H.323. The H.323
Time to Live, and Pre-granted ARQs) [9][10]. terminals are also compatible with the terminals on the
networks given in Table 2 [4][13].
3.3 H.323 Version 3
H.323 version 3 was approved on September 30, 1999. 4.2 Gateways
It introduced a few modest improvements, mostly Gateways connect H.323 networks to other networks,
geared for better PSTN integration and scalability. including the PSTN, ISDN, H.320 systems, etc. The
However, H.323 has progressed substantially, mostly connectivity of dissimilar networks is achieved by
in the form of new Annexes to H.323 and H.225.0 that translating protocols for call setup and release,
add considerable value to the overall H.323 system converting media format between different networks
architecture [8].
12
[4][12]. An example of Gateway, which connects
H.323 system to PSTN, is given in Figure 2 [3].
Figure 2: H.323/PSTN Gateway

Figure 4: Hybrid multipoint conference
4.3 MCUs
MCUs are responsible for managing multipoint 4.4 Gatekeepers
conferences of three or more H.323 terminals. A two-
terminal point-to-point conference can be expanded to Gatekeepers are used for admission control and address
a multipoint conference. The MCU consists of a resolution. A gatekeeper may allow calls to be placed
mandatory multipoint controller (MC) and optional directly between endpoints or it may route the call
multipoint processor (MP). The MC supports the signaling through itself. A gatekeeper is also
negotiation of capabilities with all terminals in order to responsible for the services of band control,
insure a common level of communications. It can also accounting, and billing. A single gatekeeper manages a
control the resources in the multicast operation. The collection of Terminals, Gateways, and MCUs forming
MP is the central processor of the voice, video, and a zone. A zone is logical association of these
data streams for a multipoint conference [13]. components and may span multiple LANs [4] (Figure 5
The MCU may (or may not) control three types of [3]).
multipoint conference (Figure 3 [4]).
Centralized multipoint conference
All participating terminals communicate with the MCU
point-to-point. The MC manages the conference, and
the MP receives, processes, and sends the voice, video,
or data streams to and from the participating terminals.
Decentralized multipoint conference
The MCU is not involved in this operation. Rather the Figure 5: H.323 zone
terminals communicate directly with each other
through their own MCs. If necessary, the terminals
assume the terminals take the responsibility for
summing the received audio streams and selecting the 5 The H.323 Protocol Stack
received video signals for display.
The H.323 suite consists of a set of standards. H.323
cites the use of the others shown in Figure 6 [13].
For audio applications, the minimum requirement is the
support of recommendation G.711 (64 kbps channel).
Other voice codec standards cited by H.323 are G.722
(48, 56, and 64 kbps channels), G.723 (5.3 and 6.3
kbps channels), G.728 (16 kbps channel), G.729 (8
kbps channel) [4].
The H.245, control protocol for multimedia
communication, is used during an initial handshake
between the machines to determine the audio encoding
algorithm, terminal capabilities, and media channels.
The terminals should be capable of sending and
receiving different audio streams. After H.245 has
Figure 3: Multipoint conference completed the agreements on the terminals’ capabilities
and media channels, the H.225, call signaling and setup
protocol, is used to format the audio stream.
Hybrid multipoint conference H.261 is video coding standard. It was designed for
This conference is a mix of the centralized and data-rates which are multiples of 64kpbs. H.261
decentralized modes. The MCU keeps the operations supports two resolutions, QCIF (Quarter CIF) and CIF
transparent to the terminals (see Figure 4 [4]). (Common Intermediate format). If video is supported,
the H.323 terminals must code and decode the video
13
streams in accordance with H.261 QCIF. Options are 6.2 Registration, Admission and Status
available, but they must use the H.261 or H.263 (RAS)
specifications. The coding algorithm of H.263 is
similar to that used by H.261, however with some The RAS channel is used between H.323 endpoints and
improvements and changes to improve performance gatekeepers for gatekeeper discovery, endpoint
and error recovery. H.263 supports five resolutions, registration, endpoint location, and admission control.
QCIF, CIF, SQCIF (Sub-QCIF), 4CIF, and 16CIF. The RAS messages are carried on a RAS channel that
Data support is through T.120, and the various control, is unreliable. Hence, RAS message exchange may be
signaling, and maintenance operations which are associated with timeouts and retry counts.
provided by H.245, Q.931, and the Gatekeeper Gatekeeper discovery
specification. Gatekeeper discovery is the process an endpoint uses to
The audio and video packets must be encapsulated into determine which Gatekeeper to register with. The
the Real-time Transport Protocol (RTP) and carried on gatekeeper discovery can be done statically or
a UDP socket pair between the sender and the receiver. dynamically. In static discovery, the endpoint knows
The Real-Time Control Protocol (RTCP) is used to the transport address of its gatekeeper a priori. In the
assess the quality of the sessions and connections as dynamic method of gatekeeper discovery, the endpoint
well as to provide feedback information among the multicast GRQ message on the gatekeeper’s discovery
communication parties. The data and support packets multicast address. One or more gatekeepers may
can operate over TCP or UDP [4][13]. respond with GCF message [4].
Aud io Vid eo D ata Sy stem C ontrol E n dp oint G a te k e e p e r

I/O I/O A pp . U ser Interfa ce
GRQ
Au d io
C o dec Video D ata System C o ntrol G C F /G R J
G .7 1 1 C o dec Interfa ce
G .7 2 2 C AL L R AS H .24 5
G .7 2 3 H .2 6 1 T .1 20 C o ntrol C ontrol C ontrol
G .7 2 8 H .2 6 3 H .22 5 H .22 5
G .7 2 9
R T P/RT C P Figure 7: H.323 - Gatekeeper discovery
UD P UD P or T C P
IP Endpoint registration
L_ 2 Varies Endpoint registration is the process by which an
L_ 1 Varies
endpoint joins a Zone, and informs the Gatekeeper of
its Transport Address and alias address. All endpoints
register with a gatekeeper as part of their configuration
Figure 6: H.323 protocol stack process. Registration occurs before any calls are
attempted and occurs periodically as necessary [4] (see
Figure 8).
6 Call Signaling
E n dp oint G a tek e e p e r
Call signaling is the messages and procedures used to
RRQ
establish a call, request changes in bandwidth of the
call, get status of the endpoints in the call, and R C F /R R J
disconnect the call [4].
6.1 Addresses
URQ
In H.323 system, each entity has at least one Network E n d p o i n t i n itia t e d
Address (e.g. IP address). This address uniquely U C F /U R J U n r eg i st e r R e q u e st
identifies the H.323 entity on the network. Some

entities may share a Network address (i.e. a terminal
and a co-located MC). For each Network address, each URQ
H.323 entity may have several Transport layer Service G a tek e e p e r i n i tia t e d
U n r eg i st e r R e q u e st
Access Point (TSAP) identifiers. These TSAP UCF
Identifiers allow multiplexing of several channels

sharing the same Network address. An endpoint may
also have one or more alias addresses associated with Figure 8: H.323 Endpoint registration
it. An alias address may represent the endpoint or it
may represent conferences that the endpoint is hosting.
The alias addresses provide an alternate method of
addressing the endpoint [4].
14
Endpoint location signaling on the call-signaling channel (see Figure 10)
Endpoint location is a process by which the transport [4].
address of an endpoint is determined and given its alias Gatekeeper cloud
name or E.164 address [4].
Other Controls 1 2 4 5
The RAS channel is also used for other controls, such 1 ARQ
2 ACF/ARJ
as admission control, to restrict the entry of an 3 Setup 3
4 ARQ
endpoint into a zone; bandwidth change, to modify the 5 ACF/ARJ
Endpoint 1 6 Endpoint 2
6 Connect
call bandwidth during a call; and disengagement Call Signalling Channel Messages
control, to disassociate an endpoint from a gatekeeper RAS Channel Messages
and its zone [4].
6.3 H.225 Call Signaling and H.245 Control Figure 10: H.323-Direct endpoint call signaling
Signaling
H.225 Call signaling H.245 Control Signaling
The H.225 call signaling is used to set up connections When Gatekeeper routed call signaling is used, there
between H.323 endpoints, over which the real-time are two methods to route the H.245 channel. In the first
data can be transported. The call signaling channel is a method, the H.245 control channel is established
reliable channel, which is used to carry H.225 (adopted directly between the endpoints (see figure 11). In the
a subset of Q.931 messages and elements) call control second method, the H.245 control channel is routed
messages. For example, H.225 protocol messages are between the endpoints through the Gatekeeper (see
carried over TCP in an IP based H.323 network [4]. Figure 12). This method allows the Gatekeeper to
In networks that do not contain a Gatekeeper, call redirect the H.245 Control channel to an MC when an
signaling messages are passed directly between the ad hoc multipoint conference switches from a point-to-
calling and called endpoints. It is called direct call point conference to a multipoint conference. This
signaling. In networks that do contain a Gatekeeper, choice is made by the Gatekeeper.
the H.225 messages are exchanged either directly When direct endpoint call signaling is used, the H.245
between the endpoints or between the endpoints after control channel can only be connected directly between
being routed through the gatekeeper. It is called the endpoints [4].
gatekeeper-routed signaling. The method chosen is
decided by the gatekeeper during RAS-admission Gatekeeper cloud
message exchange. 1 ARQ

2 ACF/ARJ
3 Setup 1 2 3 8 4 5 6 7
4 Setup
Gatekeeper-Routed Call Signaling 5 ARQ
6 ACF/ARJ
The admission messages are exchanged between 7 Connect 9
Endpoint 1 Endpoint 2
endpoints and the gatekeeper on RAS channels. The 8
9
Connect
H.245 Channel
gatekeeper receives the call-signaling messages on the H.245 Control Channel Messages
T1521300-96
call-signaling channel from one endpoint and routes Call Signalling Channel Messages
them to the other endpoint on the call-signaling RAS Channel Messages
channel of the other endpoint (see Figure 9)[4].
Figure 11: H.323 – H.245 control channel connection

Gatekeeper cloud between endpoints
1 ARQ
2 ACF/ARJ
3 Setup
4 Setup 1 2 3 8 4 5 6 7
5 ARQ
6 ACF/ARJ
7 Connect
8 Connect
Gatekeeper cloud
1 ARQ
2 ACF/ARJ
Call Signalling Channel Messages 3 Setup
RAS Channel Messages 4 Setup 1 2 3 8 9 4 5 6 7 10
5 ARQ
6 ACF/ARJ
7 Connect
8 Connect
9 H.245 Channel Endpoint 1 Endpoint 2
Figure 9: H.323-Gatekeeper routed call signaling 10 H.245 Channel
H.245 Control Channel Messages T1521310-96
Direct Call Signaling Call Signalling Channel Messages

RAS Channel Messages
During the admission confirmation, the gatekeeper
indicates that the endpoints can exchange call-signaling
messages directly. The endpoints exchange the call
Figure 12: H.323 – Gatekeeper routed H.245 control
15
7 Connection Procedures
The connection procedures of the H.323 systems Endpoint 1 Gatekeeper Endpoint 2
communication are made in the steps of Call setup, TerminalCapabilitySet(9)
Initial communication and capability exchange,
TerminalCapabilitySetAck(10)
Establishment of audiovisual communication, Call
services, and Call termination. This section uses an TerminalCapabilitySet(11)
example network, which contains two endpoints
connecting to a gatekeeper to illustrate the whole TerminalCapabilitySetAck(12)
connection steps. OpenLogicalChannel(13)
OpenLogicalChannelAck(14)
7.1 Step A: Call setup
Call setup can be in all following cases: OpenLogicalChannel(15)
all combinations of Direct Routed Call signaling
(DRC)/Gatekeeper Routed Call signaling (GRC), same OpenLogicalChannelAck(16)
or different Gatekeepers;
Fast connect procedures; H.245 Message
call forwarding using facility (restarts the procedure);
and setting up conferences [6].
Figure 13 illustrates the call setup process with the Figure 14: H.323 Control Signaling Flows
example of both endpoints registered to the same
Gatekeeper. It assumes direct call signaling [12]. 7.3 Step C: Establishment of audiovisual
communication
Endpoi nt 1 Gate keeper Endpoi nt 2
ARQ (1) Following the exchange of capabilities, master-slave
ACF/ARJ (2)
determination, and opening of the logical channels for
the various information streams, the audio and video
Setup (3)
streams, which are transmitted in the logical channels
Call proceeding (4) setup in H.245, are transported over dynamic Transport
ARQ (5)
layer Service Access Point (TSAP) Identifiers using an
unreliable protocol. Data communications, which are
ACF/ARJ (6)
transmitted in the logical channels setup in H.245, are
Alerting (7)
transported using a reliable protocol. Figure 15 is an
Connec t (8) example of illustrating the H.323 media stream and
media control flows [4][11].
T152 7160-97
RAS Messages
Call Signalling Messa ges
Endpoint 1 Gatekeeper Endpoint 2

Figure 13: Call Setup RTP Media Stream(17)
7.2 Step B: Initial communication and RTP Media Stream(18)

capability exchange
RTCP Messages(19)
This step includes the procedures of Capability
exchange, Master/Slave determination, and H.245
tunneling [4]. RTCP Messages(20)
Once both sides have exchanged call setup messages
from step A, the endpoints shall establish the H.245 RTP media stream and RTCP Messages
Control Channel. The procedures of H.245 are used
over the H.245 Control Channel for the capability
exchange and to open the media channels.
The H.245 Master-slave determination procedures are
used to resolve conflicts between two endpoints which Figure 15: Media Stream and Media Control
can both be the MC for a conference, or between two Flows
endpoints which are attempting to open a bidirectional
channel. Figure 14 is an example of H.323 control 7.4 Step D: Call services
signaling flows.
Call services include Bandwidth change, Status
Information request for management, Conference
expansion, multicast cascading, and H.450
Supplementary Services [4].
16
Bandwidth changes MC
Call bandwidth is initially established and approved by Gatekeeper
the Gatekeeper during the admission exchange. At any T1524130-96
time during a conference, the endpoints or Gatekeeper

Endpoint 2
may request an increase or decrease in the call
bandwidth. An example of Bandwidth changes is given
in Figure 16.
Figure 18: Gatekeeper routed Call Signaling model
Multicast cascading
Multicast cascading is the case when a call is
BRQ(21) established between the entities containing the MCs,
and the H.245 Control Channel is opened, the active
BCF/BRJ(22)
MC (Master/Slave procedure) may active the MC in a
CloseLogicalChannel(23) connected entity. Once the cascade conference is
established, either the master or slave MCs may invite
OpenLogicalChannel(24) other endpoints into the conference. There is only one
BRQ(25) master MC in a conference. A slave MC can only be
BCF/BRJ(26)
cascaded to a master MC.
H.450 Supplementary services
The H.450 supplementary services are optional to
OpenLogicalChAck(27) H.323 systems. These services include call forward,
call hold, call waiting, message waiting indication, and
name identification etc.
RAS messages H.245 messages
7.5 Step E: Call termination

Figure 16: H.323 – Bandwidth Change Call termination can be made by any endpoint when
video, audio, or data transmissions are at end.
Status Correspondingly all logical channels for video, audio,
Status is procedures of gatekeeper determining the or data are closed. Terminating a call may not
work status, on/off or failure, of the endpoints. The terminate a conference. It can be done by MC that the
Gatekeeper may use the H.225 Information Request terminating of a conference. Figure 19 [12] illustrates
(IRQ) /Information Request Response (IRR) messages the call release procedure.
to poll the endpoints periodically.
Conference expansion
Conference expansion is the procedure for expanding a
point-to-point conference involving an MC to a End Session Command(28)
multipoint conference. First, a point-to-point End Sesion Command(29)
conference is created between two endpoints. At least
one endpoint or the gatekeeper must contain an MC.
Once the conference has been created, the conference Release Complete (30)
may be expanded to multipoint conference by any DRQ(31) DRQ(32)
endpoint in the conference inviting another endpoint
into the conference through the MC, or an endpoint DCF(33) DCF(34)
joins an existing conference by calling an endpoint in
the conference. Figure 17, 18 [4] illustrate the H.245
Control Channel topology for the Direct Call Signaling H.225 Signaling Messages
model, and the Gatekeeper routed Call Signaling RAS Message
model. H.245 Message
Endpoint 1
MC
Endpoint 3 Figure 19: H.323 Call Release
Endpoint 2
T1524120-96
Figure 17: Direct Call Signaling model 8 New Feature of H.323 Version 4
H.323. Version 4 was approved on November 17,
2000. It contains enhancements in a number of
important areas including scalability, reliability,
flexibility, services, must have features, and generic
extensibility framework [7][10][1].
17
8.1 Scalability, Reliability, and Flexibility
The H.323 Version 4 enhances the scalability of H.323
systems in the areas including Gateway Decomposition
with H.248, Additive Registrations, Alternate
Gatekeepers, and Endpoint Capacity Reporting.
Gateway Decomposition
Traditional Gateways were designed so that both media
and call control were handled in the same box.
Recognizing the need to build larger, more scalable
gateway solutions for carrier solutions, the ITU-T
worked jointly with the IETF produced the
Recommendation H.248, which describes the protocol
between the Media Gateway Controller (MGC) and the
Media Gateway (MG). H.323 version 4 supports the
decomposition of Gateway into Media Gateway Figure 21: Alternate Gatekeepers
Controller (MGC) and Media Gateway (MG).
The decomposed Gateway separates the MGC function
and the MG function. Multiple MGs may exist to allow
the decomposed Gateway to scale to support much
more capacity than a composite Gateway. The
communication between the MGC and MGs is done
through H.248 (see Figure 20 [11]).
Figure 22: Endpoint Dispatcher
8.2 Services
One of the most important features of a VoIP protocol
is its ability to provide services to the service provider
and end users. H.323 has a rich set of mechanisms to
provide supplementary services. Version 4 introduces a
few more supplementary services to strengthen the
protocol in this regard. These services mainly include
HTTP-based Service Control, Stimulus-based Control,
Figure 20: Decomposition Gateway and Call completion [1][10][7].
HTTP-based Service Control
H.323 version 4 specifies a means of providing HTTP-
Alternate Gatekeepers based control for H.323 devices. With HTTP-based
The architecture of alternate Gatekeepers is shown in control, service providers have the ability to display
Figure 21 [11]. By using Alternate Gatekeepers, web pages to the user with meaningful content that ties
endpoints can continue functioning when the into the H.323 systems. In essence, it is a third party
communication between the endpoints and one or more call control mechanism that utilizes a separate HTTP
Gatekeepers. It increases the reliability and never loses connection for control.
calls.
Stimulus-based Control
Endpoint Capacity reporting H.323 version4 provides a new "stimulus-based"
H.323 endpoints report capacity to Gatekeepers. By control mechanism. With this mechanism, an H.323
utilize endpoint capacity reporting, Gatekeepers may device may communicate with a feature server to
select an endpoint that is best capable of handling the provide the user with various services. The H.323
call. It is very useful for large scale deployments of endpoint may possess some intelligence, but some
Gateways, and extremely increases the availability (see intelligence may reside only in the feature server or
Figure 22 [11]). multiple feature servers. The features may be
numerous. Any new features may be added to the
feature servers without the delay by standard
* GK selects the GW with the most capacity.
* H.323 terminals report capacity in absolute terms, not in
procedure.
percentages.
18
Call completion Fax Enhancements
This is a new H.450 supplementary service, which Version 4 of H.323 allows an endpoint to be able to
provides a standard means of allowing calls to initiate a voice call and then switch to fax at some
complete when the user is either busy or there is no point. It allows an IP-based fax device to operate in a
answer. similar manner as today's PSTN fax devices. Version 4
also enhanced to utilize TCP for carrying fax data.
8.3 “Must Have” Features Previously, UDP was the only real option for carrying
The features included are listed below [7][10][5]: fax data.
Usage Information Reporting Tunneling other protocols
To help providing accurate billing information, the H.323 is often used to inter-work between two circuit
Gatekeeper can request the endpoint to provide usage networks. To provide better inter-working, Version 4
information reporting to the Gatekeeper at various provides a mechanism whereby QSIG (Signaling
times during the call, including at the beginning of the between the Q reference points) and ISUP may be
call, during the call, and at the end of the call. tunneled without translation essentially. H.323 may act
Caller Identification as a transparent tunnel for those non-H.323 signaling
H.323 Version 4 contains complete information for protocols (see Figure 23 [5]).
providing caller identification services with H.323.
Tones and Announcements Composite MGC
H.323 version 4 details the procedure for indicating the QSIG Gateway
presence of in-band tones and announcements. Such Signalling
X
tones and announcements are often heard when the
destination number is incorrect or unreachable.
In addition to in-band tones and announcements, the C A
Gatekeeper may signal an endpoint to play specific Media
announcements at various times: pre-call, mid-call, or Flow MG
end-call.
Alias Mapping QSIG
When routing calls, a telephone number in the IP-world Signalling
may not be sufficient for proper routing into the SCN.
In addition, it might be that a service provider would Figure 23: H.323 – QSIG tunneling example
like to use the same Gateways to provide Virtual Voice
Private Networks, but need some intelligence in a H.323 specific URL
device to perform proper mapping. With Version 4, a Version 4 introduced URL scheme "h323". The H.323
Gateway, for example, can indicate that it can perform URL allows entities to access users and services in a
alias mapping at either the ingress or egress side of a consistent manner. The form of the H.323 URL is
call. This will reduce the number of malformed "h323:user@host", where "user" is a user or service
numbers, as well as provide a means for providing and "host" might be the Gatekeeper that can translate
Voice Virtual Private Network (VVPN) services. the URL into a call signaling address.
Better Bandwidth Management (multicast) Call Credit-related capabilities
Prior to H.323 Version 4, and endpoint could request H.323 v4 provides the means of communicating
much more bandwidth than it actually needs, and thus, available funds or for the Gateway to control early call
cause wasting network resources. With Version 4, it is termination based on available funds for the prepaid IP
mandatory that an endpoint made bandwidth requests telephony. H.323 v4 adds these features to the RAS
with a lower value if, indeed, the endpoint is using less protocol.
bandwidth than it had initially indicated in the ARQ. Multiplexing audio and video
In addition, managing bandwidth for multicast sessions One weakness with the current usage of RTP is
has been nearly impossible since, unless the difficulty in synchronizing the separate audio and video
Gatekeeper routed the H.245 signaling and carefully streams. Version 4 now includes an optional procedure,
monitored the media channels that were opened, it which allows both video and audio to be multiplexed in
could not determine whether two endpoints that request a single stream. This will assist endpoints in
bandwidth are actually requesting bandwidth for a synchronizing video and audio.
multicast session or unicast session. This becomes a DTMF Relay via RTP
much bigger issue when many people are participating H.323 version 4 allows an endpoint to utilize RFC
in a multipoint multicast conference. With Version 4, 2833 “RTP Payload for DTMF Digits, Telephony
specific details about the media channels are conveyed Tones and Telephony Signals” to send and receive
to the Gatekeeper in (Information Request Response) DTMF digits.
IRR messages (if the Gatekeeper requests them), so
that the Gatekeeper can better control bandwidth
utilization.
19
8.4 Further Features are under developing Robustness
on H.323 Robustness is developing, it requires refining the
architecture for recovery from crashes. Currently two
ITU-T is working or is going to work on some of architectures are proposed: small scale systems and
further enhanced features of H.323, which include large scale systems.
Generic Extensibility Framework, Protocol Inter- In smallscale systems, the architecture makes each
working, Mobility, and Robustness [1][10]. element responsible for detecting failures of the others.
Generic Extensibility Framework If one element fails in the system, the others can go to
The Generic Extensibility Framework (GEF) the backup element. Some state information of the
introduces new means by which H.323 may be failure element is then need to be provided. For large
further enhanced or extended with optional scale system, the architecture is very complex and need
features, which does not require changes to the to be specified.
current ASN.1 syntax
Inter-working or integrating with other protocols.
The inter-working or integrating with newly developed 9 Comments on H.323
protocols may need to be developed. These protocols H.323 is a very complex system with all kinds of
include SIP, H.248/Megco, and Bearer Independent features for multimedia communications, but not every
Call Control (BICC). part of H.323 has to be implemented when building a
SIP is gaining in popularity as a VoIP protocol. powerful and useful system. Multimedia over IP, in
H.248/Megaco may find its way into many “media itself, has a certain amount of complexity associated
gateway” devices, ranging from residential gateways to with it. It results in that the system for implementing
large-scale service provider gateways. The Bearer the inter-working between different multimedia
Independent Call Control (BICC) protocol will systems with various features and services is inevitably
compete with both H.323 and SIP for a place in the complex. The complexity does exist in a H.323 system
service provider network. Making H.323 work with is for a reason, the reason may become even more evident
also important. as video, audio, and data conferencing become more
Mobility prevalent [10].
Mobility includes terminal mobility, user mobility, and H.323 allows the use of multiple codecs. In the
service mobility. To implement the mobility of H.323, systems, there is a good reason for using each of the
the functions of mobility management need to be codecs.
defined, which include Home Location Function Gatekeepers are optional in H.323 system. They
(HLF), Visitor Location Function (VLF), provide consistent means for H.323 endpoints to
Authentication Function (AuF), and Inter-working perform address resolution, and may perform inter-
Function (IWF) (see Figure 24 [1]) working between simple H.323 (set devices) and more
protocol-complete H.323 entities. Gatekeepers can act
as a platform from which powerful new IP-based
services can be built and provided.
H.323 is scalable. Service providers can deploy H.323
networks in small scale or large scale depending on the
expected features and services.
H.323 is a proven technology used in large networks. It
has excellent integration with PSTN.
Multimedia conferencing shows the real potential
for H.323 used in multimedia communication
Many equipment manufacturers, software vendors,
and service providers have built products and
services supporting H.323. It greatly supports the
success of H.323.
With the constantly coming of new technologies, for
example BICC, H.323 has big pressure on keeping
its place in the service provider network.
10 Conclusions
As just presented, H.323 is organized around four
Figure 24: H.323 - Mobility
major facilities: (a) terminals, (b) Gateways (which can
perform protocol conversion), (c) Gatekeeper
(bandwidth manager), and (d) multipoint control units
20
(MCUs), responsible for multicasting. The H.323 MG – Media Gateway
standard is a principal technology for the transmission MGC – Media Gateway Controller
of real-time audio, video, and data communication over MP – Multi-point Processor
packet-based networks. It provides both multipoint and N-ISDN – Narrow-band ISDN
point-to-point sessions. One of the primary goals of PISN – Private Integrated Services Network
developing H.323 standards is to provide the POTS – Plain Old Telephone Service
interoperability between packet switched networks and PSN – Packet Switched Network
other multimedia networks. H.323 is a rich and PSTN – Public Switching Telephone Network
complex specification. Especially the version 4 is a QCIF – Quarter Common Intermediate Format
powerful system for multimedia communication. It QoS – Quality of Service
contains enhancements in a number of important areas, QSIG - Signaling between the Q reference points
including, scalability, reliability, flexibility, RAS – Registration/Admission Status
supplementary services, and new features. Future RCF/RRJ – Registration Confirm/Reject
releases will be even more powerful. Especially the RRQ – Registration Request
inter-working or integrating with other newly RTCP – Real Time Control Protocol
developed protocols will strengthen its position in the RTP – Real-time Transport Protocol
multimedia communication area. Mobility will greatly SCN – Switched Circuit Network
increase flexibility for using H.323 system in the fields SIP – Session Initiation Protocol
of terminal mobility, user mobility, and service SQCIF – Sub Quarter Common Intermediate Format
mobility. Of course, mobility will also greatly increase TCP – Transmission Control Protocol
the complexity of the H.323 system. TSAP – Transport Service Access Point
Even though H.323 is a powerful system for UCF/URJ – Unregistration Confirm/Reject
multimedia communication, if has faced great UDP – User Datagram Protocol
competition from some newly developed protocols, URQ – Unregistration Request
such as SIP, H.248/Megco, and BICC. Reducing the VLF – Visitor Location Function
complexity of H.323, and simplifying its usage will VoIP – Voice over Internet Protocol
hopefully improve its leading position in fast changing VVPN – Voice Virtual Private Network
multimedia communication world.
References
Acronyms Boaz Michaely: H.323 Overview, November 2000.
ACF/ARJ – Admission Confirm/Reject http://www.packetizer.com/iptel/h323/papers/
ARQ – Admission Request Chan-Hwa Wu ja J. David Irvin: Emerging Multimedia
AuF – Authentication Function Computer Communication Technologies, Prentice Hall,
1998, ISBN 0-13-079967-X.
BCF/BRJ – Bandwidth Confirm/Reject
Databeam Corporation: A Primer on the H.323 Series
BICC – Bearer Independent Call Control Standard, 1999.
B-ISDN – Broadband ISDN http://www.packetizer.com/iptel/h323/primer/
BRQ – Bandwidth Request ITU-T: Recommendation H.323, 1998.
CIF – Common Intermediate Format ITU-T: Recommendation H.323, 2000.
DCF/DRJ – Disengage Confirm/Reject Olivier Hersent, David Gurle & Jean-Pierre Petid: IP
DRC – Direct Routed Call signaling Telephony Packet-based multimedia communications
DRQ – Disengage Request systems, Pearson Education Limited 2000, ISBN 0-201-
DTMF – Dual-Tone Multi-Frequency 61910-5.
Packetizer: H.323 Version 4 – Overview, 2001.
GCF/GRJ – Gatekeeper Confirm/Reject http://www.packetizer.com/iptel/h323/whatsnew_v4.html
GEF – Generic Extensibility Framework Packetizer: H.323 Version 3 – Overview, 2001.
GK – Gatekeeper http://www.packetizer.com/iptel/h323/whatsnew_v3.html
GQOS – Guaranteed Quality of Service Packetizer: H.323 Version 2 – Overview, 2001.
GRQ – Gatekeeper Request http://www.packetizer.com/iptel/h323/whatsnew_v2.html
GSTN – General Switched Telephone Network Paul E. Jones: H.323 Past, Present and Future, January 2001.
GRC – Gatekeeper Routed Call signaling http://www.packetizer.com/iptel/h323/papers/
GRQ – Gatekeeper Request Phillips Omnicom Training: Voice Over IP Training
HLF – Home Location Function Material, 2000.
Trillium: H.323, 2000. http://www.iec.org/tutorials/h323/
IRR – Information Request Response
Uyless D. Black: Voice Over IP, Prentice Hall PTR 2000,
IRQ – Information Request ISBN 0-13-022463-4.
ISDN – Integrated Services Digital Network
ISUP – ISDN User Part
ITU – International Telecommunication Union
IWF – Inter-working Function
MC – Multi-point Controller
MCU – Multi-point Control Unit
21
Voice Quality in IP Telephony
Vesa Kosonen
P.O.Box 3000, FIN-02015 HUT, FINLAND
vesa.kosonen@hut.fi
mobility was favored despite of lower voice quality.

Ever since new IP telephony solutions have came to
Abstract market. Especially companies benefit from the new
technology: it allows cheap internal calls and the same
This paper has been presented at the licentiate course in
infrastructure can be used for data and voice. This is
the Networking Laboratory of Helsinki University of
especially tempting since the amount of voice traffic is
Technology in April 2001. The topic of the course was
decreasing compared to data traffic.
‘IP Telephony’.
In this paper we will study voice quality in IP telephony.
We will look at the causes of impairments along the end- 2 End-to-End Route of a Voice Call
to-end path and how to recover from them. We will also
introduce common criterions to measure voice quality.
Some results based on measurements in our laboratory 2.1 Scenarios from ETSI/TIPHON
with different commercial VoIP phones will also be ETSI/TIPHON has defined four different scenarios of
included. making an IP call (See Appendix A). The scenario 0
defines a call from an IP network to another IP network.
The scenario 1 on the other hadn defines a call from IP
1 Introduction network to SCN. The scenario 2 is the opposite of the
scenario 1: a call from SCN to IP network. Finally the
The improvement of voice quality has been one of the scenario 3 defines a call between two SCNs using
main interests in the telephony industry since the Internet between them.
invention of telephone in 1876. Especially delay and
echo have caused most problems. Nowadays the quality
of the SCN phones is very good. The codec using G.711 2.2 Path of a Voice Call
-standard with 8 kHz sampling frequency gives MOS We will use scenario 0 to show the route of an end-to-
value 4.2, while the theoretical maximum value is 5.0. end voice call (Fig. 1). The analog speech of a caller is
On the contrary voice quality of IP telephony is far away first transformed to digital bits. It is called A/D
from quality of SCN phones. But for the surprise of transformation and it is done by taking samples from the
many IP telephony draws the attention of ever increasing speech and quatising them. The bitstream is then
number of people. Also telephone companies have encoded. Encoding or speech coding is the process of
realized the potential of IP telephony, especially the transforming digitized speech into a form that can be
threats that lie ahead of them. efficiently transported over the network. The reverse
function of encoding is decoding which is performed at
the receiving end [2]. After encoding bits are framed.
The first applications of IP telephony were programs that The size of the frame depends on the used codec. E.g.
made it possible to call anywhere in the world that had G.723.1 codec uses 30 ms frames. Several frames are
Internet access. One could use a personal computer to grouped together and packetized by adding
call to another personal computer with the same program RTP+UDP+IP header (12+8+20=40 bytes). Now the
without ever entering SCN network. Because calling packets are ready to be sent to Internet.
with this new invention was not so handy as with normal
phone only a few people liked to use it even it meant free
calls. Voice quality varied from bad to moderate. Later At the other end the header information is removed.
on new telephone operators appeared to the market who While travelling through Internet some delay is always
gave low price overseas telephone services based on the introduced. Delay is not the same for all packets which
utilization of IP telephony. Now you could make a call causes variation in delay or interarrival jitter as it is also
from your own SCN telephone. Even though voice called. Packets might also be lost. Jitter buffer is used to
quality was only moderate many people got interested. correct those impairments. After a playout time the
The same thing happened earlier with mobile telephones: packets are deframed, decoded and transformed back to
analog voice.
22
3.2 Encoder/Decoder
Audio Audio Typically in a telephone conversation there are periods
of silence. These silence periods don't contain intelligent
information and are cut off. This is done with the help of
voice activity detector (VAD) which cuts the silence
A/D D/A periods and sends only silence information descriptor
(SID) frames. The other end adds the silence into the
speech. This is a very efficient way of saving bandwidth
Decoding since the estimated time of silence is close to 50% [2]. In
Encoding a bitstream there is also always redundant information
that can be removed before sending e.g. information that
can be forecasted by extrapolating or that is repeated by
Deframing
certain intervals. To save bandwidth further bitstream is
Framing also compressed, both payload and headers (40 bytes à
2 or 4 bytes).
Jitter buffer
Packetization The bit rate of a typical G.711 codec is 64 kbit/s. The

Depacketization quality is good but the problem is the big bandwidth
usage. Codecs with smaller bit rates have been
developed. E.g. G.723.1 codec has bitrate only 5.3/6.3
kbit/s. Yet the voice quality is almost the same (MOS
Internet
value 3.7/3.9).
3.3 Audio equipment

Figure 1. End-to-End route of voice in an IP-to-IP
call [2]. There are two kinds of echos namely talker echo and
listener echo. Talker echo means that the speaker hears
his/her own voice but delayed and attenuated. It can be
caused by electric (hybrid) echo or by acoustic echo
3 Causes of Impairments and how to picked up at the listener side. If the talker's echo is
Improve Voice Quality reflected twice the listener will hear the talker's voice
twice - a loud signal first, then attenuated and much
There are several factors that cause impairments to the delayed. This is called listener echo [3].
end-to-end voice quality of IP telephony. We will
consider five of them, namely delay, jitter, packet loss,
echo and bandwidth usage. They occur either at terminal An IP phone can be either a separate microphone and
or along the transmission path or in both of them. loudspeakers, personal computer together with headset
or it may look like an ordinary SCN phone. When used a
separate microphone and loudspeakers there will exist
3.1 The Operating System acoustic echo which can be eliminated with the help of
As we speak the sound card samples the microphone an acoustic echo canceller (AEC). Typical values for
signals and accumulates them into a memory buffer. acoustic echo attenuation in loudspeakers phones are 10-
When the buffer is full the sound card tells the operating 15 dB, in hands-free phones are 35-40 dB and in phone
system by the help of an interrupt signal that it can with good quality handset are 35-40 dB [3]. If the two
retrieve the buffer. There is a limit how many interrupts other types of phones are used then acoustic echo doesn't
an operating system allows. For instance Windows exist.
allows an interrupt not more often than every 60 ms.
This means that the buffer collects speech samples in 3.4 An IP/ISDN Call
chunks of at least 60 ms which is the introduced
minimum delay [3]. When a call is made between an IP phone and an ISDN
phone there is a need to use a gateway. The gateway
makes the mapping from IP network to ISDN network
To avoid this problem some vendors use real-time and vice versa. This introduces some delay.
operating systems, which allow as many interrupts as
needed. Another way is to do all real-time functions
using dedicated hardware and perform only the control 3.5 An IP/PSTN Call
functions from the non real-time operating system [3]. If a call is made from an IP phone to a PSTN phone we
also need to use a gateway. In addition to that PSTN
23
network will also introduce electric echo which is caused satisfied with end-to-end transmission performance
by the 2wire/4wire transformation. The phones at user while avoiding over-engineering of networks [4]. The
side use only two wires but the network uses four wires. model estimates the conversational quality from mouth
Thus the 2wire/4wire transformation has to be to ear as perceived by the user at the receive side, both as
performed. The electric echo can be eliminated with an listener and talker. The primary output from the model is
electrical echo canceller (EEC) which should be the "Rating Factor" R. The model combines the effect of
positioned as close to the user as possible [3]. several impairment factors instead of considering them
separately [4]. The Rating Factor R is defined as follows:
3.6 Jitter Buffer
Jitter buffer is used to eliminate the impairments caused R = R0 - Is - Id - Ie + A [4]
by the transmission path. Real-time Transmission
Protocol (RTP) was developed to handle the situations Where
that may occur to the packets as they travel through - R0 represents the basic signal-to-
Internet. If packets are lost codecs try to hide that. This noise ratio, including noise sources such as
procedure is called 'error concealment'. Lost packets are circuit noise and room noise
decomposed by interpolating the previous packet [2]. - Is is the combination of all
This prevents gaps in the speech. The packets that arrive impairments that occur simultaneously with
in wrong order or are delayed are desequenced with the voice signal, such as the quantization
help of RTP time stamp and sequence number. The size distortion or too load side tone
of the jitter buffer can be adjusted and it is a trade-off - Id represents impairments caused
between delay and voice quality. If the size of jiffer by delay including impairments caused by
buffer is long it has time to wait for delayed packets but talker and listener echo or by loss of
in that case it will introduce more delay. To minimize interactivity
delay all packets will not arrive in due time and that - Ie represents the impairments
causes gaps into the speech. caused by use of special equipment, such as
low bit rate codecs or by e.g. packet loss
[4]
4 Methods to Assess Voice Quality - A is the advantage factor, which
expresses the decrease in the rating R that a
user is willing to tolerate lower voice
4.1 Mean Opinion Score quality, e.g. the A factor for mobile
In order to be able to compare voice quality of different telephony is 10 [5] and for multi-hop
telephony systems we need some common criterions. satellite connections A is 20 [4].
One possibility is to assess voice quality subjectively
with the help of MOS (Mean Opinion Score) scale. The values of the rating factor R can lie between 0 and
Voice quality is given values between 0 - 5. Table 1 100, where R=0 means an extremely bad quality and
below shows the MOS values of the most common ITU- R=100 means a very high quality. The values of R can
T standardized codecs. also be compared with MOS values and user satisfaction
as shown in the Table 2. The lower limit of R is included
Table 1. MOS values of the most common ITU-T but not the upper limit.
standardized speech codecs [3].
Table 2. Comparing R, MOS and user satisfaction
according to [4], [6]
Standard Bitrate MOS
R-value R-value MOS-value User
(in value (attribute) (lower limit) Satisfaction
kbit/s)
90 - 100 Best 4.34 Very satisfied
G.711 64 4.2
80 - 90 High 4.03 Satisfied
G.726 32 4.0
70 - 80 Medium 3.60 Some dissat.
G.728 16 4.0
60 - 70 Low 3.10 Many dissat.
G.729 8 4.0
50 - 60 Poor 2.58 Nearly all dissat.
G.723.1 6.3/5.3 3.9/3.7
PSTN quality is an example of desirable level of voice
4.2 E-model
quality. It is described as “good intelligibility, good
E-model (ITU-T standard G.107) was originally speaker identification, naturalness, only minor disturbing
developed by ETSI. It is a computational tool to assess impairments” [7] If a call is considered to be PSTN
end-to-end voice quality. It was developed for the use of quality then rating factor values R=>70 should be
network planners to help to ensure that users will be reached. G.107 standard lists the default values, which
24
are recommended to be used for all parameters that don't
vary during the calculation. If only default values are D(i) = (Ri - Ri-1) - (Si - Si-1) [8]
used the calculation results in a very high quality with
rating factor of R = 93.2. [4]. In that case (and if echo is When delay is constant the value of D(i) is zero. But
perfectly controlled, that is echo loss = ∞) a call retains when delay varies the spacing of the packets at the
its quality up to a mouth-to-ear delay of 150 ms. Also
delay values even up to 400 ms are still within the limits Selsius/Cisco
of PSTN quality [5]. IP phone
5 Measurements Hub
5.1 A Practical Test on Delay

To get an idea about the end-to-end delay in different
telephony systems you can perform the following
trendsetting test. Make a call with an ISDN phone to DNA-323 Selsius
another ISDN phone. Start counting. When the other analysator server
person hears you say 'one' he should say 'two'. When you Virtual
hear the other person say 'two' you should say 'three' and Selsius/Cisco
so on. Count until fifty and take the time elapsed. You Netmeeting
should repeat the test several times. Then make another
call with your mobile phone to another mobile phone
and repeat the same procedure. Compare the times [3].
Table 3 lists the results from the previous measurements
including the same measurements done in our laboratory
environment. It shows that the end-to-end delay in ISDN Selsius/Cisco
network is lower than in GSM network. On the other Virtual IP phone
Selsius/Cisco
hand Selsius/Cisco phone seems to have lower delay
Netmeeting
than NetMeeting.
Table 3. A Practical Test on Delay
Figure 2. Measurement arrangements.
Time is takes to receiving end varies, too. The parameter that describes
count to fifty this difference is called delay variance or jitter J. There
ISDN 32 s is a connection between D(i) and J as follows:
GSM 42 s
Selsius/Cisco phone 40 s
NetMeeting 55 s
S1 S2 S3 S4 S5 S6
5.2 The Measuring Environment
sender
We have measured some commercial VoIP phones R1 R2 R6 R5
(Selsius/Cisco IP phone and Microsoft's NetMeeting). R4
The phones were connected to 10 Mbit/s Ethernet local
LAN (Fig 2). receiver
P1 P2 P4 P5 P6
We used Dummynet software to simulate different real arrival P3
life situations by altering its parameters, such as delay,
bandwidth and packet loss. The packets were captured playout
and analyzed by DNA-323 analyzer software. Before Jn = Jn-1 + 1/16 * (|D(i)| - Jn-1) [8]
introducing some of the results we will explain the key
concepts namely packet spacing difference (D(i)) and
jitter (J). Figure 3. Synchronization in Jitter Buffer.
Jitter gives the size of the jitter buffer that is needed to
Packet spacing difference is defined as the difference synchronize the packets before they are played out. The
between the consecutive received packets subtracted playout time depends on the position of the first packet
with the difference between the consecutive sent packets as follows:
(Fig. 3).
25
remains the same even when load was applied. Where as
Pn = Pn-1 + (SI - SI-1), for n>1 [8] the same curve of NetMeeting has changed considerably.
Where Table 6 shows how decreasing bandwidth effects on

- RI is the arrival time of the received voice quality. See also figures 8 and 9 in Appendix D.
packet
- SI is the generation time of the packet Table 6. Effects of changing bandwidth.
- Pn is the playout time of the packet
5.3 Some Measuring Results Bandwidth Selsius/Cisco NetMeeting

In Appendix B. you will find some measuring results of 80 bit/s Noticeable Noticeable
Selsius/Cisco phone and NetMeeting. We measured the decrease in quality decrease in quality
packet spacing difference and jitter without any load
(Dummynet parameters: bandwidth=10 Mbit/s, packet
60 kbit/s Difficult to Difficult to
loss = 0 and delay = 0 ms) and then with load by altering
understand understand
the parameters separately.
5.3.1 Measurements without load 50 kbit/s Nearly impossible More difficult to
Without load voice quality of Selsius/Cisco phone (Fig. to understand understand
4 in Appendix B.) was good and there was no noticeable
delay either. The clipping of voice when not talking 20 kbit/s ---- Nearly impossible
caused the only inconvenience. But it was faded out if to understand
there was some background noise. But voice quality was
lower than SCN quality. The summary of statistics is
show in Table 4.
Voice quality of NetMeeting (Fig.5 in Appendix B) on

References
the other hand was considerably worse compared with [1] ETSI, Telecommunications and Internet Protocol
Selsius/Cisco phone. There was clearly noticeable delay Harmonization Over Network (TIPHON); End to end
and the tone of the voice was softer. The graphs of Quality of Service in TIPHON Systems; Part 1: General
packet spacing difference and jitter are quite different. Aspects of Quality of Service (QoS), France, 2000 (TR
The graph of packet spacing difference has two peaks. It 101 329-1, V 3.1.1 (2000-07)).
is due to variance in delay and packet loss.
[2] Selin, J.: Media Management in IP Telephony
Table 4. Statistic of D and J without load Systems, Master Thesis of the Networking Laboratory,
Helsinki University of Technology, Espoo, Feb. 2001.
Selsius Selsius NetM. NetM. [3] Hersent O et al, IP Telephony: Packet-based
D [ms] J [ms] D [ms] J [ms] multimedia communications systems, Great Britain,
Average 0,0156 0,3986 0,021 20,026 2000, ISBN 0-201-61910-5.
St.Dev. 0,6694 0,1775 21,041 1,9179 [4] ITU-T Recommendation G.107, The E model, a
Var. 0,4480 0,0315 442,703 3,6785 computational model for use in transmission planning,
5.3.2 Measurements with load 2000.
Table 5 shows the effects of packet loss on voice quality. [5] Janssen J et al, Delay and Distortion Bounds for
See also figures 6 and 7 in Appendix C. Packetized Voice Calls of Traditional PSTN Quality,
Proceedings of the 1st IP-Telephony Workshop (IPTel
Table 5. Effects of introducing packet loss.
2000), Berlin, 2000.
Packet loss Selsius/Cisco NetMeeting [6] ETSI Telecommunications and Internet Protocol
20% Small crackings Small crackings Harmonization Over Network (TIPHON); TIPHON; End
25 % Gaps in speech Gaps in speech to end Quality of Service in TIPHON Systems; Part 2:
30 % It took a few seconds Big gaps in Definition of Quality of Service (QoS) Classes, France,
to connect, more gaps speech 2000 (TR 101329-2, V 1.1.1 (2000-07)).
35 % Severe gaps in Speech difficult [7] ITU-T Recommendation G.113, Transmission
speech, the to understand Impairments, 1996.
connection was cut [8] Yletyinen, T.: The Quality of Voice over IP, Master
Thesis of the Laboratory of Telecommunication
As can be seen from the figures Selsius phone seems to Technology, Helsinki University of Technology, March
be more robust than NetMeeting. The shape of the 1998.
packet spacing difference curve of Selsius phone
26
Appendix A. Scenarios of End-to-End Voice Call by ETSI/TIPHON [1]
IP Network
IP Network
TIPHON
Terminal
TIPHON
Terminal IP Access
IP Access
Scenario 0: Call from IP network to IP network
TIPHON
Terminal IP Network
IP Access IFW
Call initiated from IP Network to
SCN SCN
network
Call initiated from SCN to

IP Network
Scenario 1: Call from IP Network to SCN
Scenario 2: Call from SCN to IP Network
IP Network
IFW IFW
SCN
network
Scenario 3: Call from SCN to SCN over IP Network
27
Appendix B. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with no load
3
2
D, 1
J
[m 0
s] -1 0 200 400 600 800 1000 1200 1400 1600 1800
-2
-3
Packet number
Nu 400 Nu 600
m m 500
be 300 be
r r 400
of 200 of 300
pa pa 200
ck 100 ck
et et 100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 4. Selsius/Cisco IP phone with no restrictions (Bandwidth = 10 Mbit/s, Delay = 0 ms, Packet loss = 0%)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms] and down
right is the histogram of J [ms].
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800
-20
-40
Packet number
400 600
Number of packets
Number of packets
500
300 400
200 300
200
100 100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 5. NetMeeting program with no restrictions (Bandwidth = 10 Mbit/s, Delay = 0 ms, Packet loss = 0%)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms], and
down right is the histogram of J [ms].
28
Appendix C. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with packet loss 25 %
3
2
1
D, J [ms]
0
-1 0 200 400 600 800 1000 1200 1400 1600 1800
-2
-3
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 6. Selsius/Cisco IP phone with Packet loss = 25 % (Bandwidth = 10 Mbit/s, Delay = 0 ms)
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800
-20
-40
Packet number
400 600
Number of packets
Number of packets
300
400
200
200
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 7. NetMeeting program with Packet loss = 25 % (Bandwidth = 10 Mbit/s, Delay = 0 ms)
29
Appendix D. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with bandwidth 80 kbit/s
50
0
-50 0 200 400 600 800 1000 1200 1400 1600 1800
D, J [ms]
-100
-150
-200
-250
-300
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 8. Selsius/Cisco IP phone with Bandwidth = 80 kbit/s (Delay = 0 ms, Packet loss = 0 %)
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
-20
-40
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 9. NetMeeting program with Bandwidth = 80 kbit/s (Delay=0 ms, Packet loss=0%)
Upper picture shows the measured D and J as the packets were captured, bottom left is the histogram of D [ms], and
bottom right is the histogram of J [ms].
30
Voice in Packets: RTP, RTCP, Header Compression, Playout
Algorithms, Terminal Requirements and Implementations
Jani Lakkakorpi
Nokia Research Center
P.O. Box 407
FIN-00045 NOKIA GROUP
Finland
jani.lakkakorpi@nokia.com
number. Chapter 2 describes RTP and its companion

Abstract protocol, Real-Time Control Protocol (RTCP). Chapter
3 addresses the problem of RTP header compression –
RTP/RTCP protocol suite provides the means for the overhead can be really large, as we will see.
sending packetized voice by introducing time stamps Chapter 4 introduces some playout algorithms
and sequence numbers in packet headers. Playout developed for packetized voice, and chapter 5 gives a
buffering is needed to re-synchronize the received short overview about the terminal requirements and
voice stream. In this paper, a new adaptive playout implementations for Voice over IP.
delay adjustment algorithm is introduced.
A major problem, especially on low-bandwidth links,

with Voice over IP (VoIP) packets is that they include 2 RTP and RTCP
a lot of overhead. The solution is header compression,
which is done on link-by-link basis. Real-Time Transport Protocol (RTP) provides end-to-
end transport functions suitable for applications that
All terminals that support real time interactive voice transmit real time data, such as audio or video over
should have enough processing power. The multicast or unicast networks. RTP does not provide
computational requirements of voice codecs usually any Quality of Service (QoS) guarantees but it is only
increase with the voice compression ratio. responsible of synchronizing the received packets –
within a single stream or across two streams. To
achieve this, RTP packets are equipped with
timestamps and sequence numbers.
1 Introduction
RTP is designed to work together with Real-Time
During an average telephone conversation, each party Control Protocol (RTCP) to get feedback on the quality
usually talks only 35 per cent of the time. Most of the of data transmission and information about participants
techniques that are used today to transform voice into in the on-going session. Request For Comments (RFC)
data have the ability to detect silent periods. This 1889 (and its later revisions) includes complete
allows asynchronous voice transmission and statistical descriptions of these protocols and their uses [Sch96a].
multiplexing [Her00].
With statistical multiplexing, bandwidth can be used 2.1 RTP

more efficiently. However, statistical multiplexing also The first twelve octets in an RTP header (the first three
introduces some uncertainty in the network. This rows in Figure 1) are included in all RTP packets,
uncertainty is variation in delay – also known as jitter. while the list of Contributing Source identifiers
It needs to be corrected by the receiving side by adding (CSRC) is present only when inserted by a mixer (see
some playout delay in order to restore the original Section 2.2).
packet spacing. Otherwise the original speech would
sound incomprehensible.
• Version (V, 2 bits)
Real-Time Transport Protocol (RTP) gives us the
means to re-synchronize a voice stream. Each RTP This field identifies the RTP version. The current
packet is equipped with a timestamp and a sequence version, defined in RFC 1889, is two.
31
• Padding (P, 1 bit) The sequence number is used by the receiver to
detect packet losses and to restore packet
If padding bit is set, packet contains one or more sequence.
padding octets at the end of the payload. The last
octet of the payload contains the number of • Timestamp (32 bits)
padding octets.
Timestamp reflects the sampling instant of the first
V P X CC M PT Sequence Number payload octet. The clock frequency is defined for
each payload type, and the clock is initialized with
Timestamp a random value [Her00].
Synchronization Source (SSRC) Identifier • SSRC (32 bits)
Contributing Source (CSRC) Identifiers SSRC field identifies the synchronization source.
This identifier is chosen randomly, with the intent
that no two synchronization sources within the
… same RTP session have the same SSRC identifier.
Profile-specific Extensions • CSRC list (0 to 15 items, 32 bits each)
Figure 1. RTP Header Format CSRC list identifies the contributing sources for
the payload contained in this packet. The number
• Extension (X, 1 bit) of identifiers is given by the CC field. Only 15
sources can be identified. CSRC identifiers are
If extension bit is set, the fixed RTP header and inserted by mixers, using the SSRC identifiers of
possible CSRCs are followed by extensions that the contributing sources.
use the format defined in RFC 1889.
• CSRC count (CC, 4 bits) 2.2 Mixers and Translators

RTP-level relays called mixers re-synchronize
CSRC count contains the number of Contributing incoming packets in order to reconstruct the packet
Source identifiers that follow the fixed header. spacing generated by the sender, mix these
This number is usually zero [Her00]. reconstructed streams into a single stream, possibly
translate the encoding, and then forward the packet
• Marker (M, 1 bit) stream. These packets might be unicast to a single
recipient or multicast to multiple recipients. RTP
header includes the means for mixers to identify the
The interpretation of marker bit is defined by RTP
sources that contributed to a mixed packet so that
profile. The marker bit is intended for marking
correct talker indication, for example, can be provided
significant events, such as frame boundaries, in the
at the receiver.
packet stream. H.225.0, for example, says that for
audio codings supporting silence suppression, the
marker bit must be set to one in the first packet of Another type of RTP-level relay, translator, just takes a
each talkspurt after a silence period [Her00]. stream and passes it through. Translator can be used,
for example, in a situation where the receiver is beyond
a firewall.
• Payload Type (PT, 7 bits)
This field identifies the format of the RTP 2.3 RTCP

payload, and determines its interpretation by the
application. A profile specifies default static Real-Time Control Protocol (RTCP) is based on the
mapping of payload type codes to payload formats. periodic transmission of control packets to all
An initial set of default mappings for audio and participants in the session. The underlying protocol
video is specified in RFC 1889 [Sch96b]. must provide multiplexing of data and control packets,
for example, by using separate port numbers with UDP
(User Datagram Protocol). RTP is usually assigned an
• Sequence Number (16 bits) even UDP port, and RTCP the next odd UDP port.
RTCP performs three mandatory functions:
Sequence number starts from a random value and
it is incremented by one for each RTP packet sent.
32
1. The primary function is to provide feedback on the Both the sender report and the receiver report include
quality of the data distribution. This function is reception report blocks, one for each of the
performed through sender and receiver reports. synchronization sources from which this participant
has received RTP data packets since the last report.
2. RTCP carries a persistent transport-level identifier Reports are not issued for contributing sources listed in
for an RTP source, called Canonical Name the CSRC list. Each reception report block provides
(CNAME). Since the SSRC identifier may change, statistics about the data received from the particular
all receivers require the CNAME to keep track of source indicated in that block.
each participant.
V P RC PT=SR=200 Length
3. Since the first two functions require that all
participants in a session send RTCP packets, the SSRC of Sender
RTCP packet rate must be controlled in order to
scale up to a large number of participants. Each NTP Timestamp, Most Significant Word
participant can independently observe the number
of other participants and thus control its RTCP
packet rate. The maximum rate at which a NTP Timestamp, Least Significant Word
participant can send RTCP reports is one per five
seconds. RTP Timestamp
It is recommended that translators and mixers combine Sender's Packet Count

individual RTCP packets from multiple sources into
compound packets whenever feasible. Sender's Octet Count
RFC 1889 defines several RTCP packet types to carry SSRC_n (SSRC of nth Source)
control information:
Fraction Lost Cumulative Number of Packets Lost
• SR (Sender Report) contains transmission and
reception statistics for active senders
Extended Highest Sequence Number Received
• RR (Receiver Report) contains reception

Interarrival Jitter
statistics for participants that are not active senders
Last SR (LSR)
• SDES (Source Description Items) describes
various parameters about the source, including the
CNAME Delay Since Last SR (DLSR)
• BYE packet is sent by a participant when leaving …

the session
Profile-specific Extensions
• APP: application specific functions
Figure 2. RTCP Sender Report
Each RTCP packet begins with a fixed part, followed
by structured elements that may be of variable length The first section of the sender report, called header, is 8
according to the packet type, but always end on a 32- octets long.
bit boundary.
• Version (V, 2 bits)
2.3.1 Sender and Receiver Reports
Identifies the current version (which is the same
RTP receivers provide reception quality feedback using for RTCP and RTP packets). The version defined
RTCP report packets of two types. The only difference in RFC 1889 is two.
between sender reports (SR) and receiver reports (RR),
besides the packet type code, is that the sender report
includes a 20-byte sender information section • Padding (P, 1 bit)
(highlighted in Figure 2). Sender report is issued if the
If the padding bit is set, this RTCP packet contains
participant has sent at least one RTP packet during the
some additional padding octets at the end which
last report period – otherwise a receiver report is
are not part of the control information. The last
issued.
33
octet of the packet contains the number of these The third section contains reception report blocks. The
padding octets. amount of these blocks depends on the number of other
sources that this sender has been listening to since last
• Reception Report Count (RC, 5 bits) report.
The number of reception report blocks contained • SSRC_n (Source Identifier, 32 bits)
in this packet. A value of zero is valid.
The SSRC identifier of the source that we are
• Packet Type (PT, 8 bits) reporting about.
Contains the constant 200 to identify this packet as • Fraction Lost (8 bits)
an RTCP sender report.
The fraction of RTP data packets from source
• Length (16 bits) SSRC_n that were lost since the previous sender or
receiver report was sent. If the loss is negative due
Length of this RTCP packet in 32-bit words to duplicates, the fraction lost is set to zero.
subtracted by one. (Includes the header and any
padding.) • Cumulative Number of Packets Lost (24 bits)
• SSRC (32 bits) The total number of lost packets from source
SSRC_n since the beginning of reception. This
Synchronization source identifier for the originator figure is defined to be the number of packets
of this sender report. expected subtracted by the number of packets
actually received. The number of packets received
The second section, sender information, is 20 octets also includes late and duplicate packets. Thus
(five rows in Figure 2) long and it is present in all packets that arrive late are not counted as lost, and
sender reports. It summarizes the data transmissions the loss may be negative if there are duplicates.
from this sender. The number of packets expected is defined to be
the last extended highest sequence number
received subtracted by the initial sequence number
• NTP Timestamp (64 bits) received. This may be calculated as shown in RFC
Indicates the wallclock time when this report was 1889 [Sch96a].
sent.
• Extended Highest Sequence Number Received
• RTP Timestamp (32 bits) (32 bits)
Corresponds to the same time as the NTP The low 16 bits contain the highest sequence
timestamp, but in the same units, and with the number received in an RTP packet from source
same random offset as the RTP timestamps in data SSRC_n, and the most significant 16 bits extend
packets. This correspondence may be used for that sequence number with the corresponding
intra- and inter-media synchronization for sources count of sequence number cycles.
whose NTP timestamps are synchronized.
• Interarrival Jitter (32 bits)
• Sender's Packet Count (32 bits) An estimation of the variance of the RTP packet
The total number of RTP data packets transmitted interarrival time measured in timestamp units and
by the sender since starting transmission up until expressed as an unsigned integer.
the time this sender report was generated. The
count is reset if the sender changes its SSRC Interarrival jitter can be calculated as a difference
identifier. in the relative transit time for two packets. The
relative transit time is the difference between the
• Sender's Octet Count (32 bits) packet's RTP timestamp and the receiver's clock at
the time of arrival, measured in same units. If Si
The total number of payload octets (not including
header or padding) transmitted in RTP data is the RTP timestamp of packet i and Ri is the
packets by the sender since the start of time of arrival of packet i (in RTP timestamp
transmission up until the time this sender report units), the difference in packet spacing for the two
was generated. The count is reset if the sender packets, i and j , can be expressed as:
changes its SSRC identifier. This field can be used
to estimate the average payload data rate.
34
D( i , j ) = ( R j − Ri ) − ( S j − Si ) . Header compression is based on the simple idea that
since most of the data packet overhead is constant for a
given stream, it is possible to negotiate a shorter index
Interarrival jitter is updated each time when a for those constants (e.g. source and destination IP
packet is received from source SSRC_n (using the addresses and ports) when the stream is set up [Her00].
difference in packet spacing for that packet and the Other (variable) values can be reconstructed at the
previous packet) according to the following receiving end. To put it short: the sending host replaces
formula: the large RTP/UDP/IP header to a small index, and the
receiving host reverses this operation. An RTP/UDP/IP
header compression mechanism for low-speed links is
J = J + ( D(i −1,i ) − J ) / 16 . described in RFC 2508. In many cases, all three
headers can be compressed to 2-4 bytes. The
When the reception report is issued, the current compression is done on a link-by-link basis [Cas99].
value of J is sampled.
3.1 RFC 2508

• Last SR Timestamp (LSR, 32 bits)
The compression algorithm proposed in RFC 2508
The middle 32 bits of the NTP timestamp of the draws heavily upon the design of TCP/IP header
most recent RTCP sender report from source compression, which is described in RFC 1144 [Jac90].
SSRC_n. If no sender report has been received
yet, the field is set to zero. 3.1.1 The Basic Idea
In TCP/IP header compression, it has been observed
• Delay Since Last SR (DLSR, 32 bits) that about half of the header bytes remain constant over
the duration of the connection. After the header has
The delay (expressed in NTP form: units of been sent uncompressed once, the constant fields can
be excluded from the following compressed headers.
1 / 65536 seconds) since the last sender report
Headers can be further compressed with the help of
arrived. Together with the last SR timestamp, the
differential coding on the changing fields.
sender of this last SR can use it to compute the
round trip time. If no SR packet has been received
yet from SSRC_n, the DLSR field is set to zero. In RTP header compression, some of the
aforementioned techniques can be applied, but the
2.3.2 Receiver Report RTCP Packet major gain comes from the fact that although several
fields change in every packet, the difference from
Receiver report (RR) shares the same format with the packet to packet is often constant. If the compressor
sender report except that the packet type field contains and decompressor maintain both the uncompressed
the constant 201, and the five words of sender header and the first-order differences, the only
information are omitted (these are the NTP & RTP information that must be conveyed is an indication that
timestamps and sender's packet & octet counts). the second-order difference was zero. If that is the case,
the decompressor can reconstruct the original header
without any loss of information by adding the first-
order differences to the uncompressed header as each
3 Header Compression compressed packet is received.
The high overhead of RTP/UDP/IP packets is a
challenging issue, especially on slow links [Her00]. For 3.1.2 Header Compression for RTP/UDP/IP
example, a popular video-conferencing application, Packets
Microsoft NetMeeting [Net01], uses voice codec In IPv4 header, only the total length, packet ID, and
G.723.1, where a frame of 24 bytes is sent every 30 ms. header check-sum fields typically change. The total
This will produce a data rate of 6.3 kbit/s. Since length can be excluded, because it is provided by the
RTP/UDP/IP headers add at least 40 bytes of overhead, link layer. Since the RFC 2508 compression scheme
and the link layer some bytes as well (PPP+HDLC add depends upon the link layer to provide good error
four bytes), the resulting bit rate will be over 18 kbit/s. detection, the header checksum may also be excluded.
A common trade-off for reducing the bit rate is to put In order to maintain lossless compression, changes in
several frames in a single packet. However, this can set the packet ID are transmitted. The packet ID is usually
the conversational delay to a level that is far too high incremented by one for each packet. In IPv6 base
for most users. The overhead issue can be solved also header, neither packet ID nor header checksum exist,
by using header compression [Cas99]. and only the payload length field changes.
35
In UDP header, the length field is redundant with the Three different playout delay adjustment algorithms for
IP total length field and the length indicated by the link packetized voice are presented in [Moo98]. The paper
layer. UDP checksum field will be zero in the case, is focused on the tradeoff between packet playout delay
where source does not generate any UDP checksums. and packet playout loss. The authors present an
Otherwise, the checksum must be sent intact in order to adaptive delay adjustment algorithm that tracks the
preserve lossless compression. network delay of recently received packets and
maintains delay percentile information.
In most RTP headers, only the sequence number and
timestamp change from packet to packet. If packets are Some playout delay adjustment algorithms assume that
not lost or misordered, the sequence number is the sender and receiver clocks are synchronized, but in
incremented by one for each packet. For audio packets [Moo98] this is not the case. The propagation delay is
of constant duration, the timestamp is incremented by removed from end-to-end delay by subtracting out the
the number of sample periods conveyed in each packet. minimum of measured end-to-end delays. Thus it is
possible to concentrate on the variable delay
If the second-order differences of the sequence number component.
and timestamp fields are zero, the next packet header
can be constructed from the previous header by adding In the following section, we present a similar, although
the first-order differences (that are stored in the session somewhat simpler, algorithm for adaptive playout
context along with the uncompressed header) for these delay adjustment. This algorithm does not assume
fields. synchronization of the sender and receiver clocks.
The marker bit is set on the first packet of an audio Waiting time in playout buffer is calculated with the
talkspurt. If it were treated as a constant field, such following algorithm:
that each change would require sending the full RTP
header, the compression would become quite All packets are played out at:
inefficient. Because of this, one bit in the compressed
header is reserved for the marker bit. PlayAti = ReceivedAti + Twait, i
For the first packet of the connection, playout delay is

3.2 Other Proposals constant (given by the user):
There exist a number of other RTP/UDP/IP header
Twait, 0 = twait
compression mechanisms that have emerged after RFC
2508. They all should perform slightly better than the For other packets, waiting time is calculated as follows:
mechanism described in the RFC 2508. Some of the
most recent proposals are Ericsson's ROCCO [Lar00] Twait, i =(TStampi – TStampi-1) – (ReceivedAti - PlayAti-1)
and Nokia's ACE [Khi00].
If the result is negative, packet is discarded.
Whenever playout delay is adjusted, it will be the

4 Playout Algorithms maximum of the initial playout delay and the current
playout delay subtracted by the minimum Twait of the
4.1 Playout Delay latest measurement period.
In most packet audio applications, packets are buffered The following events will trigger the playout delay
at the receiving host in order to compensate for adjustment:
variable network delay. The receiver buffer sizes can
be constant or adaptively adjusted. Keeping the delay • If N or more packets among the last M packets
as small as possible, and avoiding excessive packet (measurement period) arrive late, playout delay is
losses at the same time is not an easy task. The results adjusted upwards when the next talkspurt arrives.
of [Ram94] indicate that an adaptive algorithm, which
explicitly adjusts to the sharp, spike-like increases in • Similarly, if M successive packets have been
packet delay, can achieve a lower rate of lost packets. received all in time, playout delay is adjusted
downwards before the next talkspurt.
Adaptive playout delay can be either per-talkspurt or
per-packet based. In the former approach, playout Table 2 shows some simulation results for constant and
delay remains constant throughout the talkspurt and the adaptive playout delay. Network delay was simply
adjustments are done between talkspurts. The latter modeled with exponential distribution with a mean of
approach introduces gaps in speech, and thus it is not 30 milliseconds. Parameters used were: N = 2, M =
recommended for VoIP [Yle97]. 100, twait = 100 ms. Simulation duration was 200
seconds.
36
Table 2: Constant vs. adaptive playout delay end, and increased delay is experienced at the other
end. Buffer underrun occurs when the receiver does not
Playout Packet Mean Min. Max. have anything to play. Modifying the playout rate by
delay loss ratio adding or dropping samples before the frame is
Constant 3.7% 100 ms 100 ms 100 ms transferred to the audio device can compensate for
Adaptive 1.2% 150 ms 100 ms 240 ms clock drift [Sel01].
Simulation results show that there is a clear tradeoff
between playout delay and playout loss. If we had VoIP Delay [ 3-Apr-2001 11:32:41 ]
selected a larger playout delay in the constant playout Adaptive Playout Delay
delay case, packet loss ratio would have been smaller. 0.25
If we the variation of network delay is unknown, it can
be very hard to set the constant playout delay.
0.20
Simulation results also show that upper bound for
adaptive playout delay is probably needed, because
end-to-end delays longer than 400 milliseconds are not
Delay [ms]
0.15
acceptable for voice [ITU00].

0.10
Figure 3 illustrates the simulated sequence of sent,

received and synchronized VoIP packets, while Figure 0.05
4 illustrates the changes in playout delay.
0.00
VoIP Packets [ 13-Mar-2001 18:55:36 ]
VoIP Packets 0. 50. 100. 150. 200.

671.
Time [s]
670. End-to-End Delay Network Delay
669.
Sequence Number
668.
667.
Figure 4. Adaptive Playout Delay
666.
665.
664. 5 VoIP Terminal Requirements and

663. Commercial Implementations
662.
According to [Kos98], PCs (or other terminals) that
39.35 39.40 39.45 39.50 39.55 39.60 39.65 39.70
support real time interactive voice must have
Time [s] considerable processing power. The computational
Sent Received requirements of voice codecs increase with the voice
Synchronized
compression ratio.
Microsoft NetMeeting [Net01] is a popular remote

conferencing tool. The hardware requirements that
Figure 3. Sent, Received and Synchronized Packets
have to be met in order to use the data, audio, and
video features of NetMeeting are as follows:
4.2 Synchronization Delay and Clock Drift
Audio device driver is the link between the operating • For Windows 95 or Windows 98: Pentium 90
system and the hardware. Implementation of the device processor with 16 MB of RAM
driver is a crucial factor in audio device performance. • For Windows NT: a Pentium 90 processor with 24
The driver can namely introduce unnecessary delay in MB of RAM
both directions. If the audio device requests data at
fixed intervals, that are not synchronized with the • 56,000 bps or faster Internet connection
reception of incoming packets, an additional delay of
• Sound card with microphone and speakers
half of the audio block duration is introduced [Sel01].
• Video capture card or camera that provides a
Clock drift means that the sampling and playout rates Video for Windows capture driver
of the audio devices do not match. If the clocks at each
end drift in different directions, buffer underruns at one
37
VocalTec Internet Phone Lite is mainly targeted for ComprEssion (ACE) for Real-Time
voice only connections. It has the following system Multimedia (Internet Draft, Expired
requirements [Voc01]: 24.11.2000), 24.5.2000.
[Kos98] T. J. Kostas, M. S. Borella, I. Sidhu, G. M.
• Windows 95 or Windows NT 4.0 or higher Schuster, J. Grabiec, J. Mahler: Real-time
• Pentium 75 processor or higher Voice over Packet-switched Networks, IEEE
Network, Volume: 12, Issue: 1, 1998.
• 14,400 bps or faster Internet connection
[Lar00] Lars-Åke Larzon, Hans Hannu, Lars-Erik
• Sound card with microphone and speakers Jonsson, Krister Svanbro: Efficient
Transport of Voice over 1P over Cellular
links, Proceedings of IEEE Globecom 2000.
6 Conclusions [Moo98] Sue B. Moon, Jim Kurose, and Don

Towsley: Packet Audio Playout Delay
RTP/RTCP protocol suite provides the means for real- Adjustment: Performance Bounds and
time communication over IP by introducing time Algorithms, ACM/Springer Multimedia
stamps and sequence numbers. Systems, Vol. 6, pp. 17-28, January 1998.
[Net01] Microsoft Netmeeting:
The amount of overhead in VoIP packets is a serious http://www.microsoft.com/windows/netmeet
issue, especially on low-bandwidth links. This problem ing/, 12.2.2001.
can be alleviated by header compression, which is done
on link-by-link basis. [Ram94] Ramachandran Ramjee, Jim Kurose, Don
Towsley Henning Schulzrinne: Adaptive
Playout Mechanisms for Packetized Audio
In order to make transmitted voice stream human
Applications in Wide-Area Networks,
understandable, playout buffering is needed to re-
Proceedings of IEEE Infocom '94, Montreal,
synchronize the stream. A new adaptive playout delay Canada, April 1994.
adjustment algorithm has been introduced. The
operation of the algorithm is basically that if certain [Sch96a] H. Schulzrinne, S. Casner, R. Frederick, V.
criteria are met, the delay between the sender and the Jacobson: RTP: A Transport Protocol for
receiver can be adaptively adjusted upwards or Real-Time Applications (Request for
downwards. Comments: 1889), January 1996.
[Sch96b] H. Schulzrinne: RTP Profile for Audio and
Terminals that support real time interactive voice must Video Conferences with Minimal Control
have considerable processing power. The (Request for Comments: 1890), January
computational requirements of voice codecs increase 1996.
with the voice compression ratio.
[Sel01] Jari Selin: Media Management in IP
Telephony Systems, Master's Thesis,
References Helsinki University of Technology,
Networking Laboratory, February 2001.
[Cas99] S. Casner, V. Jacobson: Compressing
IP/UDP/RTP Headers for Low-Speed Serial [Voc01] VocalTec Internet Phone Lite:
Links (Request for Comments: 2508), http://www.vocaltec.com/iptelephony/produ
February 1999. cts/iplite_vea_sp/iplite_overview.htm,
12.2.2001.
[Her00] Olivier Hersent, David Gurle, Jean-Pierre
Petit: IP Telephony, Packet based [Yle97] Tomi Yletyinen: An Introduction to
multimedia communications systems, Protocols for Real-Time Communications in
Addison Wesley, 2000, ISBN 0-201-61910- Packet Switched Networks, Laboratory of
5. Telecommunications Technology, Helsinki
[ITU00] ITU-T Recommendation G.114 (05/00): University of Technology, 1997.
One-way transmission time.
[Jac90] V. Jacobson: TCP/IP Compression for Low-
Speed Serial Links (Request for Comments:
1144), February 1990.
[Khi00] Khiem Le, Christopher Clanton, Zhigang
Liu, Haihong Zheng: Adaptive Header
38
Voice Coding in 3G Networks
Tommi Koistinen
Signal Processing Systems
Nokia Networks
Tommi.Koistinen@nokia.com
and the quality of the original signal. If the input

signal is already band limited to 300-3400 Hz
Abstract there is no point of using 24-bit converter.
Commonly 13 bits per sample is seen to be a
practical value for restricted voice band
The 3G networks will introduce several new
quantisation. The uniform quantisation however is
additions to the basic speech service. The
not the most efficient quantisation method.
adaptive wideband speech codec will enhance the
naturalness of speech and the transcoder free
The main idea behind the G.711 standard is to
operation will remove unnecessary encodings that
use a logarithmic quantizer which results the
otherwise would degrade the speech quality. The
same signal-to-noise ratio (SNR) with only 8 bits
speech processing on network side in 3GPP
per sample compared to original 13 bits per
reference architecture model is focused around
sample.
two network elements, namely the Media
Gateway (MG), and the Media Resource
This is achieved by allocating more quantisation
Functions (MRF) unit. However, as the speech
steps to lower amplitude levels that in fact are the
applications utilize the network more or less in
most important to perceived overall speech
transparent end-to-end mode the characteristics
quality. The drawback is that the logarithmic scale
and speech enhancement capabilities of mobile
will result a reduced SNR in the area of high-
terminals will finally determine the perceived
powered input signals but happily the effect of this
overall speech quality.
is insignificant with speech signals.
As a result we can multiply 8000 samples per

1 Introduction second (that came from the sampling theorem)
with 8 bits per sample (that resulted from the
logarithmic quantisation) to get the final bit stream
Voice compression techniques have been utilized of 64 kbit/s.
in digital telecommunication networks for decades
(G.711 standard [1] dates back to 1972). The The compression ratio of G.711 standard can be
G.711 standard presents a coding technique that seen to be 1.625:1 (13:8). And all compression is
operates at rate of 64 kbit/s and is widely used in usually good. To transfer more telephone calls
all digital switched telephone networks. But where with less transmission equipment means money
does the exact rate of 64 kbit/s come ? for the operator and this has resulted that several
more advanced compression techniques have
The most essential frequency range for the been developed.
human speech production system (that is the
glottis and the vocal tract) and for the auditory Speech coding techniques in general can be
system happens to be between 300-3400 Hz. As separated to waveform coders (e.g. G.711, G.726,
the sampling theorem says; to reproduce the G.722) and to analysis-by-synthesis type of
original signal after sampling we must use a coders (e.g. G.723, G.729, GSM FR). The
sampling rate that is double the desired frequency waveform coders operate in time domain and they
band. If sampling rate is less the reproduced are based on sample-by-sample approach that
signal will be distorted by image frequencies of utilizes the correlation between speech samples.
the original signal. Speech in telecommunication Analysis-by-synthesis types of coders try to
networks is commonly sampled at 8 kHz to obey imitate the human speech production system by a
this law. simplified model of a source (glottis) and a filter
(vocal tract) that shapes the output speech
The number of bits per sample that is used to spectrum on frame basis (typically frame size of
quantize the analog signal is a compromise 10-30 ms is used). A short introduction to details
between the quantisation noise that is introduced of both basic techniques (and their intermediate
39
versions; hybrids) is presented in [2] on pages
270-287. Release 99
The waveform coders are mainly used to The basic architecture of R99 compatible network
compress speech on transmission links, for is shown in Figure 1. The IP packet data from
example, on PCM trunks between two switching UTRAN (Universal Terrestrial Radio Access
centers. The compression ratios range from 2:1 to Network, that is basically base stations and Radio
4:1 and quite high speech quality can be Network Controllers (RNC)) goes through Iu-PS
maintained. interface to 3G SGSN. Voice data goes through
Iu-CS interface to 3G Mobile Switching Center
The analysis-by-synthesis types of coders were (MSC) that converts the Adaptive Multirate (AMR)
mainly introduced together with digital mobile coded speech to G.711 format and vice versa for
networks (GSM Full Rate codec [3] dates back to the PSTN network. The circuit switched speech is
1988). As frequency band in the radio interface transferred in packet mode (ATM/AAL2) from
between a mobile terminal and a base station is UTRAN (from Radio Network Controller) to 3G
restricted (and regulated) compression techniques MSC but the codec level packet mode speech is
are a meaningful way to save money in that not yet originated from the terminal.
interface. A typical full rate channel (16 kbps)
utilizes a compression rate of 4:1. A half rate
channel (8 kbps) is half of that and it operates at
Multimedia
compression rate of 8:1. Lossy compression has S
SG
GS
SN
N G
GG
GS
SN
N IPnetworks
always some effects on speech quality and more Iu-PS
compression means usually less quality. The M
MT
T U
UT
TR
RAN
N
G.711 standard is common reference point for H
HL
LR
R
“real” speech codecs and e.g. GSM Enhanced Iu-CS
3
3G
GMS
SC
Full Rate codec [4] almost reaches the quality of T
PSTN/legacy
Tra
ran
ns
sc
co
od
de
err networks
G.711.
The frame based handling that is natural to

analysis-by-synthesis coders is also in line with Figure 1. 3GPP Reference Architecture of Release 99.
the characteristics of packet based transmission
techniques (IP, ATM) that are becoming quite Release R4
common not only in core networks (or backbone)
but also as building blocks of radio access The next step that is taken with release 4
networks. (formerly known as Release 2000) is to separate
the signaling and the user data in Iu-CS interface.
This article will discuss the voice coding and user The signaling goes now to MSC Server and the
plane issues particularly in 3G networks. The first transcoder is separated as a standalone media
chapter presented the basic reasons and means gateway. Figure 2 presents the R4 architecture
for speech coding in general. The second chapter with clear separation to packet side and to circuit
will review the basic 3G network architecture switched side. Media gateway in the PSTN
models. The most important 3G network elements interface converts the AMR coded speech to
that provide speech related processing are G.711. Speech goes in packet mode from UTRAN
discussed in chapters four and five. The sixth to PSTN interface.
chapter will discuss the issues related to
tandeming of speech codecs and finally the H
HSS
S/C
/CS
SCF
seventh chapter will conclude the presentation.
Multimedia
SG
GSN
N GGSN
N IPnetworks
Iu-PS
2 Network Architectures MT UT
TRAN MGW
W M
MG
GW
Iu-CS PSTN/legacy
user data
networks
This chapter will present the basic 3G network
Iu-CS
evolution according to 3GPP (Third Generation control MSCC M
M S
SC
S
Serve
err Se
erv
rver
Partnership Project [5]) reference architectures.
3GPP has scheduled its work to releases of R99,
R4 and R5 and so on. In the following the basic Figure 2. 3GPP Reference Architecture of Release 4.
reference architecture model of each release is
shortly described emphasizing the voice coding The final architecture model, also called as All-IP
and user plane issues. network [6], moves also speech to full end-to-end
40
packet mode. The IP packets that are generated • DTMF and call progress tone generation and
in a mobile terminal go as such either to another detection
IP terminal or to MGW from GGSN. The • support for fax/modem/data protocols
architecture is presented in Figure 3. A new • support for Tandem Free Operation (TFO)
network entity is also introduced, namely the and Transcoder Free Operation (TrFO)
Multimedia Resource Functions (MRF) unit that • bad frame handling
implements mainly conferencing services for the • IP protocol handling (RTP/RTCP, encryption,
IP based calls. QoS support)
HSS
S/C
/CSCF
Some functions, especially the conferencing
service and possible speech enhancement
Multimedia services, are basically thought to be provided by
SGSN
N GGSN
N
Iu-PS
IPnetworks
the Multimedia Resource Functions (MRF) unit,
but they may optionally be added to Media
MT
T U
UTRAN M
MRF MGW
W
Gateway responsibilities.
PSTN/legacy
networks A lot of signal processing (DSP) power is required
to provide the Media Gateway’s functions.
Typically, one DSP chip may process 4-16
Figure 3. All-IP reference architecture. channels, and on one processor card there might
be 8-32 DSPs which totals 32-512 channels per
Of course, the different phases of 3GPP releases processor card.
may coexist at the same time depending on
operators’ needs.
4 Media Resource Functions
The Multimedia Resource Functions (MRF) unit
3 Media Gateway according to 3GPP standard shall provide the
In 2G networks (like GSM) the speech related audio/video conferencing services for the All-IP
functionalities have been implemented around the network. The basic requirement is to support
transcoder unit (TRAU). The basic task of several speech codecs to be able to sum up the
transcoder has been speech encoding and conference for each party. As it is impossible for
decoding of narrowband codecs like GSM Full today’s technology to sum up signals in parameter
Rate (FR), Enhanced Full Rate (EFR) or Half Rate domain, all signals must be first decoded for linear
(HR) codecs. Some extra features like noise domain processing. The summed signals are then
cancellation or acoustic echo cancellation are also encoded again for each party.
offered by 2G transcoders. The Mobile Switching
Center has then additionally offered tone and The 3GPP work on MRF entity has not
DTMF generators, echo cancellers, fax and progressed further than the conferencing
modem pools and announcement and requirement. However, the MRF entity is a natural
conferencing services. Control mechanisms for place also for other speech enhancement
these functionalities have usually been services. It should be remembered that most of
proprietary. In 3G networks, all of these functions the calls in an All-IP network are staying inside the
must be offered by the Media Gateway that is core network and they are not going to Media
controlled by the Media Gateway Controller Gateway at all (see figure 4).
(MGC) with the standard H.248 control protocol
[7]. MT
T U
UTRA
AN
N MRF IP
IPte
term
rmin
ina
al
An example (and quite full) set of functions that Multimedia

S
SG
GSN G
GGSN
Media Gateway could implement is: IPnetworks
• support for several interfaces (A-interface for MT

T U
UTRA
AN
N
2G and Iu-interface for 3G) and for several
transmission protocols (ATM, IP, TDM) Figure 4. MRF unit as a network side speech
• support for several codecs including the enhancement server.
Adaptive Multirate (AMR) codec and future
coming wideband codecs Calls between mobile IP terminals are transferred
• electric and acoustic echo cancellation in coded format end-to-end and if any speech
• announcement services enhancement services are desired to be provided
on the network side, the MRF entity could do the
41
necessary operations (as it already has to support
all coding formats for the conferencing service).
The other option is that all speech enhancement
services shall be provided by mobile terminals. MSC PSTN MSC
48(16) kbps
A set of speech enhancements that the MRF Transcode
er Tra
ranscode
er
entity could provide is: 16↔ 16 16↔ 16
• Noise suppression
• Gain (volume) control
B
BSS
S B
BSS
S
• Acoustic echo cancellation
M
MS MS
S
It should also be mentioned that the Media
Gateway and the Multimedia Resource Functions
unit are logical entities only and physically they Figure 6. Tandem Free Operation is utilised.
may co-locate in the same device.
TFO is based on inband procedures that means
that no outband signaling is used to form a TFO
5 Tandem Avoidance connection. In practice, the TFO connection
establishment starts with a negotiation phase
where certain TFO protocol messages are
5.1 Tandem Free Operation (TFO) exchanged between transcoders to agree on the
Every time voice is encoded or decoded the used codecs. If the other end doesn’t support TFO
speech quality will degrade a little bit. Thus, as it will not acknowledge the negotiation and also
few conversion as possible are desired. The basic the TFO capable transcoder will start to encode
2G mobile-to-mobile call suffers from tandem and decode the 64 kbps as in figure 5.
coding that means that separate speech coding
happens in both radio interfaces and between the
transcoders voice goes in 64 kbps G.711 format. 5.2 Transcoder Free Operation (TrFO)
In general two encodings in clear speech
conditions is no problem but more than two For the 3G networks a slightly different approach
encodings especially in bad line conditions cause is taken considering tandem avoidance. Firstly,
severe degradations. outband signaling is used for codec negotiation
and if codecs match there is no need for the
To overcome this kind of quality problem ETSI transcoders at all. Operation is called as
has specified so called Tandem Free Operation Transcoder Free Operation (TrFO) [9].
(TFO) [8] that establishes a sub channel (of 16 or
8 kbps) inside the 64 kbps G.711 stream for the TrFO is relevant mainly for the MSC Server
encoded speech. Also the transcoders must concept and for intersystem compatibility as in the
support TFO feature as they must omit the final All-IP network calls are by nature of TrFO
decoding and pass encoded parameters as such type. In figure 7 is presented a basic call where
forward. outband signaling travels from MSC Server to
An end-to-end connection (of 16 or 8 kbps) can another until the whole link is negotiated. If a
now be formed with only one encoding (in common codec can be agreed no transcoding
originating mobile) and only one decoding (in resources are reserved from the intermediate
receiving mobile). The figures 5 and 6 present the media gateways.
cases without TFO and with TFO in operation.
AMR EFR!
M
MT
T U
UTR
RA
AN
N M
MG
GW
W M
MGW
W G
GS
SM
MB
BS
SS
S
MSC PSTN MSC
C
64kbps
AMR?
Tran
ranscoder Tra
ransco
oder PSTN/legacy
64↔ 16 64↔ 16 M
MSSC
C MSSC
C
networks
S
Se
erv
rve
err S
Se
erv
rve
err
AMR?
BSS BS
SS
Figure 7. A basic TrFO call.
M
MS MS
Figure 5. No Tandem Free Operation.
42
6 Adaptive Speech Coding capacity requirements of the operator. The
selection of the codec mode happens
continuously by the radio resource management.
The traditional GSM speech codecs operate in the Basically, as a lower AMR mode is selected, more
radio interface at a fixed source rate with a fixed bits from the gross bit rate are freed for the
level of error protection (e.g. Full Rate codec with channel coding and error protection. Even that we
framing overhead consumes 16 kbps and error use a very low codec bit rate the high error
protection adds 6.8 kbps resulting a 22.8 kbps protection keeps the overall speech quality
gross bit rate over the air). The codec itself do not sufficiently high. The figure 8 shows reasoning for
have means (except bad frame handling the mode selection. To follow the optimum quality
mechanism) to adapt to changing radio curve (MOS=Mean Opinion Score of speech
conditions. quality) against decreasing signal-to-noise ratio
For this reason, ETSI (and later 3GPP) has asked (C/I) the AMR mode that is used must be changed
for new adaptive coding schemes that could accordingly.
select the optimum channel mode (full rate or half
rate) and the optimum codec mode (speech rates)
based on the radio conditions. As a result, the M
Adaptive Multirate (AMR) codec [10,11] has now O
been standardized as an additional codec for the S
GSM system and as the only mandatory codec Mode 1
(thus far) for the 3G system. Two most important Mode 2
design targets for the AMR codec were: Mode 3
• improved speech quality in both half-rate and

full-rate modes by means of codec mode
adaptation i.e. varying the balance between C/I
speech and channel coding for the same
gross bit-rate.
Figure 8. Different AMR modes have different
• ability to trade speech quality and capacity
quality curves.
smoothly and flexibly by a combination of
channel and codec mode adaptation; this can
It should be however noted that in the 3G radio
be controlled by the network operator on a cell
interface the power control mechanism (fast
by cell basis.
power control and outer loop power control) is
used to keep the optimum speech quality by
The AMR codec consist of 2 channel modes (full
adjusting the transmit power of a mobile terminal
rate (FR) and half rate (HR)) and 8 codec modes
and the base station. The adaptiveness of AMR in
that are presented in table 1. The ninth mode is
fact doesn’t bring such benefits for 3G as it does
for discontinuous transmission (DTX) meaning
for 2G radio interface.
that during silence only silence description (SID)
frames are periodically sent to other end. All
Principles of the AMR encoder
modes operate on 20 ms frame basis.
The AMR codec is based on the Code-Excited
Codec mode Source codec bit-rate Linear Predictive (CELP) coding model that
AMR_12.20 12.20 kbit/s FR imitates the glottis and the vocal tract by an
AMR_10.20 10.20 kbit/s FR excitation signal and a linear prediction synthesis
AMR_7.95 7.95 kbit/s FR / HR filter (Figure 9).
AMR_7.40 7.40 kbit/s FR / HR
adaptive codebook gp
AMR_5.15 5.15 kbit/s FR / HR v(n)
u(n) 1 ^ ^
s'(n)
AMR_4.75 4.75 kbit/s FR / HR fixed
codebook + A(z)
s(n) post-filtering
AMR_SID 1.80 kbit/s FR / HR gc

c(n) LP synthesis
Table 1. 8+1 different AMR modes.
The choice between the full rate and the half rate
Figure 9. The CELP model.
channel mode can be made off-line based on the
43
The excitation signal at the input of the short-term will output the application level protocols, that in
LP synthesis filter is constructed by adding two this case, are the RTP (Real-time Transport
excitation vectors from adaptive and fixed Protocol) frames carrying the AMR payloads. So,
codebooks. The speech is synthesized by feeding concerning IP Telephony the RTP payload
the two properly chosen vectors from these specification for AMR codec [13] has grown in
codebooks through the short-term synthesis filter. importance as AMR is the codec that should
The optimum excitation sequence in a codebook converge the traditional IP Telephony with the
is chosen using an analysis-by-synthesis search mobile IP Telephony. The RTP for AMR
procedure. specification includes the following extra features:
The AMR coder operates on speech frames of 20 • codec mode request procedure
ms corresponding to 160 samples at the sampling • robust sorting of payload bits
frequency of 8 000 sample/s. At each 160 speech • bad frame indication
samples, the speech signal is analysed to extract • compound payloads
the parameters of the CELP model (LP filter • CRC calculation
coefficients, adaptive and fixed codebooks'
indices and gains). These parameters are The specification is still under finalisation in IETF.
encoded and transmitted. At the decoder, these
parameters are decoded and speech is
synthesized by filtering the reconstructed 7 Wideband Speech Coding
excitation signal through the LP synthesis filter.
Table 2 shows the resulting parameters from the The 300-3400Hz speech band frequency range
encoder operating in 12.2 kbps mode. The LP has been used for decades in all telephony
analysis is performed twice per 20ms frame applications. As the range is heavily restricted all
resulting 2 sets of line spectrum pairs (LSP). The non-speech signals, like music, are degraded
adaptive codebook (pitch delay and gain) and the badly when forced to go through this narrow
fixed codebook are found for 4 subframes of 5 ms frequency pipe. Even speech contains plenty of
each. Total number of bits is 244 per frame. information above 3400 Hz that affects the
naturalness of speech.
Basically, the existing terminals that conform to

this traditional frequency band have been one
barrier in front of wideband speech. Second
Parameter 1st 2nd 3rd 4th Total reason has been that more bandwidth is needed
to transfer the highest quality wideband signals.
2 LSP sets 38
Pitch delay 9 6 9 6 30 However, as the difference in quality between
Pitch gain 4 4 4 4 16 narrowband and wideband speech is so clear it is
Fixed code 35 35 35 35 140 evitable that more wideband applications will be
Fixed gain 5 5 5 5 20 introduced in the near future. Wideband speech
coding can easily be seen as the next
Total 244 fundamental improvement in speech quality for
mobile telecommunication systems. 3GPP has
Table 2. Encoder output. understood this and wideband AMR specifications
are already getting ready.
RTP payload specification for AMR codec
The principles of wideband AMR [14] are copied
In the 3GPP Release 99 architecture the AMR from the narrowband AMR. The frequency band,
codec payload is packed in the Radio Network as a difference, is extended in both directions, and
Controller in IuUP protocol frames [12] that are it is now from 50 Hz to 7000 Hz. The resulting
carried as such to transcoder in 3G MSC. The speech quality exceeds the wireline quality of
specified frame format for AMR codec is restricted narrowband G.711. In figure 10 is shown an
to Iu interface. illustrive graph on speech quality comparison of
EFR, AMR-NB and AMR-WB in 16kbps full-rate
In the All-IP model (figure 3) the AMR payload channel [15].
data travels all the way from the mobile terminal
through UTRAN and the core network either to
media gateway or another IP terminal. The GGSN
44
services will make speech quality better, even to
AMR-WB
level never experienced before.
Subjectivespeech quality
Excellent
AMR-NB
Verygood This article has mainly focused on the application
level. Good network conditions (low delay, no lost
EFR
Good packets due congestion) are a starting point also
for superior application level speech quality.
Poor Media gateways shall support the network level
QoS mechanisms (like DiffServ) that are used to
Unacceptable optimize and prioritise the real-time and the non-
real-time traffic (see for example [16]).
Error-free 13 10 7 4
In the past, speech service has been closely tied
Carrier-to-interfaceratio (dB)
on technical level to providing network. Within All-
IP networks also speech service will be lifted more
Figure 10. AMR-WB vs. AMR-NB and EFR. and more up to user-level. End-to-end user
applications will not even see the underlying
As can be seen, in error-free conditions the AMR- transport network and the overall speech quality
WB is superior over AMR-NB or EFR (that is the that is perceived will heavily depend on the
highest quality GSM codec at the moment). Even characteristics and features of the All-IP terminals.
in very bad conditions AMR-WB can maintain high
quality far above fixed rate GSM codecs. The nine As also the speech service will include more
modes of AMR-WB (plus one mode for DTX) are choices of used codecs, used bandwidth and
presented in in table 3. used speech enhancements there shall be
opportunity to differentiate the pricing of these
AMR-WB is specified for GSM full rate radio traffic features. The user may in the future have means
channel, for future GSM EDGE (GERAN) and for to select the speech quality that he or she is
the 3G (UTRAN) radio channel. The 3GPP willing to pay.
specifications for a wideband AMR codec (AMR-
WB) are expected to be finalized in March 2001.
References
Codec mode Source codec bit-rate
AMR-WB_23.85 23.80 kbit/s [1] ITU-T G.711; Pulse Code Modulation (PCM) of
Voice Frequencies. 1972.
AMR-WB_23.05 23.05 kbit/s
AMR-WB_19.85 19.85 kbit/s [2] Hersent O, Gurle D, Petit J-P. IP Telephony.
AMR-WB_18.25 18.25 kbit/s Packet-based multimedia communications
AMR-WB_15.85 15.85 kbit/s system. Addison Wesley, 2000.
AMR-WB_14.25 14.25kbit/s
[3] GSM 06.10; Full Rate Speech; Transcoding.
AMR-WB_12.65 12.65 kbit/s
AMR-WB_8.85 8.85 kbit/s [4] GSM 06.60; Enhanced Full Rate Speech;
AMR-WB_6.6 6.6 kbit/s Transcoding.
AMR-WB_SID 1.75 kbit/s
[5] Third Generation Partnership Project (3GPP)
www.3gpp.org
Table 3. 9 Different AMR-WB modes.
[6] 3GPP/TR 23.922; Architecture for an All-IP
8 Conclusion network, v1.0.0, October 1999.
[7] ITU-T H.248; Gateway Control Protocol, June

Packet data services have been advertised to be 2000.
the major application of future 3G networks.
However, also the voice services are strongly [8] GSM 08.62; Inband Tandem Free Operation
enhanced with new wideband codecs that can (TFO) of Speech Codecs; Service Description;
adapt to network conditions. Also the transcoder Stage, v8.0.1, August 2000.
free operation, and the new speech enhancement
45
[9] 3GPP/TS 23.153; Out of Band Transcoder
Control – Stage 2, v2.0.3, October 2000.
[10] 3GPP/TS 26.071; AMR Speech Codec; General

Description, v3.0.1, August 1999.
[11] 3GPP/TR 26.975; Performance Characterization

of AMR Speech Codec, v1.1.0, January 2000.
[12] 3GPP/TS 25.415; UTRAN Iu Interface User

Plane Protocols, v3.5.0, December 2000.
[13] IETF Internet Draft: RTP Payload Format and

File Storage Format for AMR Audio, v0.5,
February 2001.
[14] 3GPP/TR 26.901; AMR Wideband Speech

Codec; Feasibility Study Report, v4.0.1, April
2000.
[15] Advance – Information from Nokia Research

Center. Number 1, 2001.
[16] Ferguson P, Huston G. Quality of Service;

Delivering QoS on the Internet and in Corporate
Networks. Wiley 1998.
46
Session Initiation Protocol (SIP)
Jouni Soitinaho
Jouni.Soitinaho@nokia.com
group and then continued in the SIP working group.

Active communications with MMUSIC is important
Abstract since the Session Description Protocol (SDP) is
developed by MMUSIC. The working group has also
close relationship with the IP telephony (iptel) working
This paper describes the basic characteristics of the SIP
group, whose Call Processing Language (CPL) relates
protocol and especially its extension mechanism.
to many features of SIP, and the PSTN and Internet
Several Internet draft specifications are studied in order
Internetworking (pint) working group, whose
to get an overall picture of the maturity of the protocol.
specification is based on SIP. Distributed Call
Some interesting application areas are examined for
Signaling Group (DCS) is giving input to SIP for
demonstrating how the SIP protocol suite can be used
distributed telephony services. Recently it was decided
in a wider context.
to split the SIP working group to two: SIP WG will
concentrate on the basic protocol and general
extensions and SIPPING WG will concentrate on
applications and generate input to the SIP WG.
1 Introduction
SIP is a simple but extendable signaling protocol for Besides all the activities taken by the IETF task forces
setting up, modifying and shutting down 3GPP technical specification groups currently
communication sessions between two or more investigate SIP. Since SIP was chosen as the signaling
participants. One or more media or even no media at protocol for the IP multimedia subsystem of 3G
all, can be transmitted in the session context. SIP is network 3GPP will set new requirements for the
independent of the actual media and the route of the protocol.
media can be different to the route of signaling
messages. SIP can also invite participants to IP
multicast session. The basic SIP protocol is defined in RFC2543 that is
currently in "proposed" state. The corresponding
Internet draft document [1] contains many updates and
SIP is part of the IETF multimedia architecture and it's is the reference document for describing the basic
designed to cooperate with several other protocols, protocol in the next section. Some of the current
which is a fundamental principle of the SIP design. development activities are discussed in section three.
Other protocols include, for example, RTP and RTCP Finally, a few application areas of SIP are studied in
for media transport, RTSP for controlling streaming section four before conclusions in the last section.
and SDP for describing the capabilities of the
participants. Limiting the SIP protocol to the
controlling of the session state is also more likely to
keep it simple and easy to implement.
2 Basic Protocol
Another fundamental aspect of SIP design is the easy 2.1 Characteristics

way it can be extended with additional capabilities. The basic features of SIP:
Actually, the basic protocol specification defines rather
limited signaling protocol. It is missing several • Locating user: determination of the end system to
capabilities needed by real life applications. Several be used for communication;
general extensions are being defined currently and • Determining user capabilities: determination of the
some of these are expected to be included in the basic media and media parameters to be used;
standard after reaching the required stability. • Determining user availability: determination of the
willingness of the called party to engage in
communications;
SIP was first developed within the Multiparty
• Setting up the call: "ringing", establishment of call
Multimedia Session Control (MMUSIC) working
parameters at both called and calling party;
47
• Controlling the call: including transfer and UAC
User1@host1
Proxy/
Registrar
Location
Server
UAS
User2@host2
termination of calls. REGISTER
Location update/OK
200 OK
Main technical properties and some implications of
INVITE Location query/Reply
SIP:
INVITE
1-way media transmission

• Text-based (ISO 10646 in UTF-8 encoding), 200 OK 200 OK
similar to HTTP: Easy to learn, implement, debug ACK ACK
and extend. Causes extra overhead, which is not a 2-way media transmission
serious drawback for a signaling protocol. Header
BYE BYE
names can be abbreviated. 200 OK 200 OK
• Recommended transport protocol is UDP: It is not
meant to send large amounts of data.
Figure 5. An example of SIP protocol operations.
• Application level routing based on Request-URI:
The signaling path through SIP proxies is
controlled by the protocol itself not by the
underlying network. Requires routing 2.3 Network elements
implementation in SIP proxies.
• Independence on the session it initiates and SIP has been designed for IP networking. The protocol
terminates (capability descriptions, transport makes use of standard elements like DNS and DHCP
protocol, etc.): Cooperates with different servers, firewalls, NATs and proxies. Special support
protocols, which can be developed independently. in DNS and DHCP servers is not needed but it makes
It is not a conference control protocol (floor the protocol operations more efficient. The SIP
control, voting, etc.) but it can be used to introduce protocol is implemented by the user agent client (UAC)
one. and server (UAS), redirect servers, proxies and
• Supports multicasting for signaling and media but registrars. Registrars and location servers maintain the
no multicast address or any other network resource mapping between user's permanent address and current
allocation. physical addresses.
• Support for stateless, efficient and "forward"
compatible proxies (re-INVITE carries state,
The SIP specification does not actually define the
ignore the body, ignore extension methods).
network architecture. However, the logical elements
and their relationships can be determined based on the
2.2 Operations protocol specification. The following figure
demonstrates an example of inter-domain session
Protocol operations of SIP: setup. Both UAC and UAS are located in their home
domains. Thin lines represent SIP signaling messages
• INVITE initiates session establishment and thick lines represent media transmission and dotted
line represent non-SIP protocol.
• ACK confirms successful session establishment
• OPTIONS requests capabilities
Domain A Domain B
• BYE terminates the session
DNS
• CANCEL cancels a pending session establishment Location
Server
• REGISTER binds a permanent SIP URL to a Outbound

Proxy
temporary SIP URL for the current location. UAC

Firewall/
NAT
Firewall/
NAT
Proxy/
Registrar
UAS
The following diagram demonstrates SIP protocol SIP protocol
operations for user registration and session handling. Non-SIP protocol

Media flow
Figure 6. Logical network elements involved in an

inter-domain session setup.
In this scenario UAC composes an INVITE message in

order to set up a call with UAS. The message contains
the session data in its headers and media descriptions in
the body in SDP format [2]. INVITE is sent to
48
Outbound Proxy whose address may have been
configured in UAC using DHCP. Outbound Proxy uses
Furthermore two headers are in central position for
DNS to resolve the recipient's address. It also controls
routing SIP messages:
Firewall/NAT to open the ports for media transmission.
Domain B has configured all the incoming requests to • Via header indicates the request path taken so far.
go to Proxy/Registrar that controls Firewall/NAT of It prevents looping and is used for routing the
Domain B. Proxy/Registrar queries the current location response back the same path as request has
of UAS from Location Server and forwards the traveled. Proxies must add "received" parameter in
message to UAS. In an intra-domain call a redirect the top-most Via header if the field contains
server could be used instead of a proxy in Domain B to different address than the sender's source address.
return the current location of UAS who could then be This feature supports NAT servers. Proxies can
contacted directly by UAC without having any proxy also forward the request as multicast by adding
involved in the communications. "maddr" parameter in the Via field.
• Route header is used for routing all requests of a
Since the request carried the media descriptions of call leg along the same path, which was recorded
UAC and since the corresponding ports were opened in in the Record-Route header during the first
firewalls media can immediately flow back from UAS request. This is to guarantee that stateful proxies
to UAC. The signaling response is routed along the will receive all the subsequent messages that affect
same path as the request and it carries the media the call state.
descriptions of UAS. UAC can now send media to
UAS. Finally UAC has to send ACK message to UAS
for acknowledging the successful session SIP proxies can also fork the incoming request to
establishment. several outgoing requests in order to accelerate the
processing of INVITE method. The forking can create
several simultaneous unicast INVITEs to the potential
locations or one multicast INVITE to a restricted
2.4 Addressing and routing subnetwork. Even if forking is an efficient mechanism
SIP uses e-mail like addresses for users but it also it is a potential source of difficult problems and needs
includes the protocol keyword in the SIP URL. SIP to be paid special attention during implementation.
URLs are used to identify the originator (From),
current destination (Request-URI), final destination
(To) and redirection address (Contact).
2.5 Registering
A client uses REGISTER method to bind its permanent
Two formats exist: address to one or more physical addresses where the
client can be reached. The request is sent to the
• sip:user@host registrar, which is typically co-located with a proxy
when UA exists, e.g. From and To fields in server. Alternatively the request can be sent to the
INVITE well-known SIP multicast address "sip.mcast.net".
• sip:host
REGISTER method is also ideally suited for
when no UA exists, e.g. Request-URI in
configuration and exchange of application layer data
REGISTER
between a user agent and its proxy. This may produce
modest amounts of data exchanges. However, because
of the infrequency of such exchanges and their typical
Including the protocol keyword in the URL allows SIP
limitation to one-hop this is acceptable if TCP is used.
server use the Contact-header to redirect a call to a web
page or to a mail server, for example. This facilitates
integration of audio and video applications with other
The most important fields for the REGISTER method:
multimedia applications.
• Request-URI names the domain of the registrar.
user part must be empty.
Routing of SIP messages is included in the protocol
itself since finding the user is one of the primary • To indicates the user to be registered
functions of SIP. The host part of the SIP URL
• From indicates the user responsible for the
indicates the next hop for a request. Even if clients
registration (typically equal to To header value)
could send the request directly to this address in
practice they are typically forced to go through a proxy
for security or address translation reasons.
49
• Contact (optional) indicates the address(es) of Obviously, standard IPSec protocol can be used for IP
the user's current location. List of current locations level encryption.
can be queried by leaving the Contact header
empty in the REGISTER request. An optional
expires parameter indicates the expiration time
of the particular registration. By giving the 2.7 Expandability
wildcard address "*" in a single contact header a In order to keep the basic protocol compact SIP
client can remove all the registrations. By giving provides the protocol designers with means for
zero as the value for the expires parameter a client extending its capabilities. Protocol elements that can be
can remove the corresponding registration. extended without change in the protocol version
include:
• Expires tell the default value for expiration
unless the corresponding parameter is present in
the Contact header. If neither one is present
• Methods
default value of one hour is used.
• Entity headers
It is particularly important that REGISTER requestor is • Response codes

authenticated. • Option tags
In addition to the SIP extensions the session

2.6 SIP Security description (SDP) can be extended to contain new
Security must be addressed at several levels. At the attributes and values for the session.
network level the security is based on regular firewalls
and NATs since SIP is designed for IP networking.
Controlling the firewall with a SIP proxy is an essential Several definitions in the protocol set the limits for the
enhancement for the standard IP security mechanisms. extensions. First of all, proxy and redirect servers treat
all methods other than INVITE, CANCEL and ACK in
the same way by forwarding them. User agent server
At the protocol level both the media security and and registrar respond with the "501 Not Implemented"
signaling security must be addressed. Media encryption response code for request methods they do not support.
is specified in the message body with SDP [2].
SIP servers and proxies ignore header fields not

Signaling security includes user authentication and defined in the specification [1] and they do not
encryption of the signaling messages. User understand, i.e. treating them as entity headers. General
authentication is based on HTTP authentication headers, request headers and response headers are
mechanism [3] with minor modifications as specified extended only in combination with a change in the
in [1]. Besides "Basic" and "Digest" authentication protocol version. Furthermore, stateless proxies are
schemes SIP supports also stronger authentication with required to recognize only the values defined in the
"PGP" scheme [4]. It is based on public key basic protocol. They will forward new values without
cryptography, which requires the client to sign the actions. Session stateful proxies need to support the
request with the private key and the server to verify the extension if it can change the call state in a way, which
signature with the public key. It is recommended to is meaningful for the proxy.
authenticate the REGISTER requestor with the PGP
scheme instead of the other schemes.
SIP applications are not required to understand all
registered response codes. They must treat any
SIP also supports PGP encryption of the signaling unrecognized response code as being equivalent to the
messages. By setting the "Encryption" header to "PGP" x00 response code of that class, with the exception that
scheme all following headers can be encrypted as well an unrecognized response must not be cached.
as the message body. Note that sending the media
encryption key in the body requires the message body
to be encrypted. Note also that there are special Option tags are unique identifiers used to designate
considerations for the encryption of the Via header new extensions for SIP. These tags are set in Require,
since it is used by the proxies. Proxy-Require, Supported and Unsupported header
fields to communicate the signaling capabilities
between UACs, UASs and proxies. The extension
creator must either prefix the option with the reverse
50
domain name or register the new option with the The client uses a new method (PRACK) for
Internet Assigned Numbers Authority (IANA). acknowledging the provisional response. Unlike ACK,
which is end-to-end, PRACK is a normal SIP message,
like BYE. Its reliability is ensured hop-by-hop through
Clients can always call the OPTIONS method for each stateful proxy. PRACK has its own response and
explicitly querying the capabilities of the server and therefore existing proxy servers need no modifications.
proxies lying on the path. A new header (RAck) in the PRACK message
indicates the sequence number of the provisional
response, which is being acknowledged.
Since there are multiple ways to define a SIP extension
special attention needs to be paid on the semantic
compliance with the basic protocol. An informational The following diagram demonstrates how the support
Internet draft sets the guidelines for writing a SIP and need for reliable provisional response is negotiated
extension [5]. and implemented.
UAC UAS
INVITE sip:uas@host SIP/2.0
3 Protocol Extensions Supported: 100rel
About 30 extension drafts can be found on SIP/2.0 180 Ringing

Require: 100rel
http://www.cs.columbia.edu/~hgs/sip/drafts_base.html. RSeq: 776655
Some of these add reliability or functionality missing Retransmission
in the basic protocol for supporting real time services PRACK sip:uas@host SIP/2.0 algorithm starts
like VoIP. Examples of these are "reliable provisional RAck: 776655 1 INVITE
Retransmission
responses", "resource management" and "INFO algorithm starts
method". Some extensions add functionality for (retransmission of 180)
implementing existing PBX services, like call transfer. Retransmission
(retransmission of PRACK) algorithm stops
Examples are "call control-transfer" and "caller identity
and privacy". Some extensions add new functionality Retransmission
SIP/2.0 200 OK (for PRACK)
algorithm stops
for enabling new type of services, like presence based
instant messaging. Examples are "event notification"
and "caller preferences". Finally some extensions add Figure 7. Reliable provisional response.
resilience to the basic protocol for implementing
reliable and scalable networks. Examples are "session
timer" and "distributed call state".
3.2 Resource Management
In order to become a successful service Internet
telephony must meet the quality expectations based on
3.1 Reliable provisional responses the existing telephony services. This implies that the
When run over UDP, SIP does not guarantee that resources must be reserved beforehand for each call.
provisional responses (1xx) are delivered reliably, or in Cooperation is therefore needed between call signaling,
order. However, many applications like gateways which controls access to telephony specific services,
wireless phones and call queuing systems make use of and resource management, which controls access to
the provisional responses to drive state machinery. This network-layer resources
is especially true for the 180 Ringing provisional
response, which maps to the Q.931 ALERTING
message. The Internet draft document [10] discusses how
network QoS and security establishment can be made a
precondition to sessions initiated by SIP, and described
The Internet draft document [6] specifies an extension by SDP. These preconditions require that the
to SIP for providing reliable provisional response participant reserve network resources or establish a
messages ("100rel"). When a server generates a secure media channel before continuing with the
provisional response which is to be delivered reliably, session. In practical terms the "phone won't ring" until
it places a random initial value for the sequence the preconditions are met. The draft proposes new
number (RSeq). The response is then retransmitted attributes for SDP:
with an exponential backoff like a final response to
INVITE.
• "a=qos:" strength-tag SP
direction-tag
51
• "a=secure:" SP strength-tag SP security associations. The 183-Session-Progress is
direction-tag received by the UAC, and the UAC requests the
resources needed in its "send" direction, and establishes
the security associations.
where the strength can have values "mandatory",
"optional", "success" and "failure" and the direction
can have values "send", "recv" and "sendrecv". The diagram also demonstrates the usage of PRACK
and COMET methods for confirming the responses and
resource allocations respectively.
The document also proposes a new method to SIP. The
COMET method is used to confirm the completion of
all preconditions by the session originator. The
following diagram presents the message flow for a 3.3 INFO method
single-media session setup with a "mandatory" quality- The SIP INVITE method can be called one or more
of-service "sendrecv" precondition, where both the times during the established session (re-INVITE) to
UAC and UAS can only perform a single-direction change the properties of media flows or to update the
("send") resource reservation. SIP session timer. However, there is no general-
purpose mechanism to carry session control
information along the SIP signaling path during the
UAC UAS
| SIP-Proxy(s) | session.
| INVITE | |
|---------------------->|---------------------->|
| | |
| 183 w/SDP | 183 w/SDP |
|<----------------------|<----------------------|
RFC2976 [14] defines the INFO method for
| | communicating mid-session information during the
| PRACK |
|---------------------------------------------->| call. It is not used to change the state of the session but
| 200 OK (of PRACK) |
|<----------------------------------------------|
it provides means for exchanging additional
| Reservation Reservation | information between the peers. One example of such
===========> <===========
| | session control information is ISUP and ISDN
| | signaling messages used to control telephony call
| COMET |
|---------------------------------------------->| services.
| 200 OK (of COMET) |
|<----------------------------------------------|
|
|
| SIP-Proxy(s) User Alerted The information can be conveyed either in the header
|
| 180 Ringing
|
| 180 Ringing
|
|
of the INFO message or as part of the message body.
|<----------------------|<----------------------| The definition of the message body and/or message
| |
| PRACK | headers used to carry the mid-session information is
|---------------------------------------------->| outside the scope of this document. However,
| 200 OK (of PRACK) |
|<----------------------------------------------| consideration should be taken on the size of message
|
|
|
UserPicks-Up
bodies since it can be fragmented while carried over
| SIP-Proxy(s) the phone UDP bearer.
| | |
| 200 OK | 200 OK |
|<----------------------|<----------------------|
| | |
| |
| ACK |
|---------------------------------------------->|
3.4 Call Control - Transfer
The basic SIP protocol does not support any of the
multiple ways a call can be transferred to a third party.
Figure 8. Resource management signaling.
In an "unattended transfer" the transferor is not
participating the call simultaneously with the transferee
and transfer target whereas in an "attended transfer" the
The session originator (UAC) prepares an SDP
three actors participate the call simultaneously (ad-hoc
message body for the INVITE describing the desired
conference). In an "consultation hold transfer" the
QoS and security preconditions for each media flow,
transferor establishes and terminates a second call with
and the desired direction "sendrecv." This SDP is
the transfer target before performing the actual transfer.
included in the INVITE message sent through the
proxies, and includes an entry "a=qos:mandatory
sendrecv." The recipient of the INVITE (UAS), returns
a 183-Session-Progress provisional response The Internet draft document [11] proposes a SIP
containing SDP, along with the qos/secure attribute for extension, which can be used, for example, to
each stream having a precondition. The UAS now implement traditional unattended and consultation hold
attempts to reserve the qos resources and establish the transfers. The attended transfer is not drafted yet since
the call control framework has not addressed
52
conferencing. The following figure presents the 3.6 Caller preferences
message sequence of unattended transfer with
When a SIP server receives a request, there are at least
consultation hold.
three parties who have an interest and each of which
should have the means for expressing its policy:
Transferor Transferee Transfer Target • The administrator of the server, whose directives
INVITE/200/ACK can be programmed in the server.
INVITE(hold)/200/ACK • The callee, whose directives can be expressed
Call put
on hold
most easily through a script written in the call
INVITE/200/ACK
processing language (CPL)
consultation
BYE/200
• The caller, who doesn't have obvious ways to
express the preferences within the SIP server.
REFER/202 Accepted
INVITE/200/ACK
NOTIFY/200
Call
The Internet draft document [9] specifies an extension
terminated BYE/200 mechanisms by which the caller can provide its
BYE/200
preferences for processing a request. These preferences
include the ability to select which URIs a request gets
proxied or redirected to, and to specify certain request
Figure 9. Unattended call transfer with consultation handling directives in proxies and redirect servers. It
on hold. does so by defining three new request headers, Accept-
Contact, Reject-Contact and Request-Disposition. The
extension also defines new parameters for the Contact
The new REFER method indicates that the recipient header that describe attributes of a UA at a specified
(Request-URI) should contact a third party identified URI.
by the contact information (Refer-To). Once the
transferee knows whether the transfer succeeded or
failed it notifies the transferor by sending "refer" event
using the NOTIFY mechanism as if the REFER 3.7 Event Notification
message had established a subscription. The ability to request asynchronous notification of
events is useful in many types of services. Examples
include automatic callback services (based on terminal
state events), buddy lists (based on user presence
3.5 Caller Identity and Privacy events), message waiting indications (based on mailbox
In order for SIP to be a viable alternative to the current state change events), and PINT status (based on call
PSTN, it must support certain telephony services state events).
including Calling Identity Delivery, Calling Identity
Delivery Blocking, as well as the ability to trace the The Internet draft document [13] proposes a framework
originator of a call. While SIP can support each of by which notification of events can be ordered. The
these services independently, certain combinations draft can't be used directly, i.e. it doesn't specify any
cannot be supported. The issue of IP address privacy event types and it must be extended by other
for both the caller and callee needs to be addressed as specifications (event packages). In object-oriented
well. terminology, this is an abstract base class which must
be derived into an instantiatable class by further
The Internet draft document [12] specifies two extensions.
extensions to SIP that allow the parties to be identified
by a trusted intermediary while still being able to The extension is based on two new methods:
maintain their privacy. A new general header, Remote- SUBSCRIBE and NOTIFY and a new header "Event"
Party-ID, identifies each party. Different types of party together with the "Expires" header. Neither
information can be provided, e.g. calling, or called SUBSCRIBE nor NOTIFY necessitates the use of
party, and for each type of party, different types of "Require" or "Proxy-Require" header and no extension
identity information, e.g. subscriber, or terminal, can token is defined for "Supported" header. Clients may
be provided. Another new general header, Anonymity, probe for the support of SUBSCRIBE and NOTIFY
is also defined for hiding the IP addresses from the using the OPTIONS method.
other parties.
There is no separate media transmission between the
subscriber and notifier as in normal SIP session. The
message body of the NOTIFY method is to carry the
actual notification.
53
The Internet draft document [7] specifies the session
Removing and refreshing subscriptions are performed timer extension ("timer") for solving the problem and
in the same way as for REGISTER method. Usage of improving the reliability of the basic SIP protocol.
the message body in SUBSCRIBE request is left up to UAC, UAS and proxies communicate the support for
the concrete extensions. It may be used to filter and set the extension and assign the responsible party (UAC or
thresholds for the events. UAS) for sending the re-INVITEs in the original
INVITE message. If UAC supports the extension it sets
The basic scenario of a notification session is presented "timer" in the Supported header and if it wants to turn
in the following figure. Note that according to the SIP the extension on it sets the refresh interval in Session-
principle proxies need no additional behavior to Expires header. UAC will then be responsible for
support SUBSCRIBE and NOTIFY methods but they sending the re-INVITEs. A proxy may adjust the
can act as subscribers and notifiers. refresh interval to a smaller value and also require
(Proxy-Require) UAS to send the re-INVITEs in case
Subscriber Notifier UAC does not support the extension. If a re-INVITE is
SUBSCRIBE not received before the refresh interval passes, the
200
session is considered terminated, and call stateful
Generate immediate proxies can release the session.
NOTIFY state response
200
Note that using INVITE as the refresh method, as

opposed to a new method, allows sessions to be
Generate state change recovered after a crash and restart of one of the UAs.
NOTIFY event
200
SUBSCRIBE Expires: 0 (unsubscribe) 3.9 Distributed Call State

200 Many types of services require proxies to retain call
state. Unfortunately, maintaining call state presents
Figure 10. Event notification messages. problems. It introduces scalability problems and makes
fallback and load balancing more complex.
This extension is not targeted to very frequent

notifications. The interval must be minutes instead of The extension proposed in the Internet draft document
seconds. For better performance and for simplifying [8] allows proxies to encapsulate any state information
the subscriber implementation the new state after the they desire into a header, called State header. The
event must be notified in addition to the event itself. header is sent to the user agents and reflected back in
The extension is not either for transferring large subsequent messages.
amounts of data since the preferred transport protocol
is UDP. Therefore this extension is not fully in line
with the SIP extension guidelines. The idea is similar to the use of cookies with HTTP
user agent clients and proxies. In essence, it allows
proxies to behave as stateful proxies while still being
3.8 Session timer stateless.
SIP does not currently define a keepalive mechanism.
The result is that call stateful proxies are not always
able to determine whether a call is still active or not.
For instance, when a user agent fails to send a BYE
message at the end of a session, or the BYE message 4 Applications
gets lost due to network problems, a call stateful proxy
will not know when the session has ended.
4.1 Call centers
There are multiple ways to implement a SIP based call
This is especially important feature for proxies center where more than one operators can provide the
controlling firewalls or NATs or performing billing same service for incoming requests. In a very simple
tasks. Holes and address bindings are dynamically model a redirect server is used together with the
created in firewall and NATs to allow the media for the registrar to redirect the calls to a free operator
session to flow. These settings represent state which according to a round robin algorithm, for example. The
must be eventually removed. server can use the Contact header with the maddr
parameter to instruct the caller to send the next
54
INVITE with the same Request-URI but connect to the based, for example, on user's location and caller's
host indicated by the maddr parameter. identity.
This is a very limited solution since the redirect server

has no automatic means to record the state of the Instant messaging (IM) is defined as the exchange of
operators. Of course, they could send re-REGISTER content between a set of participants in real time, like
message whenever they are free for a new call but this in IRC. The content is mainly small textual messages
is not according to the semantics of the REGISTER but they can also contain pictures or audio or video
message. In fact, SIP provides a better way for clips. The main difference to emails is the real time
implementing the application. nature requiring all the parties to be online.
Using a SIP proxy instead of a redirect server the state

of each call can be maintained by listening to the SIP It is very important to keep presence and IM separate
messages. The address of the proxy is published from each other even if these are mixed in the existing,
externally and no direct connections to the operator proprietary solutions. The separation enables
addresses are allowed through the firewall. The proxy independent development of the two protocols. This is
includes itself in the message path using Record-Route important also because of the existing IM applications
and Via headers in order to get the CANCEL and BYE (multiplayer online games).
requests as well as all the responses. When a new call
arrives the proxy decides the operator based on its own
call state information and information in the registrar. SIMPLE bases its work on the existing SIP and
extension drafts. The foundation of using SIP for the
Sending the INVITE message using IP multicast can presence and IM protocols derives from two factors:
accelerate the seeking of operator. Free operators the SIP registrars already hold some information about
generate a response within a random time interval. the user's presence and SIP networks already route
Since all operators will hear the first response they can messages from user to the proxy that can access this
drop the request without responding. If no operator is information [15,16]. Extending SIP for this area is
free proxy retries until one is free or the client rather small step in terms of protocol operations but
terminates the call by sending CANCEL request which semantically it is a bigger step, however.
is responded by the proxy. The proxy generates all call
statistics.
The presence extension is an instantiation of the
If the network does not support IP multicast yet another abstract notification extension. A new event package,
option is to fork the request in the proxy into named as "presence", is defined for this purpose. The
simultaneous requests to the current locations of the body of the NOTIFY message contains a presence
free operators. In this case the cancellation of the other document. An XML data format and a MIME type will
INVITE messages need to be performed by the proxy be defined for the document. The following figure
whenever the first operator responds. shows the logical elements for SIP presence.
4.2 Presence and Instant Messaging SIP Presence System

UA2
REGISTER Presence
Presence is considered as a promising application area UA1
SUBSCRIBE
NOTIFY
Proxy/
Registrar
User
in all-IP networks. When combined with instant Agent
messaging it creates a lot of opportunities for UA3

application developers. A new working group, called Non-SIP
protocol Presence
Presence
SIMPLE (SIP for Instant Messaging and Presence Agent
User
Agent
Leveraging), has been established in IETF for
developing specifications in this area. 3GPP is also SUBSCRIBE
NOTIFY
considering presence as one service for the IM
SIP/CPIM
subsystem. Gateway
SIP protocol
CPIM Presence System
Presence is defined as user's reachability, capabilities Non-SIP protocol
and willingness to communicate with other users.

Presence application obviously has to provide the Figure 11. Logical network elements for SIP
means to deliver this information to other users. A lot presence.
of room exists for differentiating applications from
each other's. For example, intelligent filters for
exposing the presence and accepting calls can be built
55
The presence agent (PA) is capable of storing the The semantic difference between presence and IM
subscriptions and generating notifications based on the protocols and basic SIP protocol is in the type of
events. Present user agent (PUA) updates presence session they create. Presence protocol creates a passive
information. session which is used asynchronously for notifying the
subscriber using the signaling channel without any
media channel. Establishment and termination of the
Authorization is a critical component of a presence session is done differently to the basic protocol. IM
protocol. Authorization can be pushed to the server does not create a session at all which is currently
ahead of time or, more typically, determined at the time discussed in the working group. Surrounding the
of subscription. Since this is not covered by the basic related MESSAGE requests with INVITE and BYE
SIP protocol an Internet draft [17] proposes a new requests would be consistent with the basic protocol..
method (QUATH) for querying the authorization from
the subscription authorizer (e.g. PUA). This draft
seems to be arguable, however.
5 Conclusion
The IM protocol extensions are defined in the Internet
draft [18]. When a user wishes to send an instant
message to another, the sender issues a SIP request Simplicity is a key characteristic of SIP. It facilitates
using the new MESSAGE method. The request URI interoperable clients, servers and proxies coming from
can be in the format of "im: URL" or normal SIP URL. independent vendors. Sharing a lot of similarities with
The body of the request contains the message to be HTTP makes the understanding of SIP rather easy for a
delivered. Provisional and final responses will be large developer community.
returned to the sender as with any other SIP request.
The following diagram shows two message exchanges
between two users. Expandability is another key characteristic. Being
inbuilt in the basic protocol it provides the means for
extending the protocol capabilities. Network elements
User1 Proxy User2 can dynamically negotiate their capabilities. The basic
MESSAGE im:user2@domain.com SIP/2.0
From: im:user1@domain.com MESSAGE sip:user2@domain.com SIP/2.0 protocol specification can concentrate on its primary
From: im:user1@domain.com
To: im:user2@domain.com
Contact: sip:user1@user1pc.domain.com To: im:user2@domain.com function.
Contact: sip:user1@user1pc.domain.com
SIP/2.0 200 OK
From: im:user1@domain.com
To: im:user2@domain.com;tag=ab8asdasd9 Supporting different protocols for different purposes is
SIP/2.0 200 OK yet another key characteristic of SIP. This facilitates
MESSAGE sip:user1@user1pc.domain.com protocol development independence between SIP and
From: im:user2@domain.com;tag=ab8asd9
To: im:user1@domain.com other protocols and makes the overall adoption of SIP
more likely.
SIP/2.0 200 OK
Figure 12. Instant messaging between users in the A lot of SIP related development activities are going on
same domain. in IETF (over 70 drafts). This is an evidence of its
potential on one hand but an evidence of its immaturity
on the other hand. The potential is demonstrated by the
Proxy looks up the registration database for the binding application examples presented in this paper. The
from im address to sip address of User2 and forwards immaturity for IP telephony is demonstrated by the
the message to the current location. The response large number of suggested extensions described in this
traverses the same path. Based on the Contact header paper that are fundamental for this area.
of the message User2 can send the second message
directly to User1's current location because Proxy
added no Record-Route header in the first message. Many extensions seem to be very useful and easy to
The From and To headers are reversed, however. specify at first sight. However, they may not share the
semantics of the basic protocol and should not be
defined as a SIP extension. The ability of IETF to
The specifications for presence and instant messaging respond to the needs and at the same time control the
are still rather insufficient. This is indicated by the specification work will be tested in near future.
long list of open issues listed in the drafts.
The slowness of the IETF process is indicated also by

its inability to promote the basic SIP specification to
56
"draft" state after being in "proposed" state over two February, 2001, http://www.ietf.org/internet-
years. At the same time 3GPP is stating its drafts/draft-ietf-sip-state-01.txt
requirements for SIP in the IP multimedia subsystem of [9] Internet Draft, SIP WG, Schulzrinne/Rosenberg:
3G. If these requirements are not included in the IETF SIP Caller Preferences and Callee Capabilities,
specifications the risk of SIP fragmentation may come November 24, 2000, http://www.ietf.org/internet-
true. drafts/draft-ietf-sip-callerprefs-03.txt
[10] Internet Draft, SIP WG, W. Marshall, et al:
Integration of Resource Management and SIP,
February, 2001, http://www.ietf.org/internet-
drafts/draft-ietf-sip-manyfolks-resource-01
References [11] Internet Draft, R. Sparks: SIP Call Control –
Transfer, February 26, 2001,
[1] Internet Draft, SIP WG,
http://www.ietf.org/internet-drafts/draft-ietf-sip-
Handley/Schulzrinne/Schooler/Rosenberg: SIP:
cc-transfer-04.txt
Session Initiation Protocol, November 24, 2000,
[12] Internet Draft, SIP WG, W. Marshall, et al: SIP
Extensions for Caller Identity and Privacy
rfc2543bis-02.txt
February, 2001, http://www.ietf.org/internet-
[2] RFC2327, Network WG, M. Handley, V.
drafts/draft-ietf-sip-privacy-01.txt
Jacobson: SDP: Session Description Protocol,
[13] Adam Roach: Event Notification in SIP, Internet
April 1998, http://www.ietf.org/rfc/rfc2327.txt
Draft, February 2001, http://www.ietf.org/internet-
[3] RFC2617, Network WG, J. Franks, et al: HTTP
drafts/draft-roach-sip-subscribe-notify-03.txt
Authentication: Basic and Digest Access
[14] RFC2976, Network WG, S. Donovan: The SIP
Authentication, June 1999,
INFO Method, October 2000,
http://www.ietf.org/rfc/rfc2617.txt
[4] RFC2440, Network WG, J. Callas, et al: OpenPGP
[15] RFC2778, Network WG, M. Day, J. Rosenberg, H.
Message Format, November 1998,
Sugano: A Model for Presence and Instant
Messaging, February 2000,
[5] Internet Draft, SIP WG, J.Rosenberg,
http://www.faqs.org/rfcs/rfc2778.html
H.Schulzrinne: Guidelines for Authors of SIP
[16] Internet Draft, SIMPLE WG, Rosenberg et al: SIP
Extensions, March 5, 2001,
Extensions for Presence, March 2, 2001,
http://www.cs.columbia.edu/sip/drafts/draft-
guidelines-02.txt
rosenberg-impp-presence-01.txt
[6] Internet Draft, SIP WG,
[17] Internet Draft, IMPP WG, Jonathan Rosenberg
J.Rosenberg,H.Schulzrinne: Reliability of
et.al: SIP Extensions for Presence Authorization,
Provisional Responses in SIP, March 2, 2001,
June 15, 2000,
100rel-03.txt
rosenberg-impp-qauth-00.txt
[7] Internet Draft, SIP WG, S.Donovan, J.Rosenberg:
[18] Internet-Draft, J. Rosenberg, et al: SIP Extensions
The SIP Session Timer, November 22, 2000,
for Instant Messaging, February 28, 2001,
session-timer-04.txt
rosenberg-impp-im-01.txt
[8] Internet Draft, SIP WG, W. Marshall, et all: SIP
Extensions for supporting Distributed Call State,
[19] 14 2000,
http://www.softarmor.com/sipwg/teams/sipt/index.
html
[20] Ericsson: Best Current Practice for ISUP to SIP
mapping , IETF, September 2000,
http://www.softarmor.com/sipwg/teams/sipt/index.
html
[21] Phillips Omnicom: Voice over IP, Phillips
Omnicom, July 2000.HERTS SG1 1EL – UK
[22] Srinivas sreemanthula etc: 'RT Hard Handoff
Concept for All-IP System, version V1.0.2, and
IPMN project.
57
A transport protocol for SIP
Gonzalo Camarillo Henning Schulzrinne Raimo Kantola

Advanced Signalling Research Lab. Department of Computer Science Networking Laboratory
Columbia University Helsinki University of Technology
Ericsson
USA Finland
Finland
hgs@cs.columbia.edu Raimo.Kantola@hut.fi
Gonzalo.Camarillo@ericsson.com
unreliable network such as an IP network. However,

the protocol has been designed so that SCTP can be
used as a general-purpose transport protocol.
Abstract
There have been already attempts to define SIP
Current SIP implementation typically use TCP or UDP
operation on top of SCTP [5]. However, although there
as a transport protocol. The differences between SIP
are already implementations of telephony signalling
over UDP and SIP over TCP have already been
protocols such as ISUP on top of SCTP, so far there
analyzed and are relatively well-known. However,
has not been any implementation of SIP over SCTP
there have not been so far SIP implementations that
that could show the gains that SCTP might achieve.
use SCTP as a transport. This paper analyzes the
This document discusses advantages and disadvantages
advantages that can be derived from the use of SCTP
derived from the use of SCTP as a transport protocol
as a transport for SIP. It shows how while SCTP is an
for SIP.
excellent transport protocol for high levels of traffic its
performance decreases when the number of SIP
The remainder of this document is organized as
transactions transmitted in parallel decreases.
follows. Section 2 and 3 describes SIP operation on top
of TCP and UDP respectively. Pros and cons of each
protocol are analyzed. Section 4 provides an
1 Introduction introduction to SCTP. Section 5 analyzes advantages
and disadvantages of using SCTP as a transport for SIP
The Session Initiation Protocol (SIP) is an application- and finally section 6 outlines some conclusions.
layer protocol for creating, modifying and terminating
sessions. SIP [1] is designed in a modular way so that it
is independent of the type of session established and of 2 SIP over TCP
the lower-layer transport protocol used. Its modularity
The natural choice to transport a signalling protocol
is one of the most important strengths of SIP. It makes
whose messages have to be reliably delivered to the
SIP flexible and easy to extend with new features.
destination seems to be a reliable transport protocol.
Since the most widespread reliable transport protocol is
The SIP specification describes how the protocol
TCP, it would not have been surprising if SIP had been
operates over TCP [2] and over UDP [3]. Both
designed to run only over TCP. Besides, SIP is based
transport protocols have different characteristics and
on HTTP [6], which uses TCP as a transport.
provide a particular SIP application with different
services. TCP provides reliable in-order transfer of
However, TCP presents some limitations regarding
bytes while UDP does not ensure neither reliability nor
signalling transport. Therefore, SIP was designed to be
in-order delivery. Both UDP and TCP present certain
independent of the transport protocol. This way, SIP
advantages and disadvantages, and also both of them
can also run over UDP overcoming some of TCP’s
present certain limitations regarding signalling
limitations. At present, UDP is the most widespread
transport.
transport for SIP.
The limitations present in TCP and UDP for
2.1 TCP limitations
transporting signalling traffic led to the design of a new
transport protocol within the IETF. The SIGTRAN TCP was designed to transport large amounts of data
working group developed the Stream Control between two end-points. Once a connection is
Transmission Protocol (SCTP). SCTP [4] was first established, TCP implements flow control and error
intended to transport telephony signalling over an correction based on the dynamic behavior of the end-
-58-
to-end traffic. However, signalling traffic does not
Sender Receiver
consist of large amounts of data. Signalling traffic
usually consists of small bursts of information. TCP’s
flow control mechanisms are not designed for such as 1:513
traffic pattern, and therefore do not perform as well as
1,5 secs
it might be expected.
Fast retransmit algorithm 1:513
When a large bulk of data is being transmitted by TCP,
ack 513
ack messages from the receiver are continuously
received indicating which segments have been
successfully received. The receiver sends duplicate
acks when out-of-order segments arrive. Thus, arrival Figure 2 : TCP timeout
of duplicate acks indicates that a segment was lost.
Therefore, the sender retransmits it without waiting for
a timeout. This mechanism is referred to as fast TCP connection establishment
retransmit and it is used together with the fast recovery TCP performs a three-way handshake before any user
algorithm. data can be transmitted between both ends. In a long-
lived connection, the connection establishment time is
Sender Receiver negligible compared to the whole connection duration.
However, signalling traffic is delay sensitive. If a SIP
UAC wants to send an INVITE over TCP it will have
1:257
to wait until the TCP connection is established before
ack 257
sending the INVITE.
257:513
513:769
Sender Receiver
ack 257
257:513
SYN
SYN
Figure 13 : Fast retransmit ack Established

Note that the sender in figure 1 retransmit the missing
segment upon reception of a duplicate ack. This flow
has Figure 3 : TCP three-way handshake
been simplified. A typical implementation waits until
three duplicate acks are received before retransmitting The receiver of figure 3 will not pass any data to the
a segment. application until it does not reach the “established”
state. This overhead is not acceptable when the user is
Figure 1 shows how TCP behaves when large bulks of expecting an answer for his INVITE.
data are transmitted. Retransmissions are usually
triggered by duplicate acks rather than timeouts. This is TCP implements a special timer for connection
the reason why TCP timeouts are relatively high, in the establishment. When a SYN gets lost, a typical
order of 1,5 seconds. This allows using the fast implementation retransmits it after 6 seconds.
retransmit algorithm before a timeout occurs. Therefore, a single packet loss increases enormously
the connection establishment delay introduced by TCP.
However, SIP messages are relatively small, in the
order of 500 bytes. A SIP message usually fits into a 2.2 Multiple SIP sessions
TCP segment. So, if a TCP segment that contains a SIP One straightforward attempt to resolve both issues
message gets lost, TCP will not be able to receive previously described consists of bundling several SIP
duplicate acks, since it is not sending any more data. sessions into a single TCP connection. With a high
Therefore, TCP will have to wait for a timeout in order number of SIP sessions the TCP connection transports
to retransmit the missing segment. This results in a too data continuously so that packet losses are detected by
conservative retransmission policy when TCP receiving duplicated acks rather than by timeouts. This
transports SIP signalling. increases the performance of TCP and reduces the
delay introduced to SIP messages.
Another advantage of bundling SIP sessions is that the

first SIP message of a new session does not have to
wait for a new TCP connection to be established before
being transmitted. Since the TCP connection is already
-59-
established SIP messages belonging to a new SIP Since UDP does not provide reliable transport, reliable
session are not affected by any additional delay. They delivery is achieved through application level
can be sent immediately. retransmissions. The SIP application retransmit a
particular SIP messages when the retransmission timer
A SIP UAC usually handles a single SIP session, but expires. This retransmission timer is lower than in
proxies in the network have several ongoing SIP TCP. Its default value is 0,5 seconds. Therefore, the
sessions between them at the same time. Therefore, retransmission policy of SIP when it runs over UDP is
proxies handling a high number of SIP sessions can more aggressive than when it runs over TCP.
typically take advantage of bundling SIP sessions.
Another example were bundling can be performed is
between a large gateway towards the PSTN and its Sender Receiver
outbound proxy.
Byte stream service INVITE
However, TCP presents an important limitation
0,5 secs
regarding bundling of sessions. TCP provides ordered
delivery of a stream of bytes. When TCP is used to
transmit messages it preserves the order in which the INVITE
messages were sent by the sender. This property causes
interaction problems between different SIP sessions
Figure 5 : SIP retranmission policy using UDP
carried on a single TCP connection.
SIP can afford to have a more aggressive
retransmission policy over UDP than TCP because it
transmits a small number of small messages. Therefore,
Sender Receiver
SIP assumes that it is not going to congest the network
because they are retransmitted more often than TCP.
1:513
513:1025 Therefore, when a single or a small number of SIP
ack 1 sessions are handled, UDP is a better choice than TCP.
However, UDP, as opposed to TCP, does not hide
1:513 TCP delivers
retransmissions from the application layer. Thus,
1:1025
although a SIP application using UDP has to store
more state information than when TCP is used this
does not represent an important issue for most of the
Figure 4 : TCP provides ordered delivery applications.
The sender of figure 4 sends two INVITEs that belong

to different sessions using the same TCP connection. 3.1 Multiple SIP sessions
The segment carrying the first INVITE gets lost
(1:513), but the segment carrying the second INVITE
When there are multiple SIP sessions between two
arrives properly to the receiver (513:1025). However,
proxies they can be bundled in a single TCP session to
since TCP provides ordered delivery, it will not
take advantage of the congestion control mechanisms
delivered the second INVITE to the application until it
built in TCP. Losses are detected before and thus,
has delivered the first INVITE. Therefore, the second
performance improves.
INVITE is delayed until the first INVITE is
retransmitted. The consequence is that a particular SIP
However, when UDP is used, the same retransmission
session might suffer delay without having
timers apply to every session. This can lead to a poorer
experimented any packet loss, as it is shown in figure
performance and even to network congestion, since
4.
UDP does not provide congestion information to the
application and by default SIP uses a more aggressive
retransmission policy than TCP.
3 SIP over UDP
Transporting SIP over UDP overcomes some of the Therefore, for proxies handling a large amount of
problems associated with TCP. UDP is a connections, the choice between UDP and TCP is not
connectionless protocol. Thus, it does not perform any clear. TCP presents the previously described head of
kind of connection establishment before sending data. the line blocking issue and UDP does not implement
Therefore, a particular INVITE will be sent any congestion control mechanism. The choice
encapsulated in a UDP packet without any between TCP and UDP depends on how the network is
establishment delay introduced by the transport loaded at a certain moment and the RTT between
protocol. sender and receiver.
-60-
4 SCTP an association might be an ordered stream while
another is unordered.
The Stream Control Transmission Protocol (SCTP) is
intended to resolve the issues derived from the use of
TCP and UDP when there are multiple SIP sessions 4.3 Flow and congestion control per
between sender and receiver. SCTP [4] also provides a association
certain level of fault tolerance through multihoming.
Even if an association contains several streams, SCTP
4.1 SCTP connection establishment performs flow and congestion control per association.
SCTP is a connection oriented transport protocol. In This allows to use the behavior of all the traffic within
SCTP terminology, a connection is referred to as an the association as input for the flow control
association. An association is established through a mechanisms, which are effectively very similar to the
four-way handshake in which the last two messages ones used by TCP.
can already carry user data.
For instance, the fast retransmit algorithm can be used
effectively without waiting for timeouts in order to
Sender Receiver retransmit data. Figure 7 shows how stream
demultiplexing and flow control work together in an
INIT example.
INIT ACK
Sender Receiver
COOKIE ECHOE
COOKIE ACK TSN=1
Stream id=0 Chunk delivered
Stream seq=0 for stream id=0
Figure 6 : SCTP four-way handshake
TSN=2
In this handshake end users exchange one or multiple Stream id=1
IP addresses or host names. One destination address Stream seq=0
will be marked as the primary. The rest of them will be
used in case the primary destination becomes TSN=3 Chunk delivered
unavailable. This feature, known as multihoming, Stream id=0
for stream id=0
allows a SCTP connection to survive network failures. Stream seq=1
The data is just sent to another destination address in SACK
case of failure. TSN=1
TSN=2
The four-way handshake provides also a certain level Stream id=1
of protection against resource attacks. The receiver, Stream seq=0
upon reception of an INIT message sends back a
cookie in the INIT ACK. The receiver does not allocate Figure 7 : Multiple streams within an association
any resources for this SCTP association until it
receives the same cookie in the COOKIE ECHOE The association of figure 7 consists of two ordered
message. This way, resources are allocated when it is streams (stream id=0 and stream id=1). SCTP
ensured that the party sending the INIT message is implements a general sequence number space
really willing to establish an SCTP association. (Transmission Sequence Number) and a sequence
number space per stream. The general TSN is used to
4.2 Multiple streams within an association perform flow control and packet loss recovery and the
stream sequence numbers are used to deliver individual
SCTP provides multiplexing/demultiplexing streams.
capabilities within an association. A single association
can contain several streams. Each stream is identified When the message with TSN=3 arrives to the receiver,
by its stream id. During the four-way handshake the this knows that TSN=2 is lost. However, it also knows
number of streams in both directions is negotiated. that TSN=3 is the next packet of stream id=0 (Stream
seq=1). Therefore, it delivers the packet to the
An association can contain several types of streams. application without waiting to receive TSN=2. In the
The base SCTP specification [4] defines two services: SACK (Selective ACK) the receiver reports that
reliable ordered delivery and reliable unordered TSN=2 was missing.
delivery. However, there are extensions [7] that
provide an unreliable delivery service. Therefore, losses in one stream do not introduce delay
on other streams. Besides, since the whole association
It is important to note that a particular service is is used to perform flow control, the sender detects that
provided on stream basis. Therefore, one stream within TSN=2 got lost thanks to the SACK sent upon
-61-
reception of TSN=3, that belongs to a different stream. Unordered service for final responses
This way, SCTP does not have to wait for a timeout to In order to overcome this problem SIP final responses
retransmit TSN=2. can be sent using the SCTP unordered service. SCTP
allows to send unordered messages within an ordered
So, SCTP combines good features of both TCP and stream. Therefore, all SIP messages within a SIP
UDP. It bundles streams to take advantage of flow session are still sent using the same stream, but
control mechanisms and delivers separately packets messages carrying final responses are sent with the
belonging to different streams. SCTP unordered flag set.
Note that a receiver performs demultiplexing of

5 SIP over SCTP incoming SIP messages based on the Call-ID of the SIP
It seems clear that proxies that handle multiple SIP message rather than on the SCTP stream id. Stream ids
sessions between them can obtain a better performance are used here to solve the head of the line blocking
using an SCTP association than using TCP or UDP. If problem. They are not intended to provide further
each SIP session is sent over an ordered stream, SIP demupltiplexing.
messages can take advantage of flow control without
being delayed by lost messages from other sessions. General unordered service
The method just described would be the most efficient
However, even when multiple ordered streams are way of transporting SIP over SCTP. However, there is
used, it is still possible that messages are delayed by a simpler mechanism that behaves nearly as well and
other messages belonging to the same SIP session. The simplifies implementations. It consists of sending all
example of figure 8 shows how the loss of a SIP traffic using the SCTP unordered service. When all
provisional response can delay the delivery of the final SIP messages are sent with the unordered flag set
response which was successfully received. SCTP delivers any message received immediately,
independently of which stream the message belongs to.
Sender Receiver Thus, SIP entities can perfectly use the same stream id
100 Trying for all SIP sessions.
TSN=1
Stream id=0 This mechanism is simpler because an implementation
Stream seq=0 does not have to ensure that SIP messages belonging to
a particular SIP session are always sent using the same
180 Ringing stream id. Implementation that are not willing to
TSN=2 perform stream id management should use this
Stream id=0 mechanism.
Stream seq=1
200 OK An example of such an implementation is a proxy that
TSN=3 does not store state information about SIP transactions
Stream id=0
Stream seq=2
(stateless) but has SCTP associations continuously
open to send SIP messages to certain common
SACK
destinations.
TSN=1
Note that the use of a hash of the Call-ID of a SIP
180 Ringing message module the number of SCTP streams available
Delivery of TSN=2 in order to choose the outgoing stream id for the
180 and 200 Stream id=0 message has some limitations. Although with a high
Stream seq=1 number of available streams it is not likely no happen,
a system using this method might end up sending
Figure 8 : SIP over ordered SCTP streams requests with different Call-IDs using the same stream
In figure 8 all SIP responses are sent over an ordered id. This would result in the head of the line blocking
SCTP stream (stream id=0). Therefore, SCTP delivers problem previously mentioned.
messages to the application in order within the stream.
Since the provisional response “180 Ringing” got lost, These two methods have the advantage of interworking
SCTP cannot deliver the final response “200 OK” to together. Any receiver is able to receive traffic from
the application. SCTP waits until TSN=2 arrives before senders using any of both mechanisms.
delivering both responses.
5.1 Differences between both methods
The only difference between both methods is that
sending just final responses with the SCTP unordered
flag set avoids re-ordering of requests and provisional
responses in the parts of the path where SCTP is used.
-62-
However, there are just a few scenarios where this can These two situations are the only ones where both uses
happen. of SCTP described previously differ. If the requests are
Provisional responses sent unordered, a CANCEL or a BYE might overtake
Provisional responses are sent unreliably by SIP. SIP the INVITE sent before. Ordered SCTP ensures that
systems do not rely on provisional responses to drive they arrive in the same order as they were sent.
any protocol state machine. Therefore, receiving out of However, this is only ensured in the part of the path
order provisional responses does not represent a where ordered SCTP is used. If other transport protocol
problem for a SIP UAs. such as UDP is used in another part of the path,
reordering can still happen. Therefore, even systems
using ordered SCTP have to be prepared to handle out
When a SIP UA is interested in provisional responses it of order CANCELs and BYEs. Figure 10 shows how a
uses the extension defined in [9]. Then, provisional system using ordered SCTP might still receive out of
responses are transmitted reliably. [9] recommends SIP order requests.
servers sending provisional responses not to send
subsequent responses until the previous one has been Proxy 1 Proxy 2 Proxy 3
acknowledged with a PRACK. Thus, using ordered or
unordered SCTP to transport provisional responses
does not make a difference, since the SIP layer ensures
BYE BYE
that they are received in order.
INVITE
Client Server
INVITE
182 Two in the Queue
PRACK Ordered
UDP
200 OK transport
182 One in the Queue

PRACK Figure 10 : INVITE followed by a BYE
200 OK
Even if ordered SCTP streams are used, a SIP entity

180 Ringing has to be prepared to received a BYE before an
PRACK INVITE. A “481 Transaction Does Not exist” will be
200 OK sent as response to the BYE.
Therefore, the only difference between sending all the

Figure 9 : SIP ensures in order delivery SIP traffic with the SCTP unordered flag set and
sending just final responses with this flag is that the
Requests
likelihood of receiving a BYE or a CANCEL before an
The behavior of a SIP entity sending requests is similar
INVITE decreases using the latter method, although it
to the one described for reliable provisional responses.
might still happen.
A SIP client does not send a request until the previous
transaction has completed. There are two exceptions to
this rule, but in general it does not make a difference 5.2 Other strengths of SCTP
the transport used (ordered or unordered) for requests The previous section described SCTP behaves like a
either. TCP connection without the head of the line blocking
problem. Besides resolving this problem, SCTP has
The only two exceptions when a SIP client sends some other strengths that SIP can take advantage of.
overlapping requests are: an INVITE followed by a Message based
CANCEL and an INVITE followed by a BYE. Note
SCTP is a message-oriented protocol, as opposed to
that other methods such as COMET or PRACK are just
TCP that is stream oriented. SCTP delivers messages
sent after a response for the INVITE has been received.
while TCP delivers a stream of bytes. This makes it
Note also that CANCEL can terminate any request
possible for SCTP to provide unordered delivery of SIP
other than CANCEL and ACK. However, since non-
messages. In TCP this concept would not make any
INVITE requests are responded immediately by the
sense, since delivering unordered bytes would be
server, CANCEL is typically used only for INVITE
useless for an application.
requests.
Message-oriented protocols such as SCTP or UDP
allow implementing simpler parsers. When these
-63-
transport protocols deliver a message to the application failures. This feature increases the reliability of an
it contains a single SIP message. In order to parse a SIP association.
message received over TCP it is necessary to
implement application level boundaries such as the SIP Multiple destination addresses are not intended to
Content-Length header. provide a load balancing mechanism. SCTP marks one
address as the primary, and all the traffic is routed to
Transport-layer fragmentation that address until it fails. Other mechanisms such as
However, although both SCTP and UDP are message- DNS SRV [8] records might be used to provide load
oriented transport protocols, SCTP has an advantage balancing. SCTP multihoming just provides a fail over
over UDP. SCTP implements transport-level mechanism.
fragmentation while UDP does not. If a SIP message
inside a UDP packet is larger than the path MTU the 5.3 A single SIP session over SCTP
packet will be fragmented at the IP layer. It is clear that SIP entities that handle a high amount of
SIP traffic between them can take advantage of SCTP
IP-layer fragmentation presents several problems. The and all its features. However, SCTP advantages are not
likelihood of having packet losses increases and so evident when a single SIP session (or a small
firewall and NAT traversal becomes impossible. The number of them) is transported. In this scenario SCTP
fragments of the UDP packet do not carry the UDP shares some problems that TCP has. SCTP association
header, which contains the source and the destination establishment delays the delivery of the first INVITE,
port number of the UDP packet. Therefore, network and once the association is established, SCTP timeouts
devices that need to examine port numbers will simply are more conservative than the ones used by SIP over
discard the packets. UDP. The initial value for the SCTP retransmission
timer is 3 seconds and even when RTT measurements
SCTP implements transport-layer fragmentation. are performed its minimum value is 1 second.
Messages larger than the path MTU are transported in
different SCTP chunks. Every chunk carries complete Sender Receiver
transport information, and thus, problems derived from
IP fragmentation are avoided. Different chunks are
INVITE
reassemble at the destination and delivered to the
3 secs
application as a single message.
Currently fragmentation does not represent a serious INVITE

problem for SIP, since SIP messages are usually
smaller than the path MTU. However, new session
description protocols or new SIP extensions might
increase the size of SIP messages. SCTP fragmentation
would then represent an important advantage. Figure 12 : SCTP’s initial retransmission timer
The only advantage of SCTP over UDP in a scenario
Bundling of chunks with low level of SIP traffic is the transport-layer
Figure 11 shows the format of a SCTP packet. It fragmentation provided by SCTP, since multihoming
contains a common header and several chunks. Unless can be achieved using UDP in conjunction with DNS
fragmentation is performed, a chunk contains an SRV records.
application-level message.
Common 6 Conclusions
header
Chunk 1 [...] Chunk n
The best transport protocol for SIP depends on the
amount of SIP traffic that a particular SIP entity
Figure 11 : SCTP message format handles. SIP entities that handle a large amount of SIP
Therefore, a single SCTP packet can carry several SIP traffic between them such as proxies and large SIP
messages that belong to different sessions. Bundling gateways have in SCTP their best choice. SCTP
SCTP chunks decreases the number of packets sent bundles together several SIP sessions into a single
through the network. This avoids certain congestion SCTP association and then performs flow and
problems in IP routers and typically achieves a better congestion control per association. This way, packet
performance than sending various individual packets. losses are detected before retransmission timers expire
leading to an increase in the overall performance.
Multihoming Among all the possible services provided by SCTP,
SCTP provides several source and destination unordered delivery and ordered delivery with
addresses within an association. They are intended to unordered final responses are the ones that suit SIP
provide alternative paths to be used in case of network better.
-64-
However, SIP entities that handle a small number of [3] Postel J, “User Datagram Protocol”, RFC 768.
SIP sessions such as the SIP UA of a individual user IETF. August 1980.
cannot take advantage of the flow control provided by
SCTP. When a small number of SIP messages are [4] Stewart R., Xie Q., Morneault K., Sharp C.,
transported over SCTP packet losses are detected by Schwarzbauer H., Taylor T., Rytina I., Kalla M.,
timeouts. This leads to a too conservative Zhang L., Paxson V., “Stream Control
retransmission policy, since timers in SCTP are not Transmission Protocol”, RFC 2960. IETF. October
designed for situations where the traffic load is very 2000
low. Therefore, small SIP entities have in UDP their
best choice. UDP does not introduce any connection [5] Rosenberg J, Schulzrinne H., “SCTP as a
establishment time and retransmit lost packets in a Transport for SIP”, draft-rosenberg-sip-sctp-00.txt.
more aggressive way than SCTP. However, since SIP IETF. Jone 2000. Work in progress.
applications using UDP do not perform any congestion
control other than implementing a back-off [6] Fielding R., Gettys J., Mogul J., Frystyk H.,
retransmission timer, the use of UDP is not Berners-Lee T., “Hypertext Transfer Protocol --
recommended for high volumes of SIP traffic. HTTP/1.1”, RFC 2068. IETF. January 1997.
While TCP is an excellent protocol for transferring [7] Xie Q., Stewart R., Sharp C., Rytina I., “SCTP
large amounts of data such as files or the contents of a Unreliable Data Mode Extension”, draft-ietf-
particular web page, it presents important limitation sigtran-usctp-01.txt. IETF. February 2001. Work
regarding signalling transport. Therefore, depending on in progress.
the SIP entity, UDP or SCTP are better choices to
transport SIP signalling. [8] Gulbrandsen A., Vixie P., Esibov L., “A DNS RR
for specifying the location of services (DNS
SRV)”, RFC 2782. IETF. February 2000.
Acronyms
[9] Rosenberg J., Schulzrinne H., “Reliability of
ACK: Acknowledgement
Provisional Responses in SIP”, draft-ietf-sip-
DNS: Domain Name System
100rel-03.txt. IETF. March 2001. Work in
HTTP: HyperText Transfer Protocol
progress.
IP: Internet Protocol
ISDN: Integrated Services Digital Network
ISUP: ISDN User Part Protocol
MTU: Maximum Transmission Unit
NAT: Network Address Translator
PRACK: Provisional ACK
PSTN: Public Switched Telephone Network
RTT: Round Trip Time
SACK: Selective ACK
SCTP: Stream Control Transmission Protocol
SIGTRAN: Signalling Transport
SIP: Session Initiation Protocol
SYN: Synchronize sequence numbers flag
TCP: Transmission Control Protocol
TSN: Transmission Sequence Number
UA: User Agent
UAC: User Agent Client
UDP: User Datagram Protocol
References
[1] Handley M., Schulzrinne H., Schooler E.,
Rosenberg J., “SIP: Session Initiation Protocol”,
RFC 2543. IETF. March 1999.
[2] Postel J., “Transmission Control Protocol”, RFC

793. IETF. September 1981.
-65-
Session Initiation Protocol in 3G
Tuomo Sipilä
Nokia Research Center, Helsinki, Finland
tuomo.sipila@nokia.com
- Radio Access Network Domain (RAN) consists

of the physical entities, which manage the
resources of the radio access network, and
Abstract provides the user with a mechanism to access the
core network. It can be either WCDMA based
This article gives an overview of the 3GPP IP
UTRAN or GSM/EDGE based GERAN
Multimedia subsystem and how SIP protocol is used
- Circuit Switched Core Network Domain (CS
with it. Also the 3G required changes to SIP are
CN) comprises all core network elements for
identified. In addition some key problems with SIP
provision of Circuit switched services
protocol are identified with regards to 3G mobile
- Packet Switched Core Network Subsystem (PS
networks.
CN) comprises all core network elements for
provision of PS connectivity services i.e.
guaranteeing the IP flow to and from the mobile
1 Introduction terminal (UE)
- IP Multimedia Core Network Subsystem
The 3rd Generation Partnership Project started the
(IMSS) contains all the network elements that are
development of IP based multimedia services to the 3rd
used to provide the IP base multimedia services
generation (3G) mobile networks, known as UMTS in
- Service Subsystem Comprises all elements
Europe and IMT-2000 in Japan, during autumn 1999.
providing capabilities to support operator specific
The initiation and pressure for the work came through
services (e.g. IN and OSA)
an organisation called 3G.IP which is a group formed
to push IP based ideas to 3G networks. The target was
to standardise the required enhancements for the 3G
network so that IP telephony and multimedia can be
provided with equal user perceived quality as with the
current mobile network services. Another requirement
was that the future 3G network could function fully
based on packet and IP connections without the
traditional circuit switched domain. Essentially also the
IP multimedia would in the future provide via IP a
wider and more flexible service set than the current
networks. In Spring 2000 IETF defined Session
Initiation Protocol (SIP) was selected as the base
protocol that shall provide the IP Multimedia sessions
to the mobile terminals (UE). This decision tied the
3GPP solution and work into co-operation with IETF.
Already during year 2000 it became evident that the
specification work would take longer that expected -
Figure 1: 3GPP Release 5 network architecture
since it requires specification of a completely new
and domains (not all elements are shown)
network subsystem with all required mobile functions.
Therefore the specification target was set to the end of
One of the fundamental principles of the 3GPP
2001 when the 3GPP Release 5 specifications should
network architecture is to maintain to large extent an
be finalised.
independence of the network domains and subsystem
so that independent evolution is possible. This means
2 3GPP Rel5 system architecture that the IP multimedia is seen to large extent as an
transparent service through the Radio Access and
The 3GPP Release 5 architecture is illustrated in the Packet Core Network domains. Other names for this is
figure 1. The 3GPP mobile system consists of the layered approach and access indepdendency.
following network domains: IP Multimedia Core Network Subsystem (IMSS) and
its functionality is the main focus of this article. For
66
making the system functionality more understandable - hide the operator network topology from users
the PS CN subsystem functionality is briefly and home/visited network. The network topology
illustrated. Also the IMSS linkage with the service is regarded as a key competitive factor between
subsystem that provides the open service generation is operators
briefly mentioned. - the resources shall be made available before the
destination alerts
2.1 PS CN Subsystem - identification of the entities with either SIP URL
The PS CN subsystem main functions are to establish or E.164 number
and maintain the connection between the terminal and - procedures for incoming and outgoing calls,
the GGSN, route the IP packets in both directions and emergency calls, presentation of originator
do charging. The Packet Switched Core Network identity, negotiation, accepting or rejecting
subsystem consists of the following GPRS based incoming sessions., suspending, resuming or
network elements and functions [4]: modifying the sessions
- Serving GPRS Support Node which maintains - user shall have the choice to select which session
the subscription data (identities and addresses) and components reject or accept
follows the location of the terminal within the
network 3.2 Architecture
- Gateway GPRS Support Node which maintains The IP Multimedia subsystem current architecture
the subscription information, allocated IP showing the functions (March 2001) is in figure 2.
addresses and follows the SGSN under which the Note that several of the illustrated functions can be
terminal is. merged into real network elements. The functions and
their purposes are clarified in the following
The PS CN subsystem in connected to the IMSS via subsections. It should be noted that the standardisation
Go and Gi interfaces that are located in the GGSN. The for the system is still ongoing so changes can be
Gi interface is the one that is also used for standard expected.
Internet access and it is relatively transparent. The Go Applications
Legacy mobile
signalling
interface is used for policy control between IM & Services network External IP
networks
Subsystem and GGSN and packet core. The reasons for SCP
and other IMS
policy control is to allow the operators to limit the R-SGW networks
P/I/S-CSCF
utilisation of the best 3G packet QoS classes to their Sc Ms
Mh
own IP Multimedia services. MRF Mc
S-CSCF
Mm BGCF
The IP connections between terminal (UE) and GGSN HSS
Cx
Mw
Mm
are provided by PDP contexts . At PDP context Cx

Gc Gi I-CSCF Mi
establishment the used QoS profile and terminal IP Mw Mk
BGCF
address are allocated. GGSN
Go
P-CSCF
Mw Mg
Mj
Gi
MGCF T-SGW
Mc
3 IP Multimedia Subsystem MGW
PSTN/
Legacy /External
• Gi interface from GGSN to external

networks is not shown in the figure
3.1 Requirements
Figure 2: IP Multimedia Subsystem
The 3GPP TS 22.228 [6] specifies the following main
service requirements for the IP Multimedia Subsystem:
- at least equal end-to-end QoS for voice as in 3.3 HSS
circuit switched (AMR Codec based) wireless Home Subscriber Server (HSS) is a combination of the
systems currently existing UMTS/GSM HLR and the needed
- equal privacy, security or authentication as in register functions for IP Multimedia Subsystem. HSS
GPRS and circuit switched services will provide the following functions:
- QoS negotiation possibility for IP sessions and - user identification, numbering and addressing
media components by both ends information.
- access independence i.e. the IP Multimedia - user security information: Network access control
network and protocols evolve independently of information for authentication and authorisation
radio access (WCDMA, EDGE/GSM/GPRS, - user location information at inter-system level;
WLAN etc) HSS handles the user registration, and stores inter-
- applications shall not be standardised system location information, etc.
- IP policy control possible i.e the operators shall - the user profile (services, service specific
have the means to control which IP flows use the information…) [3]
real-time QoS bearers
- automated roaming with the services in home
and visited network
67
3.4 P-CSCF - It maintains session state and has the session
control for the registered endpoint's sessions
Proxy Call State Control Function (P-CSCF) performs
- Acts like a Registrar defined in the RFC2543[9],
the following functions:
i.e. it accepts Register requests and makes its
- Is the first contact point for UE within IM CN
information available through the location server
subsystem, forwards the registration to the I-
(e.g. HSS)
CSCF to find the S-CSCF and after that forwards
- may also behave as a proxy or as a user agent as
the SIP messages between UE and I-CSCF/S-
defined by RFC 2543 [9]
CSCF
- Interacts with Services Platforms for the support of
- Behaves as like a proxy in RFC 2543 [9]i.e.
Services
accepts requests and services the internally or
- obtain the address of the destination I-CSCF based
forwards them possibly after translation
on the dialled number or SIP URL
- may behave also like a RFC 2543 [9] User agent
- on behalf of a UE forward the SIP requests or
i.e. in abnormal conditions it may terminate and
responses to a P-CSCF or an I-CSCF if an I-CSCF
independently generate SIP transactions
is used in the path in the roaming case
- is discovered using DHCP during registration or
- generates charging information
the address is sent with PDP context activation
- Security issues are currently open in
- may modify the URI of outgoing requests
standardisation [3]
according to the local operator rules (e.g. perform
number analysis, detect local service numbers)
- detect and forward emergency calls to local S-
3.7 MGCF
CSCF Media Gateway Control Function (MGCF) Provides
- generation of charging information the following functions:
- maintains security association between itself and - protocol conversion between ISUP and SIP
UE, also provides security towards S-CSCF - routes incoming calls to appropriate CSCF
- provides the policy control function (PCF) - controls MGW resources [3]
- authorisation of bearer resources, QoS
management and Security issues are currently 3.8 MGW
open in standardisation [3]. Media Gateway (MGW) provides the following
functions:
3.5 I-CSCF - Transcoding between PSTN and 3G voice codecs
Interrogating Call State Control Function (I-CSCF) - Termination of SCN bearer channels
performs the following functions: - Termination of RTP streams [3]
- is the contact point within an operator’s network
for all connections destined to a subscriber of that 3.9 T-SGW
network operator, or a roaming subscriber Transport Signalling Gateway provides the following
currently located within that network operator’s functions
service area. It can be regarded as a kind of - Maps call related signalling from/to PSTN/PLMN
firewall between the external IMSS and the on an IP bearer
operators internal IMSS network. There may be - Provides PSTN/PLMN <-> IP transport level
multiple I-CSCFs within an operator’s network address mapping [3]
- Assigns a S-CSCF to a user performing SIP
registration 3.10 MRF
- Route a SIP request received from another Multimedia Resource Function provides the following
network towards the S-CSCF functions:
- Obtains from HSS the Address of the S-CSCF - Performs multiparty call and multimedia
- charging and resource utilisation conferencing functions [3]
- in performing the above functions the operator
may use I-CSCF to hide the configuration, 3.11 BGCF
capacity, and topology of the its network from the
The S-CSCF, possibly in conjunction with an
outside
application server, shall determine that the session
- additional functions related to inter-operator
should be forwarded to the PSTN. The S-CSCF will
security are for further study
forward the Invite information flow to the Breakout
3.6 S-CSCF Gateway control function (BGCF) in the same
network.
Serving Call State Control Function) (S-CSCF) The BGCF selects the network in which the
performs the following functions: interworking should occur based on local policy. If the
- performs the session control services for the BGCF determines that the interworking should occur in
terminal. Within an operator’s network, different the same network, then the BGCF selects the MGCF
S-CSCFs may have different functionality which will perform the interworking, otherwise the
68
BGCF forward the invite information flow to the SLF HSS AS
BGCF in the selected network. The MGCF will Dx Cx Cx
perform the interworking to the PSTN and control the Gm Mw Mw
MGW for the media conversions UE P-CSCF I-CSCF S-CSCF

SIP SIP SIP
Mg
3.12 IMSS functionality SIP
Figure 3 shows the call model in mobile to mobile call Mg
case when both callee and caller are roaming. In the BCGF MGCF
roaming case i.e. when a user roams to network that is

outside his home network the IP multimedia services Mc
are provided by the S-CSCF in the home network. The MGW
P-CSCF in the visited network forwards the service Figure 4: SIP protocol in IM SS [5]
request to the home network. However in some cases
some services can be provided directly via the visited Eventually there may be differences in the SIP
network i.e. by the P-CSCF. The P-CSCF is needed in procedures of Gm and Mw reference points. This
the home network to allow for the network flexiblity implies that there is a difference in UNI and NNI
because S-CSCFs may contain different services and interfaces [3].
also in the roaming case allow the visited operator
handle the call and provide local services. The local The following procedures have been defined for the
services can be an emergency call or other localised 3GPP IM subsystem in [3]:
services such as services related to geographical
location of the user or local numbering plans. I-CSCF - Local P-CSCF discovery: Either using DHCP or
is acting as a protective firewall between home and carrying address in the PDP context
visited networks. Notice that the true physical elements - S-CSCF assignment and cancel
may contain one or several of the CSCF functions [6]. - S-CSCF registration
- S-CSCF re-registration
User A
A’s visited network Required on A’s home network
- S-CSCF de-registration (UE or network initiated)
registration,
optional on - Call establishment procedures separated for
P-CSCF
sessiion establish
S-CSCF - Mobile origination; roaming, home and PSTN
I-CSCF
- Mobile termination; roaming, home and PSTN
I-CSCF
Optional
- S-CSCF/MGCF – S-CSCF/MGCF; between
and within operators, PSTN in the same and
User B
different network
I-CSCF - Routing information interrogation
I-CSCF
Required on
P-CSCF - Session release
S-CSCF registration,
optional on
- Session hold and resume
B’s home network
sessiion establish
B’s visited network
- Anonymous session establishment
Figure 3: Call model in roaming case [5] - Codec and media flow negotiation (Initial and
changes)
- Called ID procedures
4 SIP protocol in 3GPP Rel5 - Session redirect
- Session Transfer
4.1 SIP in IMSS 4.2 SIP in Service SS
SIP and SDP as a protocol has been selected to some The service subsystem and its connections to IM
and IPv6 as the only solution to all of the IP subsystem is shown in the figure 5. The S-CSCF
Multimedia Subsystem interfaces. interfaces the application development servers with
As shown by the figure 4 the basic SIP (RFC2543[9]) SIP+ protocols. The SIP application server can reside
has been selected as the main protocol on the following either outside or within operators network [3]. The
interfaces: OSA capability server and Camel refer to already
- Gm: P-CSCF - UE standardised 3G and GSM based service generation
- Mw: P-CSCF – S-CSCF and P-CSCF – I-CSCF elements.
- Mm: S/I-CSCF - external IP networks & other
IMS networks
- Mg: S-CSCF – BCGF
Mk: BCGF – external IP networks & other IMS
networks
69
SIP Application
Server 4.5 Bearer reservation before alerting
SIP+ For the session flow (user plane traffic) a secondary
Cx SIP+ OSA Service
OSA API
OSA
PDP context with different QoS requirements is
S-CSCF
HSS Capability Server
(SCS)
Application
Server activated. A timing synchronism has to be sought
SIP+ between signalling PDP context establishment,
IM SSF
secondary PDP context establishment, SIP connection
negotiation and callee alerting. This is needed to avoid
MAP
CAP alerting before the resources are available and to find

CAMEL Service
Environment
the fastest call establishment solution. This can be
resolved with 2 phase call setup. The procedures are
shown in the figure 6. For simplicity only the call
Figure 5: Service Subsystem connections with IMSS
originating part is shown. Also it should be noted that
SIP+ is used to interface the Application servers on the the PDP Context activation or the radio access bearer
following interfaces: setups are not shown. In the receiving end the PDP
- S-CSCF- SIP Application server context is established after the message 19: 200 OK
- S-CSCF- Camel Server [1][3][4].
- S-CSCF-OSA Service Server
Visited Network Home Network
I-CSCF
The plus sign implies here that the standard SIP may UE#1 P-CSCF (Firewall) S-CSCF
need to be modified. The modifications have not been 1. INVITE

2. 100 Trying
identified yet [3]. 3. INVITE

4. INVITE
4.3 The 3GPP Release 5 IMSS procedures

5. 100 Trying
6. 100 Trying
7. Service Control
The PS CN Subsystem is strongly linked with the IP 8. INVITE

9. 100 Trying
Multimedia since it shall provide the bearer through 11. 183 SDP
10. 183 SDP
radio and the packet core network for the IP 12. 183 SDP
13. Authorize QoS Resources
Multimedia Signalling (SIP) and also for the IP media 14. 183 SDP
15. PRACK
streams. Thus co-operation is required on some level. 16. PRACK

17. PRACK
18. PRACK
Special key topics are: 20. 200 OK

19. 200 OK
- Handling of mobile terminated calls 22. 200 OK

21. 200 OK
- Bearer reservation before alerting

23. Resource
Reservation
24. COMET
25. COMET
26. COMET
4.4 Mobile terminated calls 29. 200 OK

27. COMET
28. 200 OK
For mobile terminated calls the options are:

30. 200 OK
31. 200 OK
1) have network initiated PDP Context activation 34. 180 Ringing

33. 180 Ringing
32. 180 Ringing
2) provide an always on PDP context. 35. 180 Ringing
36. PRACK
37. PRACK
38. PRACK
39. PRACK
The network initiated PDP conext is currently 41. 200 OK

40. 200 OK
discussed in 3GPP in the context of push services. The 43. 200 OK

42. 200 OK
44. 200 OK
problem of the network initiated context activation is 45. Service Control
that the usage of dynamic IP addressing is not possible 48. 200 OK

47. 200 OK
46. 200 OK
without enhacements to the network. The discussions 49. ACK

50. ACK
are still open and the solution for the address allocation 51. ACK
52. ACK
is sought [10].
For the second option i.e. using signalling PDP context
there are two alternate methods how the P-CSCF
address is provided to the terminal: either during the Figure 6: The 3GPP SIP 2-phase call setup [4]
PDP context activation or after that with DHCP
procedures. The latter case requires that the PDP
Context is modified after the IP address of the P-CSCF 5 3GPP SIP requirements
has been found so that the GGSN can filter the SIP 3GPP is in its specifications referring to IETF
traffic to the correct PDP flow or a new PDP context is specifications and the target has been to minimise the
established for SIP with the correct filter information changes. 3GPP is currently dependent on completion of
and the old is released. At the moment both options for the following SIP WG items [5] :
the CSCF discovery are available in the specifications - draft-ietf-sip-rfc2543bis: SIP: Session Initiation
[1][3]. Protocol
- draft-sip-manyfolks-resource: Integration of
resource management and SIP
70
- draft-ietf-sip-100rel: Reliability of Provisional - guarantees of QoS: Several elements and several
Responses in SIP IP based interfaces, in addition the packet radio
- draft-ietf-sip-privacy: SIP extensions for caller included in the path while the requirements are at
identity and privacy the same level as current GSM circuit voice calls
- draft-ietf-sip-call-auth: SIP extensions for media - lengthy standardisation time: more issues there
authorization are to standardise, more there are opinions and
- draft-roach-sip-subscribe-notify more time it will take
- suitability of the SIP protocol for the radio
3GPP has found out that a major part of the features interface i.e. it is a character based protocol with
are already provided by the SIP protocol and thus very long signalling messages and requires certain
few candidates for 3GPP originated enhancements transport quality
have been identified [7]. The following SIP - IETF and 3GPP standardisation co-operation:
enhancements have been recognised so far [1][8]: the operations and the behaviour are different in
- addition of routing PATH header to the SIP IETF and 3GPP
messages to record the signalling path from P- - Terminal complexity: the terminals become more
CSCF to S-CSCF and more complex with several protocol stacks,
- location information in the INVITE message to only to provide very similar services than today.
carry the location of the terminal (for instance Cell SIP has to provide true revolution in applications
ID) and services
- emergency call type is needed to indicate the type
of emergency call i.e. is it police, ambulance etc.
- filtering of routing information in the IM SS 7 Conclusions
before the SIP message is sent to the terminal to
The major identified differences with the SIP IETF in
hide the network topology from terminal
and 3GPP are as follows:
- refresh mechanism inside IM SS
1. the architecture of the IMSS is defined based on
- Network-initiated de-registration
3G model (home and visited), messages run
- 183 Session Progress provisional response for
always via S-CSCF
INVITE to ensure that the altering is not generated
2. Registration is mandatory
before PDP contexts for session are activated
3. The CSCFs interrogate the SIP and SDP flows
- Reliability of provisional responses – PRACK
either actively modifying the messages or reading
method to acknowledge the 183 message
the data, also the I-CSCF hides the names of CSCF
- Usage of session timers to keep the SIP session
behind it
alive
4. Codec negotiations in 3GPP do not allow different
- Indication of resource reservation status –
codecs in different directions
COMET method
5. in 3G networks there is a separation of UNI and
- Security for privacy
NNI interface
- Extensions for caller preferences and callee
6. due to radio and packet core functionality there are
capabilities
some change proposals to the SIP and SDP
- Media authorisation token
7. due to the P-CSCF – S-CSCF interface and the 3G
roaming mode there are some requirements to the
Discussions are currently ongoing on the changes
SIP and SDP protocols
between 3GPP and IETF.
8. in 3G SIP is used also to interface the application
development elements, they set requirements for
6 Problems and open issues SIP and SDP protocols
The following problems can be identified in the 3GPP Despite of the above mentioned differences it seems
IP Multimedia Subsystem: that the SIP protocol is suitable to the needs of the
- architecture complexity i.e. with several UMTS network. The identified problems can be
functions there will be several interfaces, overcome and some of them have political or
implementation may differ from vendor to vendor architectural nature thus they are more of choices than
thus the multivendor cases may become problems. The current work in 3GPP is still unfinished
challenging and the discussion with IETF has just been started. It
- call establishment delay problems due to the is likely that the 3GPP Release 5 shall contain some
signalling taking place on multiple levels (RAN, specifications on SIP and IMSS architecture but their
PS CN, IMSS). By making some calculations maturity is not probably too good by the end of 2001 to
based on figure 6 for establishing a call there will guarantee fully functioning network. One major
be 6 round trip times (RTT) end to end on SIP advantage is that the SIP changes so far required by
level. In addition to that there are the PDP context 3GPP are not extensive thus the SIP can probably be
reservations which take one round trip time tailored for 3GPP. However, since the specification
between UE and GGSN. work for a new subsystem is a relatively large, it can
71
be expected that the specification work will continue
also during 2002 and beyond. When the SIP and IMSS
has been finalised for the UMTS network the real-time
packet services can provide for the operators a true way
to differentiate from each other and thus generate
longed for revenues.
References
[1] Ahvonen, Kati: Master's Thesis: IP telephony
signalling in a UMTS All IP network, Helsinki
University of Technology, 24.11.2000
[2] 3G TS 23.002 version 5.1.0 Network Architecture
(Release 5)
[3] 3G TS 23.228 version 2.0.0 IP Multimedia (IM)
Subsystem - Stage 2
Subsystem - Stage 3
[5] Drage, Keith: 3GPP and SIP. Presentation in IETF
#50 (March 18-23, 2001).
http://www.softarmor.com/sipwg/meets/ietf50/slid
es/drage-3gpp-sip.ppt
[6] 3G TS 22.228 V5.0.0 (2001-01) Service
requirements for the IP Multimedia Core Network
Subsystem (Stage 1) (Release 5)
[7] Meeting minutes of 3GPP TSG-CN/SA SIP ad-
hoc, February 12-14
http://www.3gpp.org/ftp/TSG_CN/WG1_mm-cc-
sm/SIP_meetings/CN1_SA2_03_(New%20Jersey)
/Report/NewJersey0102.zip
[8] Tdoc N1-010233; 3GPP TSG-SA WG2 / TSG-CN
WG1 SIP ad-hoc meeting, 13-15 February, 2001,
New Jersey, USA; Nokia: Feedback from IETF's
interim SIP WG meeting held on week #6
http://www.3gpp.org/ftp/TSG_CN/WG1_mm-cc-
sm/SIP_meetings/CN1_SA2_03_(New%20Jersey)
/Tdocs/N1-010233%20.zip
[9] RFC 2543 SIP: Session Initiation Protocol . ,
March 1999. http://www.ietf.org/rfc/rfc2543.txt
[10] 3GPP TR 23.874 V1.3.0 (2000-11) Feasibility
study of architecture for push service (Release 4)
http://www.3gpp.org/ftp/Specs/Latest_drafts/2387
4-130.ZIP
72
SIP Service Architecture
Markus Isomäki
Senior Research Engineer
Nokia Research Center
Markus.isomaki@nokia.com
Besides IETF, several standardization organizations

have selected SIP as a cornerstone in their
architecture models. The most notable of these is
Abstract perhaps the Third Generation Partnership Project
(3GPP), who decided to use SIP as a session control
protocol in the future releases of Third Generation
Session Initiation Protocol (SIP) is an application layer
cellular network specifications.
signaling protocol for creating and modifying
multimedia sessions. In this paper an overview of SIP
based service architecture is presented. This includes
introduction to SIP Application Servers, which In addition to traditional call control and
implement the "service logic". Service programming supplementary services, SIP can be used to build
methods such as Call Processing Language are briefly more advanced session related services by adding
described, as well as service building blocks such as more intelligence to SIP servers and User Agents.
Third Party Call Control and call transfer. In order to The services cover those currently provided by
draw everything together, an example of Intelligent Networks (IN) in the PSTN, as well as
"autoconferecing" service is provided. Finally 3GPP completely new service types. By combining SIP
Service Control architecture is described to point out with protocols such as HTTP, VoiceXML, RTSP
how the principles could be applied in future mobile and SMTP a full-blown multimedia infrastructure
networks. combining voice, video, web, e-mail and instant
message communication can be created.
1 Introduction In this paper an overview of SIP based service

architecture is presented. Chapter 2 presents a
Session Initiation Protocol (SIP) is an application generic SIP service model that works as basis for
layer signaling protocol for creating, modifying and further discussion. Chapter 3 introduces
terminating multimedia sessions with two or more Application Servers and provides insight into their
participants [1]. These sessions include IP telephony functionality. Chapter 4 briefly describes some
and video calls, multimedia conferences and media useful service building blocks that have been
distribution. SIP supports user mobility by developed for SIP. Chapter 5 lists different possible
proxying and redirecting requests to user's current tools for programming Application Servers.
registered location. Chapter 6 draws everything together by going
through a complex service example. Finally,
Chapter 7 intoduces 3GPP service control
architecture and Chapter 8 points out the major
The protocol is still under heavy development and
conclusions of the paper.
innovation in the Internet Engineering Task Force
(IETF). The first version of the protocol was
published as RFC 2543 already in March 1999, but
since then the effort has only intensified. The 2 Service Model
current work includes refinements to the base
protocol (known as rfc2543bis), as well as a large
number of extensions for various purposes. Some of The service architecture depends in large degree on
the extensions are merely ways of using existing routing and security model of the servic providers.
SIP, such as third party call control. Others are true Figure 1 depicts a generic SIP service model with
extensions offering new capabilities, such as session roaming support, with P denoting a Proxy server and
transfer method and generic event subscription, AS denoting Application Server. Roaming and
notification and messaging framework. mobility are emphasized in the model, as most users
and terminals are expected to be mobile in any future
network, and a model excluding them would soon be
obsolete.
73
who interact with other protocols such as HTTP or
RTSP.
AS1 AS2 If user is roaming outside his home domain, he may

need to use a local outbound proxy in the visited
domain. The outbound proxy may authenticate the user
using some AAA protocol such as Diameter to make a
A P1 P2 P3 P4 B query to home domain's AAA server. Outbound proxy
may also help the user with NAT or firewall traversal
and authorize network layer QoS resources for user's
A'sVisited A'sHome B'sHome B'sVisited media flows. In the Figure P1 and P4 denote local
Domain Domain Domain Domain
outbound proxies for users A and B, respectively. If the
user is able to bypass the local outbound proxy, he is
Figure 1. SIP Domains and Elements. effectively in his home domain. Outbound proxies are
usually not intended for service control, but they might
Each user has a home domain with whom he shares be useful in offering local or location based services for
a security association, either based on shared secret the visiting user. These include emergency call
or PKI. User registers to one of the registrars in his services.
home domain, and home domain should also be
aware of his services for both originating and So, when user A issues a request to user B, his terminal
terminating requests. Note that SIP home domain first sends it to P1. P1 routes the request to A's home
does not have to have anything to do with IP-layer network, eventually to P2. P2 controls user A's
home network, where e.g. the Mobile IP Home originating services and can use various Application
Agent resides. Servers to execute them. After that, if no specific
action is taken by the service logic, the request is
routed to recipient's, that is B's, home domain to P3. P3
All incoming requests to user are first routed to his controls B's terminating services, and utilizes
home domain, where registration information is Application Servers to achieve them. After that, if that
used to route the requests to the user himself. is what the service logic tells, request is routed to P4
Outgoing requests can bypass home domain servers and finally to B. B's responses and the further
what comes to basic routing, but if some negotiation drives the service machinery in both A's
"originating" session services are required, they and B's domains in those proxies that decided to stay
should also traverse home domain. It is the on the signaling path during the initial request.
responsibility of the home domain to execute both
"terminating" and "originating" request services It should be noted that User Agents or terminals
for the user. Usually the overall control is left to a themselves can do services, such as forwarding or
specific proxy, which is always on the signaling path screening or special ringing tones. However, terminals
of both incoming and outgoing requests. Sometimes may be out of network coverage or otherwise
this proxy is called a "feature proxy", in 3GPP disconnected, so everything can not be left to them.
terminology it is called Serving Call State Control Otherwise basically every element on the signaling
Function (S-CSCF). In the Figure P2 and P3 denote path can do a "service".
user A's and user B's feature proxies, respectively.
3 Application Server Mechanisms
In order to implement more complex services, the and Components
feature proxy can use specific Application Servers
(AS1 and AS2 in the Figure) by routing certain
requests to them. Application Servers can be either In SIP there are basically four different types of
specialized for one task (number translation, servers in addition to User Agent software running
conferencing) or they can do several tasks. In SIP in terminals:
nomenclature Application Servers can be either
proxies, redirect servers or back-to-back User • Registrar is the server that handles registrations
Agents. More details of Application Servers are and maintains state from which user's current
provided in Chapter 3. In order to implement contact point and preferences can be determined
services Application Servers can interact with e.g. (actually, part of this state can be stored to external
Intelligent Network, or use other means such as elements, location server and presence server).
those introduced in Chapter 5. They can also utilize Registrar is always in the home domain.
mechanisms described in Chapter 4 as building • Proxy forwards the incoming request further based
blocks for services. It is usually Application Servers on some internal logic. Proxies can be transaction
stateless or stateful. In some cases they can be
74
even session stateful, which of course reduces their • Conferencing Server or bridge is able to mix
scalability. There can be proxies for different
medias coming from different parties together
purposes:
to implement a multi-party conference. SIP is
• Core Routing proxy used as a control protocol, also more tight
• Gateway controlling proxy control can be obtained using a special
conference control protocol.
• Firewall controlling proxy
• QoS controlling proxy • Presence Server obtains information on users
communication state and preferences or
• "Feature proxy" or "Regional Routing proxy", announcements ("I'm eating") and can convey
whose duty is to orchestrate the service this information to interested parties. More
routing and execution as explained in Chapter details in the next Chapter.
2.
• Redirect Server does not forward the request • Text-to-Speech Server is able to translate a text
further, but returns it towards its originator with stream to speech and possibly even vice versa.
redirection information. Redirect servers are very
easy to implement and scalable, and thus they are • Messaging Server is able to issue instant
powerful tools for simple services. messages to users.
• Back-to-Back User Agent is the term used for • Web Server and E-mail Server can be used to
any "proxy-like" element in the network which enhance services by alternative communication
does something more than a proxy is supposed to methods. For example interactive voice
do. This includes e.g. issuing requests to ongoing response can be replaced by web-based
sessions "in the middle" or modifying SDP dialogues. Web-pages can contain embedded
parameters. SIP or mailto URLs and SIP messages can
contain HTTP URLs, so for example an initial
SIP Application Server (APSE) is a vague term voice session initiation can be redirected to
which is not defined anywhere in the official fetch a HTML/XML document.
specifications. Basically Application Server can be a
Proxy, Redirect Server or Back-to-Back User
Agent. If simple change in the Request-URI is all Complex services can be built by combining the
that a service requires, redirection is the way to go. capabilities of different servers. It is feasible to treat
However, if changes in other headers, call state the servers as resources or service components
monitoring or acting upon responses is desired it addressible by either SIP, RTSP or HTTP URL, as
takes at least a proxy. APSE acting as a proxy proposed in [2]. In that way the servers who use
differs from common proxy only in how much other servers do not need to know the internal
intelligence or programmability it has. Obviously details of what they are using. This is an opposite
there is no official distinction between the two. If approach compared to device control protocols such
proxy functionality is not enough, a Back-to-Back as MGCP or Megaco, where the controller has tight
UA is needed. control over the slave and needs to understand
slave's internal structure to some extent. MGCP
and Megaco can still be used to separate media part
It may be useful to let one APSE to handle only one from the control part.
specific task and treat them as components from
which complete services can be build. In addition to
routing and control APSEs also other special media
handling components are needed in the network: 4 Service Building Blocks
This Chapter introduces some of the building blocks

• Media Server can play announcements or that APSEs and User Agents can use to construct
stream other audio or video content to the user. services.
It can also be used to collect DTMF "digits".
Media Server is needed for e.g. announcements, Record-Routing is one of the key features in SIP
voicemail and interactive voice response. Such a protocol design and it is very useful to APSEs, as it
server can be controlled by SIP (session allows them to choose whether to remain on the
creation and termination), RTSP (media signaling path of certain session after the first
control) and HTTP/VoiceXML (interactive transaction has been completed.
voice response dialogues).
75
Forking is another nice feature of SIP, which allows
for "parallel searches" in order to save time. C INVITE
INVITE
Unfortunately complete forking solution is only
possible for INVITE requests.
Event Notification is a SIP extension to provide a UA1 UA2

generic and extensible framework by which SIP nodes
MEDIA
can request notification from remote nodes indicating
that certain events have occurred [3]. The framework is Figure 3. Third Party Call Control.
based on two new methods – SUBSCRIBE and
NOTIFY. Third Party Call Control is suitable e.g. for
situations where an Application Server wishes to
Using SUBSCRIBE SIP User Agents can subscribe to create a session between a user and a Media Server
state of any resource in the network that is addressible or it wants to invite a user to a centralized
with a SIP URL. When the state changes, a NOTIFY conference. APSE can terminate the session when it
with relevant information is sent to each of the so wishes, but it does not have to cope with the
subscribers. media. Third Party Call Control does not require
any SIP extensions, as it should not make any
It is not intended that SIP SUBSCRIBE and NOTIFY difference to UA in terms of processing incoming
would be used for all types and classes of events. sessions. Third Party Call Control is explained in
However, events related to session or user registration detail in [4].
state fall well within the intended scope. In fact, by
extending user's registration state to include so called
Presence information to which other users can REFER is a new SIP method for implementing
subscribe, innovative new service types are made various types of call or session transfer. Figure
possible. depicts a common use of REFER, namely
unattended transfer. Three parties are involved:
A typical SUBSCRIBE-NOTIFY message flow is Transferor (party initiating transfer), Transferee
depicted in Figure 2 [3]. If SUBSCRIBEs are Record- (party being transferred) and Transfer Target
Routed, NOTIFYs follow the same (reverse) path. (party to whom the transfer is intended). First
Transferee calls Transferor, and a session between
Subscriber Resource them is established. After they have e.g. talked for a
´SUBSCRIBE while Trasferor decides that it is time to initiate the
200 OK
transfer. It first puts Transferee on hold by issuing
a re-INVITE with special SDP parameters. After
NOTIFY State that it issues a REFER request to Transferee
200 OK
Changes indicating in Refer-To header the address of the
Transfer Target. Transferee accepts the request
NOTIFY State (202), and using the information in REFER issues
200 OK
Changes an INVITE to Transfer Target.
If INVITE is successful, Transferee and Transfer

Figure 2. SUBSCRIBE and NOTIFY. target now have an ongoing session. After that
Transferee sends Transferor NOTIFY to indicate
Third Party Call Control allows a third party to setup the success of transfer. Transferor is then able to
a session between two User Agents by issuing both of terminate the original session. Thus, session has
them a separate INVITE request and relaying the been transferred. If the transfer were not successful,
session descriptions (SDPs) from one to another. Thus, Transferor could resume the original session by
media will flow directly between the two UAs. The issuing re-INVITE instead of BYE.
scenario is depicted in Figure 3.
76
Transferor Transferee Transfer Target telephony servers [7]. Thus, even end-users
INVITE/200 OK/ACK
themselves could program CPL-scripts and
download them to servers without being able to do
INVITE (hold)/200 OK/ACK
any severe harm. The goal of the language
REFER
definition has been that it would be possible to
202 Accepted generate it with graphical tools. Nothing prevents
INVITE/200 OK/ACK running CPL directly in the User Agent. CPL
NOTIFY (200 OK) should become IETF standard-track RFC during
200 OK spring 2001.
BYE/200 OK • SIP Common Gateway Interface (CGI) is
BYE/200 OK equivalent to popular HTTP CGI, the difference
being the protocol under control. SIP CGI opens
the contents of SIP message headers directly
accessible by external programs, in a programming
language independent way.
Figure 4. Unattended transfer with REFER method. • SIP Servlets or SIPlets are equivalent to Java
Servlets used in Web programming. They provide
According to current specifications, REFER can occur a certain class library for developers to access and
outside of call-leg, thus it can basically initiate sessions control the SIP stack. Servlets usually offer better
from scratch. Refer-To header can contain other URLs performance than CGI scripts due to their more
than SIP URLs, thus it would be possible to initiate advanced handling of processes. Being tied to
also other than SIP-based forms of communication. Java, Servlets share the advantages and drawbacks
REFER is suitable to similar types of scenarios as of the language.
Third Party Call Control, although both have their
advantages. REFER details can be found in [5].
• SIP JAIN is another Java-based programming
interface to control the SIP stack.
Messaging is a new form of communication supported
Of the four presented mechanisms, the three latter ones
by SIP and is currently under development. Messaging
seem to be suitable for same type of tasks and are thus
can be done either with MESSAGE method or by
competing with each other. CPL has a bit different
opening a messaging session with INVITE. Messaging
scope, as it has certain limitations. One of the trade-
is a useful tool between two users but as well between
offs in the control interface is how much the developer
an APSE and a user. MESSAGE is specified in [6].
has to understand SIP. Low-level interfaces such as
CGI require certain knowledge of SIP messages and
Caller Preferences and Callee Capabilities can be
allow efficient and pinpointed control. On the other
expressed easily in SIP by header parameters.
hand high-level interfaces such as CPL do not require
Examples include indication of supported media types
SIP expertice, but lack efficiency and advanced
or spoken languages or type of the device the user
features. There are already Application Server products
currently has.
(or at least prototypes) available supporting at least
CPL, SIP CGI and Servlets.
5 Service Creation Tools
Also Intelligent Networks can be used to provide
service control for SIP proxies. This can be achieved
After talking about Application Servers and service by mapping SIP state machine to Basic Call State
building blocks it is useful to look briefly how the Machine (BCSM) and controlling it by INAP or
services can actually be programmed. As an APSE CAP protocols. In this architecture SIP proxy plays
is usually a proxy or redirect server, the task is to the role of "Soft" Service Switching Function (SSF)
find suitable tools to program them to handle which interacts with external Service Control
incoming SIP messages in a desired way. Function (SCF). While this model is suitable for
bringing "legacy" telephony services to SIP
network, it lacks the capabilities and flexibility
There are currently a lot of tools available for the task:
required for integration with other protocols and
• Call Processing Language (CPL) is an XML- services, and the most advanced features of SIP are
based language that can be used to describe and not utilized. Open Services Architecture (OSA) is
control Internet telephony services. CPL is not tied another telephone network oriented service control
to any signaling protocol, but the expectation is platform, which can be used to control SIP services.
that either SIP or H.323 is used. CPL is designed
so that it is powerful enough to describe a large
number of services and features, but it is limited in
power so that it can run safely in Internet
77
6 Example Service using Third Party Call Control. This can be
preceded with some kind of authentication scheme
using a web page or Media Server IVR. Conference
At this point it is useful to go through an example Server ties all participants together based on the
service to illustrate how different APSEs could Request-URI in INVITE it receives. Thus, all
interoperate using the building blocks defined in participants are in the same conference. When new
Chapter 4. participants join, Controller can invite Media
Server to the conference to play short
announcements like: "Bob just joined the
"Autoconferencing" service is used as an example. conference".
The service works so that a user is able to schedule
an audio or video conference to start when all
required participants indicate that they are All interaction between different servers happens
available for conferencing. Figure 5 depicts the by exchanging standard SIP and HTTP messages.
needed components: Controller for orchestrating Inside Media and Conferencing servers, MGCP or
the service, Presence Server for obtaining users' Megaco could be used to separate media and
availability, Messaging and Media Servers to send control processing from each other.
announcements and finally a Conferencing Server
to bridge the conference participants together.
7 3GPP Service Architecture
Mess. Prec.
3GPP is currently defining its service control
Server Server architecture for Release 5 IP Multimedia Subsystem
Contr. (IMS). The specifications are to be completed by the
end of year 2001. IMS is based on SIP protocol with
Media Conf. minor 3GPP specific modifications. In IMS each
Server Server incoming and outgoing request is routed via
subscribers home network through an element
called Serving Call State Control Function (S-
CSCF).
Terminals
S-CSCF plays the roles of registrar, feature proxy and
Figure 5. Autoconferencing Service. in some cases also back-to-back user agent. It has to
route the incoming and outgoing requests to correct
SIP Application Servers and other external service
First, a user fills in a web page order for the platforms according to subscribers' service profile.
conference listing all the required participants and Besides SIP APSEs, also CAMEL and OSA service
sends it to the Controller Server using standard control are supported in Release 5.
HTTP/HTML. Controller then subscribes to
presence information of all participants with SIP 3GPP Release 5 service architecture is depicted in
SUBSCRIBE and starts to receive NOTIFYs when Figure 6 [8]. S-CSCF routes incoming and outgoing
their presence state changes. When all participansts requests to APSEs and other service platforms using a
seem to be available for conferencing, Controller protocol called "SIP+". SIP+ requirements and
Server either sends them an instant message using definition work has just started in 3GPP, so the only
Messaging Server or invites them to a session with known fact about it is that it should resemble SIP as
Media Server using Third Party Call Control. The much as possible. Mapping to CAMEL (CAP) and
purpose of these actions is to get an OSA does not happen in S-CSCF but rather in the
acknowledgement from the participants on their specialized gateway elements shown in the Figure.
willingness to join the conference. Message can
contain push buttons or links to achieve this, while
Media Server can use IVR dialogues to get users
opinion. IVR dialogue can be controlled by the
Controller Server.
In the end the Controller receives a "yes" or "no"

answer from each participant. If everything is
proceeding well, Controller now invites each
participant to a session with Conference Server
78
SIP Application
Server be made work together and how service specific
routing of requests should be orchestrated.
SIP+
OSA API
Cx SIP+ OSA Service OSA
HSS S-CSCF Capability Server Application
SIP+
(SCS) Server
References
IM SSF
[1] M. Handley, H. Schulzrinne, E. Schooler, J.

MAP
CAP Rosenberg, "SIP: Session Initiation Protocol", Internet-

CAMEL Service Draft (work in progress), November 2000,
http://search.ietf.org/internet-drafts/draft-ietf-sip-
Environment
Figure 6. S-CSCF and Service Control interfaces. rfc2543bis-02.txt
3GPP IMS Application Servers should be quite [2] Jonathan Rosenberg, "An Application Server
similar to standard SIP APSEs described in this Component Architecture for SIP", Internet-Draft (work
paper. Some differences may arise, if SIP+ turns out in progress), March 2001,
to be much different from SIP. The most http://search.ietf.org/internet-drafts/draft-rosenberg-
challenging problem is to define how different sip-app-components-01.txt
APSEs and service platforms interoperate and how
S-CSCF makes service routing decisions. [3] Adam Roach, "Event Notification in SIP", Internet-
Draft (work in progress), February 2001,
http://search.ietf.org/internet-drafts/draft-roach-sip-
subscribe-notify-03.txt
8 Conclusions
[4] Jonathan Rosenberg, Jon Peterson, Henning
Schulzrinne, Gonzalo Camarillo, "Third Party Call
SIP and related protocols such as HTTP, RTSP and Control in SIP", Internet-Draft (work in progress),
SMTP provide a powerful machinery to implement March 2001, http://search.ietf.org/internet-drafts/draft-
services that integrate different forms of rosenberg-sip-3pcc-02.txt
communication. In SIP services are provided by
specialized Application Servers, which are actually [5] Robert Sparks, " SIP Call Control – Transfer",
proxy or registrar servers with extended Internet-Draft (work in progress), February 2001,
intelligence. http://search.ietf.org/internet-drafts/draft-ietf-sip-
cc-transfer-04.txt
The intelligence can be brought to the servers with

various methods including Call Processing [6] Jonathan Rosenberg et al., " SIP Extensions for
Language (CPL), SIP CGI and SIP Servlets. Even Instant Messaging", Internet-Draft (work in progress),
traditional Intelligent Network can be used for some February 2001, http://search.ietf.org/internet-
purposes. drafts/draft-rosenberg-impp-im-01.txt
[7] Jonathan Lennox, Henning Schulzrinne, " CPL: A

Language for User Control of Internet Telephony
Application Servers utilize standard SIP features
such as Record-Routing and make use of extensions Services", Internet-Draft (work in progress), November
such as Third Party Call Control, SUBSCRIBE and 2000, http://search.ietf.org/internet-drafts/draft-ietf-
NOTIFY, REFER and SIP messaging. Also HTTP iptel-cpl-04.txt
is useful in carrying out simple transactions. All
resources can be addressed by SIP or RTSP or
Subsystem - Stage 2
HTTP or mailto: URLs, which can be carried in
protocol headers or be embedded e.g. in
HTML/XML documents. By making different
APSE components play together, complex services
such as "autoconferencing" presented in Chapter 6
can be accomplished.
3GPP is currently in the process of defining a SIP-

based Service Control architecture for UMTS. The
main challenges in the process are how services can
79
IP TELEPHONY SERVICES IMPLEMENTATION
Eero Vaarnas
eero.vaarnas@iki.fi
extension – as the successor of MGCP. Both can

possibly be used also in dumb IP terminals directly.
ISUP (Integrated Services User Part) and similar
signalling over IP networks can be done quite
Abstract straightforwardly, either by mapping ISUP messages to
SIP, H.225/H.245 or similar, or tunneling them
There is a wide variety of tools – both traditional,
transparently using e.g. BICC (Bearer Independent Call
PSTN-like (Public Switched Telephone Network) and
Control) or SIP-T (SIP for Telephones). Media
web-oriented – for implementing services in IP
transmission is merely a matter of standard codecs and
telephony. There are so many alternatives for service
packetization.
creation that only some of them are described here. The
In service creation there are more decisions to be
scope of this document is in all-IP environment, where
made. In the PSTN many services have been
many of the paradigms come from the World Wide
implemented using Intelligent Networks (IN). IN is
Web (WWW). Some of the techniques are more or less
controlled by the operator and typically users activate
standardized, like Call Processing Language (CPL),
services using DTMF (Dual Tone MultiFrequency)
SIP-CGI (SIP Common Gateway Interface) and SIP
tones. New kind of service creation paradigms come
Servlet API (SIP Servlet Application Program
from the World Wide Web, where users can more
Interface).
freely control the services and user interfaces are more
CPL is a simple scripting language with rapid
intuitive.
implementation cycle but limited capabilities. It is
There are some interfaces that can be used to integrate
independent of the signalling protocol. SIP-CGI is a
IN services to IP telephony environment. With for
more powerful interface for executing arbitrary
example JAIN (Java Advanced Intelligent Networks,
programs in a SIP proxy server. The interface is
Java APIs for Integrated Networks) and/or Parlay,
language independent, but the process handling causes
Intelligent Networks could be utilized from the IP
some overhead. SIP Servlet API is a similar technique
environment. IN connectivity is an important issue, but
to SIP-CGI. It is designed using Java, so it’s platform
it isn’t considered here.
independent. All services run on the same Java Virtual
The emphasis of this document is in services
Machine (JVM), so the overhead of process generation
implemented totally in the IP environment. Most of the
is eliminated. There are also H.323-based services, but
new techniques – especially SIP based – borrow
their major disadvantage is in interoperability
slightly from techniques already used in WWW. The
problems.
idea is that the more open the architecture is, the easier
it is for the third parties and even users themselves to
1 Introduction create new services.
Four service implementation techniques are presented
IP Telephony protocols are in a quite mature state. here: CPL, SIP-CGI, SIP Servlet API and H.323
There are some competing and/or overlapping services. First three of them work conceptually quite
standards, but the overall picture is pretty clear. It similarly. The server has some default mechanism for
seems more and more likely that SIP (Session Initiation handling requests, which is used for normal signalling
Protocol [1]) is going to be the signalling protocol of operation. By some means the server decides, which
All-IP multimedia sessions, including voice. SIP is messages are handled by the default processing and
text-based, HTTP-like (HyperText Transfer Protocol) which are sent to the service interface. Then the service
protocol standardized by IETF. It is simple but easy to interface can perform signalling or other operations
be extended. Of course also H.323 from ITU-T [2] will and/or pass the message back to the default processing.
have its own role because its current installed base, H.323 services introduced later on form an exception.
mainly in corporate use. However, H.323 has its They are more similar to traditional PSTN services.
difficulties, such as scalability and interoperability.
Also PSTN-interoperability can be handled with a
limited number of protocols. In media gateway control, 2 Call Processing Language
there are practically two protocols, MGCP (Media
CPL (Call Processing Language) [3] is an XML-based
Gateway Control Protocol) and Megaco/H.248.
(eXtensible Markup Language) markup language that
Megaco/H.248 can be seen – if not directly as an
can be used to describe telephony services. It describes
80
the logical behavior of the signalling server, in doesn’t exist, the optional not-present tag can be
principle it isn’t tied to any specific protocol. chosen instead. If none of the outputs match (including
Like XML, CPL is based on tags that are not-present), the optional output otherwise is
hierarchically arranged according to the information chosen. There are four types of switches: address-
that they contain. The tags are traversed according to switch, string-switch, time-switch and
the hierarchy and the rules they contain. Eventually the priority-switch.
traversal ends and the action specified by the script is The address-switch makes decisions according
executed. In some cases the action remains unspecified,
to addresses. With the field parameter either
so some default policy is resumed.
origin, destination, or original-
2.1 Structure of CPL destination of the request can be chosen.
Moreover, the optional subfield parameter can be
CPL is specified as an XML DTD (Document Type
used to access the address-type, user, host,
Definition). It is going to have a public identifier in
port, tel, or display (display name) of the
XML (-//IETF//DTD RFCxxxx CPL
selected address. In the address output it can be
1.0//EN) and corresponding MIME (Multipurpose
Internet Mail Extensions) type. Here is only an compared if the address is an exact match,
overview of the structure, the complete DTD can be contains substring of the argument (for display
seen in [3] and XML specification in [4]. only) or is in the subdomain-of the argument (for
After the standard XML headers, CPL script is host, tel only). The address-switch is
enclosed between tags <cpl> and </cpl>. The script essentially independent of the signalling protocol. The
itself consists of nodes and outputs, arranged specific meaning of the entire address depends on the
hierarchically in a nested structure. Nodes and outputs protocol and additional subfield values may be defined
can be thought of as states and transitions, respectively for protocol-specific values.
(for a tree representation, cf. 2.2). The structure is The string-switch allows a CPL script to make
represented by nested start and end tag pairs, so both decisions based on free-form strings present in a
nodes and outputs can be simply referred as tags. Tags request. The field parameter selects either
can have parameters that describe their exact behavior subject, organization, user-agent
. (program or device name that made the request),
At the top level, there can be four kinds of tags: language or display. The string output checks
ancillary, subaction, outgoing and if the selected string is an exact match or contains
incoming. The subaction tag is used to describe a substring of the argument. String switches are
repeated structures to achieve modularity and to avoid dependent on the signalling protocol being used.
redundancy. The implementation is under the The time-switch handles requests according to
subaction tag with the id parameter as an the time and/or date the script is being executed. It uses
identifier. One or more references to the a subset of iCalendar standard [5], which allows CPL
implementation can be made using the sub tag with scripts to be generated automatically from calendar
the desired subaction identifier as the ref parameter. books. It also allows us to re-use the extensive existing
The outgoing and incoming tags are top level work specifying calendar entries such as time intervals
actions, similar to sub-actions in their implementation and repeated events. Parameters tzid (time zone
structure. The ancillary tag contains information identifier) or tzurl (time zone url) select the current
that is not part of any operation, but possibly necessary time zone and the output time match calendar entries
for some CPL extension. such as starting or ending times (dtstart, dtend),
The actual node-output structure of the script is inside days of the week (byday) and frequencies (freq).
the action tags, i.e. subaction, outgoing and Time switches are independent of the underlying
incoming. There are four categories of CPL nodes: signalling protocol.
switches, which represent choices a CPL script can With the priority-switch it is possible to
make; location modifiers, which add or remove consider priorities specified for the requests. Priority
locations from the set of destinations; signalling switches take no parameters. The priority output
operations, which cause signalling events in the can be used to match against less than, greater
underlying protocol; and non-signalling operations, than or equal to the argument. The priorities are
which trigger behavior which does not effect the emergency, urgent, normal, and non-urgent.
underlying protocol. The priority switches are dependent on the underlying
2.1.1 Switches signalling protocol.
Switches represent choices a CPL script can make, 2.1.2 Location modifiers
based on either attributes of the original call request or The set of locations to which a call is to be directed is
items independent of the call. The attributes are not given as node parameters. Instead, it is stored as an
represented by variables, depending on the switch type. implicit global variable throughout the execution of a
Switch has a list of output tags, that are traversed and processing action (and its subactions). Location
the first matching output is selected. If the variable modifiers add, retrieve or filter the set of locations.
81
There are three types of location nodes defined. 2.1.4 Non-signalling operations
Explicit locations add literally-specified locations to With non-signalling operations, it is possible to invoke
the current location set; location lookups obtain operations independently of the telephony signalling. If
locations from some outside source; and location filters supported, mail can be sent, log files can be
remove locations from the set, based on some specified generated, and also other operations can be added as so
criteria. called extensions.
The explicit location node has three node
parameters. The mandatory url parameter's value is 2.2 Tree representation of CPL
the URL of the address to add to the location set. The For illustrative purposes, CPL scripts can be
optional clear parameter specifies whether the represented as trees. Also graphical editors might
location set should be cleared before adding the new utilize the tree representation. Node tags represent
location to it. The optional priority parameter nodes of the tree, output tags are edges between them.
specifies a priority for the location. There are no In Figure 1 is an example CPL script from [3]. It is
outputs, next node follows directly. Explicit location converted into a tree in Figure 2.
nodes are dependent on the underlying signalling
protocol. 1: <?xml version="1.0" ?>
Locations can also be specified up through external 2: <!DOCTYPE cpl
3: PUBLIC "-//IETF//DTD RFCxxxx CPL 1.0//EN"
means, through the use of location lookups. The 4: "cpl.dtd">
lookup node initiates lookups according to the
source parameter. With the optional parameters, one 5: <cpl>
6: <subaction id="voicemail">
can use or ignore caller preferences fields or 7: <location
clear the location set before adding. The outputs are 8: url="sip:jones@voicemail.example.com">
success, notfound, and failure, one of them is 9: <redirect />
10: </location>
selected depending on the result of the lookup. 11: </subaction>
The remove-location is used to filter the
location set. Filtering is done based on the location 12: <incoming>
13: <address-switch field="origin"
parameter and caller preferences param - value 14: subfield="host">
pairs. There are no outputs, next node follows directly. 15: <address subdomain-of="example.com">
The meaning of the parameters is signalling-protocol 16: <location url="sip:jones@example.com">
17: <proxy timeout="10">
dependent. 18: <busy> <sub ref="voicemail" />
2.1.3 Signalling operations 19: </busy>
Signalling operation nodes cause signalling events in 20: <noanswer> <sub ref="voicemail" />
21: </noanswer>
the underlying signalling protocol. Three signalling 22: <failure> <sub ref="voicemail" />
operations are defined: proxy, redirect, and 23: </failure>
reject. 24: </proxy>
25: </location>
The proxy node causes the request to be forwarded 26: </address>
on to the currently specified set of locations. With the 27: <otherwise>
corresponding parameters, a timeout can be set, the 28: <sub ref="voicemail" />
29: </otherwise>
server can be forced to recurse to subsequent 30: </address-switch>
redirection responses, and the ordering of the 31: </incoming>
location set traversal can be set to parallel, 32: </cpl>
sequential, or first-only.
The redirect node causes the server to direct the Figure 1 Example CPL script
calling party to attempt to place its call to the currently
specified set of locations. The redirection can be set Let us have a brief look at the example script (also the
graphical representation can be followed and compared
permanent, otherwise considered temporary.
to the script structure). In lines 6-11 there is an
Redirect immediately terminates execution of the CPL
example of a subaction. It defines a redirection to the
script, so this node has no outputs and no next node.
user’s voicemail. This is accomplished by adding the
The specific behavior the redirect node invokes is
address of the voicemail to the location set (lines 7-8)
dependent on the underlying signalling protocol
and then activating the redirection (line 9). Lines 12-31
involved, though its semantics are generally applicable.
describe how incoming calls are handled. The address
The reject nodes cause the server to reject the
switch in lines 13-30 selects the host part of the callers
request, with a status code and possibly a reason. address. If the caller is from the same domain as the
Similarly to redirect, rejection terminates the owner of the script (line 15), the call is considered
execution, and specific behavior depends on the urgent and it is let through. Again, this is done in two
signalling protocol. stages: first the address is added to the location set (line
16), then the actual proxy behavior is activated (line
17). All the unsuccessful cases are directed to the
82
voicemail (lines 18-23). The voicemail is implemented scripts. It is also possible to generate scripts
as a reference to the previously defined subaction. Also automatically. Generation could be based on simple,
unimportant calls go to the voicemail (lines 27-29). standard text-processing languages. From other types
of XML documents, XSLT (eXensible Style Language
Translation) transformations could apparently be used.
incoming
Because of its tree representation CPL (and XML) can
be expressed and edited also graphically. With GUI
(Graphical User Interface) based editors also people
address-switch not so familiar with the syntax can create and edit
field: origin services. Users could upload their own CPL scripts
subfield: host using SIP registration messages, HTML forms, FTP, or
whatever method seems proper.
subdomain-of:
Things like scalability, stability and security depend
example.com much on the implementation of the CPL server.
However, because of the limited expression power of
the language, these problems are more easily treated.
location Scripts can be exhaustively validated upon their
url: sip:jones@
uploading, so in principle malicious or erroneous code
example.com
can be eliminated. Also the lack of loops and other
more complex programming structures makes CPL
otherwise
scripts potentially more compact.
CPL execution is already implemented at least in a
few SIP proxy servers [6]. There are also plenty of
proxy
timeout: 10
XML editors available and recently even some
specialized CPL editors. Some service creation
environments are based on automatic CPL generation.
busy failure
noanswer
3 Common Gateway Interface for
subaction SIP
id: voicemail
location
SIP-CGI (Common Gateway Interface for SIP) [7] is
url: sip:jones@ an interface for running arbitrary programs from a SIP
voicemail. proxy server or similar software. Since SIP borrows a
example.com lot from HTTP, also the CGI interface is adopted. Of
course, the technical specification is different, but the
basic idea is similar to HTTP-CGI.
When the server decides to invoke a SIP-CGI script, it
executes it as a normal process in the underlying
redirect operating system. It then uses standard input and output
(stdin, stdout) and environment variables to exchange
information with the process. Script status throughout
invocations is maintained with special tokens.
Figure 2 Tree representation of the example script
3.1 Input and metadata
2.3 General feasibility of CPL The header fields (with some exceptions, such as
CPL is a simple but powerful tool for IP telephony potentially sensitive authorization information) of the
service implementation. It is concentrated in basic call received SIP message are passed to the script as
control functions, but it is possible to create extensions metavariables. In practice, metavariables are
– some of them already available – for different kinds represented by the operating system environment
of advanced services. Of course CPL isn’t a variables. Each SIP header field name is converted to
programming language, so constructions like loops upper case, has all occurrences of “–” replaced by “_”,
aren’t possible and all the features must be actually and has SIP_ prepended to form the metavariable
implemented outside the scripts. name. For example Contact header would be
CPL is based on XML, which is a widely accepted represented by SIP_CONTACT metavariable. The
industry standard. This, along with its general values of the header fields are converted to fit the
simplicity, provides a good starting point for its requirements of the environment variables. Similar
utilization. First of all, people already familiar with transformations are applied for other protocols.
XML can easily adopt CPL. Even with minimal There are some additional metavariables that are
knowledge of XML it is possible to start writing CPL passed to the script. Some of them are derived from the
83
header fields or even match the values of the fields. are separated by double line feeds – in the same way
This redundancy is for the script to distinguish between that in a UDP packet in which multiple requests or
information from the original header fields and responses are sent. It is intended that all the actions are
information synthesized by the server. performed, but the server can choose which actions it
The type of the message is seen from metavariables will perform. An example of a SIP-CGI output can be
REQUEST_METHOD and RESPONSE_STATUS. If seen in Figure 3. It is explained in the following
REQUEST_METHOD is defined, the message was a chapters.
request and the method (INVITE, BYE, OPTIONS, 1: SIP/2.0 100 Trying
CANCEL, REGISTER or ACK) is stored in the 2:
metavariable. REQUEST_URI is the intended 3: CGI-PROXY-REQUEST sip:user@host SIP/2.0
4: Contact: sip:server@domain
recipient of the request. REGISTRATIONS contains a 5: CGI-Remove: Subject
list of the current locations the server has registered for 6:
the recipient (REQUEST_URI). 7: CGI-AGAIN yes SIP/2.0
8:
For responses, RESPONSE_STATUS is the numeric 9: CGI-SET-COOKIE abcd1234 SIP/2.0
code of the response and RESPONSE_REASON is the
string describing the status. For example SIP/2.0 Figure 3 Example SIP-CGI output
404 Not Found response contains the protocol
version, status code and reason phrase, respectively. 3.2.1 Action lines
REQUEST_TOKEN and RESPONSE_TOKEN are used If the action line is a normal status line, a normal SIP
to match requests and responses. SCRIPT_COOKIE response is generated according to the status code. CGI
can be used to store state information across header fields (and possibly some others) are discarded
invocations within the same transaction. and missing fields are filled according to the original
REMOTE_ADDR and REMOTE_HOST determine the message, if needed. For example line 1 in Figure 3
IP address and DNS name of the client that sent the would generate a provisional response to the request
message to the server, respectively. REMOTE_IDENT being processed.
can be used to supply identity information with The action line CGI-PROXY-REQUEST causes the
Identification Protocol, but it isn’t too widely used. server to forward a request to the specified SIP URI.
The AUTH_TYPE metavariable determines the Message to be sent depends on the triggering point: if
authorization method, if any. Authentication methods the script is triggered by a request, the triggering
comply to SIP/2.0 specification. Currently the options request is forwarded; if it is triggered by a response, the
are Basic, Digest or PGP. REMOTE_USER initial request of the transaction is sent. The initial
identifies the user to be authenticated. request can only be known by a stateful server. The
CONTENT_LENGTH and CONTENT_TYPE describe request can be supplemented with the header fields
the message body. Content type can be any registered possibly contained in the CGI output. Message body
MIME type, as stated in [1]. Actual message body can can be inserted, substituted or deleted. However,
be read from stdin. message integrity must be maintained. An example use
Some additional information of the server and the of CGI-PROXY-REQUEST can be seen in Figure 3,
outside world is provided in some special lines 3-5. It forwards the request to sip:user@host, adds
metavariables. The SERVER_NAME metavariable is set a Contact header and removes the Subject (cf.
to the name of the server. The SERVER_PROTOCOL 3.2.2 for details).
metavariable is set to the name and revision of the CGI-FORWARD-RESPONSE causes the server to
protocol with which the message arrived, e.g. forward a response on to its appropriate final
SIP/2.0. The SERVER_SOFTWARE metavariable is destination. The same rules apply for accompanying
set to the product name and version of the server SIP headers and message bodies as for CGI-PROXY-
software handling the message. REQUEST. RESPONSE_TOKEN metavariable can be
GATEWAY_INTERFACE is the version of SIP-CGI set.
used, e.g. SIP-CGI/1.1. Servers and CGI CGI-SET-COOKIE sets the SCRIPT_COOKIE
implementations can check their compatibility based metavariable to store information across invocations
on the information provided. SERVER_PORT is the (Figure 3, line 9).
port on which the message was received. CGI-AGAIN determines whether the script will be
invoked for subsequent requests and responses for this
3.2 Output transaction. If it won’t, the default action is performed
The output (stdout) consists of any number of for all later invocations. Default action results also if
messages determining the desired actions of the server. the script doesn’t generate any new messages. Line 7 in
The messages are like arbitrary SIP messages possibly Figure 3 instructs the script to be invoked again.
containing some additional information as special CGI 3.2.2 CGI Header Fields
header fields. The status line can be replaced by CGI CGI header fields pass additional instructions or
actions, thus referred as the action line. The messages information to the server. They resemble syntactically
84
SIP header fields, but their names all begin with CGI-. used depending on the platform, but their usage is
The SIP server strips all CGI header fields from any invisible to the CGI interface. Because of its similarity
message before sending it. to HTTP-CGI, SIP-CGI will be easy to adopt for
To assist in matching responses to proxied requests, experienced web programmers. However, CGI
the script can place a CGI-Request-Token CGI programming is getting a bit “old-fashioned”.
header in a CGI-PROXY-REQUEST or a new request.
This header contains a token, opaque to the server.
When a response to this request arrives, the token is 4 SIP Servlet API
passed back to the script as a meta-header. This allows SIP Servlet API is an interface for Java programs
scripts to fork (send to multiple locations in parallel) a which control the processing of SIP messages.
proxy request, and correlate which response Similarly to SIP-CGI and HTTP-CGI, the basic idea of
corresponds to which branch of the request. SIP Servlet API is from HTTP Servlet API. Currently
The CGI-Remove header allows the script to remove there is no single standard for SIP Servlet API. Here,
SIP headers from the outgoing request or response. we describe the first one of the proposals [8]. The rest
The value of this header is a comma-separated list of of the proposals [9] are either extensions to the first
SIP headers. If the headers exist in the message, they one or competing drafts.
are removed before sending, for example line 5 in The API is based on Java interface definitions.
Figure 3 removes the subject, if it exists. It is illegal to Any server/servlet that implements the appropriate
try to remove a header that is inserted elsewhere in the interfaces can be used together. The server and the
script. servlets communicate through the API and the state of
the servlets is maintained by the JVM (Java Virtual
3.3 General feasibility of SIP-CGI Machine).
SIP-CGI is an interface that provides practically The interface for all SIP servlets to be implemented is
endless possibilities in service creation within the SIP SipServlet. After instantiation (creation of a new
architecture. Since CGI scripts can be whatever object in Java), servlets are initialized and eventually
programs, it is possible to perform any kind of “cleaned” with init and destroy methods,
operations or access external services. This can be respectively. Their main function is to pass
considered as a weakness also: If the programs are configuration information and handle the allocation
extensively complex, they can cause severe and deallocation of needed resources.
overloading of the system. Also access to local file The SipServlet interface has methods for
systems or similar resources can be misused. This is different types of messages: gotRequest for
why care should be taken, when considering third party requests and gotResponse for responses. In its
implementations in CGI. Even though the uploading of
abstract implementation class,
scripts can be done straightforwardly, it is impossible
to verify the functionality of the code. Therefore it is SipServletAdapter, gotRequest divides
not advisable to let third party developers or service requests to their subtypes. Their implementation lies in
users freely create new CGI programs. Of course with methods doInvite, doAck, doOptions, doBye,
proper supervision and access restrictions it is possible doCancel and doRegister. When the server
to expose CGI programmability to a limited number of decides that some servlet is responsible for handling a
people/organizations. message, it calls the appropriate method. The methods
CGI scripts can be written in any programming return boolean values depending on the success. If
language available for the platform in use. There are false is returned, the server should apply its default
many powerful scripting languages such as Perl and processing to the message.
various shell scripts that can be used for simple The work distribution between servlets is based on
specialized tasks. When more complex operations are transactions. When a servlet is registered as a listener
needed, actual programming languages can be used. to a transaction, it receives all messages related to that
There can be portability problems concerning the transaction. Initially, the server is responsible for this
variety of languages: in order to implement the service registration. Servlets can register to further transactions
on a different platform, the compiler or interpreter for and remove registrations via the SipTransaction
the implementation language must be available. Even if interface.
the language is implemented in the new platform, there SipMessage and its sub-interfaces SipRequest
can be some dialect variations that can mess up the and SipResponse represent messages. A new
functionality. request in a SipTransaction can be initiated with
One more disadvantage of SIP-CGI is that every its method createRequest. A response to a
invocation of a script generates a new process. This is SipRequest can be created with its method
quite resource consuming in most of the operating createResponse. The method send is used to
systems. Thus, large number of simultaneous service send messages. A Request needs a next hop address,
users can cause overloading. whereas responses are routed according to their Via
There are some proxy/application servers with SIP-
CGI support available [6]. Programming tools can be
85
fields. Servlets can have different authorizations to (line 16) is used. The servlet returns true, which means
generate messages. that no default message processing is needed.
Servlets can inspect and modify the messages with
1: import org.ietf.sip.*;
certain restrictions. The body of the message can be 2:
accessed through the methods getContent and 3: public class RejectServlet
setContent. Header fields can be inspected with 4: extends SipServletAdapter {
5: protected int statusCode;
methods getHeaderNames, getHeaders and 6: protected String reasonPhrase;
getHeader. Method setHeader is used to modify 7:
the headers, excluding so-called system headers that 8: public void init(ServletConfig config) {
9: super.init(config);
are managed by the SIP stack. 10: try {
Similarly to SIP-CGI, requests and responses can be 11: statusCode = Integer.parseInt(
tied together with tokens. Sending a request returns a 12: getInitParameter("status-code"));
request token that can be used by servlets to match 13: reasonPhrase =
14: getInitParameter("reason-phrase");
against similar tokens contained in responses. This can 15: } catch (Exception _) {
be used, for example, in forking requests to different 16: statusCode = SC_INTERNAL_SERVER_ERROR;
destinations in parallel. 17: }
Current registrations of the users can be accessed 18: }
19:
through the interface ContactDatabase. Servlets 20: public boolean doInvite(SipRequest req) {
can inspect (getContacts), substitute 21: SipResponse res = req.createResponse();
(setContacts), add (addContact) or remove 22: res.setStatus(statusCode, reasonPhrase);
23: res.send();
(removeContact) registrations. Despite of its name, 24: return true;
ContactDatabase doesn’t have to be a database: 25: }
26: }
its internal implementation is hidden and it provides
only generic contact information.
SipURL represents SIP URL’s in the destination of Figure 4 Example SIP Servlet
the messages, user addresses etc. With additional
information such as display name, URL’s can be stored
4.1 General feasibility of SIP Servlet API
id SipAddress interface. SipAddress represents In its expression power, SIP Servlet API is quite
the values of From and To headers. Contact is an similar to SIP-CGI. As independent programs, servlets
extended version of SipAddress, including can carry out any kind of tasks needed for the service.
expiration information and similar information. However, there are some key differences in these two
techniques. Mainly they are the same that those
Contact represents values of Contact header and
between HTTP-CGI and HTTP Servlet API.
individual entries in the ContactDatabase.
The Java Virtual Machine is running as long as the
Besides the message manipulations and database servlet engine is up. This saves resources, since it is not
access, the server can set other restrictions for sensitive necessary to generate a new process for every servlet
operations such as file system or network access. For invocation. Once the servlet is instantiated, its methods
untrusted code, so-called servlet sandbox or similar can be called over and over again. Also the state
models can be used. The idea of the sandbox model is information is conserved in the servlets themselves, no
to restrict the set of operations that can be performed. If external mechanism is needed for distributing it.
feasible, even the bytecode of newly installed servlets The tight connection to the server has also other
can be analyzed to ensure that they don’t contain buggy advantages. As stated above, messages and even the
or malicious code such as endless loops. database are represented through the API. This makes
Figure 4 is an example SIP Servlet from [8]. To access to them “handier”. It is more convenient and
understand it completely the reader should be familiar safer to handle headers, database fields etc. when they
with Java API specification [10], but the following are readily parsed by the server. It is also easier to
brief explanation can be understood cursorily even control the access when it is done explicitly through the
without prior knowledge about Java. The servlet interface. In addition, different kinds of sandbox-like
implements an unconditional call reject. As a service it environments can be used.
isn’t interesting, but it serves as an example about SIP Servlet API (like practically anything written in
servlet programming. Java) is platform independent. Unfortunately it is tied
The example servlet extends to Java language, so obviously some flexibility is lost.
SipServletAdapter (line 4), which means that by Some operations are more suitable to be performed
default it doesn’t react to any messages. Only INVITE with a scripting language like Perl, than with a general-
requests are processed (lines 20-25). They are purpose language like Java. If it is necessary or more
responded with a generic response (lines 21-23), with efficient to use scripting languages, some of them can
status code and reason phrase (line 22) stored in the be run natively in Java. There are packages for Perl,
servlet instance (lines 5-6). Customized codes and regular expressions and many other tools. External
reasons can have been determined (lines 11-14) during scripts can also be invoked as system processes from
the initialization (lines 8-18), otherwise the default one Java (even CGI can be run from a servlet), but that
86
should be avoided because it effectively destroys the 5.3 General feasibility of H.323 services
original idea of tight integration.
It can be seen that H.323 is largely based on PSTN-like
There are some proxy/application servers with SIP
models. The most significant service implementation
Servlet API support available [6]. Java itself is widely
proposals are based on PBX and possibly IN
adopted, with many development environments to
technologies.
choose from. Because of their similarity to HTTP
It is worth thinking over, whether conventional
Servlets, SIP Servlets will be easy to adopt for
models should be used in IP telephony service
experienced web programmers.
implementation. It is clear that for example IN based
services must be accessible from IP environment, but it
is a completely different issue to reproduce the
5 H.323 services implementation mechanisms. There are already
standards like JAIN for integrating IP telephony
5.1 H.450-based services systems to IN. Services that are purely developed for
Originally H.323 intended to handle only basic call the new environment should provide some real added
control signalling [11]. The first solution to enable value utilizing the new possibilities.
advanced services in on top of H.323 was ITU-T Many vendors and carriers have already made
specification series H.450. Its idea was to specify significant investments in H.323. Equipment and
individual supplementary services similar to current software have been at commercial stage for quite a
PSTN services. period. However, at the services side the progress has
The protocol for all H.450-based services is defined in been a lot slower. Apart from H.450 services and the
H.450.1. It is derived from QSIQ protocol used proprietary implementations, there hasn’t been very
between private branch exchanges (PBX), so it can be much service implementation capabilities.
seen as a protocol for IP PBX services. One large
difference to PSTN model is that most of the service
logic is in terminal equipment (TE). Since the services 6 Example service architecture
are visible in the protocol and the TE’s execute the The interfaces presented in chapters 2-4 are typically
services, it is necessary to both endpoints to understand implemented within a SIP proxy server. Also other SIP
the logic of the service to be used. This is a major signalling server types can host services and the system
disadvantage, because services will work completely can also be referred as an application server. More
correctly only if all the TE’s have the same release of precise description about the overall architecture can be
H.323. found in [12]. Figure 5 depicts an example of the
The actual services are defined in H.450.2 and up. C internal architecture of the application server.
Current version (H.323 v. 4) includes H.450.2 to
H.450.12: H.450.2 for call transfer, H.450.3 for call
diversion (forwarding, deflection) H.450.4 for call CPL Scripts
hold, H.450.5 for call park and pickup, H.450.6 for
message waiting indication, H.450.7 for call waiting, CPL
H.450.8 for name identification, H.450.9 for call
completion, H.450.10 for call offer, H.450.11 for call CPL Servlet SIP Servlets SIP-CGI scripts
intrusion, and H.450.12 for additional common
SIP Servlet API SIP-CGI
information network services.
SIP proxy/application server core
5.2 Non-H.450-based services
H.450-based services are a bit cumbersome to deploy.
All the services are specified by ITU-T and often all Figure 5 Example service architecture
the TE’s must support the same version of H.323.
Another solution is to separate the service logic from In the example architecture, both servlets and CGI
the TE’s and implement the services in the gatekeeper. scripts communicate directly with the signalling server
Particularly routing related services could be offered by through respective interfaces. CPL scripts are handled
the gatekeeper. by a servlet specialized in that task. CPL support could
So far gatekeeper services have been proprietary be also implemented directly in the signalling server or
implementations. There’s been some discussion, through CGI scripts. In general, this is only a reference
whether IN should be integrated with gatekeepers. Also architecture, application servers or similar components
other alternatives – maybe similar to CGI or Servlets – can be realized in various ways.
could be developed. Since CPL is independent of the
signalling protocol, also CPL servers could be
implemented in an H.323 environment. 7 Conclusions
What comes to signalling and media transmission, IP
telephony isn’t going to change much. In the long term,
87
of course operation costs will reduce, because it won’t [7] Lennox, Jonathan et al: Common Gateway
be necessary to maintain two separate networks. Issues Interface for SIP, IETF, January 2001
like signalling delays and voice quality are going to http://www.ietf.org/rfc/rfc3050.txt
stay pretty much the same (if they will degrade, users [8] Kristensen, Anders; Byttner, Anders: The SIP
will complain). Of course more advanced codecs and Servlet API, IETF, September 1999,
other improvements are being developed but generally http://www.cs.columbia.edu/~hgs/sip/drafts/draft-
there isn’t much to do. kristensen-sip-servlet-00.txt
The part that is going to change most radically is the [9] Schulzrinne, Henning: SIP Drafts: APIs and
services. The existing services in the PSTN and the Programming Environments, Columbia
WWW can be combined. Some examples of the University, ongoing work,
combination are click-to-dial, Unified Messaging (UM) http://www.cs.columbia.edu/~hgs/sip/drafts_api.ht
and different kinds of information services. Also ml
completely new kind of services will emerge. The tools [10] Java 2 Platform, Standard Edition, v 1.3 API
used to implement these services are going to be Specification, Sun Microsystems, 1993-2000,
numerous, which can be seen already from the variety http://java.sun.com/j2se/1.3/docs/api/index.html
of service implementation techniques used in WWW. [11] Liu, Hong; Mouchtaris, Petros: Voice over IP
Some of them have already been adopted in IP Signalling: H.323 and Beyond, IEEE
telephony. CGI and servlets are being standardized for Communications Magazine, October 2000
SIP, and components like Java Beans are widely used [12] Isomäki, Markus: SIP Service Architecture,
in service creation environments. Just wait for the IP Helsinki University of Technology, May 2001
telephony equivalents of ASP, JSP, JavaScript,
VBScript, VRML, FutureSplash, Shockwave and
others to appear.
Like now everyone can run a web server, in the future
communications services could be distributed among
individuals. There is a project similar to Apache
starting to implement an open source SIP proxy server
with CGI and servlets. It could be downloaded and
installed by anyone, and services could be developed as
in a kind of “home-made telephone exchange”. Of
course carrier grade communications services will have
their own role regardless of the new, more open
solutions. How exactly the transition is going to
happen, is still to be seen.
References
[1] Schulzrinne, Henning et al: SIP: Session Initiation
Protocol, IEFT, March 1999 - April 2001,
http://www.ietf.org/rfc/rfc2543.txt,
http://search.ietf.org/internet-drafts/draft-ietf-sip-
rfc2543bis-02.txt
[2] ITU-T Recommendation H.323, Packet-Based
Multimedia Communications Systems, since 1996
[3] Lennox, Jonathan; Schulzrinne, Henning: CPL: A
Language for User Control of Internet Telephony
Services, IETF, November 14 2000,
http://search.ietf.org/internet-drafts/draft-ietf-iptel-
cpl-04.txt
[4] Bray, T. et al: Extensible markup language (XML)
1.0 (second edition), W3C, October 2000
[5] Dawson, F; Stenerson, D.: Internet Calendaring
and Scheduling Core Object Specification
(iCalendar), IETF, November 1998,
[6] Schulzrinne, Henning: SIP Implementations,
Columbia University, ongoing work,
http://www.cs.columbia.edu/~hgs/sip/implementati
ons.html
88
MASTER SLAVE PROTOCOL
Sunesh Kumra
Nokia Networks
Takimo 1, Pitajanmaki
Helsinki
Sunesh.Kumra@nokia.com
Media Gateways are low intelligence distributed

devices, which terminates lines/trunks and provide
Abstract translation of POTS voice/fax signals for IP transport.
Media Gateway Controller provides centralized
This article explains how the MGCP and MEGACO
intelligence for
protocols work. A brief introduction about how MGCP
a) total control over Media Gateways
was born is given and then the various messages in the
b) Call admission and billing
MGCP protocols are explained. Couple of scenarios
c) Signaling interface to PSTN
are then presented where we see how the protocol
d) Translation for other protocols. E.g. SIP and
actually works. This is followed by brief look at the
H.323.
other variant of this Master Slave protocol called
Megaco. Conclusion of the paper is then presented.
Appendix A contains the glossary of terms used in this 2 MGCP
article, while Appendix B contains the notations used
to explain MGCP Messages. Appendix C contains MGCP is designed to interface a media gateway
some interesting comments made at the VON controller and media gateway. MGCP is a text-based
conference. protocol and supports centralized call model. MGCP is
a master slave protocol. MGCP assumes a call control
architecture where the call control "intelligence" is
outside the gateways and handled by external call
1 Introduction control elements.
In its principle MGCP is very close to the proprietary
In 1998 some R&D departments started to realize that
protocols of the switch manufacturers that convey
H.323v1 was not satisfying some very important
information back and forth between call control points
requirements from the carriers. Lack of mature
and service switching points. This principle in the
products, lack of some features in H.323v1, lack of
context of MGCP clearly places the intelligence on the
marketing efforts in favor of H.323v2 and time to
physically separate entity, the media gateway controller
market issues pushed the incumbent vendors to react
and not on the hardware endpoint, the media converter.
against H.323 and propose alternative protocols to
But unlike the switch architecture as specified in IN
address the needs of large-scale phone-to-phone
documents where the call control remains close to the
deployments. In mid 1998, the important RFI (Request
actual hardware endpoints, in the MGCP architecture
for Information) and RFP (Request for Proposal) for
the call control functionality is no longer attached to
building large VoIP networks were sent to vendors.
the media part.
The first proposal came from Bellcore (now Telcordia)
The MGCP assumes that these call control elements, or
and Cisco by the name of SGCP (Simple Gateway
Call Agents, will synchronize with each other to send
control protocol). The second proposal came from
coherent commands to the gateways under their
ITU-T SG16, ETSI TIPHON and IETF by the name of
control. MGCP does not define a mechanism for
IPDC (Internet Protocol Device Control).
synchronizing Call Agents. MGCP is, in essence, a
It was not long before the forces behind these two
master/slave protocol, where the gateways are expected
protocols realized that by unifying their efforts they
to execute commands sent by the Call Agents.
could get bigger consensus and foster the adoption of
MGCP allows combination of commands to be sent in
their position. Bellcore and Level3 played a key role in
one PDU, this combination reduces the number of
merging these protocols into one, the MGCP (Media
messages necessary to establish a call. However,
Gateway Control Protocol).
MGCP still requires at least 11 round trips to establish
Some time later another protocol by the name of
a phone to phone call.
MEGACO was introduced. Megaco is now a
MGCP has seamless PSTN Integration. Many existing
coordinated standard between IETF (MEGACO) and
Internet Telephony solutions require two stage dialing
the ITU (H.248).
where a gateway number must be dialed prior to
In both MGCP and MEGACO/H.248 the main two
dialing the actual destination number. This is
components are Media Gateway and Media Gateway
cumbersome for the end-user. However, if gateways
Controller.
are made dumb then they will be inexpensive enough
89
for the end-users to buy and place in their home. This • Circuit switches, or packet switches, which can
avoids the need for two-stage dialing since the users offer a control interface to an external call control
telephone will already be connected to the gateway! element.
MGCP assumes a connection model where the basic Note: The examples of gateways give above are just
constructs are endpoints and connections. Endpoints functional classification of gateway. It is possible that
are sources or sinks of data and could be physical or two or more gateways explained above are present in
virtual. Example of physical endpoints is an interface the same physical gateway.
on a gateway that terminates a trunk connected to a
PSTN switch. Example of a virtual endpoint is an 2.2 Calls and Connections
audio source in an audio- content server. Connections are created on the call agent on each
Connections may be either point to point or multipoint. endpoint that will be involved in the "call." Each
A point to point connection is an association between connection will be designated locally by a connection
two endpoints with the purpose of transmitting data identifier, and will be characterized by connection
between these endpoints. Once this association is attributes.
established for both endpoints, data transfer between When the two endpoints are located on gateways that
these endpoints can take place. A multipoint are managed by the same call agent, the creation is
connection is established by connecting the endpoint to done via the three following steps:
a multipoint session. 1. The call agent asks the first gateway to "create a
Connections can be established over several types of connection" on the first endpoint. Denoted by Step
bearer networks: 1 in Figure 1. The gateway allocates resources to
• Transmission of audio packets using RTP and that connection, and respond to the command by
UDP over a TCP/IP network. providing a "session description." (step 2) The
• Transmission of audio packets using AAL2, or session description contains the information
another adaptation layer, over an ATM networks. necessary for a third party to send packets towards
• Transmission of packets over an internal the newly created connection, such as for example
connection, for example the TDM backplane or the IP address, UDP port, and packetization
interconnection bus of a gateway. parameters.
2. The call agent then asks the second gateway to
2.1 Telephony Gateway "create a connection" on the second endpoint.
A telephony gateway is a network element that (Step 3) The command carries the "session
provides conversion between the audio signals carried description" provided by the first gateway. The
on telephone circuits and data packets carried over the gateway allocates resources to that connection, and
Internet or over other packet networks. Examples of respond to the command by providing its own
gateways are: "session description."( Step 4).
• Trunking gateways, that interface between the 3. The call agent uses a "modify connection"
telephone network and a Voice over IP network. command to provide this second "session
Such gateways typically manage a large number of description" to the first endpoint.(Step 5) Once
digital circuits this is done, communication can proceed in both
• Voice over ATM gateways, which operate much directions.
the same way as voice over IP trunking gateways, .
except that they interface to an ATM network. Media
• Residential gateways, that provide a traditional Gateway
analog (RJ11) interface to a Voice over IP Controller
4
network. Examples of residential gateways include
cable modem/cable set-top boxes, xDSL devices, 2
1
broad-band wireless devices
3
• Access gateways, that provide a traditional analog 5
(RJ11) or digital PBX interface to a Voice
over IP network. Examples of access gateways
include small-scale voice over IP gateways. Media Media
• Business gateways, that provide a traditional Gateway 1 MEDIA Gateway 2
digital PBX interface or an integrated "soft
PBX" interface to a Voice over IP network.
• Network Access Servers that can attach a
"modem" to a telephone circuit and provide
data access to the Internet. It is expected, in
the future, the same gateways will combine Endpoint 1 Endpoint 2
Voice over IP services and Network Access Figure 1: Call Setup
services.
90
When the two endpoints are located on gateways that 3 DeleteConnection CA -> GW
are managed by the different call agents, these two call 4 NotificationRequest CA --> GW
agents shall exchange information through a call-agent 5 Notify CA <-- GW
to call-agent signaling protocol, in order to synchronize 6 AuditEndpoint CA --> GW
the creation of the connection on the two endpoints. 7 AuditConnection CA --> GW
Once established, the connection parameters can be 8 RestartInProgress CA <-- GW
modified at any time by a "modify connection" 9 Endpoint Configuration CA --> GW
Command. The call agent may for example instruct the
gateway to change the compression algorithm used on We shall now look into the individual MGCP
a connection, or to modify the IP address and UDP port Commands. Every command is represented by a few
to which data should be sent, if a connection is parameters, details on what those parameters can be
"redirected." found in Appendix B. For more information on how
command is represented, check the RFC 2705.
The call agent removes a connection by sending to the
gateway a "delete connection" command. The gateway
2.5.1 Endpoint Configuration
may also, under some circumstances, inform a gateway The EndpointConfiguration commands are used to
that a connection could not be sustained specify the encoding of the signals that will be received
by the endpoint. For example, in certain international
2.3 Usage of SDP telephony configurations, some calls will carry mu-law
encoded audio signals, while other will use A-law. The
The Call Agent uses the MGCP to provision the Call Agent will use the EndpointConfiguration
gateways with the description of connection parameters command to pass this information to the gateway.
such as IP addresses, UDP port and RTP profiles.
These descriptions will follow the conventions Command is represented by:
delineated in the Session Description Protocol which is ReturnCode
now an IETF proposed standard, documented in RFC EndpointConfiguration( EndpointId,
2327. BearerInformation)
SDP allows for description of multimedia conferences.
This version limits SDP usage to the setting of audio
2.5.2 Notification Request
The Notification Request commands are used to
circuits and data access circuits. The initial session
request the gateway to send notifications upon the
descriptions contain the description of exactly one
occurrence of specified events in an endpoint. For
media, of type "audio" for audio connections, "nas" for
example, a notification may be requested for when a
data access.
gateway detects that an endpoint is receiving tones
2.4 High Availability and Load Balancing associated with fax communication.
One of the nice features of this command is the
in MGCP association of actions with each of the events. Using
Call Agents are identified by their domain name, not this facility, the communication and processing of
their network addresses, and several addresses can be information between the two entities can be optimized.
associated with a domain name. In a typical To each event is associated an action, which can be:
configuration, the MG sends Notifications to the CA. • Notify the event immediately, together with the
After trying to contact the CA for some configurable accumulated list of observed events,
number of times and not getting any response back, it • Accumulate the event in an event buffer, but don't
starts contacting the other (back-up) MGC within the notify yet.
same domain name. • Accumulate according to Digit Map.
If a CA is overloaded, it can inform the MG about the
same, by changing the Notified Entity with the MG to a Command is represented by:
new CA. Therefore, when the MG has to deliver the ReturnCode
next Notification, it does so to the new CA. NotificationRequest( EndpointId,
[NotifiedEntity,]
2.5 MGCP Commands [RequestedEvents,]
The table below lists the various MGCP Commands. RequestIdentifier,
CA denotes the Call Agent and GW denotes the [DigitMap,]
Gateway. [SignalRequests,]
CA --> GW would mean that the command is send [QuarantineHandling,]
from CA to GW. [DetectEvents,]
[encapsulated EndpointConfiguration])
Table 3: MGCP Commands
2.5.3 Create Connection
Sr no. Commands Command flow This command is used to create a connection between
1 CreateConnection CA --> GW two endpoints. In addition to the necessary parameters
2 ModifyConnection CA --> GW that enable a media gateway to create a connection, the
91
localConnectionOptions parameter provides features
for quality of service, security, and network related In some circumstances, a gateway may have to clear a
QOS. connection, for example because it has lost the resource
associated with the connection, or because it has
Command is represented by: detected that the endpoint no longer is capable or
ReturnCode, willing to send or receive voice. The gateway
ConnectionId, terminates the connection by using a variant of the
[SpecificEndPointId,] DeleteConnection command.
[LocalConnectionDescriptor,]
[SecondEndPointId,] 2.5.6 Audit Endpoint
[SecondConnectionId] The Audit EndPoint command can be used by the Call
CreateConnection(CallId, Agent to find out the status of a given endpoint. This
EndpointId, feature has been inherited from the switch
[NotifiedEntity,] environment.
[LocalConnectionOptions,]
Mode, Command is represented by:
[{RemoteConnectionDescriptor | ReturnCode,
SecondEndpointId}, ] EndPointIdList|{
[Encapsulated NotificationRequest,] [RequestedEvents,]
[Encapsulated [DigitMap,]
EndpointConfiguration]) [SignalRequests,]
2.5.4 Modify Connection [RequestIdentifier,]
This command is used to modify the characteristics of [NotifiedEntity,]
a gateway's "view" of a connection. This "view" of the [ConnectionIdentifiers,]
call includes both the local connection descriptors as [DetectEvents,]
well as the remote connection descriptor. [ObservedEvents,]
[EventStates,]
Command is represented by: [BearerInformation,]
ReturnCode, [RestartReason,]
[LocalConnectionDescriptor] [RestartDelay,]
ModifyConnection(CallId, [ReasonCode,]
EndpointId, [Capabilities]}
ConnectionId, AuditEndPoint(EndpointId,
[NotifiedEntity,] [RequestedInfo])
[LocalConnectionOptions,] 2.5.7 Audit Connection
[Mode,] The Audit Connection command can be used by the
Call Agent to retrieve the parameters attached to a
[RemoteConnectionDescriptor,] connection.
[Encapsulated NotificationRequest,]
[Encapsulated Command is represented by:
EndpointConfiguration]) ReturnCode,
2.5.5 Delete Connection [CallId,]
This command is used to terminate a connection. As a [NotifiedEntity,]
side effect, it collects statistics on the execution of the [LocalConnectionOptions,]
connection. If there are more than one gateway [Mode,]
involved, the call agent will send the Delete [RemoteConnectionDescriptor,]
Connection command to each of the media gateways. It [LocalConnectionDescriptor,]
is also possible for the Call Agent to delete multiple [ConnectionParameters]
connections at the same time, using for example wild AuditConnection(EndpointId,
card options. ConnectionId,
RequestedInfo)
Command is represented by:
ReturnCode, 2.5.8 Restart in Progress
Connection-parameters The RestartInProgress command is used by the
DeleteConnection(CallId, gateway to signal that An endpoint, or a group of
EndpointId, endpoint, is taken in or out of service.
ConnectionId,
[Encapsulated NotificationRequest,] Command is represented by:
[Encapsulated ReturnCode,
EndpointConfiguration]) [NotifiedEntity]
92
RestartInProgress ( EndPointId, 8 CA finds the IP address that serves the dialed
RestartMethod, number for EP2 from the database.
[RestartDelay,] 9 After CA knows the IP address of RGW B, it
[Reason-code]) sends Create Connection command to it.
10 RGW B responds sending back its SDP.
11 CA now sends the SDP from RGW B to RGW A
in the Modify Connection command. AT this point
3 Protocol At Work two legs of the call are established in half duplex
mode.
We shall now see how MGCP works with the help two
12 CA instructs RGW B to start ringing by sending
examples.
Notification Request.
3.1 MGCP in all IP Network 13 CA notifies EP1 that EP2 is ringing
14 EP2 answers the call and the RGW B sends the
Let us now see how MGCP works in the case of all IP CA Notification that EP2 is answering the call.
network. In the figure RGW = Residential Gateway, 15 CA sends a Notification Request to RGW A to
CA = Call Agent and EP = Endpoint. stop ringing
For the sake of discussion, it is assumed that the two 16 CA sends Modify Connection to RGW A to
EPs, which want to talk with each other, are under the change the communication mode from half duplex
control of the same CA. to full duplex.
In the figure below the solid lines denote the signalling 17 The EP1 and EP2 are now talking!
path and the dashed line denotes the media flow. The
RGW, CA and database are all part of the IP Network.
3.2 MGCP in PSTN - IP Network
This is the case where the User A is in the PSTN
network and he wants to call to an IP phone.
As before the solid lines denote the signaling path and
the dotted lines denote the media path.
CA Database Point to be noted is that SG and TGW are on the edge
of the IP cloud. They interface with both the IP world
and SS7 and PSTN world respectively.
RGW A RGW B
CA Database
EP1 EP2
SG
STP
RGW
Figure 2: MGCP in all IP Network TGW B
1 CA directs the RGW A to look for an off-hook A
event and report it. Sends a Notification request to SSP
RGW A.
EP2
2 RGW A goes off-hook and the same is detected by
the RGW A and Notification is sent to the CA. EP1
3 CA looks for the service associated with the off-
hook event and asks the RGW A to collect the
digits and play dial tone to EP1 Figure 3: MGCP in PSTN-IP Network
4 RGW A accumulates the digits and send
Notification to CA.
1 EP1, which is in the PSTN world, dials the number
5 CA send a Notification Request to RGW A to stop
of EP2.
collecting digits and look for an on-hook event.
2 This number reaches SSP through EP1s local
6 CA seizes the incoming circuit (asks the RGW to
exchange.
create a call context) and then send the Create
3 SSP issues IAM (IAM is the ISUP Initial Address
Connection command to RGW A.
Message) to the CA, which is in the IP world. This
7 RGW A sends back the SDP (Session Description
IAM reaches SG via STP. SG is connected to IP
Parameter) to the CA.
world on one side and the SS7 world on the other.
93
SG converts the ISUP on SS7 to ISUP on IP and
sends the message to CA. The protocol provides commands for manipulating the
4 CA finds the IP address that serves the dialed logical entities of the protocol connection model,
number for EP2 from the database. Contexts and Terminations.
5 CA now sends the Create Connection command to Commands provide control at the finest level of
the TGW to connect to the incoming trunk using granularity supported by the protocol. For example,
CIC. TGW returns the SDP of the connection. Commands exist to add Terminations to a Context,
6 CA seizes the incoming trunk (asks the RGW to modify Terminations, subtract Terminations from a
create call context) and reserves the outgoing trunk Context, and audit properties of Contexts or
by sending the Create Connection to the RGW Terminations. Commands provide for complete
passing the SDP of TGW. control of the properties of Contexts and Terminations.
7 CA now sends Modify Connection to the TGW. This includes specifying which events a Termination is
8 CA requests the RGW to ring the called line by to report, which signals/actions are to be applied to a
sending Notification Request to the RGW. Termination and specifying the topology of a Context
9 When the CA receives the Ack from the RGW, it (who hears/sees whom). Most commands are for the
issues ACM to the SG. specific use of the Media Gateway Controller as
10 The SG forwards the ACM (ACM is the ISUP command initiator in controlling Media Gateways as
Address Complete Message) to the SSP. command responders. The exceptions are the Notify
11 EP2 goes off-hook, the RGW notifies the CA by and ServiceChange commands.
sending the Notification Request.
12 Now the voice channel has to be turned into the
Media Gateway
full duplex mode, the CA does this be sending the
Modify Connection command to the TGW.
13 CA then sends the answer message to the SG, the
STP forwards this message to the SSP. Context
14 The EP1 and EP2 are now talking!
Termination
SCN Bearer
Termination Channel
RTP Stream
4 MEGACO
MEGACO is used between elements of a physically Termination
decomposed multimedia gateway, i.e. a Media SCN Bearer
Gateway and a Media Gateway Controller. Megaco Channel
meets the requirements for a MGCP as defined in RFC
2705.
4.1 Connection Model

The connection model for the protocol describes the Context
logical entities,or objects, within the Media Gateway Termination
that can be controlled by the Media Gateway SCN Bearer Termination
Controller. Channel SCN Bearer
The main abstractions used in the connection model are Channel
Terminations and Contexts.
A Termination sources and/or sinks one or more
streams. In a multimedia conference, a Termination
can be multimedia and sources or sinks multiple media
streams. The media stream parameters, as well as
modem, and bearer parameters are encapsulated within Context
the Termination.
A Context is an association between a collection of
Terminations. There is a special type of Context, the Termination Termination
null Context, which contains all Terminations that are SCN Bearer SCN Bearer
not associated to any other Termination.For instance, in Channel Channel
a decomposed access gateway, all idle lines are
represented by Terminations in the null Context.
The following figure is a graphical depiction of these
concepts.
The empty dashed box in each context represents the
logical association of Terminations implied by the Figure 4: Connection Model
Context.
94
4.2 MEGACO Commands Internet TCP or UDP UDP
Protocol
Following are the various Megaco Commands. Evolution Formal extension Less structured
• Add. The Add command adds a termination to a process defined process,
context. The Add command on the first within the IETF managed by
Termination in a Context is used to create a and the ITU industry
Context consortia
• Modify. The Modify command modifies the
properties, events and signals of a termination.
• Subtract. The Subtract command disconnects a
Termination from its Context and returns statistics 6 Conclusion
on the Termination's participation in the Context. With Megaco you can do everything that you could
The Subtract command on the last Termination in have done with MGCP and more. Megaco would be
a Context deletes the Context. primarily used for the Media Gateway Control in the
• Move. The Move command atomically moves a future. MGCP is being tested in many networks today
Termination to another context. and should soon be operational commercially, but the
• AuditValue. The AuditValue command returns the popularity is Megaco is fast rising. Since MGCP would
current state of properties, events, signals and be soon deployed, so it is likely to stay for some time.
statistics of Terminations. However the networks that will appear maybe a year
• AuditCapabilities. The AuditCapabilities from now will likely use Megaco for Media Gateway
command returns all the possible values for Control. So I see that MGCP and Megaco will co-exist
Termination properties, events and signals allowed for some years, before we mainly have Megaco for
by the Media Gateway. Media Gateway Control.
• Notify. The Notify command allows the Media
Gateway to inform the Media Gateway Controller
of the occurrence of events in the Media Gateway.
• ServiceChange. The ServiceChange Command References
allows the Media Gateway to notify the Media [1] Request for Comments: 2705: Media Gateway
Gateway Controller that a Termination or group of Control Protocol
Terminations is about to be taken out of service or
has just been returned to service. ServiceChange [2] JAIN MGCP API, version 0.9.
is also used by the MG to announce its availability
to an MGC (registration), and to notify the MGC [3] IP Telephony Packet-based multimedia
of impending or completed restart of the MG. The communications systems
MGC may announce a handover to the MG by
sending it a ServiceChange command. The MGC [4] www.pulver.com
may also use ServiceChange to instruct the MG to
take a Termination or group of Terminations in
or out of service.
5 Comparison between MGCP and

MEGACO
Now that we have had a brief look at the two protocols,
let us make a comparison between them.
Table 2: Comparison
Features MEGACO MGCP
Server Media Gateway Call Agent
Controller
Call Terminations Endpoints with
Representative within a call Connections
context
Call Types Any combination Point to point
of multimedia and multipoint
and conferencing audio.
Coding Text or binary Text
95
• DigitMap: DigitMap allows the Call Agent to
provision the gateways with a digit map according
to which digits will be accumulated. If this
Appendix A parameter is absent, the previously defined value is
Glossary retained.
• SignalRequests: SignalRequests is a parameter that
Terms Meaning
contains the set of signals that the gateway is
STP Signaling Transfer Points asked to apply to the endpoint, such as, for
SP Signaling Point example ringing, or continuity tones. Signals are
ISUP ISDN User Part identified by their name, which is an event name,
SSP Service Switching Points and may be qualified by parameters.
SCP Service Control Points
• QuarantineHandling: The QuarantineHandling
TGW Trunk Gateway
parameter specifies the handling of "quarantine"
RGW Residential Gateway
events, i.e. events that have been detected by the
EP Endpoint
gateway before the arrival of the
MGCP Media Gateway Control Protocol
NotificationRequest command, but have not yet
CA Call Agent
been notified to the Call Agent.
MG Media Gateway
• DetectEvents: DetectEvents specifies a list of
SG Signaling Gateway
events that the gateway is requested to detect
JAIN Java APIs for Integrated Networks
during the quarantine period.
SGCP Simple Gateway control protocol
RFI Request for Information • ConnectionId: ConnectionId uniquely identifies
RFP Request for Proposal the connection within one endpoint.
SS7 Signaling System No. 7 • SpecificEndpointId: SpecificEndPointId parameter
PSTN Public Switched Telephone Network identifies the responding endpoint when returned
IN Intelligent Network from a CreateConnection command.
UDP User Datagram Protocol • LocalConnectionDescriptor:
ITU International Telecommunication Union LocalConnectionDescriptor is a session
IETF Internet Engineering Task Force description that contains information about
IP Internet Protocol addresses and RTP ports, as definedin SDP.
IAM Initial Address Message • SecondEndpointId: When a SecondEndpointId is
CIC Circuit identification code returned from a CreateConnection command, the
ACM Address Complete Message command really creates two connections that can
VON Voice on Net be manipulated separately through
ModifyConnection and DeleteConnection
commands.
• SecondConnectionId: When this is returned from a
Appendix B CreateConnection, it identifies the second
connection.
• ReturnCode: ReturnCode is a parameter returned • LocalConnectionsOptions:
by the gateway. It indicates the outcome of the LocalConnectionOptions is a parameter used by
command and consists of an integer number the Call Agent to direct the handling of the
optionally followed by commentary. connection by the gateway. Some of the fields
• EndpointId: EndpointId is the name for the contained in LocalConnectionOptions are:
endpoint in the gateway where command executes. Encoding Method, Packetization period,
• BearerInformation: BearerInformation is a Bandwidth, Type of Service,Usage of echo
parameter defining the coding of the data received cancellation and so on.
from the line side. • Mode: Mode indicates the mode of operation for
• NotifiedEntity: NotifiedEntity is specifies where this side of the connection. The mode are "send",
the notifications should be sent. When this "receive", "send/receive", "conference", "data",
parameter is absent, the notifications should be "inactive", "loopback", "continuity test", "network
sent to the originator of the NotificationRequest. loop back" or "network continuity test."
• RequestedEvents: RequestedEvents is a list of • DetectEvents: DetectEvents, the list of events that
events that the gateway is requested to detect and are currently detected inquarantine mode.
report. Such events include, for example, fax • RestartMethod: The RestartMethod parameter
tones, continuity tones, or on-hook transition. To specified the type of restart of the endpoint. The
each event is associated an action methods include "graceful" and "forced".
• RequestIdentifier: RequestIdentifier is used to • RestartDelay: The parameter is expressed as a
correlate the request with the notifications that it number of seconds. If the number is absent, the
triggers. delay value should be considered null.
96
• Capabilities: The capabilities for the endpoint are
similar to the LocalConnectionOptions parameter
and including event packages and connection
modes.
Appendix C
Mentioned below are some interesting comments from
Speakers during the VON conference.
• In 1998 there were more than 1 trillion minutes of
POTS usage.
• The US market for Telephony services is about
$250 billions and the global telecom service
market is about $800 billion.
• The cross-over for the wide-area data traffic
exceeding voice traffic is happening about now,
but voice revenues are much greater than the data
revenues.
• By 2004, 5% to 20% of long distance calls will be
VoIP.
• Circuit switching will be dead by 2005.
• Voice will be only 1% of the total global network
traffic by 2008.
• The worldwide market for IP Telephony will grow
from $480 million in 1999 to $19 billion in
2004.
97
Network dimensioning for voice over IP
Tuomo Hakala
Oy Datatie Ab
tuomo.hakala@datatie.fi
• Overall delay
Currently there are several approaches to improve the

Abstract audio quality of VoIP [7]:
This article concentrates in the issues of network
dimensioning for voice over IP (VoIP). The network • Integrated Services (IntServ) is a stateful
under dimensioning is an IP network between VoIP approach where resources are reserved in the
user devices. First, a short introduction to VoIP in network before data starts to flow along the
general is given. Second, the issues in network reserved path. [8]
dimensioning for VoIP are identified. Third, bandwidth
requirements of VoIP are calculated. Fourth, basic • Differentiated Services (DiffServ) is a stateless
approaches to Quality of Service are discussed and approach where real-time traffic is marked to get
finally conclusions are drawn. preferred treatment in the network. [9]
• Forward Error Correction (FEC) algorithms

reduce the impact of data loss by sending
1 Introduction redundant data along with the audio data. The
redundant data helps to reconstruct lost data. [10]
VoIP represents the best opportunity so far for voice [14]
and data convergence and it is now one of the fastest-
growing industries [12]. An IP network with mixed
• Loss Concealment algorithms try to reduce the
voice and data makes the network management easier
impact of data loss by replacing the lost audio with
than managing separate voice and data networks. A
an approximation. [11]
VoIP call uses less bandwidth than a circuit-switched
call. VoIP makes new services possible.
Forward Error Correction and Loss Concealment
algorithms are methods used in the VoIP user devices.
IP networks, like the current Internet, offering only
IntServ and DiffServ are methods used in the IP
best-effort service, cannot satisfy the Quality of
network.
Service (QoS) requirements of VoIP. This is primarily
because of the variable queuing delays and packet loss
during network congestion [12].
The end-to-end Quality of Service of VoIP is

2 The issues
composed of factors related to the network and factors During an average conversation, each party usually
related to the applications. Factors related to the talks only about 35 percent of the time. Most of the
network are [13] [1]: techniques used to transform voice into data, the
codecs, have the ability to detect silences. With this
• Network delay voice activity detection, data is transmitted only when
needed. When several conversations are multiplexed on
• Network jitter a single transmission line, statistical multiplexing can
be used which leads to more efficient use of
• Network packet loss and desequencing bandwidth.
Factors related to the applications are: When a VoIP packet is transferred through an IP
network, it will experience delay that is caused by:
• Overall packet loss
• Transmission delay between the nodes, depends on
• Jitter buffers the frame size and the transmission speed
• Codec performance • Queuing delay in the nodes because of buffering
98
• Switching and processing delay in the nodes, the identity of the participants [4]. RTP and RTCP are
time to switch a frame from an input port to an mostly used on top of User Datagram Protocol (UDP)
output port [3], which provides the use of a port number and a
checksum. The use of UDP enables also the use of IP
• Propagation delay, depends on the characteristics multicast i.e. sending packets to IP multicast addresses.
of the transmission media and the distance This means that a RTP stream generated by a single
between the nodes source can be received by several destinations. [1]
The use of statistical multiplexing means that the delay

of sent packets within a conversation will vary. This
varying delay is called jitter. The jitter must be 3 Bandwidth requirements
minimized in the network and the remaining jitter
needs to be corrected by the receiving side using jitter 3.1 The number of calls per link
buffers to make the original speech intelligible. Jitter
When bandwidth needs to be reserved for voice in an
buffers increase the overall delay.
IP network designed for both voice and data,
information needs to be gathered in order to know who
Several technologies enable the use of statistical
phones where, how often and how long. When an
multiplexing and mixing of voice and data on the same
existing circuit switched telephone network is planned
transmission lines. Such technologies are voice over
to be realized by using VoIP, this information can be
Frame Relay, voice over ATM (Asynchronous Transfer
derived from existing statistics. When VoIP network is
Mode) and VoIP. VoIP is the most flexible technology
planned for a new implementation and no statistics are
because it does not require virtual channels to be set up
available, calculations of the number of calls can be
between the parties having a conversation. Also, VoIP
done using the Erlang model [1].
scales in terms of connectivity much better than frame
relay or ATM.
An optimal route on the network is chosen for each of
the calls considering the cost of each link per unit of
In IP networks, routers are the devices that execute the
bandwidth. After this, it is possible to calculate the
statistical multiplexing functionality. IP packets
number of simultaneous calls, or conversations, on
belonging to the same conversation may use different
each link in the network at any given time. The
routes having different delays and therefore they may
maximum number of simultaneous calls is used as the
be received in different order than in which they were
value of N in the following discussion.
sent. This is called desequencing.
Their is some signaling traffic in VoIP before and after
When an overflow of the buffers in the network nodes
a call but the amount of signaling traffic per voice
occurs because of network congestion, there will be
channel is negligible compared to the voice traffic
some packet loss, which must be handled by the
itself and therefore it is not considered in the following
receiving side. It makes no sense to resend part of
discussion.
speech because of the overall delay. [1]
3.2 The number of active voice channels in
The bandwidth required by VoIP must be calculated
considering the bandwidth requirements of a single one direction of a link
conversation and the number of conversations on each During a voice conversation the proportion of active
link in the network. Acceptable packet loss and the one-way speech intervals of the whole time of the
buffering capacity of the nodes in the network must be conversation is called the activity rate a. An average
considered as well. Delay and jitter must be minimized value is usually 0,35 because the parties of a
in the network. conversation normally think a while before their speak.
To be on the safe side a=0,5 should be used in the
The receiving side must take care of the remaining calculations. This allows each of the two parties of a
network jitter and the desequencing of packets. The conversation to use a half of the time of the
Real-time Transport Protocol (RTP) was designed to conversation to speak.
allow the receiver to do the correction [4]. RTP
includes:
The probability P(I) of having exactly I active voice
• Information on the type of data transported channels, out of N conversations, in one direction of a
link can be expressed with the binomial distribution:
• Timestamps
• Sequence numbers
I N-I
Real-Time Control Protocol (RTCP) allows the P(I) = N! *a * (1-a) (1)
conveyance of feedback on the quality of the I!(N-I)!
transmission and it can also carry information on the
99
Figures 1 to 5 show this probability P(I) with activity
0,09 1,2
rate a=0,5 and different values of N. 0,08
1
0,07
0,06 0,8
Cumulative
0,35 1,2 0,05
P(I)
0,6
0,3 1 0,04
0,25 0,03 0,4
0,8
Cumulative
0,02
0,2 0,2
0,01
P(I)
0,6
0,15 0 0
0,4
15
22
29
36
43
50
57
64
71
78
85
92
99
1
8
0,1
0,05 0,2 I = the number of active one way voice channels (N=100,
a=0,5)
0 0
1 2 3 4 5
Figure 4: The probability of having I active voice
I = the number of active one way voice channels (N=5,
a=0,5) channels in one direction of a link when the number of
conversations is N=100 and the activity rate is a=0,5.
channels in one direction of a link when the number of
0.03 1.2
0.025 1
0.02 0.8
Cumulative
0,3 1,2
P(I)
0.015 0.6
0,25 1
0.01 0.4
0,2 0,8
Cumulative
0.005 0.2
P(I)
0,15 0,6
0 0
127
190
253
316
379
442
505
568
631
694
757
820
883
946
64
1
0,1 0,4
0,05 0,2 I = the number of active one way voice channels (N=1000,
a=0,5)
0 0
1 2 3 4 5 6 7 8 9 10
a=0,5) channels in one direction of a link when the number of
channels in one direction of a link when the number of The cumulative graphs in the figures 1 to 5 show the
conversations is N=10 and the activity rate is a=0,5. probability of having maximum I voice channels active
out of N at the same time in one direction of a link.
This can be used in link sizing. As an example, if we
0,16 1,2
know that the maximum number of conversations on a
0,14
0,12
1 link is 1000 and we want to be 99% sure that all
0,8 simultaneously active voice channels get all of their
Cumulative
0,1
packets through the link, we size the link for the
P(I)
0,08 0,6
0,06 bandwidth of 536 times the bandwidth required by a
0,4
0,04 single voice channel.
0,2
0,02
0 0 Table 1 shows I, the maximum number of active voice
channels in one direction of a link with various values
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29

a=0,5)
of N and with the probability of 99%. For N=5, the
97% value is shown. It can be seen, that when N
Figure 3: The probability of having I active voice increases the 99% value of I maximum gets closer and
channels in one direction of a link when the number of closer to a*N. This means that, as a rule of thumb, with
conversations is N=30 and the activity rate is a=0,5. a small number of conversations the link must be sized
as all conversations were active at the same time and
with a large number of conversations the link can be
sized as only a bit more than a half of the conversations
were active at the same time.
100
Table 1: I, the maximum number of active voice Table 4 shows the frame frequencies of three codecs.
channels out of N in one direction of a link, with the The frame frequency can be used to calculate the total
probability of 99% and a=0,5. For N=5, the 97% bandwidth of a single active voice stream when the
value is shown. total header overhead is known.
N I Probability
Table 4: Frame frequencies of three codecs
5 4 97%
10 8 99%
Codec G.723.1 G.723.1 G.729
30 21 99% (5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
100 61 99% Payload size 20 24 10
1000 536 99% (octets)
Sample (ms) 30 30 10
3.3 Buffering Frame 33,125 32,8125 100
Buffering in the network increases jitter and therefore frequency
reduces interactivity. It is good practice to dimension (1/s)
VoIP links assuming no buffering in the network. This
leads to some overprovision for slow links, but this
overhead can be used by non real-time traffic in an IP Table 5 shows the total level 2 frame size when one
network designed for both voice and data [1]. In an IP frame contains a single voice sample and considering
network with mixed voice and data the bandwidth the notes from table 3. Including more than one voice
requirements of VoIP are small compared to the sample in one packet would cause additional
bandwidth used for data in today’s IP networks. packetization delay and therefore it is not
recommended.
3.4 Link sizing Table 5: Total level 2 frame size (octects)
Table 2 shows the transport header overhead of IPv4,
UDP and RTP. Codec G.723.1 G.723.1 G.729
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
Table 2: Transport header overhead Frame Relay 62 66 52
PPP 68 72 58
Protocol Overhead POS 76 80 66
(octets) ATM 106 106 106
IPv4 (Internet Protocol version 4) [2] 20
UDP (User Datagram Protocol) [3] 8
Table 6 shows the total bandwidth of a single active
RTP (Real-time Transport Protocol) [4] 12
voice stream considering the frame frequencies
calculated in table 5. It is shown that the G.729 coded
voice stream that is run over an ATM link requires
Table 3 shows the header overhead with the following
more bandwidth than 64 kbit/s which is the bandwidth
level 2 technologies: Frame Relay, PPP (Point-to-Point
required by a G.711 codec run over a TDM link.
protocol), POS (Packet over SONET/SDH),
ATM/AAL5 with LLC/SNAP (Asynchronous Transfer
Mode, ATM Adaptation Layer 5, Logical Link
Control/Subnetwork Access Protocol). Table 6: Total required bandwidth of a single active
voice stream (kbit/s)
Table 3: Total header overhead
Codec G.723.1 G.723.1 G.729
Level 2 framing Frame PPP POS ATM, AAL5, (5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
Relay LLC/SNAP Frame Relay 16,430 17,325 41,600
(Note) PPP 18,020 18,900 46,400
Level 2 header 2 8 16 8+8 POS 20,140 21,000 52,800
(octets) ATM 28,090 27,825 84,800
IPv4+UDP+RTP 40 40 40 40
headers (octets) Table 7 shows the maximum number of simultaneously
Header overhead 42 48 56 56 active voice streams in one direction with zero packet
(octets) loss on a SDH/STM-1 link with POS and ATM/AAL5.
The available bandwidth on a SDH/STM-1 link is
Note: With ATM, add AAL5 padding octets to get 149,76 Mbit/s from the 155 Mbit/s link speed.
multiples of 48 and add 5 octets for every 48 octets to
get the 53 octet cell size.
101
Table 7: Maximum number of simultaneously • Integrated Services (IntServ) is a stateful approach
active voice streams in one direction with zero where resources are reserved in the network before
packet loss on a SDH/STM-1 link with POS and data starts to flow along the reserved path. [8]
ATM/AAL5 (149,76 Mbit/s available from the 155
Mbit/s link speed) • Differentiated Services (DiffServ) is a stateless
approach where real-time traffic is marked to get
Codec G.723.1 G.723.1 G.729 preferred treatment in the network. [9] [5]
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
POS 7 436 7 131 2 836
ATM, 5 331 5 382 1 766 4.1 Integrated Services (IntServ)
AAL5 IntServ model proposes two service classes in addition
to best-effort service: guaranteed service and
Table 8 shows the maximum number of simultaneously controlled-load service. Guaranteed service is for
active voice streams in one direction with zero packet applications requiring a fixed delay bound. Controlled-
loss on a 64k TDM link with Frame Relay and PPP. load service is for applications requiring reliable and
enhanced best-effort service. [12]
Table 8: Maximum number of simultaneously active
voice streams in one direction with zero packet loss on IntServ requires that resources are explicitly managed
a 64k TDM link with Frame Relay and PPP. for each real-time application. Routers must reserve
resources (e.g. bandwidth and buffer space) in order to
Codec G.723.1 G.723.1 G.729 provide specific QoS for each packet flow. This
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s) requires flow-specific states in the routers. [12]
Frame Relay 4 4 2
PPP 4 3 1 The four components of IntServ are:
• Flow specification - Flowspec describes the

4 Delay, jitter and packet loss characteristics of the flow and it has two separate
When the bandwidth required by VoIP is calculated for parts, Tspec (describes flow’s traffic
a low packet loss assuming no buffering in the characteristics) and Rspec (specifies the service
network, the network delay and jitter are minimized. In requested from the network)
an IP network with mixed voice and data traffic, some
mechanism must be used to ensure that the bandwidth • Signaling protocol - e.g. Resource ReSerVation
calculated for VoIP is not used by other real-time Protocol (RSVP) [6]
traffic or non real-time traffic. When calculations for
VoIP are done for a low packet loss in the network, • Admission control routine - determines whether
somehow it must be taken care of that the buffers in the a request for resources can be granted.
network nodes are not filled with packets of other
traffic types which would cause VoIP packets to get • Packet classifier and scheduler - packets entering
dropped causing packet loss. Also, when the a router are classified and put in the appropriate
calculations for VoIP are done assuming that there is queue and then scheduled accordingly.
no buffering in the network nodes, because buffering
would lead to increased delay and jitter, it must be
somehow taken care of, that VoIP packets get sent first 4.2 Differentiated Services (DiffServ)
to the outgoing link even though there are packets of In DiffServ model traffic entering an IP network is
other traffic type in the buffers. classified, marked, policed and shaped at the edge of
First of all, VoIP traffic must be somehow the network.
differentiated from other traffic types in the network so
that it can be treated better. The nodes in the IP The packets are then assigned to different behavior
network, the routers, can differentiate traffic according aggregates (BA). Each BA is identified by a single
to source and destination IP addresses, protocol type, DiffServ CodePoint (DSCP). Users request a specific
port numbers and by the Differentiated Services (DS) performance level per packet by marking the DiffServ
field. The DS field means the type of service (TOS) field of each packet with a specific DSCP value which
byte in IPv4 and the traffic class byte in IPv6. specifies the Per-Hop-Behavior (PHB) within the
provider’s network. Packets are forwarded within the
There are two basic approaches in an IP network with core of the network according to the PHB.
mixed voice and data traffic that can be used to
improve the quality of VoIP [7]: The four components of DiffServ are [12]:
102
• Services - Characteristics of packet transmission active at the same time and with a large number of
in one direction over a path in a network are conversations the link can be sized as only a bit more
defined by a service. DiffServ can be provided by than a half of the conversations were active at the same
two approaches: time.
• Quantitative DiffServ - QoS is specified in Buffering in the network increases jitter and therefore
deterministically or statistically quantitative reduces interactivity. It is good practice to dimension
terms of throughput, delay, jitter and/or loss. VoIP links assuming no buffering in the network.
• Priority based DiffServ - Services are When the bandwidth required by VoIP is calculated for
specified in terms of a relative priority of a low packet loss and no buffering is assumed in the
access to network resources. network, the network delay and jitter are minimized.
The receiving side must correct the remaining network
• Conditioning Functions and PHB - A user and a jitter and the desequencing of packets.
service provider must have a service level
agreement (SLA) in place that specifies the In an IP network with mixed voice and data traffic,
supported service classes and the amount of traffic some mechanism must be used to ensure that the
allowed in each class. Individual packets have bandwidth calculated for VoIP is not used by other
DiffServ (DS) fields that indicate the desired real-time traffic or non real-time traffic. There are two
service and these DS fields can be marked at hosts basic approaches to achieve this: Integrated Services
or at the access router or at the edge router in the (IntServ) and Differentiated Services (DiffServ).
service provider network. Packets are classified, IntServ is a stateful approach where resources are
policed and possibly shaped at the ingress of the reserved in the network before data starts to flow along
service provider network according to the rules the reserved path. DiffServ is a stateless approach
derived from the SLA. Between domains, service where real-time traffic is marked to get preferred
provider networks, DS fields may be remarked, if treatment in the network.
so defined in the SLA between the two service
providers. These traffic control functions at hosts,
access routers or edge routers are generically
called traffic conditioning. Per hop behavior References
(PHB) are defined to allocate buffer and [1] Hersent, Olivier; Gurle, David; Petit, Jean-Pierre:
bandwidth resources at each node among traffic IP Telephony, Packet-based multimedia
streams. PHB is applied to a DiffServ behavior communications systems; Great Britain, 2000,
aggregate and a DiffServ- compliant node. www.awl.com/cseng/, ISBN 0-201-61910-5
• DS CodePoint – DS field means the type of [2] Postel, Jon: Internet Protocol, RFC 791,
service (TOS) field in IPv4 and the traffic class September 1981
byte in IPv6. Six bits of this DS field are used as a
codepoint (DSCP) to select the PHB for a packet [3] Postel, Jon: User Datagram Protocol, RFC 768, 28
at each node. August 1980
• A node mechanism for achieving PHB – Buffer [4] Schulzrinne, Henning; Casner, Stephen L.;
management and packet scheduling mechanisms Frederick, Ron; Jacobson, Van: RTP: A Transport
are used in nodes to achieve a certain PHB. PHBs Protocol for Real-Time Applications, RFC 1889,
are defined as behavior characteristics relevant to January 1996
service provisioning policies instead of particular
implementation mechanisms. Various [5] Blake, Steven; Black, David L.; Carlson, Mark A.;
implementation mechanisms may be suitable for a Davies, Elwyn; Wang, Zheng; Weiss, Walter: An
particular PHB group. Architecture for Differentiated Services, RFC
2475, December 1998
[6] Mankin, A.; Baker, Fred; Braden, Bob; Bradner,

5 Conclusions Scott; O'Dell, Michael; Romanow, Allyn; Weinrib,
The issues to be considered in network dimensioning Abel; Zhang, Lixia; Resource ReSerVation
for VoIP are bandwidth, delay, jitter, desequencing and Protocol (RSVP), Version 1 Applicability
packet loss. Statement, Some Guidelines on Deployment, RFC
2208, September 1997
With a small number of conversations the link
bandwidth must be sized as all conversations were
103
[7] Trends in the Internet Telephony,
http://www.fokus.gmd.de/research/cc/glone/projec
ts/ipt/ (11 March 2001)
[8] IETF Integrated Services (IntServ) Working

Group charter,
http://www.ietf.org/html.charters/intserv-
charter.html (11 March 2001)
[9] IETF Differentiated Services (DiffServ) Working

Group charter,
http://www.ietf.org/html.charters/diffserv-
charter.html (11 March 2001)
[10] Speech Property-Based FEC (SPB-FEC),

http://www.fokus.gmd.de/research/cc/glone/produ
cts/voice/spb-fec/ (11 March 2001)
[11] Adaptive Packetization / Concealment (AP/C),

http://www.fokus.gmd.de/research/cc/glone/produ
cts/voice/apc/ (11 March 2001)
[12] Li, Bo; Hamdi, Mounir; Jiang, Dongyi; Cao, Xi-

Ren: QoS-Enabled Voice Support in the Next-
Generation Internet: Issues, Existing Approaches
and Challenges; IEEE Communications Magazine,
April 2000
[13] TELECOMMUNICATIONS AND INTERNET

PROTOCOL HARMONIZATION OVER
NETWORKS ETSI PROJECT – TIPHON,
http://webapp.etsi.org/tbhomepage/TBDetails.asp?
TB_ID=291&TB_NAME=TIPHON (12 March
2001)
[14] Padhye, Chinmay; Christensen, Kenneth J.;

Moreno, Wilfrido: A New Adaptive FEC Loss
Control Algorithm for Voice Over IP
Applications; IEEE 2000
104
TRIP, ENUM and Number Portability
Nicklas Beijar
Networking Laboratory, Helsinki University of Technology
P.O. Box 3000, FIN-02015 HUT, Finland
Nicklas.Beijar@hut.fi
single gateway. As IP telephony matured, a vision of

global public IP telephony became popular. In this
scenario the IP and PSTN networks are interconnected
Abstract with a large number of publicly available gateways. In
order to make connectivity between all IP- and PSTN-
This paper describes the problem of locating terminals
terminals possible, the problem of terminal and
using E.164 numbers, and the problem of selecting a
gateway location must be solved. In these problems,
suitable gateway for calls from an IP telephony
addressing is central. Number portability allows users
network to the public switched telephone network
to change operators and locations without changing the
(PSTN). Generally, these are the problems of mapping
telephone number. To smoothen the transition to an IP-
the name of a destination into an address, and to find
based telephone network, number portability is also
the best route to the destination in a combined IP and
required between PSTN and IP-networks.
PSTN network. Number portability is closely related to
these problems. Due to number portability the address
The main signaling protocols for IP telephony are
of a destination is changed without changing its name.
Session Initiation Protocol (SIP) [1] and H.323 [2]. The
Number portability may also change the optimal route
architectures that they define are similar, although
to a destination.
different names are used for the network elements.
Calls can be established between IP telephony
Two protocols are being developed by the Internet
terminals directly, but usually the call setup signaling
Engineering Task Force (IETF) to solve these
passes through a gatekeeper (in H.323) or signaling
problems. The Telephony Routing over IP (TRIP)
server (in SIP). The elements have similar functions in
protocol solves the gateway location problem by
the two signaling protocols, so in this paper we will use
distributing routing information between entities on the
the name signaling server for both. Some important
IP network. The tElephony NUmbering Mapping
functions of the signaling server are address translation
(ENUM) provides a solution to the terminal location
and location of the destination terminal. For calls
problem based on DNS. ENUM maps an E.164 number
between an IP terminal and a terminal on the PSTN, a
into an URI, which is used to locate the end point. Both
gateway is used to convert signaling and code the voice
protocols aim to add a part to the architecture in order
stream between the circuit switched and the packet
to make a global hybrid PSTN-IP network possible.
network.
They also aim to enable number portability in the IP
network and between the two types of networks.
To identify the destination of a call on the PSTN, the
caller dials the receiver’s telephone number in E.164
In this paper, we will introduce the concept of terminal
format [3]. The telephone number is analyzed digit by
and gateway location. We describe how the current
digit to locate the path through switches in the PSTN
protocols locate terminals and gateways, and what the
towards the destination. Although the number dialed by
problems with the current solutions are. The TRIP and
the user traditionally is used for routing, a different
ENUM protocols are presented in detail and scenarios
address is used for routing in many cases.
based on the protocols are described. Solutions to
number portability are presented and some problems
The European Telecommunications Standards Institute
are discussed.
(ETSI) defines the concept of names and addresses as
follows: A name is a combination of alpha, numeric or
Keywords: Voice over IP, IP Telephony, TRIP,
symbols that is used to identify end-users. An address
ENUM, Number portability, Routing, Address
is a string or combination of digits and symbols which
mapping
identifies the specific termination points of a
connection/session and is used for routing. [4]
1 Introduction The main difference between these is that a name is an

identifier for the end user, while an address is a locator.
When IP telephony was introduced, it was mainly used An address should typically have some form of
in small private networks, which were connected to the structure that allows aggregation for routing purposes.
public switched telephone network (PSTN) through a In the Internet, the name is a domain name. The
105
domain name is mapped into an IP address, which is address or host name. The signaling protocols allow
used for routing. In the telephone network, E.164 various formats of addresses to be used by the users.
numbers have traditionally been used as both names Users prefer to use E.164 or e-mail type addresses that
and addresses. However, due to number portability are familiar from traditional telephony or e-mail,
their roles have been separated. The number that the respectively. To set up a call, the name of the
user dials, which can be regarded as a name, is then destination is mapped to an IP address by a signaling
mapped into a routing number, which is an address. server. The signaling server can be manually
The dialed number is usually referred as a directory configured with the mappings for its local terminals.
number. It is also worth noting, that in many cases More usually however, the terminals must register to
entities that functionally are names are called the signaling server. Based on registration the mapping
addresses. is created. The server maintains a database of mappings
for its registered clients.
To transform the name into an address some type of
mapping method is needed. For the mapping of host The SIP architecture also includes a network element
names into IP addresses, the Domain Name Service named location server. The location servers store the
(DNS) [15],[16] is used. DNS is a distributed directory mappings on the behalf of the signaling servers. A
service based on DNS servers. Each server knows the location server may be used by a number of signaling
mapping of a range of hosts, or the address to a server servers. The location server may also be integrated
that has more detailed information. The parts of the with a signaling server. In this way we can generalize
domain names are analyzed in hierarchical order and to say that the location server stores the mapping, even
the mapping request is forwarded to more specific though the location server and signaling server in some
DNS servers until the mapping can be completed. cases are the same element. In case of separate servers,
the information is accesses with some directory access
protocol.
2 The current situation In a public IP telephony network there is a large

number of signaling servers. There is currently no
Today IP telephony is used in mainly two situations:
method to distribute the mappings between different
either as a private branch of the PSTN within an
servers. Because of this lack of distribution method,
organization or for calls between terminals on the
mappings can only be used for calls between phones
Internet. The first case involves gateways, which
registered to the same server. Thus, E.164 numbers do
connects the private IP telephony network with the
not work for calls between terminals registered to
public switched telephone network. The second case
different signaling servers or location servers. Note that
does not usually involve gateways, since it would
calls are always possible when the address is given as
require publicly available gateways and the existence
an IP address.
of a billing system. Additionally IP telephony is used
to inexpensively transport long distance calls between
The SIP protocol also supports the use of names given
PSTN callers through the IP network.
as Universal Resource Locators (URL) [18]. The URL
specifies the user, the host and other parameters. This
2.1 Locating the destination “user@host” format can be handled in a similar way as
email addresses are handled by SMTP [5]. An IP
In most of today’s IP telephony applications, the IP address of a signaling server for the domain is located
telephony network acts as a branch of the PSTN. The using DNS. Thus, the host name does not have to be a
gateway together with the signaling server (or complete host name. The call can be further forwarded
gatekeeper) works like a PBX from the viewpoint of by proxy or redirect servers. [1]
the PSTN. The called E.164 numbers are translated to
IP addresses by the signaling server. Calls to and from The most popular H.323 client Microsoft Netmeeting
external numbers are routed through the gateway. In uses directory servers to locate users. These directories
such small networks with only a few gateways, the are propriety solutions, named ULS and ILS. Similar
number translation and gateway selection can be solutions are used by many other clients on the market.
performed by a single signaling server. The mappings Still, there are significant drawbacks in this type of
are usually configured in the signaling server. The solution. The directories have a limited capacity and
users do not necessary notice that IP telephony is used, they do not exchange information with each other.
since they use E.164 numbers as normally. These types Furthermore, many of them use non-standardized
of networks are, however, very limited in size and access protocols. [7]
cannot be considered in a larger deployment of IP
telephony. In this paper, we will mainly discuss the use
of larger IP telephony networks. 2.2 Locating the gateway
For calls to the PSTN a gateway must be used. Today,
For calls over the public Internet, the situation is more
the gateways are in most cases manually configured
complicated. IP terminals are located using their IP
106
into the signaling server. A signaling server has a set of both calls to IP terminals and calls to PSTN
available gateways to use for external calls. For private destinations.
internal IP telephony networks, external numbers are
usually recognized by a preceding “0”.
In SIP the call can be set up using a gateway specified 3 Problem description
in the URL. The destination is then given as
“number@gateway”. This requires the user to know 3.1 Naming
that the destination is on the PSTN and also which
gateway should be used. If the gateway is down or if Traditionally E.164 numbers have been used on the
all lines are busy, the user must manually select telephone network and e-mail type addresses of format
another gateway. Another method is to let the signaling “user@domain” on the Internet. The signaling
server choose the gateway, whereas calls can be made protocols SIP and H.323 allows using multiple types of
by only giving a number. The server selects one from names, including both the above methods as well as IP
its list of available gateways. The H.323 protocol addresses. For Internet users, who have a keyboard
works in a much similar way. available, textual names are preferred since they are
easy to remember and deduce.
2.3 Number portability However, the problem arises when the networks are
interconnected. Callers on the PSTN have no keyboard
Number portability allows a user to change service
and a scheme for entering characters using number
providers, location or service type without changing
keypad would be too complicated. This limits the
the telephone number. Service provider portability is
PSTN users to entering numeric names. Consequently,
mandatory in many countries. The introduction of IP
an IP terminal must have a telephone number to be
telephony adds a new type of number portability:
accessible from the PSTN. The problem was
between different network types. [8]
recognized by TIPHON, which chose to equip IP-
terminals with an E.164 number. For calls within the IP
Today number portability is only implemented on the
network, other types of addressing can be used.
PSTN. The implementation of number portability
Unfortunately, this would require the user to know on
differs in different countries. Common to all
what type of network the destination is. When IP
implementations, is that the directory number dialed by
telephony is largely deployed, customers do not
the customer is mapped to either a routing number or a
necessary even know the underlying technology of
routing prefix. A routing number is a hierarchical
their own connection.
routing address, which can be digit-analyzed to reach
the correct country, network provider, end-office
As we saw in section 2.1, E.164 numbers can currently
switch and subscriber line. A routing prefix forms a
only be used between host registered to the same
routing number by adding some digits in front of the
signaling server. Using some propriety protocol,
directory number. The routing number replaces the
mapping can be distributed between smaller groups of
hierarchy that is lost, since the directory number space
servers, but there is no protocol for global distribution.
becomes flat due to number portability.
Most number portability solutions utilize Intelligent 3.2 Problem categories

Network (IN) functions. In these solutions, the
mapping database is stored in the Service Data The name entered by the user, usually given as an
Functions (SDF) elements. Depending on the E.164 number, must be mapped to at least one routing
implementation, the call may have to pass through the address. For calls from IP telephony terminals to other
previous operator’s network before it reaches the IP telephony terminals, the host address of the
current destination network. [8] destination must be found. For calls over the network
boundary to the PSTN, a gateway must be located.
At the time being, there is no specific solutions for Also in the opposite direction, from the PSTN to the IP
number portability across the network types. A number network, a gateway must be selected.
that is moved to the IP network can be handled in a
normal way in the number portability solutions on the The TRIP framework [1] divides the problem into three
PSTN. The number portability databases are updated subproblems:
with a routing number that directs calls to a gateway. 1. Given a phone number corresponding to a specific
Limited number portability can be implemented in IP host on the IP network, determine the IP address of the
telephony network using redirect and proxy servers. host. This is required for calls from the PSTN to the IP
Calls to a moved numbers are forwarded to the new network, but also for calls within the IP network if
destination by the previous signaling server. However, E.164 numbers are used. The mapping may be variable,
this feature does more correspond to call forwarding for example if DHCP is used.
than number portability. The forwarding works with
107
2. Given a phone number corresponding to a terminal 3.4 Routing and number portability
on the PSTN, determine the IP address of a gateway
For economic or quality related reasons a transit
capable of completing calls to that phone. The choice is
network of different type can be used, giving two more
influenced by a number of factors, such as policies,
call scenarios: PSTN-IP-PSTN and IP-PSTN-IP. Even
location, availability and features. This is called the
when only two network types are used, the transit
gateway location problem.
network must be selected. It is usually more cost
3. Given a phone number corresponding to a user of a
effective to hand over calls to IP destinations to the IP
terminal on the PSTN, determine the IP address of an
network near the origination point. On the other hand,
IP terminal owned by the same user. This type of
the voice quality is better if the call uses PSTN most of
mapping may be used if the PC services as an interface
the path. Possibly the caller could choose whether to
for the phone, for example for delivering a message to
route the calls via IP or PSTN using carrier selection
the PC when the phone rings.
mechanisms. Typically this would imply the use of a
prefix to select carrier. [23]
For calls from the PSTN to the IP network, the
selection of gateway is performed using normal routing
The call can thus propagate through several network
in the switched circuit network, which is static. On
types. Each time the call goes from one network type to
longer sight, it would also be necessary to dynamically
another, it has to pass a gateway where the media
select a gateway for these calls. This gives us a fourth
stream is converted. The conversions cause delay and
subproblem.
jitter, which decrease the quality. Therefore,
unnecessary media conversions should be avoided. It
would be good to know the type of the destination
3.3 The address mapping problem network already in the originating network.
To establish a call to a terminal on an IP network, the
destination IP address must be known. Alternatively With number portability numbers may move from one
the terminal can be identified by a host name, which is provider’s network to another, and even between
translated to an IP address by DNS. As terminals are network types. If a number belonging to a number
equipped with an E.164 number, a new mapping is block of a PSTN operator moves to an IP network,
required: from an E.164 name to an IP address. The calls from IP subscribers may unnecessarily be routed
address mapping problem usually refers to the task of through the PSTN.
locating terminals on the IP network.
When the switched circuit network and IP telephony 3.5 The gateway location problem
networks are interconnected, new call scenarios arise.
As the usage of IP telephony grows and the number of
Since the originating network and destination network
gateways increases, the management of gateways and
can be of two types, there are four basic call scenarios:
routes between the IP- and PSTN networks becomes
PSTN-PSTN, PSTN-IP, IP-PSTN and IP-IP. When
increasingly complex. In a situation where the IP
calls are setup, the first task is to determine the type of
network approaches the size of the PSTN, a large part
the destination network. A mapping from E.164 name
of the calls will pass through one or even several
to network type is required.
gateways on their path. For calls from the IP network
to the PSTN, the caller must locate a gateway that is
The required mappings could be solved with some type
able to complete calls to the desired destination. There
of directory. At a minimum, the mapping from E.164
may be several available gateways, and selecting the
number to network type and IP address must be
most suitable one is a nontrivial process.
supported. The directory must be scaleable too store
large amounts of mappings, possibly for all telephones
Currently the gateway must be selected by the user or
in the world. It must be capable to reply to a high rate
by the signaling servers. The selection and
of lookups, for each call that is set up. In practice, the
configuration of gateways to use involves manual
directory must therefore be distributed. The directory
work. The list of available gateways must be
must also propagate updates rather quickly when the
configured into the signaling servers and updated when
information changes.
new gateways become available. Additionally,
gateways may become blocked when all lines are in
Additionally the mapping is expected to be used with
use. The signaling server does not know which
several different services. In addition to voice calls, the
gateways are accessible.
IP network allows for video conferencing and e-mail
among others. Some method of locating the available
Connectivity to the PSTN means that every gateway is
contact modes and services is desired.
able to connect to nearly any terminal on the PSTN.
The number of available gateways can thus be very
large. The selection of which gateway to use is
influenced by a number of factors. Firstly, the location
of the gateway is important. For example, there is no
108
reason to use a gateway in a country far away to TRIP is modeled after the Border Gateway Protocol 4
connect parties in the same city. To minimize usage of (BGP-4) [10]. TRIP is like BGP-4 an inter-domain
resources it is important that the gateway is near the routing protocol driven by policies. The nodes of TRIP
path between the parties. are the location servers (LS), which exchange
information with other location servers. The
Secondly, business relationships are important. The information includes reachability information about
gateway service involves costs when calls are telephony destination, the routes towards these
completed to PSTN destination. Gateway providers, in destinations and properties of the gateways connecting
most cases, want to charge for using their gateways. the PSTN and IP network.
Because of this, the usage of gateways may be
restricted to the groups of users that have some type of TRIP uses the concept of Internet Telephony
established relationship with the gateway provider. The Administrative Domains (ITAD) in a similar way as
end user will probably not pay for the gateway service BGP-4 uses autonomous systems. The location servers
directly. Instead, the end user may have a relationship that are administered by a single provider form an
with an IP telephony service provider (ITSP). The ITAD. The ITAD may contain zero or more gateways.
ITSP may have own gateways or use the gateways of a The border of the ITAD does not have to correspond to
separate gateway provider. All these policies and the border of an autonomous system. The main
relationships influence in the selection of gateway. function of TRIP is to distribute information between
ITADs, but TRIP also contains functions for inter-
Additionally, the end user may have requirements on domain synchronization of routing information. It is
the gateway. The end user may prefer a certain not required that all ITADs in the world are connected.
provider or require a specific feature. The caller may Groups of ITADs can be formed that exchange
use a specific signaling protocol or media codec that is information with TRIP.
supported by only some gateways.
TRIP connects location servers with administratively
Keeping in mind that also the gateway capacity is created peer relationships. The location server forwards
limited, it is obvious that an automatic method for the information received from one peer to the other
gateway selection is required. Since the selection is peers. Hereby the location servers in one ITAD learn
largely driven by policies, some type of global about gateways in the other ITADs. The location server
directory of gateways is not suitable. Instead, a selects the routes to use in its own domain, and the
protocol for exchanging gateway information between routes to forward to neighboring domain according to
the providers would be a better solution. its local policies. The information can be modified
according to the policies before it is forwarded. In this
way, the provider can control the type of calls passing
through the domain.
4 Telephony Routing over IP
The location servers collect information and use it to
To solve the gateway selection problem, the Internet
reply to queries about routes to destinations. The query
Engineering Task Force (IETF) working group IP
protocol is not defined by TRIP. Any directory access
Telephony (IPTEL) began working on a protocol for
protocol can be used, for example LDAP [11].
distributing gateway information between gateway
providers and IP telephony providers. The protocol was
first called Gateway Location Protocol (GLP) but after
4.1 Operation of TRIP
finding the problem larger than merely locating
gateways, the protocol was renamed to Telephony The TRIP protocol, the structure and operation of a
Routing over IP (TRIP). The most important node, and the implementation details are specified in
documents of the work are the TRIP framework [6] and the TRIP specification draft [9].
the draft protocol specification [9].
TRIP location servers process three types of routes:
The working group found that a global directory for
gateway information is not feasible. The selection of 1. External routes received from external peers.
gateway is in large part driven by the policies of the 2. Internal routes received from another location
parties along the path of the call. Gateway information server in the same ITAD.
is exchanged between the providers and depending on 3. Local routes which are locally configured or
policies, made available locally and propagated to received from another routing protocol.
other providers. The providers create their own
databases of reachable phone numbers and the routes The routes are stored in the Telephony Routing
towards them. These databases can be different for Information Base (TRIB), whose structure is depicted
each provider. in Figure 1. The TRIB consists of four distinct parts.
1. The Adj-TRIBs-In store routing information that
has been learned from other peers. These routes are
109
the unprocessed routed that are given as input to the Table 1: The basic set of TRIP attributes
decision process. Routes learned from internal
location servers and from external location servers Name Description
are stored in separate Adj-TRIBs-In. Withdrawn routes List of telephone numbers that are
2. The Ext-TRIB stores the preferred route to each no longer available.
destination, as selected by the route selection Reachable routes List of reachable telephone
algorithm. numbers.
3. The Loc-TRIB contains the routes selected by Next hop server The next signaling server on the
applying the local policies to the routes in the path towards the destination.
internal peers’ Adj-TRIBs-In and Ext-TRIB. Advertisement The path that the route
4. The Adj-TRIBs-Out store the routes selected for path advertisement has traveled.
advertisement to external peers. Routed path The path that the signaling
messages will travel.
Atomic aggregate Indicates that the signaling may
traverse ITADs not listed in the
Local TRIB routed path attribute.
Local preference The intra-domain preference of the
location server.
Decision Process
Multi exit disc The inter-domain preference of the
route if several links are used.
Adj-TRIBs-In Communities For grouping destinations in groups
Adj-TRIBs-Out
(internal LSs) with similar properties.
ITAD topology For advertising the ITAD topology
Ext-TRIB to other servers in the same ITAD.
Authentication Authentication of selected
attributes.
Adj-TRIBs-In
Local Routes
(external peers)
The advertisements represent routes toward a gateway
through a number of signaling servers. A route must at
Figure 1: Structure of a TRIP node least contain the following attributes: withdrawn
routes, reachable routes, next hop server, advertisement
path and routed path. For an advertised route, the
TRIP uses the same state machine and the same withdrawn routes attribute is empty. The reachable
messages as BGP-4. The messages are the OPEN routes attribute contains the list of telephone number
message for establishing peer connection and ranges belonging to this route, and the corresponding
exchanging capability information, the UPDATE application protocol. The next hop server is the next
message for exchanging route information, the server that signaling messages are sent to. For the final
NOTIFICATION message for informing about error hop, it contains the address of the gateway. The
conditions, and finally the KEEPALIVE message for advertisement path is the path that this advertisement
ensuring that the peer node is running. has traveled through and the routed path is the path for
the signaling. These paths are lists of ITADs. They are
The routing information is transmitted in attributes of mainly used by the policy to select routes containing,
the TRIP messages. The specification includes a set of or not containing specific ITADs.
mandatory well-known attributes. In addition to the
well-known and mandatory attributes, optional 4.2 TRIP for gateways
attributes can be added to allow for expansion.
The TRIP framework [9] leaves the question open, how
Gateways have many properties that may need to be
the location servers learn about the gateways. Usually
advertised, so the expected large number of expansion
attributes must be handled correctly. An attribute flag the register message of SIP has been suggested.
indicates how a location server handles a message that However, the draft [14] points out the weaknesses of
using the register message and suggests that a subset of
it does not recognize. The flag can take a combination
TRIP could be used to export routing information from
of the values optional, transitive, dependent, partial and
gateways and soft switches to location servers. TRIP
link-state encapsulated.
manages the needed information transfer and keep-
The specification [9] defines the basic set of attributes alives more efficiently than other protocols and can
better describe the gateway properties. Two new
shown in Table 1. Additional attributes are defined in
attributes are proposed: circuit capacity for informing
separate drafts. An authentication attribute is defined in
about the number of free PSTN circuits, and DSP
[12] and a service code attribute is defined in [13].
capacity for informing about the amount of available
DSP resources. Because of their dynamic nature, these
110
are only transmitted to the location server that manages in conjunction with several application protocols, and
the gateway, and are not propagated. can for example, map a telephone number to an email
address.
A more lightweight version of TRIP can be used in the
gateways. Since the gateway does not need to learn
Table 2: Fields of the NAPTR record
about other gateways, it operates in send-only mode. It
neither needs to create any call routing databases. This Name Description
stripped down version, called TRIP-GW, is still Order The order in which records are
interoperable with normal TRIP nodes. Nevertheless, processed if a response includes
due to scalability problems it is recommended that several records.
location servers peering with gateways run a separate Preference The order in which records are
TRIP instance for TRIP-GW peers. processed if the records have the
same order value.
Service The resolution protocol and
resolution service that will be
5 Telephone Number Mapping available if the rewrite of the
While TRIP is carrying routes to destinations on the regexp or replacement field is
PSTN, a method for locating terminals on the IP applied.
network is still required. This problem is simpler than Flags Modifiers for how the next DNS
the gateway location problem, since the amount of lookup is performed.
information describing a terminal is less than the Regexp Used for the rewrite rules.
information about a gateway. TRIP could be used also Replacement Used for the rewrite rules.
for this purpose, but the complexity of it is not needed.
A simpler directory can be used. It has been suggested
that Domain Name System (DNS) [15], [16] could be Figure 2 shows some example NAPTR records with
used. An IETF working group called ENUM the E2U service. These records describe a telephone
(tElephone NUmber Mapping) was established to number that is preferably contacted by SIP and
specify the number mapping procedures. secondly by either SMTP or using the “tel” URI
scheme [20]. The result of the rewrite of the NAPTR
DNS is used to map domain names into IP addresses. record is a URL, as indicated by the “u” flag. The own
By constructing a domain name from the E.164 resolution methods of SIP and SMTP are used. In case
number, the DNS system can be used to map telephone of SIP, the result is a SIP URI, which is resolved as
numbers into IP addresses. More generally, the result described in [1]. In case of the “tel” scheme, the
of an ENUM lookup is a Uniform Resource Identifier procedure is restarted with a new E.164 number.
(URI) [18], which contains the signaling protocol and
the host name. An additional DNS lookup is thus
required to map the host name to an IP address. The
procedure is described in RFC 2916 [17], the main $ORIGIN 3.0.3.5.1.5.4.9.8.5.3.e164.arpa.
document specifying the ENUM service. IN NAPTR 10 10 “u” “sip+E2U”
“!^.*$!sip:nbeijar@tct.hut.fi!” .
ENUM uses the domain “e164.arpa” to store the
IN NAPTR 100 10 “u” “mailto+E2U”
mapping. Numbers are converted to domain names
using the scheme defined in [17]. The E.164 number “!^.*$!mailto:nbeijar@tct.hut.fi!” .
must be in its full form, including the country code. All IN NAPTR 100 10 “u” “tel+E2U”
characters and symbols are removed, only the digits “!^.*$!tel:+35894515303!” .
remain. Dots are put between the digits. The order of
the digits is reversed and the string “.e164.arpa” is
added to the end. This procedure will map, for Figure 2: Example NAPTR records
example, the number +358-9-4515303 into the host
name “3.0.3.5.1.5.4.9.8.5.3.e164.arpa”.
The draft [21] describes a telephone number directory
DNS stores information in different types of records. service based on ENUM. The model is divided into
The Naming Authority Pointer (NAPTR) record [19] is four levels.
used for identifying available ways to contact a node
with a given name. It can also be used to identify what The first level is a mapping of the telephone number
services exist for a specific domain name. The fields of delegation tree into authorities, to which the number
the NAPTR record are shown in Table 2. ENUM has been delegated. The hierarchical structure of DNS
defines a new service named “E.164 to URI”, which is used, and the mapping may involve one or several
maps one E.164 number to a list of URIs. The DNS queries, which are transparent from the user’s
mnemonic of the service is “E2U”. ENUM can be used point of view. The delegation maps the hierarchy of the
111
E.164 number to the DNS hierarchy, using the country 6.1 Call setup using ENUM
codes, area codes and other parts of the number. The
To illustrate the use of ENUM, we will study a call
first level mapping uses name server (NS) resource
setup situation, where the DNS records of Figure 3 are
records in DNS.
used. The figure shows the DNS configuration for the
top level delegations, the national delegations, a service
The second level is the delegation from the authority,
provider and a service registrar.
to which the number has been delegated, to the service
registrar. The registrar maintains the set of service
records for a given telephone number. Since there may
Sample top level delegations from ITU:
be several service providers for a given number, the
registrar has the role to manage service registrations 3.3.e164.arpa IN NS ns.FR.phone.net. ;France
and arbitrate conflicts between service providers. The 8.5.3.e164.arpa IN NS ns.FI.phone.net. ;Finland
second level uses the DNAME and CNAME records of

DNS to provide redirection from the designated Sample national delegations:
authority to the service registrar. The delegated 5.4.9.8.5.3.e164.arpa. IN NS
authority and the service registrar can be the same ns.ServiceProviderX.net.
entity, which is anticipated especially in the early
stages of ENUM deployment. Sample service provider’s configuration:
1.5.4.9.8.5.2.e164.arpa. DNAME
The third level is the set of service records, which
1.5.4.9.8.5.2.ns.hut.fi.
indicate what services are available for a specific
telephone number. There can be multiple records for
the same service, indicating competitive or redundant Sample service registrar configuration:
service providers. The NAPTR type of records is used *.1.5.4.9.8.5.2.ns.hut.fi.
at the third level. The response to a client’s query is a IN NAPTR 100 10 ”u” ”ldap+E2U”\
set of NAPTR records, and the client is responsible for “$!ldap://ldap.hut.fi/cn=\1!” .
selecting the service to use for the intended action. A
URI is obtained by rewriting the query using the Figure 3: Configuration of DNS records
rewrite rule. The URI can be an LDAP directory
server, a H.323 gatekeeper, a SIP signaling server or a
specific end point address. The described service enables an end-user to discover
the various methods by which the recipient can be
Finally, a fourth level can be provided if necessary. reached. The service is hosted by the recipient’s
This level provides specific attributes for the services corporation.
that are only known by the provider of the service.
Such attributes can be needed for placing calls, routing When a call is setup to the telephone number
messages or validating capabilities. The attributes can +35894515303, the number is first translated into the
be obtained through a SIP query to a signaling server domain name “3.0.3.5.1.5.4.9.8.5.3.e164.arpa”
or a LDAP query to a directory server. The level is according to the ENUM rules. Using the NS records in
service specific and dynamic, and should therefore be the top-level and national authorities’ databases, the
possible with minimal coordination between the service provider is located. In this example, the number
directories of competing providers. block +3589451xxxx is delegated to the service
provider. The service provider provides a non-terminal
redirection pointer to the corporation, which is the
service registrar for the number block +35894515xxx.
6 Scenarios The query for the reachme service returns the NAPTR
record. The client then applies the regular expression
In this section, some scenarios based on ENUM and and gets an LDAP URI of
TRIP are presented. First the different types of resource “LDAP://ldap.hut.fi/cn=35894515303”. The client uses
records used by ENUM are presented through an LDAP with the reachme schema to determine the
example. Then two call setup situations are analyzed. available communications methods.
The draft [22] describes how ENUM can be used in
different call setup situations where interworking
between the PSTN and IP-based networks is necessary. 6.2 Calls from PSTN to IP-based network
By additionally using the TRIP framework [4] and the The call setup scenario for a call from PSTN to an IP
ENUM model [17, 21] , we construct examples of how based network is depicted in Figure 4. The originating
the protocols are used. Since any final call setup customer, who resides on the PSTN, dials an E.164
procedure is not defined, these examples only represent number. The PSTN operator forwards the call to an
one possible approach for interworking between the appropriate gateway. The selection of gateway depends
networks. on several factors. This is a gateway location problem
112
similar to that on the IP network, but there are currently
no corresponding solutions like TRIP. The draft [22]
leaves the question open.
Voice path
LS DNS
Signaling path
POTS
PSTN Gateway
Phone SIP IP-based
SIP Client Network
Server
SIP IP-based
SIP Client Network
Server POTS
PSTN Gateway
Phone
Voice path DNS Figure 5: Call from IP-based network to PSTN

Signaling path Server
Figure 4: Call from PSTN to IP-based network

7 Solutions for number portability
The gateway, which contains ENUM functionality, Number portability requires a mapping between name
looks up the number in DNS. The dialed number is and address. Generally numbers can be moved by
mapped into an URI. If necessary, the country and area changing the mapping. The described protocols TRIP
codes are added to the number by the gateway. The and ENUM both provide a mapping between name
DNS returns any service records that are associated (telephone number) and address (URI, next step
with the URL. The record may be a SIP URI such as signaling server or gateway).
“sip:nbeijar@sipserver.hut.fi”. The gateway makes a
DNS query for the host, in this case “sipserver.hut.fi”
to get the IP-address of the signaling server. The SIP 7.1 ENUM and number portability
call can then be established to the user agent of the ENUM provides a solution for number portability for
given user. numbers moving within the IP network and to the
PSTN. This allows users to change IP service providers
6.3 Calls from IP-based network to PSTN without having to change their telephone number [22].
A call setup scenario for a call from an IP-based The directory service solution defined in [21] describes
network to the PSTN is illustrated in Figure 5. The number portability on three of the conceptual levels of
originating customer dials an E.164 number. Since a ENUM.
customer may dial a local number or a national
number, the client must be capable of supply any If the number is delegated to another authority, the
missing digits. Here the caller uses a SIP client, but any corresponding update is performed in ENUM by
other signaling protocol can be used. The client must changing the name server records to point to the new
contain ENUM functionality. A DNS request is authority. The number is thus moved to another service
constructed from the dialed digits according to the provider or to a portability authority.
ENUM specification.
The service registrar can be reassigned, for example if
When the client looks up the name in DNS, the DNS the customer wants to also change the service registrar
returns any NAPTR service records associated with the in conjunction with the change of service provider. The
URL. Since the destination is on the PSTN, the query DNAME or CNAME records are then updated to point
only returns one record containing the URI in the “tel” to the new service registrar. New service specific
format. For example the URI “tel:+35894515303” NAPTR records are created by the new service
might be returned. The SIP client initiates an INVITE registrar.
to the SIP server using the given URI. The SIP server
queries a location server with LDAP or any other front Most frequently the movement of a number can be
end protocol suggested in the TRIP framework [4]. The accomplished by changing the NAPTR records. This
location server has learned about available gateways happens when a specific service is moved from one
using TRIP. The location server returns the IP address provider to another, for example when switching
of a suitable gateway and the call is routed to the telephony providers. The service registrar coordinates
gateway by the SIP server. The gateway then the deletion of the old records and insertion of records
completes the call through the PSTN. for the new service provider.
113
7.2 Interworking and number portability Networking Laboratory at Helsinki University of
Technology. The suggested protocol, named Circuit
ENUM also solves number portability for hybrid
Telephony Routing Information Protocol (CTRIP),
PSTN-IP networks. The draft [22] separates three
automates the distribution of routing information
scenarios:
between operators and network elements. Information
1. The number moves within the PSTN.
is exchanged with other protocols in Numbering
2. The number moves between PSTN and IP.
Gateways. [24]
3. The number moves within the IP network.
For each scenario, the call setup procedure from both
PSTN and the IP network is described.
For calls originating from the PSTN, the first scenario

8 Conclusions
is already handled by today’s number portability. The Although the signaling protocols provide basic
second scenario is solved by changing the number mechanisms for locating terminals and gateways, new
portability mapping to direct the call to a gateway. The protocols are required for distributing routing
third scenario is solved by ENUM as was described. information in order to make a global IP based
telephone network possible. TRIP provides a solution
For calls established on the IP network, the first for the gateway location problem by distributing
scenario may lead to inefficient routing. As the number information about gateways and reachable PSTN
moves, within the PSTN, the most suitable gateway destinations between location servers. ENUM defines a
probably changes. As a result, the DNS information directory of name to address mappings, which is used
must be updated. It is still not defined how this is done, to locate terminals on the IP network. Both are based
and how it can be automated. Alternatively, if the DNS on tried solutions: TRIP is based on BGP-4 and ENUM
contains routing addresses (such as LRN) for PSTN uses the existing DNS system.
destinations, these must be updated to point to the new
operator and new gateway. Otherwise calls may be Number portability is generally implemented by
routed to the wrong operator. If the gateway do not modifying the mapping between name and address. In
have routing addresses available, an IN query must be the PSTN the Intelligent Network implements the
performed by the gateway or at a later stage. mapping functions. On the IP network the mappings of
ENUM and TRIP can be modifying to realize number
In scenario 2 for IP originated calls, it would be enough portability.
to update the type of URI returned by DNS. A “tel:”-
based URI would be replaced by an URI for a SIP or When the two network technologies, PSTN and IP, are
H.323 terminal, or vice versa. Also the third scenario is interconnected new problems arise. The information in
solved by updating ENUM information. ENUM, TRIP and IN must be kept updated to avoid
wrongly or inefficiently routed calls. Currently the
The question about whether to store routing or update is performed manually, and the process is
directory numbers of PSTN terminals in DNS has been uncoordinated between service providers. This
discussed in IETF working groups. It is also unclear becomes a burden, especially when number portability
how to know which terminals reside on the PSTN. In causes increased update frequencies. Also the risk of
current plans, mainly mappings for IP terminals are wrong and incompatible information is high. An
stored in DNS. It is assumed that calls to unknown automated approach for synchronizing information
E.164 numbers are routed to the PSTN. This may between the protocols is needed.
create unnecessary traffic and gateway blocking due to
wrongly dialed numbers. The protocols are still under development. The basic
ENUM specification has reached RFC standards track
stage but TRIP is still an Internet draft. Commercially
7.3 CTRIP available implementations are not available. The
As we saw in the two first scenarios, both the mappings protocols’ suitability for IP telephony in real networks
in ENUM and in the IN databases on the PSTN must is still not verified.
be updated in some cases. The question how the update
is coordinated and how the information is transferred is Yet, the need for standardized protocols for distributing
still unresolved. Moreover, the information of TRIP IP telephony routes is high and the future for TRIP and
must be updated in some cases. There is still no ENUM seems promising. The signaling protocols alone
solution how to coordinate information in ENUM, cannot be used to form a global IP-based network, and
TRIP and the IN databases. the described protocols provide the required solution.
However, as we have seen some new parts are required
It seems to be necessary to automate the distribution of to make the architecture complete.
numbering information between the network types and
between the protocols. To solve the problem, a
counterpart to TRIP is being developed in the
114
References [15] Mockapetris, P.: Domain names – concepts and
facilities, November 1987, IETF RFC 1034
[1] Handley, M. Schulzrinne, H., Schooler, E.,
Rosenberg J.: SIP: Session Initiation Protocol, [16] Mockapetris, P.: Domain names – implementation
March 1999, IETF RFC 2543 and specification, November 1987, IETF RFC
1035
[2] International Telecommunications Union
Telecommunication Standardization Sector, Study [17] Faltstrom, P.: E.164 number and DNS, September
group 16: Packet-based multimedia 2000, IETF RFC 2916
communications systems, February 1998, ITU-T
Recommendation H.323 [18] Berners-Lee, T., Fielding, R.T., Masinter, L.:
Uniform Resource Identifiers (URI): Generic
[3] International Telecommunications Union Syntax, August 1998, IETF RFC 2396
Telecommunication Standardization Sector: The
international public telecommunication numbering [19] Mealling, M., Daniel, R.: The Naming Authority
plan, Geneva, May 1997, ITU-T Recommendation Pointer (NAPTR) DNS Resource Record,
E.164 September 2000, IETF RFC 2915
[4] European Telecommunications Standards Institute: [20] Vaha-Sipila, A.: URLs for Telephone Calls, April
The Procedure for Determining IP Addresses for 2000, IETF RFC 2806
Routeing Packets on Interconnected IP Networks
that support Public Telephony, DTR 4006, 2000 [21] Brown, A.: ENUM Service Provisioning:
Principles of Operation, October 2000, draft-ietf-
[5] Postel, Jonathan: Simple Mail Transfer Protocol, enum-operation-01.txt
August 1982, IETF RFC 821
[22] Lind, S.: ENUM Call Flows for VoIP
[6] Rosenberg, J., Schulzrinne, H.: A Framework for Interworking, November 2000, draft-line-enum-
Telephony Routing over IP, June 2000, IETF RFC callflows-01.txt
2871
[23] Rosbotham, Paul: WG4 FAQ, TIPHON temporary
[7] Mensola, Sami: IP-verkon kommunikaatio- document (discussion)
palveluiden hallinta, November 1998, Master’s
Thesis [24] Raimo Kantola, Jose Costa Requena, Nicklas
Beijar: Interoperable routing for IN and IP
[8] Foster, Mark, McGarry, Tom, Yu, James: Number telephony, Computer Networks, Volume 35, Issue
Portability in the GSTN: An Overview, March 5, April 2001
2000, draft-foster-e164-gstn-np-00.txt
[9] Rosenberg, J., Salama, H., Squire, M.: Telephony

Routing over IP (TRIP), November 2000, draft-
ietf-iptel-trip-04.txt
[10] Rekhter, Y., Li, T.: Border Gateway Protocol 4

(BGP-4), March 1995, IETF RFC 1771
[11] Yeong, W., Howes, T., Kille, S.: Lightweight

Directory Access Protocol, March 1995, IETF
RFC 1777
[12] Rosenberg, J., Salama, H.: Authentication

Attribute for TRIP, December 2000, draft-ietf-
iptel-trip-authen-00.txt
[13] Peterson, J.: The ServiceCode Attribute for TRIP,

November 2000, draft-jfp-trip-servicecodes-00.txt
[14] Rosenberg, J., Salama, H.: Usage of TRIP in

Gateways for Exporting Phone Routes, July 2000,
deaft-rs-trip-gw-01.txt
115

Report IP Telephony

Uploaded by

Copyright:

Available Formats

Report IP Telephony

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report IP Telephony

Uploaded by

Copyright:

Available Formats

Helsinki University of Technology Networking Laboratory

Teknillinen Korkeakoulu Tietoverkkolaboratorio

IP TELEPHONY PROTOCOLS, ARCHITECTURES AND ISSUES

IP TELEPHONY PROTOCOLS, ARCHITECTURES AND ISSUES

Helsinki University of Technology

The seminar was structured around Hersent, et al book on IP Telephony. Additional

interface. The soundboard line-in jack had to be wired

mergers of smaller companies who cannot make the

7.4 VoIP growth obstacles

point-to-point communication or multipoint

Table 2: H.32X recommendations 4 H.323 Architecture

Interoperability with other multimedia networks is one

3 H.323 Version Suites

3.1 H.323 Version 1

Figure 2: H.323/PSTN Gateway

Aud io Vid eo D ata Sy stem C ontrol E n dp oint G a te k e e p e r

identifies the H.323 entity on the network. Some

Identifiers allow multiplexing of several channels

message exchange. 1 ARQ

channel of the other endpoint (see Figure 9)[4].

Figure 11: H.323 – H.245 control channel connection

H.245 Control Channel Messages T1521310-96

Direct Call Signaling Call Signalling Channel Messages

Call Signalling Messa ges

Endpoint 1 Gatekeeper Endpoint 2

7.2 Step B: Initial communication and RTP Media Stream(18)

time during a conference, the endpoints or Gatekeeper

7.5 Step E: Call termination

Figure 22: Endpoint Dispatcher

mobility was favored despite of lower voice quality.

Packetization The bit rate of a typical G.711 codec is 64 kbit/s. The

3.3 Audio equipment

5.1 A Practical Test on Delay

Where Table 6 shows how decreasing bandwidth effects on

5.3 Some Measuring Results Bandwidth Selsius/Cisco NetMeeting

Voice quality of NetMeeting (Fig.5 in Appendix B) on

Scenario 0: Call from IP network to IP network

Call initiated from SCN to

Scenario 3: Call from SCN to SCN over IP Network

number. Chapter 2 describes RTP and its companion

A major problem, especially on low-bandwidth links,

With statistical multiplexing, bandwidth can be used 2.1 RTP

Synchronization Source (SSRC) Identifier • SSRC (32 bits)

Profile-specific Extensions • CSRC list (0 to 15 items, 32 bits each)

• CSRC count (CC, 4 bits) 2.2 Mixers and Translators

This field identifies the format of the RTP 2.3 RTCP

It is recommended that translators and mixers combine Sender's Packet Count

• RR (Receiver Report) contains reception

• BYE packet is sent by a participant when leaving …

3.1 RFC 2508

For the first packet of the connection, playout delay is

Whenever playout delay is adjusted, it will be the

acceptable for voice [ITU00].

Figure 3 illustrates the simulated sequence of sent,

VoIP Packets 0. 50. 100. 150. 200.

664. 5 VoIP Terminal Requirements and

Microsoft NetMeeting [Net01] is a popular remote

6 Conclusions [Moo98] Sue B. Moon, Jim Kurose, and Don

and the quality of the original signal. If the input