Report IP Telephony
Report IP Telephony
Report IP Telephony
TEKNILLINEN KORKEAKOULU
TEKNISKA HÖGSKOLAN
HELSINKIN UNIVERSITY OF TECHNOLOGY
Helsinki University of Technology Networking Laboratory
Teknillinen Korkeakoulu Tietoverkkolaboratorio
Espoo 2001
Report 2/2001
Teknillinen Korkeakoulu
Sähkö- ja tietoliikennetekniikan osasto
Tietoverkkolaboratorio
Distributor:
Helsinki University of Technology
Networking Laboratory
P.O.Box 3000
FIN-02015 HUT
Tel. +358-0-451 2461
Fax +358-0-451 2474
ISBN 951-22-5452-2
ISSN 1458-0322
Otamedia Oy
Espoo 2001
Abstract
This report is a result of a post graduate seminar on IP Telephony (Course S38.130 Spring
2001). The report first gives an overview of IP Telephony and then proceeds to discuss
quality of voice in an IP network, voice coding, the IP Telephony protocols, the service
arcitecture and the potential service technologies. The protocols include signaling, transport
and routing information protocols. The papers are based on literature study including
materials of the 3GPP, students own research work and student’s own measurements. Of
particular interest are issues in applying IP Telephony in the 3rd generation mobile
networks. These issues are discussed in several papers. Of particular interest is also the
paper on choosing the transport protocol for SIP – it contains ideas that, we believe, have
not been published before.
115 pgs
i
Preface
This report is a result of a post graduate seminar on IP Telephony. The papers appearing in
the report were mainly prepared by the students during the Spring term 2001 and presented
in the seminar itself that took place in Otaniemi, Espoo, Finland on April 6-7, 2001. After
the seminar, based on comments, the students continued to improve their papers and finally
the qualifying papers were selected by the editor.
ii
Contents
Abstract.......................................................................................................................................................i
Preface ...................................................................................................................................................... ii
Overall picture of IP telephony ..................................................................................................................1
H.323 Protocol Suite................................................................................................................................11
Voice Quality in IP Telephony ................................................................................................................22
Voice in Packets: RTP, RTCP, Header Compression, Playout Algorithms, Terminal
Requirements and Implementations.........................................................................................................31
Voice Coding in 3G Networks.................................................................................................................39
Session Initiation Protocol (SIP)..............................................................................................................47
A transport protocol for SIP.....................................................................................................................58
Session Initiation Protocol in 3G .............................................................................................................66
SIP Service Architecture..........................................................................................................................73
IP TELEPHONY SERVICES IMPLEMENTATION .............................................................................80
MASTER SLAVE PROTOCOL..............................................................................................................89
Network dimensioning for voice over IP .................................................................................................98
TRIP, ENUM and Number Portability...................................................................................................105
Overall picture of IP telephony
Ilkka Peräläinen
The Emergency response center authority
ilkka.peralainen @112.fi
1
with the primary goal of ensuring that various vendors was accomplished in 1996 and the second version v2
products and services will interoperate. was ready by 1998. It includes both point-to-point and
multipoint connections.
Today the standardazation situation is however not at
all clear. To overcome the drawbacks of the H.323 is one of ITU-T’s mutually compliant videocon-
cumbersome and difficult to implement yet flexible ferencing standards. The others are:
H.323 protocol family the IETF has created new
protocols like the Session initiation protocol SIP and • H.310 for broadband ISDN (B-ISDN)
the Media gateway control protocol MGCP which offer • H.320 for for narrowband ISDN
much more func-tionality than H.323 to VoIP.
• H.321 for ATM
SIP is simpler, it scales better and it leverages the
existing DNS system instead of having created its own • H.322 for LANs with guaranteed QoS
separate hierarchy of name services. By including a • H.324 for public switched telephone networks
clients communication features within the invite (PSTN)
request, SIP negotiates these features and capabilities
of the call within a single transaction. The call setup Clients of H.323 are able to communicate with clients
delay can be as low as 100 ms depending on the of the other above mentioned networks.
network.
The H.323 standard does not assume any QoS in the
Thus the biggest question in VoIP today is which one network.
of the standards will prevail. H.323 is now videly
accepted and deployed, but many vendors have also 2.1 Components of H.323
announced support to the newcomer protocols. At this
transitional stage we will probably see systems which 2.1.1 Terminal
support both protocol families. Terminals are the LAN client endpoints providing real
time two way communications. They have to support
This paper restricts to presenting an overview of the H.245, Q.931, Registration Admission Status RAS and
present prevailing technology, which anyway has laid Real Time Transport RTP protocols.
the foundation of IP telephony and leaves the deeper
pre-sentation and comparison of the new standards to A H.323 terminal can communicate with an other
other presentations. The functionalities presented here H.323 terminal, a H.323 gateway or a MCU.
in context with the H.323 are all not H.323 dependent,
but general to VoIP and have thus to developped in the
2.1.2 Gateway
newer protocols also. A H.323 gateway endpoint is the interface between the
Internet and the PSTN or some other network. It
communicates in real time mode between H.323
1.3 Characteristics of IP telephony terminals on the IP network and other ITU terminals on
The characteristics of IP telephony are quite complex, a switched network, or to an other H.323 gateway. The
especially compared to streaming video, where large H.323 gateway is optional and thus is not needed in a
buffers can be used to compensate for the homogenous network
imperfectness of the Internet reagarding real time Gateways perform the translation between differing
applications. transmission formats like from H.225 to H.221. They
The main issues of IP telephony to be dealt with can also translate between audio and video codecs. In
include: one single LAN the gateway is not needed, as the
• The human ears perception of echo and delay terminals in this case can communicate directly. The
• The voice compression and packetization technics communication to other networks is done via gateways
• Silent suppresion and comfort noice generation using the H.245 and Q.931 protocols.
• The Internet shortcomings for packetized voice:
delay, jitter and packet loss 2.1.3 Gatekeeper
• The according remedies: buffering, redundancy, The gatekeeper is the vital - yet optional - central
time stamps and differentiated services managing point in its zone. When a gatekeeper is used
• Telephone signalling protocols and various call all endpoints in its zone (terminals, gateways and
types MCUs) have to be registered with it. It supports the
end-points of its zone by
• Address translation from an alias, such as an email
2 H.323 address or a telephone number, to a transport
H.323 is an ITU-T standard that was first developed for address using a translation table, which it updates
multimedia (voice, video and data) conferencing over by registration messages
LANs and later extended to cover Voice over IP. This • Admission control denying or accepting access
multimedia origin is partly the reason for its claimed based on e.g. call authorization or source and
complexity for mere VoIP. Its first version H.323v1 destination addresses.
2
• Call signalling either by processing the signalling Call signalling messages can be passed in two ways
itself or with the endpoints. It may alternatively • In Gatekeeper routed call signalling the signalling
connect a call signalling channel between the end- messages are routed between the endpoints via the
points and let them do the signalling directly. gatekeeper
• Call authorization using the H.225 signalling. The • In Direct endpoint call signalling the endpoints
gatekeeper can reject calls due to time period or change the messages directly
particular terminal access restrictions After the call signalling is completed the H.245 Control
• Bandwidth management, complying the number of channel is establshed. When Gatekeeper routed call
calls with the bandwidth available signalling is used, there are two ways to route the
• Call management maintaining optionally a list of H.245 Control channel. Either the control channel is
ongoing H.323 calls for e.g. Bandwidth manage- established directly between the endpoints or via the
ment purposes gatekeeper.
• Routing all calls originating or terminating in its
zone. This feature enables billing and security.
Rerouting to an other gateway in case of
bandwidth shortage is also included in this option
and it helps in developing mobile addressing, call
forwarding and voice mail diversion services.
2.1.4 Multipoint Control Unit
The Multipoint Control Unit network endpoint makes it
possible for three or more terminals and gateways to
participate in a multipoint conference. The MCU con-
sists of a mandatory Multipoint Controller MC and an
optional Multipoint Processors MP.
The MCU is an independent logical unit, but it can be
combined into a terminal, a gateway or a gatekeeper.
The MC determines the common capabilities of the Figure 1: The H.323 protocol stack
terminals by using the H.245 protocol, while the MP 2.2.2 H.245 Media and Conference control
does the multiplexing of audio, video and data streams
After a H.323 call is established, H.245 negotiates and
under the control of the MC.
establishes all the media channels carried by
In addition the MCU can determine whether to unicast RTP/RTCP.
or multicast the audio and video streams depending on The functions of H.245 are
the capability of the network and the topology of the • Determining master and slave. H.245 appoints a
multipoint conference. MC, which is in charge of central control in case a
call is extended to a conference
In a centralized multimedia conference each terminal
• H.245 negotiates compatible settings between the
establishes a point-to-point connection with the MCU
endpoints after the call establishment.
which then sends the mixed media streams to aech
Renegotiation can take place anytime during the
endpoint. In the decentralized model the MC manages
call
the communication compatibility but the terminals
multi-cast and mix the streams. • Media channel control by which separate logical
channels for audio, video and data can be opened
or closed after the endpoints have agreed on
2.2 The H.323 protocol stack capabi-lities. Audio and video channels are uni-
The audio video and registration packets of H.323 use directional while data channels are bi-directional
the unreliable UDP protocol, while the data and control • Flow control messages provide feed back in case
packets are transported by the reliable TCP protocol. of communication problems
• Conference control keeps the endpoints mutually
2.2.1 H.225 Call signalling aware in a conference situation. A media flow
The call signalling channel is used to carry the H.225 model between the endpoints is also established
control messages. In networks where a gatekeeper does 2.2.3 H.225 RAS Registration Admission
not exist, the calls are signalled directly between end-
points using Call signalling transport addresses. In this Status
it is assumed that the calling party knows the address RAS defines communications between the endpoints
of the called party. and the gate keeper (in case one exists) by unreliable
transport i.e. UDP.
If there is a gatekeeper in the network, the calling RAS communications include
party and the gatekeeper change the initial admission • Gatekeeper discovery is used by the endpoints to
message using the gatekeeper’s RAS channel transport find their gatekeeper: endpoints multicast gate-
address.
3
keeper requests to find the gatekeeper transport long setup delay especially when the gatekeeper routed
address model is used.
• Endpoint registration is compulsory in case where In a congested switched circuit network SCN, where a
a gatekeeper exists in the network. The gatekeeper call cannot be setup, the network local exchange tries
must know all the aliases and transport addresses to send the caller a ‘your call can not be connected’-
of all the endpoints in its zone message. No connect is sent because the network in-
• Endpoint location. A gatekeeper locates an forms the caller and not the endpoint.
endpoint with a specific transport address to Voice messages can be sent in version v1 only after
update its address database for example media channels have been established by sending first
2.2.4 H.248 Implementors' Guide a connect message.
The newcomer in the H.323 protocol family is the
H.248. It is an enhancement of the centralized master
slave type MGCP, Media gateway control protocol.
H.248 was developed in co-operation with IETF, which
calls it MEGACO.
One reason for the poor interoperability between
various implementations of H.323 has been attributed
to the lack of an implementation guide. This problem is
now being solved by the IETF Megaco project.
2.2.5 RTP
The Real time transport protocol RTP and RTCP are
both developed by the IETF. They transport the audio,
video and data packets of real time media over packet
switched networks. They are annexed in the H.323
protocol.
The main tasks of RTP are packet sequencing for
detecting packet losses, adjusting to changing
bandwidth conditions by payload identification, frame Figure 2: H.323 call sequence
identifica-tion, source identification and intramedia
syncronization to compensate for the varying delay
There is a ITU-T Mobility Ad Hoc Group working on
jitter of the stream packets.
mobile H.323 standardization.
2.2.6 RTCP
The Real time transport control protocol works in 3.1 Faster procedures
conjunction with the RTP. In a RTP session The Fast connect procedure was invented to overcome
participants send periodically RTCP packets to obtain the above mentioned deficiences. Fast connect solves
information about QoS, session quitting, participant the problems by
identification (email adresses, telephone numbers etc.) • Enabling uni- or bi-directional messages immedia-
and intermedia synchronization. tely after the Q.931 setup message
2.2.7 Q.931 • Allowing a basic bi-directional audio only commu-
Then main purpose of Q.931 is call signalling and nication immediately after the connect message
setting up the call. has been received
• Improving setup delays
An endpoint that uses the Fast connect procedure
3 Enhancements to H.323 informs the calling party of all the media points it is
prepared to receive or offers to send. This information
A major drawback - especially compared to the fast
is carried in the new fastStart parameter of the user to
SIP protocol - in the first H.323 version was the long
user Setup mesage. The description includes the codecs
call setup time. One message round trip is needed for
used and the receiving ports etc. This allows the early
• ARQ/ACF sequence
recei-ving of network prompts and improves also the
• Setup connect sequence setup delay.
• H.245 capabilities exchange
• H.245 master slave procedure The Fast connect procedure has been added as a core
• Setup of each logical channel feature in the ETSI TIPHON project, because it
In addition a TCP connection has to be setup for Q.931 resolves the interworking problem with the SCN.
and H.245 channels and each TCP connection also Fast connect makes it possible to build simple limited
needs an extra round trip for the TCP window capacity terminals that need only a minor part of the
synchronization. In a WAN environment one round trip H.245 protocol.
can take 100 ms, which ends up in a n unacceptably
4
H.323v2 offers an other solution with H.245 tunneling, is obvious that it will not scale to numerous
where H.245 messages are encapsulated in Q.931 participants.
messa-ges reducing the TCP connections to one. When The solution to large conferences is the H.332. A large
H.245 tunneling is used, the Q.931 channel must conference mostly has a panel of active speakers (5 to
remain open for the duration of the call. The Tunneling 10) and a large more or less passive audience of which
method can also clear the network generated messages one speaker at a time can propose a question or a
problem and will thus probably replace the Fast comment to the panelists.
connect procedure. The H.332 keeps ‘tightly coupled’ conference connec-
tions with the panelists and a multicast RTP/RTCP
The above described procedures are rather fixes to
conference with the passive listeners. The listeners
H.323v1 problems than a simpilification of the
have to know especially the codec and the UDP port
protocol.
used. H.332 uses the IETF Session description
The use of TCP causes at least one unnecessary protocol SDP to encode this information.
SYN/ACK round dtrip. If the Setup message exceeds Due to the large number of participants in a panel
the maximum transfer unit MTU size, two or more conference, a constraint must be set: the codec should
TCP segments must be used. Most TCP remain stable. No new participant should have the
implementations are network friendly mandating a possibility to change the codec as this would mean new
slow start, where the first TCP segment has to be negotiations for all the others.
acknowledged before the rest can be sent. If a listener wants to speak, he must use the regular join
procedure to attain the right to speak his mind.
A remedy to this problem is a special H.323v3 mode
that will use UDP insted of or simultaneously with 3.3 Directories and numbering
TCP signalling.
Most home IP telephony users are connected to
Internet by a dial-up link, where the IP address is
3.2 Conferencing with H.323 allocated on demand and is thus not static. In the early
A multipoint control unit MCU masters a multipoint stages the users of IP telephony software contacted a
conference. It consists of one multipoint controller MC server with a preconfigured IP address.
and optionally one or more multipoint processors MPs. H.323 makes this kind of solutions obsolete. A
3.2.1 Multipoint controller MC terminal has to register to a gatekeeper using a RAS
The MC decides message, which contains all the necessary information,
• who is allowed to participate especially the current IP address to contact the terminal
• how new participants are introduced to an ongoing by using an alias.
conference At present the Internet Domain name system DNS is
used to resolve the IP address when an alias name is
• how the participants synchronize their operation
known. The DNS servers make up an addressing net-
• who is allowed to broadcast media etc.
work, where an address can be resolved by quering
A gatekeeper or a terminal possessing sufficient resour-
proper DNS servers top down until one is found which
ces can include MC functionality in it and even mix
has detailed information of the endpoint in question. In
media locally to a limited extent.
addition to alias/IP address pairs a DNS database has
3.2.2 Multipoint processor MP much more information. It can hold information of the
When several participants of a multipoint conference gatekeepers of its domain in ras://-type txt records.
are simultaneously sending audio, video or data, there Once the gatekeeper is found, the caller knows to
has to exist a network element that can mix or switch which transport address he shall send the setup
the incoming media streams. The endpoint terminals message.
seldom have the capacity to do this. This mixer/switch An important issue today for international IP calls from
element is called the MP. a PSTN network is the lack of a global IP telephony
When video is sent, the MP might choose the pictures prefix. The solution has to scale to allow a large
of the latest speaker. When audio is the content, the amount of users. The global prefix should tell the IP-
MP could sum the voices of the potentially callers network that the call that has to be setup is an IP
simultaneous speakers. call and should thus be routed to a home gatekeeper,
In a centralized conference the MP mixes and switches which knows the location of the called party and can
the media streams, where as in a decentralized confe- then resolve the phone address to a call signalling
rence the terminals send their streams directly to all address.
other participating terminals. It is clear that an IP call should be routed via an IP
3.2.3 H.332 network avoiding the use of PSTN.
Several proposals have been made to define an IP tele-
The conference type where all participants retain a full
phony country code. The standardization process is not
H.245 control connection with the MCU is called
yet completed.
‘tightly coupled’. This type is resource intensive and it
For example the use of DNS works well when IP
address classes are used, but in the case of the ever
5
more popular classless interdomain routing CIDR, the G.723.1 Dual rate multimedia speech coders at
reverse address resolution is supported only by few 5,3 and 6,3 kbps (03/96)
servers and is thus not applicable. G.726 Speech coding at 16, 24, 32 or 40 kbps
using ADPCM to encode a G.711 bit
3.4 H.323 security H.235 stream
The aim of H.235 is to provide privacy and authen- G.728 Speech coding at 16 kbps using low-
tication to all protocols using H.245 including H.323. delay code exited linear prediction
Even without H.235 H.323 calls are more difficult to (09/92)
listen than ordinary telephone lines, which can be G.729 Speech coding at 8 kbps using
wiretapped. To break into a H.323 call you have to conjugate-structure-algebraic-code-
implement the codec algorithm. exited linear prediction (03/96)
With H.235 IP telephony becomes much safer than Video codecs
PSTN. The caller can even hide the telephone number H.261 Audiovisual video codecs at p * 64
of the endpoint it is trying to reach. Howerver, the kbps, where p = 1 – 30 (03/96)
H.235 is not yet widely deployed. H.263 Low bit rate video coding (02/96)
The first purpose of security was to secure the media
channels so that no outsider could listen to the ongoing Table 2: Audio and video codecs used with H.323
call. Soon it turned out that users most of all did not
want to be charged for calls they did not make and that The mandatory speech codec is the G.711, which is a
no one could monitor the called phone numbers. popular codec in telephony networks. It is not however
Providers wanted to authohrize calls when they were quite suitable for Internet communication, where the
set up, not when media or control channels were subscriber loop bandwidths are much smaller. Today
established. So the signalling channel had also to be most H.323 terminals use G.723.1, which is much
authenticated and secured. more efficient using only approximately one tenth of
The network elements that have to know the contents the G.711 bandwidth. The G723.1 uses 6,3 kbps
of the H.225 and H.245 messages need naturally to be bandwidth for continuous speech. When the call is
trusted by the endpoints. This authentication can be wrapped in IP packets the additional packet headers
carried out by Transport layer security TLS or a increase the bandwidth needed to 17 kbps. When
challenge response exchange using some certificate. silence suppression is used the net bandwidth reduces
H.323 does not specify the contents of the sertificates, back to ca. 6,7 kbps, which is ca. 10th of the bandwidth
but provides a way to exchange them and verify the of G.711. If IP header compression is used the relation
indentities of the callers. The identity can be verified is even greater. The G.728 and G.729 codecs are used
by several methods. A time stamp prevents repaly for high quality audio with also very low bandwidth
attacks. requirements.
H.323 does not ensure privacy on the RAS link
between an endpoint and a gatekeeper, but it does Due to the burstiness and bandwidth hungriness of
provide authentication. video communication efficient compression and
The call signalling channel H.225 can be secured by decomp-ression technics are of utmost importance.
TLS or IPSEC. H.323 specifies two video codecs namely the H.261
The control channel H.245 security method is and the H.263. Other codecs can also bee used in case
negotiated in the call signalling channel during the both endpoint support them.
initial set up process before any other H.245 messages Both the above mentioned video codecs use the
are sent. Various methods are accepted to initiate the discrete cosine transform DCT, H.261 with
secure channel. quantization and motion compression and H.263 with
After the H.245 channel is ensured, the terminals motion estimation and prediction
negotiate the media channel encryption method by
capability exchange. A new capability is defined for
each codec and encryption mode pair.
Many encryption algorithms can be utilized e.g. DES, 5 Applications and services
Diffie-Hellm and RSA. The vision of H.323 is interoperability between packet
and circuit switched networks. H.323 also promises
new value added services to the customers using circuit
4 Codecs switched networks. These goals have not yet been
The implementation of codecs is well developed and achieved. Lower operational costs alone are not a
does not create any interoperability problems. reason good enough to switch to a new technology.
Audio codecs Title and date Several Internet telephony service providers ITSPs
G.711 Pulse code modulation of voice frequ- have met the expectations of good services in North
encies at 56 or 64 kbps (11/88) America and Europe, but the global interoperability is
G.722 7 kHz audio coding at 64 kbps (11/88) still a big problem. Furthermore the features and
6
quality of service are often inferior to plain old incompability of services in end and edge devices will
telephone services POTS. be catered for by the capabilities negotiation process.
The main reasons for not meeting the quality goals are In the switch model new services are installed in the
the poor interoperability of the endpoints, especially switch and may result in upgrades in other parts of the
gateways, of various vendors and the limited scalability network before they are available for the customer. The
of H.323 communications. [3] switch is more over not at all so open to packages of
‘outside’ vendors. Yet it has to be admitted that in the
5.1 The architecture of H.323 central model the deployment of new services can be
The architecture of a protocol lays the foundation for simpler. On the other hand the switch is a single point
the services and applications that can be built on it. The of failure while a software PBX can be embedded in
architectural model of H.323 differs essiantially from each desktop phone. In this respect the distributed
that of the switched PSTN in that while PSTN is model is more fault tolerant than the switch model.
centralized the H.323 is decentralized. 5.2.3 The multi-tier approach
The architectural model of H.323 is peer-to-peer, the The modular nature of the multi-tier approach enables
procol design is based on the ISO QSIG standard and the creation of basic services out of building blocks of
the services can be built using a multi-tier approach. primitives. Compound services can then be created by
Use of the QSIG reduces the complexity to interact utilizing two or more basic services. Finally
with the circuit swithced PSTN networks that also use applications can be built by using compound services.
QSIG. The multitier model allows complex services to Simple services are for example:
be built of building blocks of simple services. • Multiple call handling
• Call transfer
5.2 H.450 Suplementary services • Call forwarding
The supplementary services of H.323 rely on the H.450 • Call park and pickup
series of recommendations. The key elements of it are • Call waiting
protocol based on the QSIG, peer-to-peer signalling • Message waiting indication
and a multi-tier approach of building services. [4] • N-way conference
H.323 architecture uses hig level Application program- Examples of compound services include:
ming interfaces APIs, so that software vendors do not • Consultation transfer
have to work with low level implementation details, • Conference out of consultation
which would decrease interoperability risks. Consultation transfer uses call hold, multiple calls and
call transfer. Conference out of consultation consists of
5.2.1 H.450 based on ISO QSIG call hold, multiple calls and n-way conference.
The installed base of private telecommunications
In Consultation transfer the user can perform three
networks that use QSIG is wide and thus the use of
operations:
QSIG in H.450 greatly helps the inter-working with
that base. The migration from PBX networks to H.323 1. Put a multimedia call on hold and retrieve it later
multimedia networks is simplified as well. Simpler
gateways are one more advantage of using a common 2. Call an other person and optionally alternate
standard the QSIG. between the two calls, or
5.2.2 Based on peer-to-peer signalling 3. Transfer the call
In this respect the H.323 network differs essiantially
In Conference out of consultation the user has also
from a circuit switched network. Like in the Internet, in
three options:
H.450 the intelligence resides in the end and edge
devices and the network simply routes the packets. The 1. Put a multimedia call on hold and retrieve it later
end device can be a PC or any IP phone and the edge
2. Call another person and optionally alternate
device is a PBX or a consumer gateway at the home
between the two calls
location. The state of the calls is also distributed in the
end and edge devices. 3. Merge the calls in one conference call
In the traditional circuit switch model the intelligence
and the state of the call reside in the network. The ends
and edges are simple phones that run a stimulus- 6 Application examples
response protocol.
In H.450 new services can be installed in the ends and
edges like software packages in a PC. Any software 6.1 Call center integration
house can develop services to this standard and sell A call center gateway lets Web surfers with properly
them directly to the end-user. This simplicity and equipped multimedia PCs (typically with the right
straight forwardness in deployment will certainly browser plug-in) connect to an existing Automatic Call
stimulate the growth of a service building software Distributor (ACD) with Internet phone technology.
industry. It should be remembered that the potential This illustrates one of the major advantages of IP
7
telephony — its ability to combine voice and data on a activated, these restrictions should not apply to
single line. IEPS users
The main advantage the IP telephony brings to Call • IEPS calls should be marked from end to end
centers is skill based routing. An incoming call can be 6.2.2 Established Telecommunication
directed to a call taker that for example can speak the services
same language as the caller or is a specialist in a field
The essential features of the E.106 for the IEPS in the
the caller wants help of. The call can also be directed to
well established circuit switched PSTN and ISDN
a personal adviser.
networks include
Emergency services provide another example of an
• Priority dial tone
architectural conflict since, for example IP addresses
have no correlation with geographic location. • Priority call setup, including priority queuing
schemes
6.2 IEPS • Exemption from restrictive management controls
In the United States the Goverment emergency
As an other example of an application of IP telephony telecom-munications service GETS uses the High
in the broad sense of the term application, this paper probability of completion HPC in SS7 signalling for
presents the basic requirements that IP telephony marking emer-gency calls. It should be noted that HPC
should take into account to support the International does not include pre-emption of existing calls. In the
emergency preparedness scheme IEPS. U.S. alternate carrier routing ACR is used in the GETS
The ITU-T recommendation E.106 for emergency in case some inter exchange carrier is not available.
communications was first defined for PSTN and ISDN GETS uses a non-geographic toll free universal access
networks, but it was soon realised that this scheme had number.
to be extended to cope with the next generation
networks i.e. the Internet and especially IP telephony. Some countries use IEPS access lines where all calls
In this regard the ITU-T Study group 16 is developing have a priority, while in some other countries priority
a new recommendation for International emergency is applied on a per call basis only.
multimedia service IEMS as an extension to E.106, to 6.2.3 Next generation networks
provide for enhanced emergency services over Internet The IEPS requirements of E.106 should also be
based net-works in the future. fullfilled by newly emerging next generation networks
The IEPS is needed when there is a crisis situation especially the Internet. The packet switching
which causes abnormal telecommunication technology provides a clearly different operational
requirements for governmental, military, civil evironment compared to the traditional circuit switched
authorities and other essen-tial users of PSTN. It allows networks. Thus new aspects have to be considered but
authorized users to be able to access the International there also emerges the possi-bility of new innovative
telephone service while the service is restiricted due to services based on the new featu-res of packet switched
damage, congestion and/or other faults. [6] networks.
6.2.1 Overall functional requirements Examples of the new features are
The primary goal of IEPS is to support crisis manage- • Quality, grade and class of service
ment arrangements by increasing the ability of the The flexibility of the emerging object oriented and
essen-tial users to communicate via the PSTN, ISDN, distributed technologies
Public land mobile networks PLMN or IP telephony. For IP telephony an IEPS indicator similar to that of
[6] the HPC has to be defined, but the IP indicator has to
The basic requirements include: be applied throughout the call.
• International and national preference schemes are There is extensive work going on in the international,
independent yet compatible: one could be national and regional standardization bodies to define
activated when the other does not need to be the next generation networks. It is of utmost
activated importance that they shall now start the work on the
• National preference scheme users may not get adaptation of IEPS. [7]
access to the international scheme, but authorized 6.2.4 Quality
users of the international scheme have to be able to The quality of video in the Internet is poor and the
use the national preference scheme audio quality is not high either especially compared to
• In some national schemes IEPS features may be the PSTN. Because H.323 is a higher layer protocol, it
enabled permanently can utilize the quality mechanisms of lower level
• Calls originated by IEPS users should be given protocols like the IntServ/RSVP and the DiffServ. In
priority in the networks envolved when IEPS is fact the development of QoS in the Internet is a result
enabled of the introduction of multimedia services to the
• There must not be any conflict between preference Internet.
for a call from an essential user and a call priority OoS features have been built in all modern LAN
for a non-essential user to an emergency service equipment although some critics say that enough of
• If call restrictions to certain specific destinations ever cheaper bandwidth will cater for the new
(countries or areas) have been set when IEPS is multimedia services and QoS will not be necessary.
8
The codecs in use today squeeze an IP call to only 7.3 The revenue share
about one 10th of the bandwidth of a traditional PSTN
As carriers move to packet networks voice and new
call and better codecs are on their way.
applications pay the bills. As figure 4 shows the
relative amount of pure datacommunication in the
7 Market trends telecommu-nication networks compared to telephony
will grow very fast in the next few years. Yet the
overwhelming majo-rity of the revenue will continue to
7.1 The operator market changes come from telephone calls and services. [10]
The new IP telephony technology has a markable Natural microsystems VOICE
influence on the telecommunications market. First the BITS DATA REVENUE
900
liberation of the telephone market legislation in the 200
Source: Probe Research, 2001 Source: Probe Research,, 2001
800
European Union. has given birth to new companies 000
700
both operators and service providers. The ease to build
600
new telephony services relying on the new IP 800
500
telephony technology contributes to this trend. 600
400
In the second phase we will very probably see a conso-
300
lidation period where new comers with unsustainable 400
200
cash flows merge with profitable players or leave the 200
100
hardening business campaign by going under.
0
Next a restructuring of the market will likely see 0
97
98
99
00
01
02
03
97
98
99
00
01
02
03
19
19
19
20
20
20
20
19
19
19
20
20
20
20
necessary investments that a new technology – how
flexible and promising it ever is - inevitably requires.
Figure 3. [9] Figure 4: Voice and data
9
IETF: Internet Engineering Task Force [7] Folts Hal: Functional requirements for priority
IMTC: International multimedia teleconferencing services to support critical communications,
consortium TIPHON 17, Temporary document 116, Document
IntServ: Integrated services for discussion
IP: Internet protocol [8] White paper on IP telephony, A road map to
ISDN: Integrated services digital network supporting GETS in IP networks, Prepared under
ISO: International Organization for Standardization contract no. DCA 100-99-F-4413 Data item no.
ITSP: Internet telephony service provider C002, Science applications international corpo-
ITU-T: International Telecommunication Union – ration, 27th of April 2000
Telecommunications Sector [9] Indovino, Lisa, deltathree: Show me the VoIP
MC: Multipoint controller deployment - European service providers, Spring
MCU: Multipoint control unit 2001 Voice on the net conference presentatation
MP: Multipoint processor [10] Chase, Jack, Natural MicroSystems: Convergence
MCS: Multipoint communication service of GSM, IP and VoIP, Spring 2001 Voice on the
MGCP: Media gateway control protocol net conference presentatation
OSI: Open systems interconnection [11] Dahlgren, Paul, Telia international: Show me the
PBX: Private branch exchange VoIP deployments, Spring 2001 Voice on the net
PLMN: Public land mobile network conference presentatation
POTS: Plain old telephone services
PSTN: Public switched telephone network
QSIG: D-channel signalling protocol at Q reference
point for PBX nerworking
RAS: Registration, admission, status
RFC: Request for comments
RSVP: Resource reservation protocol
RTCP: Real-time transport control protocol
RTP: Real-time transport protocol
SCN: Switched circuit network
SDP: Session description protocol
SIP: Session initiation protocol
TCP: Transmission control protocol
TIPHON: Telecommunications and Internet protocol
harmonization over networks
TLS: Transport layer security
UDP: User datagram protocol
References
[1] Arora, Rakesh: Voice over IP: Protocols and
standards, Network Magazine, , 23rd of November
1999, http://www.cis.ohio-state.edu/~jain/cis788-
99/voip_protocols/index.html
[2] Hersent Olivier, Gurle David, Petit Jean-Pierre: IP
telephony, Addison Wesley Britain 2000, ISBN 0-
201-61910-5
[3] Karim, Asim: H.323 and associated protocols,
26th of November 1999, http://www.cis.ohio-
state.edu/~jain/cis788-99/h323/index.html
[4] Kumar, Vincent: Supplementary services in the
H.323 IP multimedia telephony network, IEEE
Communications magazine, July 1999
[5] Carlberg Ken et al: Framework for supporting
IEPS in IP telephony, <draft-carlberg-ieps-
framework-00.txt>, Network working group,
November 2000
[6] Network working group, IETF: Description of an
International emergency preparedness scheme
(IEPS), <draft-itu-t-ieps-description-00.txt>, 20th
February 2001
10
H.323 Protocol Suite
Guoyou He
Helsinki University of Technology
ghe@cc.hut.fi
11
used in a variety of mechanisms, which include audio 3.4 H.323 Version 4
and video (video telephony); audio only (IP telephony);
Many new enhancements have been introduced into the
audio, video and data; video and data; multipoint-
protocol H.323 Version 4, which was approved
multimedia communications.
November 17, 2000. It contains enhancements in a
The H.323 standard is part of the H.32X family of
number of important areas, including reliability,
recommendations specified by ITU-T. The other
scalability, and flexibility. New features help facilitate
recommendations of the family define multimedia
more scalable Gateway and MCU solutions to meet the
communication service over different networks are
growing market requirements [7][10].
shown in Table 2 [11].
12
[4][12]. An example of Gateway, which connects
H.323 system to PSTN, is given in Figure 2 [3].
13
streams in accordance with H.261 QCIF. Options are 6.2 Registration, Admission and Status
available, but they must use the H.261 or H.263 (RAS)
specifications. The coding algorithm of H.263 is
similar to that used by H.261, however with some The RAS channel is used between H.323 endpoints and
improvements and changes to improve performance gatekeepers for gatekeeper discovery, endpoint
and error recovery. H.263 supports five resolutions, registration, endpoint location, and admission control.
QCIF, CIF, SQCIF (Sub-QCIF), 4CIF, and 16CIF. The RAS messages are carried on a RAS channel that
Data support is through T.120, and the various control, is unreliable. Hence, RAS message exchange may be
signaling, and maintenance operations which are associated with timeouts and retry counts.
provided by H.245, Q.931, and the Gatekeeper Gatekeeper discovery
specification. Gatekeeper discovery is the process an endpoint uses to
The audio and video packets must be encapsulated into determine which Gatekeeper to register with. The
the Real-time Transport Protocol (RTP) and carried on gatekeeper discovery can be done statically or
a UDP socket pair between the sender and the receiver. dynamically. In static discovery, the endpoint knows
The Real-Time Control Protocol (RTCP) is used to the transport address of its gatekeeper a priori. In the
assess the quality of the sessions and connections as dynamic method of gatekeeper discovery, the endpoint
well as to provide feedback information among the multicast GRQ message on the gatekeeper’s discovery
communication parties. The data and support packets multicast address. One or more gatekeepers may
can operate over TCP or UDP [4][13]. respond with GCF message [4].
Au d io
C o dec Video D ata System C o ntrol G C F /G R J
G .7 1 1 C o dec Interfa ce
G .7 2 2 C AL L R AS H .24 5
G .7 2 3 H .2 6 1 T .1 20 C o ntrol C ontrol C ontrol
G .7 2 8 H .2 6 3 H .22 5 H .22 5
G .7 2 9
R T P/RT C P Figure 7: H.323 - Gatekeeper discovery
UD P UD P or T C P
IP Endpoint registration
L_ 2 Varies Endpoint registration is the process by which an
L_ 1 Varies
endpoint joins a Zone, and informs the Gatekeeper of
its Transport Address and alias address. All endpoints
register with a gatekeeper as part of their configuration
Figure 6: H.323 protocol stack process. Registration occurs before any calls are
attempted and occurs periodically as necessary [4] (see
Figure 8).
6 Call Signaling
E n dp oint G a tek e e p e r
Call signaling is the messages and procedures used to
RRQ
establish a call, request changes in bandwidth of the
call, get status of the endpoints in the call, and R C F /R R J
disconnect the call [4].
6.1 Addresses
URQ
In H.323 system, each entity has at least one Network E n d p o i n t i n itia t e d
Address (e.g. IP address). This address uniquely U C F /U R J U n r eg i st e r R e q u e st
H.323 entity may have several Transport layer Service G a tek e e p e r i n i tia t e d
U n r eg i st e r R e q u e st
Access Point (TSAP) identifiers. These TSAP UCF
14
Endpoint location signaling on the call-signaling channel (see Figure 10)
Endpoint location is a process by which the transport [4].
address of an endpoint is determined and given its alias Gatekeeper cloud
name or E.164 address [4].
Other Controls 1 2 4 5
The RAS channel is also used for other controls, such 1 ARQ
2 ACF/ARJ
as admission control, to restrict the entry of an 3 Setup 3
4 ARQ
endpoint into a zone; bandwidth change, to modify the 5 ACF/ARJ
Endpoint 1 6 Endpoint 2
6 Connect
call bandwidth during a call; and disengagement Call Signalling Channel Messages
control, to disassociate an endpoint from a gatekeeper RAS Channel Messages
and its zone [4].
6.3 H.225 Call Signaling and H.245 Control Figure 10: H.323-Direct endpoint call signaling
Signaling
H.225 Call signaling H.245 Control Signaling
The H.225 call signaling is used to set up connections When Gatekeeper routed call signaling is used, there
between H.323 endpoints, over which the real-time are two methods to route the H.245 channel. In the first
data can be transported. The call signaling channel is a method, the H.245 control channel is established
reliable channel, which is used to carry H.225 (adopted directly between the endpoints (see figure 11). In the
a subset of Q.931 messages and elements) call control second method, the H.245 control channel is routed
messages. For example, H.225 protocol messages are between the endpoints through the Gatekeeper (see
carried over TCP in an IP based H.323 network [4]. Figure 12). This method allows the Gatekeeper to
In networks that do not contain a Gatekeeper, call redirect the H.245 Control channel to an MC when an
signaling messages are passed directly between the ad hoc multipoint conference switches from a point-to-
calling and called endpoints. It is called direct call point conference to a multipoint conference. This
signaling. In networks that do contain a Gatekeeper, choice is made by the Gatekeeper.
the H.225 messages are exchanged either directly When direct endpoint call signaling is used, the H.245
between the endpoints or between the endpoints after control channel can only be connected directly between
being routed through the gatekeeper. It is called the endpoints [4].
gatekeeper-routed signaling. The method chosen is
decided by the gatekeeper during RAS-admission Gatekeeper cloud
call-signaling channel from one endpoint and routes Call Signalling Channel Messages
them to the other endpoint on the call-signaling RAS Channel Messages
15
7 Connection Procedures
The connection procedures of the H.323 systems Endpoint 1 Gatekeeper Endpoint 2
communication are made in the steps of Call setup, TerminalCapabilitySet(9)
Initial communication and capability exchange,
TerminalCapabilitySetAck(10)
Establishment of audiovisual communication, Call
services, and Call termination. This section uses an TerminalCapabilitySet(11)
example network, which contains two endpoints
connecting to a gatekeeper to illustrate the whole TerminalCapabilitySetAck(12)
connection steps. OpenLogicalChannel(13)
OpenLogicalChannelAck(14)
7.1 Step A: Call setup
Call setup can be in all following cases: OpenLogicalChannel(15)
all combinations of Direct Routed Call signaling
(DRC)/Gatekeeper Routed Call signaling (GRC), same OpenLogicalChannelAck(16)
or different Gatekeepers;
Fast connect procedures; H.245 Message
call forwarding using facility (restarts the procedure);
and setting up conferences [6].
Figure 13 illustrates the call setup process with the Figure 14: H.323 Control Signaling Flows
example of both endpoints registered to the same
Gatekeeper. It assumes direct call signaling [12]. 7.3 Step C: Establishment of audiovisual
communication
Endpoi nt 1 Gate keeper Endpoi nt 2
ARQ (1) Following the exchange of capabilities, master-slave
ACF/ARJ (2)
determination, and opening of the logical channels for
the various information streams, the audio and video
Setup (3)
streams, which are transmitted in the logical channels
Call proceeding (4) setup in H.245, are transported over dynamic Transport
ARQ (5)
layer Service Access Point (TSAP) Identifiers using an
unreliable protocol. Data communications, which are
ACF/ARJ (6)
transmitted in the logical channels setup in H.245, are
Alerting (7)
transported using a reliable protocol. Figure 15 is an
Connec t (8) example of illustrating the H.323 media stream and
media control flows [4][11].
T152 7160-97
RAS Messages
16
Bandwidth changes MC
Endpoint 1 Endpoint 3
Call bandwidth is initially established and approved by Gatekeeper
the Gatekeeper during the admission exchange. At any T1524130-96
Multicast cascading
Endpoint 1 Gatekeeper Endpoint 2
Multicast cascading is the case when a call is
BRQ(21) established between the entities containing the MCs,
and the H.245 Control Channel is opened, the active
BCF/BRJ(22)
MC (Master/Slave procedure) may active the MC in a
CloseLogicalChannel(23) connected entity. Once the cascade conference is
established, either the master or slave MCs may invite
OpenLogicalChannel(24) other endpoints into the conference. There is only one
BRQ(25) master MC in a conference. A slave MC can only be
BCF/BRJ(26)
cascaded to a master MC.
H.450 Supplementary services
The H.450 supplementary services are optional to
OpenLogicalChAck(27) H.323 systems. These services include call forward,
call hold, call waiting, message waiting indication, and
name identification etc.
RAS messages H.245 messages
Endpoint 1
MC
Endpoint 3 Figure 19: H.323 Call Release
Endpoint 2
T1524120-96
Figure 17: Direct Call Signaling model 8 New Feature of H.323 Version 4
H.323. Version 4 was approved on November 17,
2000. It contains enhancements in a number of
important areas including scalability, reliability,
flexibility, services, must have features, and generic
extensibility framework [7][10][1].
17
8.1 Scalability, Reliability, and Flexibility
The H.323 Version 4 enhances the scalability of H.323
systems in the areas including Gateway Decomposition
with H.248, Additive Registrations, Alternate
Gatekeepers, and Endpoint Capacity Reporting.
Gateway Decomposition
Traditional Gateways were designed so that both media
and call control were handled in the same box.
Recognizing the need to build larger, more scalable
gateway solutions for carrier solutions, the ITU-T
worked jointly with the IETF produced the
Recommendation H.248, which describes the protocol
between the Media Gateway Controller (MGC) and the
Media Gateway (MG). H.323 version 4 supports the
decomposition of Gateway into Media Gateway Figure 21: Alternate Gatekeepers
Controller (MGC) and Media Gateway (MG).
The decomposed Gateway separates the MGC function
and the MG function. Multiple MGs may exist to allow
the decomposed Gateway to scale to support much
more capacity than a composite Gateway. The
communication between the MGC and MGs is done
through H.248 (see Figure 20 [11]).
8.2 Services
One of the most important features of a VoIP protocol
is its ability to provide services to the service provider
and end users. H.323 has a rich set of mechanisms to
provide supplementary services. Version 4 introduces a
few more supplementary services to strengthen the
protocol in this regard. These services mainly include
HTTP-based Service Control, Stimulus-based Control,
Figure 20: Decomposition Gateway and Call completion [1][10][7].
HTTP-based Service Control
H.323 version 4 specifies a means of providing HTTP-
Alternate Gatekeepers based control for H.323 devices. With HTTP-based
The architecture of alternate Gatekeepers is shown in control, service providers have the ability to display
Figure 21 [11]. By using Alternate Gatekeepers, web pages to the user with meaningful content that ties
endpoints can continue functioning when the into the H.323 systems. In essence, it is a third party
communication between the endpoints and one or more call control mechanism that utilizes a separate HTTP
Gatekeepers. It increases the reliability and never loses connection for control.
calls.
Stimulus-based Control
Endpoint Capacity reporting H.323 version4 provides a new "stimulus-based"
H.323 endpoints report capacity to Gatekeepers. By control mechanism. With this mechanism, an H.323
utilize endpoint capacity reporting, Gatekeepers may device may communicate with a feature server to
select an endpoint that is best capable of handling the provide the user with various services. The H.323
call. It is very useful for large scale deployments of endpoint may possess some intelligence, but some
Gateways, and extremely increases the availability (see intelligence may reside only in the feature server or
Figure 22 [11]). multiple feature servers. The features may be
numerous. Any new features may be added to the
feature servers without the delay by standard
* GK selects the GW with the most capacity.
* H.323 terminals report capacity in absolute terms, not in
procedure.
percentages.
18
Call completion Fax Enhancements
This is a new H.450 supplementary service, which Version 4 of H.323 allows an endpoint to be able to
provides a standard means of allowing calls to initiate a voice call and then switch to fax at some
complete when the user is either busy or there is no point. It allows an IP-based fax device to operate in a
answer. similar manner as today's PSTN fax devices. Version 4
also enhanced to utilize TCP for carrying fax data.
8.3 “Must Have” Features Previously, UDP was the only real option for carrying
The features included are listed below [7][10][5]: fax data.
Usage Information Reporting Tunneling other protocols
To help providing accurate billing information, the H.323 is often used to inter-work between two circuit
Gatekeeper can request the endpoint to provide usage networks. To provide better inter-working, Version 4
information reporting to the Gatekeeper at various provides a mechanism whereby QSIG (Signaling
times during the call, including at the beginning of the between the Q reference points) and ISUP may be
call, during the call, and at the end of the call. tunneled without translation essentially. H.323 may act
Caller Identification as a transparent tunnel for those non-H.323 signaling
H.323 Version 4 contains complete information for protocols (see Figure 23 [5]).
providing caller identification services with H.323.
Tones and Announcements Composite MGC
H.323 version 4 details the procedure for indicating the QSIG Gateway
presence of in-band tones and announcements. Such Signalling
X
tones and announcements are often heard when the
destination number is incorrect or unreachable.
In addition to in-band tones and announcements, the C A
Gatekeeper may signal an endpoint to play specific Media
announcements at various times: pre-call, mid-call, or Flow MG
end-call.
Alias Mapping QSIG
When routing calls, a telephone number in the IP-world Signalling
may not be sufficient for proper routing into the SCN.
In addition, it might be that a service provider would Figure 23: H.323 – QSIG tunneling example
like to use the same Gateways to provide Virtual Voice
Private Networks, but need some intelligence in a H.323 specific URL
device to perform proper mapping. With Version 4, a Version 4 introduced URL scheme "h323". The H.323
Gateway, for example, can indicate that it can perform URL allows entities to access users and services in a
alias mapping at either the ingress or egress side of a consistent manner. The form of the H.323 URL is
call. This will reduce the number of malformed "h323:user@host", where "user" is a user or service
numbers, as well as provide a means for providing and "host" might be the Gatekeeper that can translate
Voice Virtual Private Network (VVPN) services. the URL into a call signaling address.
Better Bandwidth Management (multicast) Call Credit-related capabilities
Prior to H.323 Version 4, and endpoint could request H.323 v4 provides the means of communicating
much more bandwidth than it actually needs, and thus, available funds or for the Gateway to control early call
cause wasting network resources. With Version 4, it is termination based on available funds for the prepaid IP
mandatory that an endpoint made bandwidth requests telephony. H.323 v4 adds these features to the RAS
with a lower value if, indeed, the endpoint is using less protocol.
bandwidth than it had initially indicated in the ARQ. Multiplexing audio and video
In addition, managing bandwidth for multicast sessions One weakness with the current usage of RTP is
has been nearly impossible since, unless the difficulty in synchronizing the separate audio and video
Gatekeeper routed the H.245 signaling and carefully streams. Version 4 now includes an optional procedure,
monitored the media channels that were opened, it which allows both video and audio to be multiplexed in
could not determine whether two endpoints that request a single stream. This will assist endpoints in
bandwidth are actually requesting bandwidth for a synchronizing video and audio.
multicast session or unicast session. This becomes a DTMF Relay via RTP
much bigger issue when many people are participating H.323 version 4 allows an endpoint to utilize RFC
in a multipoint multicast conference. With Version 4, 2833 “RTP Payload for DTMF Digits, Telephony
specific details about the media channels are conveyed Tones and Telephony Signals” to send and receive
to the Gatekeeper in (Information Request Response) DTMF digits.
IRR messages (if the Gatekeeper requests them), so
that the Gatekeeper can better control bandwidth
utilization.
19
8.4 Further Features are under developing Robustness
on H.323 Robustness is developing, it requires refining the
architecture for recovery from crashes. Currently two
ITU-T is working or is going to work on some of architectures are proposed: small scale systems and
further enhanced features of H.323, which include large scale systems.
Generic Extensibility Framework, Protocol Inter- In smallscale systems, the architecture makes each
working, Mobility, and Robustness [1][10]. element responsible for detecting failures of the others.
Generic Extensibility Framework If one element fails in the system, the others can go to
The Generic Extensibility Framework (GEF) the backup element. Some state information of the
introduces new means by which H.323 may be failure element is then need to be provided. For large
further enhanced or extended with optional scale system, the architecture is very complex and need
features, which does not require changes to the to be specified.
current ASN.1 syntax
Inter-working or integrating with other protocols.
The inter-working or integrating with newly developed 9 Comments on H.323
protocols may need to be developed. These protocols H.323 is a very complex system with all kinds of
include SIP, H.248/Megco, and Bearer Independent features for multimedia communications, but not every
Call Control (BICC). part of H.323 has to be implemented when building a
SIP is gaining in popularity as a VoIP protocol. powerful and useful system. Multimedia over IP, in
H.248/Megaco may find its way into many “media itself, has a certain amount of complexity associated
gateway” devices, ranging from residential gateways to with it. It results in that the system for implementing
large-scale service provider gateways. The Bearer the inter-working between different multimedia
Independent Call Control (BICC) protocol will systems with various features and services is inevitably
compete with both H.323 and SIP for a place in the complex. The complexity does exist in a H.323 system
service provider network. Making H.323 work with is for a reason, the reason may become even more evident
also important. as video, audio, and data conferencing become more
Mobility prevalent [10].
Mobility includes terminal mobility, user mobility, and H.323 allows the use of multiple codecs. In the
service mobility. To implement the mobility of H.323, systems, there is a good reason for using each of the
the functions of mobility management need to be codecs.
defined, which include Home Location Function Gatekeepers are optional in H.323 system. They
(HLF), Visitor Location Function (VLF), provide consistent means for H.323 endpoints to
Authentication Function (AuF), and Inter-working perform address resolution, and may perform inter-
Function (IWF) (see Figure 24 [1]) working between simple H.323 (set devices) and more
protocol-complete H.323 entities. Gatekeepers can act
as a platform from which powerful new IP-based
services can be built and provided.
H.323 is scalable. Service providers can deploy H.323
networks in small scale or large scale depending on the
expected features and services.
H.323 is a proven technology used in large networks. It
has excellent integration with PSTN.
Multimedia conferencing shows the real potential
for H.323 used in multimedia communication
Many equipment manufacturers, software vendors,
and service providers have built products and
services supporting H.323. It greatly supports the
success of H.323.
With the constantly coming of new technologies, for
example BICC, H.323 has big pressure on keeping
its place in the service provider network.
10 Conclusions
As just presented, H.323 is organized around four
Figure 24: H.323 - Mobility
major facilities: (a) terminals, (b) Gateways (which can
perform protocol conversion), (c) Gatekeeper
(bandwidth manager), and (d) multipoint control units
20
(MCUs), responsible for multicasting. The H.323 MG – Media Gateway
standard is a principal technology for the transmission MGC – Media Gateway Controller
of real-time audio, video, and data communication over MP – Multi-point Processor
packet-based networks. It provides both multipoint and N-ISDN – Narrow-band ISDN
point-to-point sessions. One of the primary goals of PISN – Private Integrated Services Network
developing H.323 standards is to provide the POTS – Plain Old Telephone Service
interoperability between packet switched networks and PSN – Packet Switched Network
other multimedia networks. H.323 is a rich and PSTN – Public Switching Telephone Network
complex specification. Especially the version 4 is a QCIF – Quarter Common Intermediate Format
powerful system for multimedia communication. It QoS – Quality of Service
contains enhancements in a number of important areas, QSIG - Signaling between the Q reference points
including, scalability, reliability, flexibility, RAS – Registration/Admission Status
supplementary services, and new features. Future RCF/RRJ – Registration Confirm/Reject
releases will be even more powerful. Especially the RRQ – Registration Request
inter-working or integrating with other newly RTCP – Real Time Control Protocol
developed protocols will strengthen its position in the RTP – Real-time Transport Protocol
multimedia communication area. Mobility will greatly SCN – Switched Circuit Network
increase flexibility for using H.323 system in the fields SIP – Session Initiation Protocol
of terminal mobility, user mobility, and service SQCIF – Sub Quarter Common Intermediate Format
mobility. Of course, mobility will also greatly increase TCP – Transmission Control Protocol
the complexity of the H.323 system. TSAP – Transport Service Access Point
Even though H.323 is a powerful system for UCF/URJ – Unregistration Confirm/Reject
multimedia communication, if has faced great UDP – User Datagram Protocol
competition from some newly developed protocols, URQ – Unregistration Request
such as SIP, H.248/Megco, and BICC. Reducing the VLF – Visitor Location Function
complexity of H.323, and simplifying its usage will VoIP – Voice over Internet Protocol
hopefully improve its leading position in fast changing VVPN – Voice Virtual Private Network
multimedia communication world.
References
Acronyms Boaz Michaely: H.323 Overview, November 2000.
ACF/ARJ – Admission Confirm/Reject http://www.packetizer.com/iptel/h323/papers/
ARQ – Admission Request Chan-Hwa Wu ja J. David Irvin: Emerging Multimedia
AuF – Authentication Function Computer Communication Technologies, Prentice Hall,
1998, ISBN 0-13-079967-X.
BCF/BRJ – Bandwidth Confirm/Reject
Databeam Corporation: A Primer on the H.323 Series
BICC – Bearer Independent Call Control Standard, 1999.
B-ISDN – Broadband ISDN http://www.packetizer.com/iptel/h323/primer/
BRQ – Bandwidth Request ITU-T: Recommendation H.323, 1998.
CIF – Common Intermediate Format ITU-T: Recommendation H.323, 2000.
DCF/DRJ – Disengage Confirm/Reject Olivier Hersent, David Gurle & Jean-Pierre Petid: IP
DRC – Direct Routed Call signaling Telephony Packet-based multimedia communications
DRQ – Disengage Request systems, Pearson Education Limited 2000, ISBN 0-201-
DTMF – Dual-Tone Multi-Frequency 61910-5.
Packetizer: H.323 Version 4 – Overview, 2001.
GCF/GRJ – Gatekeeper Confirm/Reject http://www.packetizer.com/iptel/h323/whatsnew_v4.html
GEF – Generic Extensibility Framework Packetizer: H.323 Version 3 – Overview, 2001.
GK – Gatekeeper http://www.packetizer.com/iptel/h323/whatsnew_v3.html
GQOS – Guaranteed Quality of Service Packetizer: H.323 Version 2 – Overview, 2001.
GRQ – Gatekeeper Request http://www.packetizer.com/iptel/h323/whatsnew_v2.html
GSTN – General Switched Telephone Network Paul E. Jones: H.323 Past, Present and Future, January 2001.
GRC – Gatekeeper Routed Call signaling http://www.packetizer.com/iptel/h323/papers/
GRQ – Gatekeeper Request Phillips Omnicom Training: Voice Over IP Training
HLF – Home Location Function Material, 2000.
Trillium: H.323, 2000. http://www.iec.org/tutorials/h323/
IRR – Information Request Response
Uyless D. Black: Voice Over IP, Prentice Hall PTR 2000,
IRQ – Information Request ISBN 0-13-022463-4.
ISDN – Integrated Services Digital Network
ISUP – ISDN User Part
ITU – International Telecommunication Union
IWF – Inter-working Function
MC – Multi-point Controller
MCU – Multi-point Control Unit
21
Voice Quality in IP Telephony
Vesa Kosonen
Networking Laboratory
Helsinki University of Technology
P.O.Box 3000, FIN-02015 HUT, FINLAND
vesa.kosonen@hut.fi
22
3.2 Encoder/Decoder
Audio Audio Typically in a telephone conversation there are periods
of silence. These silence periods don't contain intelligent
information and are cut off. This is done with the help of
voice activity detector (VAD) which cuts the silence
A/D D/A periods and sends only silence information descriptor
(SID) frames. The other end adds the silence into the
speech. This is a very efficient way of saving bandwidth
Decoding since the estimated time of silence is close to 50% [2]. In
Encoding a bitstream there is also always redundant information
that can be removed before sending e.g. information that
can be forecasted by extrapolating or that is repeated by
Deframing
certain intervals. To save bandwidth further bitstream is
Framing also compressed, both payload and headers (40 bytes à
2 or 4 bytes).
Jitter buffer
23
network will also introduce electric echo which is caused satisfied with end-to-end transmission performance
by the 2wire/4wire transformation. The phones at user while avoiding over-engineering of networks [4]. The
side use only two wires but the network uses four wires. model estimates the conversational quality from mouth
Thus the 2wire/4wire transformation has to be to ear as perceived by the user at the receive side, both as
performed. The electric echo can be eliminated with an listener and talker. The primary output from the model is
electrical echo canceller (EEC) which should be the "Rating Factor" R. The model combines the effect of
positioned as close to the user as possible [3]. several impairment factors instead of considering them
separately [4]. The Rating Factor R is defined as follows:
3.6 Jitter Buffer
Jitter buffer is used to eliminate the impairments caused R = R0 - Is - Id - Ie + A [4]
by the transmission path. Real-time Transmission
Protocol (RTP) was developed to handle the situations Where
that may occur to the packets as they travel through - R0 represents the basic signal-to-
Internet. If packets are lost codecs try to hide that. This noise ratio, including noise sources such as
procedure is called 'error concealment'. Lost packets are circuit noise and room noise
decomposed by interpolating the previous packet [2]. - Is is the combination of all
This prevents gaps in the speech. The packets that arrive impairments that occur simultaneously with
in wrong order or are delayed are desequenced with the voice signal, such as the quantization
help of RTP time stamp and sequence number. The size distortion or too load side tone
of the jitter buffer can be adjusted and it is a trade-off - Id represents impairments caused
between delay and voice quality. If the size of jiffer by delay including impairments caused by
buffer is long it has time to wait for delayed packets but talker and listener echo or by loss of
in that case it will introduce more delay. To minimize interactivity
delay all packets will not arrive in due time and that - Ie represents the impairments
causes gaps into the speech. caused by use of special equipment, such as
low bit rate codecs or by e.g. packet loss
[4]
4 Methods to Assess Voice Quality - A is the advantage factor, which
expresses the decrease in the rating R that a
user is willing to tolerate lower voice
4.1 Mean Opinion Score quality, e.g. the A factor for mobile
In order to be able to compare voice quality of different telephony is 10 [5] and for multi-hop
telephony systems we need some common criterions. satellite connections A is 20 [4].
One possibility is to assess voice quality subjectively
with the help of MOS (Mean Opinion Score) scale. The values of the rating factor R can lie between 0 and
Voice quality is given values between 0 - 5. Table 1 100, where R=0 means an extremely bad quality and
below shows the MOS values of the most common ITU- R=100 means a very high quality. The values of R can
T standardized codecs. also be compared with MOS values and user satisfaction
as shown in the Table 2. The lower limit of R is included
Table 1. MOS values of the most common ITU-T but not the upper limit.
standardized speech codecs [3].
Table 2. Comparing R, MOS and user satisfaction
according to [4], [6]
Standard Bitrate MOS
R-value R-value MOS-value User
(in value (attribute) (lower limit) Satisfaction
kbit/s)
90 - 100 Best 4.34 Very satisfied
G.711 64 4.2
80 - 90 High 4.03 Satisfied
G.726 32 4.0
70 - 80 Medium 3.60 Some dissat.
G.728 16 4.0
60 - 70 Low 3.10 Many dissat.
G.729 8 4.0
50 - 60 Poor 2.58 Nearly all dissat.
G.723.1 6.3/5.3 3.9/3.7
PSTN quality is an example of desirable level of voice
4.2 E-model
quality. It is described as “good intelligibility, good
E-model (ITU-T standard G.107) was originally speaker identification, naturalness, only minor disturbing
developed by ETSI. It is a computational tool to assess impairments” [7] If a call is considered to be PSTN
end-to-end voice quality. It was developed for the use of quality then rating factor values R=>70 should be
network planners to help to ensure that users will be reached. G.107 standard lists the default values, which
24
are recommended to be used for all parameters that don't
vary during the calculation. If only default values are D(i) = (Ri - Ri-1) - (Si - Si-1) [8]
used the calculation results in a very high quality with
rating factor of R = 93.2. [4]. In that case (and if echo is When delay is constant the value of D(i) is zero. But
perfectly controlled, that is echo loss = ∞) a call retains when delay varies the spacing of the packets at the
its quality up to a mouth-to-ear delay of 150 ms. Also
delay values even up to 400 ms are still within the limits Selsius/Cisco
of PSTN quality [5]. IP phone
5 Measurements Hub
25
remains the same even when load was applied. Where as
Pn = Pn-1 + (SI - SI-1), for n>1 [8] the same curve of NetMeeting has changed considerably.
26
Appendix A. Scenarios of End-to-End Voice Call by ETSI/TIPHON [1]
IP Network
IP Network
TIPHON
Terminal
TIPHON
Terminal IP Access
IP Access
TIPHON
Terminal IP Network
IP Access IFW
Call initiated from IP Network to
SCN SCN
network
IP Network
IFW IFW
SCN
network
27
Appendix B. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with no load
3
2
D, 1
J
[m 0
s] -1 0 200 400 600 800 1000 1200 1400 1600 1800
-2
-3
Packet number
Nu 400 Nu 600
m m 500
be 300 be
r r 400
of 200 of 300
pa pa 200
ck 100 ck
et et 100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 4. Selsius/Cisco IP phone with no restrictions (Bandwidth = 10 Mbit/s, Delay = 0 ms, Packet loss = 0%)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms] and down
right is the histogram of J [ms].
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800
-20
-40
Packet number
400 600
Number of packets
Number of packets
500
300 400
200 300
200
100 100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 5. NetMeeting program with no restrictions (Bandwidth = 10 Mbit/s, Delay = 0 ms, Packet loss = 0%)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms], and
down right is the histogram of J [ms].
28
Appendix C. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with packet loss 25 %
3
2
1
D, J [ms]
0
-1 0 200 400 600 800 1000 1200 1400 1600 1800
-2
-3
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 6. Selsius/Cisco IP phone with Packet loss = 25 % (Bandwidth = 10 Mbit/s, Delay = 0 ms)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms], and
down right is the histogram of J [ms].
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800
-20
-40
Packet number
400 600
Number of packets
Number of packets
300
400
200
200
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 7. NetMeeting program with Packet loss = 25 % (Bandwidth = 10 Mbit/s, Delay = 0 ms)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms], and
down right is the histogram of J [ms].
29
Appendix D. Measuring Packet Spacing Difference and Jitter on Selsius/Cisco IP
phone and NetMeeting program with bandwidth 80 kbit/s
50
0
-50 0 200 400 600 800 1000 1200 1400 1600 1800
D, J [ms]
-100
-150
-200
-250
-300
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 8. Selsius/Cisco IP phone with Bandwidth = 80 kbit/s (Delay = 0 ms, Packet loss = 0 %)
Upper picture shows the measured D and J as the packets were captured, down left is the histogram of D [ms], and
down right is the histogram of J [ms].
40
20
D, J [ms]
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
-20
-40
Packet number
400 600
Number of packets
Number of packets
500
300
400
200 300
200
100
100
0 0
-30 -20 -10 0 10 20 30 0 5 10 15 20 25
D [ms] J [ms]
Figure 9. NetMeeting program with Bandwidth = 80 kbit/s (Delay=0 ms, Packet loss=0%)
Upper picture shows the measured D and J as the packets were captured, bottom left is the histogram of D [ms], and
bottom right is the histogram of J [ms].
30
Voice in Packets: RTP, RTCP, Header Compression, Playout
Algorithms, Terminal Requirements and Implementations
Jani Lakkakorpi
Nokia Research Center
P.O. Box 407
FIN-00045 NOKIA GROUP
Finland
jani.lakkakorpi@nokia.com
31
• Padding (P, 1 bit) The sequence number is used by the receiver to
detect packet losses and to restore packet
If padding bit is set, packet contains one or more sequence.
padding octets at the end of the payload. The last
octet of the payload contains the number of • Timestamp (32 bits)
padding octets.
Timestamp reflects the sampling instant of the first
V P X CC M PT Sequence Number payload octet. The clock frequency is defined for
each payload type, and the clock is initialized with
Timestamp a random value [Her00].
Contributing Source (CSRC) Identifiers SSRC field identifies the synchronization source.
This identifier is chosen randomly, with the intent
that no two synchronization sources within the
… same RTP session have the same SSRC identifier.
Figure 1. RTP Header Format CSRC list identifies the contributing sources for
the payload contained in this packet. The number
• Extension (X, 1 bit) of identifiers is given by the CC field. Only 15
sources can be identified. CSRC identifiers are
If extension bit is set, the fixed RTP header and inserted by mixers, using the SSRC identifiers of
possible CSRCs are followed by extensions that the contributing sources.
use the format defined in RFC 1889.
32
1. The primary function is to provide feedback on the Both the sender report and the receiver report include
quality of the data distribution. This function is reception report blocks, one for each of the
performed through sender and receiver reports. synchronization sources from which this participant
has received RTP data packets since the last report.
2. RTCP carries a persistent transport-level identifier Reports are not issued for contributing sources listed in
for an RTP source, called Canonical Name the CSRC list. Each reception report block provides
(CNAME). Since the SSRC identifier may change, statistics about the data received from the particular
all receivers require the CNAME to keep track of source indicated in that block.
each participant.
V P RC PT=SR=200 Length
3. Since the first two functions require that all
participants in a session send RTCP packets, the SSRC of Sender
RTCP packet rate must be controlled in order to
scale up to a large number of participants. Each NTP Timestamp, Most Significant Word
participant can independently observe the number
of other participants and thus control its RTCP
packet rate. The maximum rate at which a NTP Timestamp, Least Significant Word
participant can send RTCP reports is one per five
seconds. RTP Timestamp
RFC 1889 defines several RTCP packet types to carry SSRC_n (SSRC of nth Source)
control information:
Fraction Lost Cumulative Number of Packets Lost
• SR (Sender Report) contains transmission and
reception statistics for active senders
Extended Highest Sequence Number Received
33
octet of the packet contains the number of these The third section contains reception report blocks. The
padding octets. amount of these blocks depends on the number of other
sources that this sender has been listening to since last
• Reception Report Count (RC, 5 bits) report.
The number of reception report blocks contained • SSRC_n (Source Identifier, 32 bits)
in this packet. A value of zero is valid.
The SSRC identifier of the source that we are
• Packet Type (PT, 8 bits) reporting about.
Contains the constant 200 to identify this packet as • Fraction Lost (8 bits)
an RTCP sender report.
The fraction of RTP data packets from source
• Length (16 bits) SSRC_n that were lost since the previous sender or
receiver report was sent. If the loss is negative due
Length of this RTCP packet in 32-bit words to duplicates, the fraction lost is set to zero.
subtracted by one. (Includes the header and any
padding.) • Cumulative Number of Packets Lost (24 bits)
• SSRC (32 bits) The total number of lost packets from source
SSRC_n since the beginning of reception. This
Synchronization source identifier for the originator figure is defined to be the number of packets
of this sender report. expected subtracted by the number of packets
actually received. The number of packets received
The second section, sender information, is 20 octets also includes late and duplicate packets. Thus
(five rows in Figure 2) long and it is present in all packets that arrive late are not counted as lost, and
sender reports. It summarizes the data transmissions the loss may be negative if there are duplicates.
from this sender. The number of packets expected is defined to be
the last extended highest sequence number
received subtracted by the initial sequence number
• NTP Timestamp (64 bits) received. This may be calculated as shown in RFC
Indicates the wallclock time when this report was 1889 [Sch96a].
sent.
• Extended Highest Sequence Number Received
• RTP Timestamp (32 bits) (32 bits)
Corresponds to the same time as the NTP The low 16 bits contain the highest sequence
timestamp, but in the same units, and with the number received in an RTP packet from source
same random offset as the RTP timestamps in data SSRC_n, and the most significant 16 bits extend
packets. This correspondence may be used for that sequence number with the corresponding
intra- and inter-media synchronization for sources count of sequence number cycles.
whose NTP timestamps are synchronized.
• Interarrival Jitter (32 bits)
• Sender's Packet Count (32 bits) An estimation of the variance of the RTP packet
The total number of RTP data packets transmitted interarrival time measured in timestamp units and
by the sender since starting transmission up until expressed as an unsigned integer.
the time this sender report was generated. The
count is reset if the sender changes its SSRC Interarrival jitter can be calculated as a difference
identifier. in the relative transit time for two packets. The
relative transit time is the difference between the
• Sender's Octet Count (32 bits) packet's RTP timestamp and the receiver's clock at
the time of arrival, measured in same units. If Si
The total number of payload octets (not including
header or padding) transmitted in RTP data is the RTP timestamp of packet i and Ri is the
packets by the sender since the start of time of arrival of packet i (in RTP timestamp
transmission up until the time this sender report units), the difference in packet spacing for the two
was generated. The count is reset if the sender packets, i and j , can be expressed as:
changes its SSRC identifier. This field can be used
to estimate the average payload data rate.
34
D( i , j ) = ( R j − Ri ) − ( S j − Si ) . Header compression is based on the simple idea that
since most of the data packet overhead is constant for a
given stream, it is possible to negotiate a shorter index
Interarrival jitter is updated each time when a for those constants (e.g. source and destination IP
packet is received from source SSRC_n (using the addresses and ports) when the stream is set up [Her00].
difference in packet spacing for that packet and the Other (variable) values can be reconstructed at the
previous packet) according to the following receiving end. To put it short: the sending host replaces
formula: the large RTP/UDP/IP header to a small index, and the
receiving host reverses this operation. An RTP/UDP/IP
header compression mechanism for low-speed links is
J = J + ( D(i −1,i ) − J ) / 16 . described in RFC 2508. In many cases, all three
headers can be compressed to 2-4 bytes. The
When the reception report is issued, the current compression is done on a link-by-link basis [Cas99].
value of J is sampled.
A common trade-off for reducing the bit rate is to put In order to maintain lossless compression, changes in
several frames in a single packet. However, this can set the packet ID are transmitted. The packet ID is usually
the conversational delay to a level that is far too high incremented by one for each packet. In IPv6 base
for most users. The overhead issue can be solved also header, neither packet ID nor header checksum exist,
by using header compression [Cas99]. and only the payload length field changes.
35
In UDP header, the length field is redundant with the Three different playout delay adjustment algorithms for
IP total length field and the length indicated by the link packetized voice are presented in [Moo98]. The paper
layer. UDP checksum field will be zero in the case, is focused on the tradeoff between packet playout delay
where source does not generate any UDP checksums. and packet playout loss. The authors present an
Otherwise, the checksum must be sent intact in order to adaptive delay adjustment algorithm that tracks the
preserve lossless compression. network delay of recently received packets and
maintains delay percentile information.
In most RTP headers, only the sequence number and
timestamp change from packet to packet. If packets are Some playout delay adjustment algorithms assume that
not lost or misordered, the sequence number is the sender and receiver clocks are synchronized, but in
incremented by one for each packet. For audio packets [Moo98] this is not the case. The propagation delay is
of constant duration, the timestamp is incremented by removed from end-to-end delay by subtracting out the
the number of sample periods conveyed in each packet. minimum of measured end-to-end delays. Thus it is
possible to concentrate on the variable delay
If the second-order differences of the sequence number component.
and timestamp fields are zero, the next packet header
can be constructed from the previous header by adding In the following section, we present a similar, although
the first-order differences (that are stored in the session somewhat simpler, algorithm for adaptive playout
context along with the uncompressed header) for these delay adjustment. This algorithm does not assume
fields. synchronization of the sender and receiver clocks.
The marker bit is set on the first packet of an audio Waiting time in playout buffer is calculated with the
talkspurt. If it were treated as a constant field, such following algorithm:
that each change would require sending the full RTP
header, the compression would become quite All packets are played out at:
inefficient. Because of this, one bit in the compressed
header is reserved for the marker bit. PlayAti = ReceivedAti + Twait, i
In most packet audio applications, packets are buffered The following events will trigger the playout delay
at the receiving host in order to compensate for adjustment:
variable network delay. The receiver buffer sizes can
be constant or adaptively adjusted. Keeping the delay • If N or more packets among the last M packets
as small as possible, and avoiding excessive packet (measurement period) arrive late, playout delay is
losses at the same time is not an easy task. The results adjusted upwards when the next talkspurt arrives.
of [Ram94] indicate that an adaptive algorithm, which
explicitly adjusts to the sharp, spike-like increases in • Similarly, if M successive packets have been
packet delay, can achieve a lower rate of lost packets. received all in time, playout delay is adjusted
downwards before the next talkspurt.
Adaptive playout delay can be either per-talkspurt or
per-packet based. In the former approach, playout Table 2 shows some simulation results for constant and
delay remains constant throughout the talkspurt and the adaptive playout delay. Network delay was simply
adjustments are done between talkspurts. The latter modeled with exponential distribution with a mean of
approach introduces gaps in speech, and thus it is not 30 milliseconds. Parameters used were: N = 2, M =
recommended for VoIP [Yle97]. 100, twait = 100 ms. Simulation duration was 200
seconds.
36
Table 2: Constant vs. adaptive playout delay end, and increased delay is experienced at the other
end. Buffer underrun occurs when the receiver does not
Playout Packet Mean Min. Max. have anything to play. Modifying the playout rate by
delay loss ratio adding or dropping samples before the frame is
Constant 3.7% 100 ms 100 ms 100 ms transferred to the audio device can compensate for
Adaptive 1.2% 150 ms 100 ms 240 ms clock drift [Sel01].
Simulation results show that there is a clear tradeoff
between playout delay and playout loss. If we had VoIP Delay [ 3-Apr-2001 11:32:41 ]
selected a larger playout delay in the constant playout Adaptive Playout Delay
delay case, packet loss ratio would have been smaller. 0.25
If we the variation of network delay is unknown, it can
be very hard to set the constant playout delay.
0.20
Simulation results also show that upper bound for
adaptive playout delay is probably needed, because
end-to-end delays longer than 400 milliseconds are not
Delay [ms]
0.15
669.
Sequence Number
668.
667.
Figure 4. Adaptive Playout Delay
666.
665.
37
VocalTec Internet Phone Lite is mainly targeted for ComprEssion (ACE) for Real-Time
voice only connections. It has the following system Multimedia (Internet Draft, Expired
requirements [Voc01]: 24.11.2000), 24.5.2000.
[Kos98] T. J. Kostas, M. S. Borella, I. Sidhu, G. M.
• Windows 95 or Windows NT 4.0 or higher Schuster, J. Grabiec, J. Mahler: Real-time
• Pentium 75 processor or higher Voice over Packet-switched Networks, IEEE
Network, Volume: 12, Issue: 1, 1998.
• 14,400 bps or faster Internet connection
[Lar00] Lars-Åke Larzon, Hans Hannu, Lars-Erik
• Sound card with microphone and speakers Jonsson, Krister Svanbro: Efficient
Transport of Voice over 1P over Cellular
links, Proceedings of IEEE Globecom 2000.
38
Voice Coding in 3G Networks
Tommi Koistinen
Signal Processing Systems
Nokia Networks
Tommi.Koistinen@nokia.com
39
versions; hybrids) is presented in [2] on pages
270-287. Release 99
The waveform coders are mainly used to The basic architecture of R99 compatible network
compress speech on transmission links, for is shown in Figure 1. The IP packet data from
example, on PCM trunks between two switching UTRAN (Universal Terrestrial Radio Access
centers. The compression ratios range from 2:1 to Network, that is basically base stations and Radio
4:1 and quite high speech quality can be Network Controllers (RNC)) goes through Iu-PS
maintained. interface to 3G SGSN. Voice data goes through
Iu-CS interface to 3G Mobile Switching Center
The analysis-by-synthesis types of coders were (MSC) that converts the Adaptive Multirate (AMR)
mainly introduced together with digital mobile coded speech to G.711 format and vice versa for
networks (GSM Full Rate codec [3] dates back to the PSTN network. The circuit switched speech is
1988). As frequency band in the radio interface transferred in packet mode (ATM/AAL2) from
between a mobile terminal and a base station is UTRAN (from Radio Network Controller) to 3G
restricted (and regulated) compression techniques MSC but the codec level packet mode speech is
are a meaningful way to save money in that not yet originated from the terminal.
interface. A typical full rate channel (16 kbps)
utilizes a compression rate of 4:1. A half rate
channel (8 kbps) is half of that and it operates at
Multimedia
compression rate of 8:1. Lossy compression has S
SG
GS
SN
N G
GG
GS
SN
N IPnetworks
always some effects on speech quality and more Iu-PS
compression means usually less quality. The M
MT
T U
UT
TR
RAN
N
G.711 standard is common reference point for H
HL
LR
R
“real” speech codecs and e.g. GSM Enhanced Iu-CS
3
3G
GMS
SC
Full Rate codec [4] almost reaches the quality of T
PSTN/legacy
Tra
ran
ns
sc
co
od
de
err networks
G.711.
2 Network Architectures MT UT
TRAN MGW
W M
MG
GW
Iu-CS PSTN/legacy
user data
networks
This chapter will present the basic 3G network
Iu-CS
evolution according to 3GPP (Third Generation control MSCC M
M S
SC
S
Serve
err Se
erv
rver
Partnership Project [5]) reference architectures.
3GPP has scheduled its work to releases of R99,
R4 and R5 and so on. In the following the basic Figure 2. 3GPP Reference Architecture of Release 4.
reference architecture model of each release is
shortly described emphasizing the voice coding The final architecture model, also called as All-IP
and user plane issues. network [6], moves also speech to full end-to-end
40
packet mode. The IP packets that are generated • DTMF and call progress tone generation and
in a mobile terminal go as such either to another detection
IP terminal or to MGW from GGSN. The • support for fax/modem/data protocols
architecture is presented in Figure 3. A new • support for Tandem Free Operation (TFO)
network entity is also introduced, namely the and Transcoder Free Operation (TrFO)
Multimedia Resource Functions (MRF) unit that • bad frame handling
implements mainly conferencing services for the • IP protocol handling (RTP/RTCP, encryption,
IP based calls. QoS support)
HSS
S/C
/CSCF
Some functions, especially the conferencing
service and possible speech enhancement
Multimedia services, are basically thought to be provided by
SGSN
N GGSN
N
Iu-PS
IPnetworks
the Multimedia Resource Functions (MRF) unit,
but they may optionally be added to Media
MT
T U
UTRAN M
MRF MGW
W
Gateway responsibilities.
PSTN/legacy
networks A lot of signal processing (DSP) power is required
to provide the Media Gateway’s functions.
Typically, one DSP chip may process 4-16
Figure 3. All-IP reference architecture. channels, and on one processor card there might
be 8-32 DSPs which totals 32-512 channels per
Of course, the different phases of 3GPP releases processor card.
may coexist at the same time depending on
operators’ needs.
4 Media Resource Functions
The Multimedia Resource Functions (MRF) unit
3 Media Gateway according to 3GPP standard shall provide the
In 2G networks (like GSM) the speech related audio/video conferencing services for the All-IP
functionalities have been implemented around the network. The basic requirement is to support
transcoder unit (TRAU). The basic task of several speech codecs to be able to sum up the
transcoder has been speech encoding and conference for each party. As it is impossible for
decoding of narrowband codecs like GSM Full today’s technology to sum up signals in parameter
Rate (FR), Enhanced Full Rate (EFR) or Half Rate domain, all signals must be first decoded for linear
(HR) codecs. Some extra features like noise domain processing. The summed signals are then
cancellation or acoustic echo cancellation are also encoded again for each party.
offered by 2G transcoders. The Mobile Switching
Center has then additionally offered tone and The 3GPP work on MRF entity has not
DTMF generators, echo cancellers, fax and progressed further than the conferencing
modem pools and announcement and requirement. However, the MRF entity is a natural
conferencing services. Control mechanisms for place also for other speech enhancement
these functionalities have usually been services. It should be remembered that most of
proprietary. In 3G networks, all of these functions the calls in an All-IP network are staying inside the
must be offered by the Media Gateway that is core network and they are not going to Media
controlled by the Media Gateway Controller Gateway at all (see figure 4).
(MGC) with the standard H.248 control protocol
[7]. MT
T U
UTRA
AN
N MRF IP
IPte
term
rmin
ina
al
41
necessary operations (as it already has to support
all coding formats for the conferencing service).
The other option is that all speech enhancement
services shall be provided by mobile terminals. MSC PSTN MSC
48(16) kbps
A set of speech enhancements that the MRF Transcode
er Tra
ranscode
er
entity could provide is: 16↔ 16 16↔ 16
• Noise suppression
• Gain (volume) control
B
BSS
S B
BSS
S
• Acoustic echo cancellation
M
MS MS
S
It should also be mentioned that the Media
Gateway and the Multimedia Resource Functions
unit are logical entities only and physically they Figure 6. Tandem Free Operation is utilised.
may co-locate in the same device.
TFO is based on inband procedures that means
that no outband signaling is used to form a TFO
5 Tandem Avoidance connection. In practice, the TFO connection
establishment starts with a negotiation phase
where certain TFO protocol messages are
5.1 Tandem Free Operation (TFO) exchanged between transcoders to agree on the
Every time voice is encoded or decoded the used codecs. If the other end doesn’t support TFO
speech quality will degrade a little bit. Thus, as it will not acknowledge the negotiation and also
few conversion as possible are desired. The basic the TFO capable transcoder will start to encode
2G mobile-to-mobile call suffers from tandem and decode the 64 kbps as in figure 5.
coding that means that separate speech coding
happens in both radio interfaces and between the
transcoders voice goes in 64 kbps G.711 format. 5.2 Transcoder Free Operation (TrFO)
In general two encodings in clear speech
conditions is no problem but more than two For the 3G networks a slightly different approach
encodings especially in bad line conditions cause is taken considering tandem avoidance. Firstly,
severe degradations. outband signaling is used for codec negotiation
and if codecs match there is no need for the
To overcome this kind of quality problem ETSI transcoders at all. Operation is called as
has specified so called Tandem Free Operation Transcoder Free Operation (TrFO) [9].
(TFO) [8] that establishes a sub channel (of 16 or
8 kbps) inside the 64 kbps G.711 stream for the TrFO is relevant mainly for the MSC Server
encoded speech. Also the transcoders must concept and for intersystem compatibility as in the
support TFO feature as they must omit the final All-IP network calls are by nature of TrFO
decoding and pass encoded parameters as such type. In figure 7 is presented a basic call where
forward. outband signaling travels from MSC Server to
An end-to-end connection (of 16 or 8 kbps) can another until the whole link is negotiated. If a
now be formed with only one encoding (in common codec can be agreed no transcoding
originating mobile) and only one decoding (in resources are reserved from the intermediate
receiving mobile). The figures 5 and 6 present the media gateways.
cases without TFO and with TFO in operation.
AMR EFR!
M
MT
T U
UTR
RA
AN
N M
MG
GW
W M
MGW
W G
GS
SM
MB
BS
SS
S
MSC PSTN MSC
C
64kbps
AMR?
Tran
ranscoder Tra
ransco
oder PSTN/legacy
64↔ 16 64↔ 16 M
MSSC
C MSSC
C
networks
S
Se
erv
rve
err S
Se
erv
rve
err
AMR?
BSS BS
SS
Figure 7. A basic TrFO call.
M
MS MS
42
6 Adaptive Speech Coding capacity requirements of the operator. The
selection of the codec mode happens
continuously by the radio resource management.
The traditional GSM speech codecs operate in the Basically, as a lower AMR mode is selected, more
radio interface at a fixed source rate with a fixed bits from the gross bit rate are freed for the
level of error protection (e.g. Full Rate codec with channel coding and error protection. Even that we
framing overhead consumes 16 kbps and error use a very low codec bit rate the high error
protection adds 6.8 kbps resulting a 22.8 kbps protection keeps the overall speech quality
gross bit rate over the air). The codec itself do not sufficiently high. The figure 8 shows reasoning for
have means (except bad frame handling the mode selection. To follow the optimum quality
mechanism) to adapt to changing radio curve (MOS=Mean Opinion Score of speech
conditions. quality) against decreasing signal-to-noise ratio
For this reason, ETSI (and later 3GPP) has asked (C/I) the AMR mode that is used must be changed
for new adaptive coding schemes that could accordingly.
select the optimum channel mode (full rate or half
rate) and the optimum codec mode (speech rates)
based on the radio conditions. As a result, the M
Adaptive Multirate (AMR) codec [10,11] has now O
been standardized as an additional codec for the S
GSM system and as the only mandatory codec Mode 1
(thus far) for the 3G system. Two most important Mode 2
design targets for the AMR codec were: Mode 3
u(n) 1 ^ ^
s'(n)
AMR_4.75 4.75 kbit/s FR / HR fixed
codebook + A(z)
s(n) post-filtering
The choice between the full rate and the half rate
Figure 9. The CELP model.
channel mode can be made off-line based on the
43
The excitation signal at the input of the short-term will output the application level protocols, that in
LP synthesis filter is constructed by adding two this case, are the RTP (Real-time Transport
excitation vectors from adaptive and fixed Protocol) frames carrying the AMR payloads. So,
codebooks. The speech is synthesized by feeding concerning IP Telephony the RTP payload
the two properly chosen vectors from these specification for AMR codec [13] has grown in
codebooks through the short-term synthesis filter. importance as AMR is the codec that should
The optimum excitation sequence in a codebook converge the traditional IP Telephony with the
is chosen using an analysis-by-synthesis search mobile IP Telephony. The RTP for AMR
procedure. specification includes the following extra features:
The AMR coder operates on speech frames of 20 • codec mode request procedure
ms corresponding to 160 samples at the sampling • robust sorting of payload bits
frequency of 8 000 sample/s. At each 160 speech • bad frame indication
samples, the speech signal is analysed to extract • compound payloads
the parameters of the CELP model (LP filter • CRC calculation
coefficients, adaptive and fixed codebooks'
indices and gains). These parameters are The specification is still under finalisation in IETF.
encoded and transmitted. At the decoder, these
parameters are decoded and speech is
synthesized by filtering the reconstructed 7 Wideband Speech Coding
excitation signal through the LP synthesis filter.
Table 2 shows the resulting parameters from the The 300-3400Hz speech band frequency range
encoder operating in 12.2 kbps mode. The LP has been used for decades in all telephony
analysis is performed twice per 20ms frame applications. As the range is heavily restricted all
resulting 2 sets of line spectrum pairs (LSP). The non-speech signals, like music, are degraded
adaptive codebook (pitch delay and gain) and the badly when forced to go through this narrow
fixed codebook are found for 4 subframes of 5 ms frequency pipe. Even speech contains plenty of
each. Total number of bits is 244 per frame. information above 3400 Hz that affects the
naturalness of speech.
44
services will make speech quality better, even to
AMR-WB
level never experienced before.
Subjectivespeech quality
Excellent
AMR-NB
Verygood This article has mainly focused on the application
level. Good network conditions (low delay, no lost
EFR
Good packets due congestion) are a starting point also
for superior application level speech quality.
Poor Media gateways shall support the network level
QoS mechanisms (like DiffServ) that are used to
Unacceptable optimize and prioritise the real-time and the non-
real-time traffic (see for example [16]).
Error-free 13 10 7 4
In the past, speech service has been closely tied
Carrier-to-interfaceratio (dB)
on technical level to providing network. Within All-
IP networks also speech service will be lifted more
Figure 10. AMR-WB vs. AMR-NB and EFR. and more up to user-level. End-to-end user
applications will not even see the underlying
As can be seen, in error-free conditions the AMR- transport network and the overall speech quality
WB is superior over AMR-NB or EFR (that is the that is perceived will heavily depend on the
highest quality GSM codec at the moment). Even characteristics and features of the All-IP terminals.
in very bad conditions AMR-WB can maintain high
quality far above fixed rate GSM codecs. The nine As also the speech service will include more
modes of AMR-WB (plus one mode for DTX) are choices of used codecs, used bandwidth and
presented in in table 3. used speech enhancements there shall be
opportunity to differentiate the pricing of these
AMR-WB is specified for GSM full rate radio traffic features. The user may in the future have means
channel, for future GSM EDGE (GERAN) and for to select the speech quality that he or she is
the 3G (UTRAN) radio channel. The 3GPP willing to pay.
specifications for a wideband AMR codec (AMR-
WB) are expected to be finalized in March 2001.
References
Codec mode Source codec bit-rate
AMR-WB_23.85 23.80 kbit/s [1] ITU-T G.711; Pulse Code Modulation (PCM) of
Voice Frequencies. 1972.
AMR-WB_23.05 23.05 kbit/s
AMR-WB_19.85 19.85 kbit/s [2] Hersent O, Gurle D, Petit J-P. IP Telephony.
AMR-WB_18.25 18.25 kbit/s Packet-based multimedia communications
AMR-WB_15.85 15.85 kbit/s system. Addison Wesley, 2000.
AMR-WB_14.25 14.25kbit/s
[3] GSM 06.10; Full Rate Speech; Transcoding.
AMR-WB_12.65 12.65 kbit/s
AMR-WB_8.85 8.85 kbit/s [4] GSM 06.60; Enhanced Full Rate Speech;
AMR-WB_6.6 6.6 kbit/s Transcoding.
AMR-WB_SID 1.75 kbit/s
[5] Third Generation Partnership Project (3GPP)
www.3gpp.org
Table 3. 9 Different AMR-WB modes.
[6] 3GPP/TR 23.922; Architecture for an All-IP
8 Conclusion network, v1.0.0, October 1999.
45
[9] 3GPP/TS 23.153; Out of Band Transcoder
Control – Stage 2, v2.0.3, October 2000.
46
Session Initiation Protocol (SIP)
Jouni Soitinaho
Jouni.Soitinaho@nokia.com
47
• Controlling the call: including transfer and UAC
User1@host1
Proxy/
Registrar
Location
Server
UAS
User2@host2
termination of calls. REGISTER
Location update/OK
200 OK
Main technical properties and some implications of
INVITE Location query/Reply
SIP:
INVITE
and extend. Causes extra overhead, which is not a 2-way media transmission
serious drawback for a signaling protocol. Header
BYE BYE
names can be abbreviated. 200 OK 200 OK
• Recommended transport protocol is UDP: It is not
meant to send large amounts of data.
Figure 5. An example of SIP protocol operations.
• Application level routing based on Request-URI:
The signaling path through SIP proxies is
controlled by the protocol itself not by the
underlying network. Requires routing 2.3 Network elements
implementation in SIP proxies.
• Independence on the session it initiates and SIP has been designed for IP networking. The protocol
terminates (capability descriptions, transport makes use of standard elements like DNS and DHCP
protocol, etc.): Cooperates with different servers, firewalls, NATs and proxies. Special support
protocols, which can be developed independently. in DNS and DHCP servers is not needed but it makes
It is not a conference control protocol (floor the protocol operations more efficient. The SIP
control, voting, etc.) but it can be used to introduce protocol is implemented by the user agent client (UAC)
one. and server (UAS), redirect servers, proxies and
• Supports multicasting for signaling and media but registrars. Registrars and location servers maintain the
no multicast address or any other network resource mapping between user's permanent address and current
allocation. physical addresses.
• Support for stateless, efficient and "forward"
compatible proxies (re-INVITE carries state,
The SIP specification does not actually define the
ignore the body, ignore extension methods).
network architecture. However, the logical elements
and their relationships can be determined based on the
2.2 Operations protocol specification. The following figure
demonstrates an example of inter-domain session
Protocol operations of SIP: setup. Both UAC and UAS are located in their home
domains. Thin lines represent SIP signaling messages
• INVITE initiates session establishment and thick lines represent media transmission and dotted
line represent non-SIP protocol.
• ACK confirms successful session establishment
• OPTIONS requests capabilities
Domain A Domain B
• BYE terminates the session
DNS
• CANCEL cancels a pending session establishment Location
Server
48
Outbound Proxy whose address may have been
configured in UAC using DHCP. Outbound Proxy uses
Furthermore two headers are in central position for
DNS to resolve the recipient's address. It also controls
routing SIP messages:
Firewall/NAT to open the ports for media transmission.
Domain B has configured all the incoming requests to • Via header indicates the request path taken so far.
go to Proxy/Registrar that controls Firewall/NAT of It prevents looping and is used for routing the
Domain B. Proxy/Registrar queries the current location response back the same path as request has
of UAS from Location Server and forwards the traveled. Proxies must add "received" parameter in
message to UAS. In an intra-domain call a redirect the top-most Via header if the field contains
server could be used instead of a proxy in Domain B to different address than the sender's source address.
return the current location of UAS who could then be This feature supports NAT servers. Proxies can
contacted directly by UAC without having any proxy also forward the request as multicast by adding
involved in the communications. "maddr" parameter in the Via field.
• Route header is used for routing all requests of a
Since the request carried the media descriptions of call leg along the same path, which was recorded
UAC and since the corresponding ports were opened in in the Record-Route header during the first
firewalls media can immediately flow back from UAS request. This is to guarantee that stateful proxies
to UAC. The signaling response is routed along the will receive all the subsequent messages that affect
same path as the request and it carries the media the call state.
descriptions of UAS. UAC can now send media to
UAS. Finally UAC has to send ACK message to UAS
for acknowledging the successful session SIP proxies can also fork the incoming request to
establishment. several outgoing requests in order to accelerate the
processing of INVITE method. The forking can create
several simultaneous unicast INVITEs to the potential
locations or one multicast INVITE to a restricted
2.4 Addressing and routing subnetwork. Even if forking is an efficient mechanism
SIP uses e-mail like addresses for users but it also it is a potential source of difficult problems and needs
includes the protocol keyword in the SIP URL. SIP to be paid special attention during implementation.
URLs are used to identify the originator (From),
current destination (Request-URI), final destination
(To) and redirection address (Contact).
2.5 Registering
A client uses REGISTER method to bind its permanent
Two formats exist: address to one or more physical addresses where the
client can be reached. The request is sent to the
• sip:user@host registrar, which is typically co-located with a proxy
when UA exists, e.g. From and To fields in server. Alternatively the request can be sent to the
INVITE well-known SIP multicast address "sip.mcast.net".
• sip:host
REGISTER method is also ideally suited for
when no UA exists, e.g. Request-URI in
configuration and exchange of application layer data
REGISTER
between a user agent and its proxy. This may produce
modest amounts of data exchanges. However, because
of the infrequency of such exchanges and their typical
Including the protocol keyword in the URL allows SIP
limitation to one-hop this is acceptable if TCP is used.
server use the Contact-header to redirect a call to a web
page or to a mail server, for example. This facilitates
integration of audio and video applications with other
The most important fields for the REGISTER method:
multimedia applications.
• Request-URI names the domain of the registrar.
user part must be empty.
Routing of SIP messages is included in the protocol
itself since finding the user is one of the primary • To indicates the user to be registered
functions of SIP. The host part of the SIP URL
• From indicates the user responsible for the
indicates the next hop for a request. Even if clients
registration (typically equal to To header value)
could send the request directly to this address in
practice they are typically forced to go through a proxy
for security or address translation reasons.
49
• Contact (optional) indicates the address(es) of Obviously, standard IPSec protocol can be used for IP
the user's current location. List of current locations level encryption.
can be queried by leaving the Contact header
empty in the REGISTER request. An optional
expires parameter indicates the expiration time
of the particular registration. By giving the 2.7 Expandability
wildcard address "*" in a single contact header a In order to keep the basic protocol compact SIP
client can remove all the registrations. By giving provides the protocol designers with means for
zero as the value for the expires parameter a client extending its capabilities. Protocol elements that can be
can remove the corresponding registration. extended without change in the protocol version
include:
• Expires tell the default value for expiration
unless the corresponding parameter is present in
the Contact header. If neither one is present
• Methods
default value of one hour is used.
• Entity headers
50
domain name or register the new option with the The client uses a new method (PRACK) for
Internet Assigned Numbers Authority (IANA). acknowledging the provisional response. Unlike ACK,
which is end-to-end, PRACK is a normal SIP message,
like BYE. Its reliability is ensured hop-by-hop through
Clients can always call the OPTIONS method for each stateful proxy. PRACK has its own response and
explicitly querying the capabilities of the server and therefore existing proxy servers need no modifications.
proxies lying on the path. A new header (RAck) in the PRACK message
indicates the sequence number of the provisional
response, which is being acknowledged.
Since there are multiple ways to define a SIP extension
special attention needs to be paid on the semantic
compliance with the basic protocol. An informational The following diagram demonstrates how the support
Internet draft sets the guidelines for writing a SIP and need for reliable provisional response is negotiated
extension [5]. and implemented.
UAC UAS
INVITE sip:uas@host SIP/2.0
3 Protocol Extensions Supported: 100rel
like VoIP. Examples of these are "reliable provisional RAck: 776655 1 INVITE
Retransmission
responses", "resource management" and "INFO algorithm starts
method". Some extensions add functionality for (retransmission of 180)
implementing existing PBX services, like call transfer. Retransmission
(retransmission of PRACK) algorithm stops
Examples are "call control-transfer" and "caller identity
and privacy". Some extensions add new functionality Retransmission
SIP/2.0 200 OK (for PRACK)
algorithm stops
for enabling new type of services, like presence based
instant messaging. Examples are "event notification"
and "caller preferences". Finally some extensions add Figure 7. Reliable provisional response.
resilience to the basic protocol for implementing
reliable and scalable networks. Examples are "session
timer" and "distributed call state".
3.2 Resource Management
In order to become a successful service Internet
telephony must meet the quality expectations based on
3.1 Reliable provisional responses the existing telephony services. This implies that the
When run over UDP, SIP does not guarantee that resources must be reserved beforehand for each call.
provisional responses (1xx) are delivered reliably, or in Cooperation is therefore needed between call signaling,
order. However, many applications like gateways which controls access to telephony specific services,
wireless phones and call queuing systems make use of and resource management, which controls access to
the provisional responses to drive state machinery. This network-layer resources
is especially true for the 180 Ringing provisional
response, which maps to the Q.931 ALERTING
message. The Internet draft document [10] discusses how
network QoS and security establishment can be made a
precondition to sessions initiated by SIP, and described
The Internet draft document [6] specifies an extension by SDP. These preconditions require that the
to SIP for providing reliable provisional response participant reserve network resources or establish a
messages ("100rel"). When a server generates a secure media channel before continuing with the
provisional response which is to be delivered reliably, session. In practical terms the "phone won't ring" until
it places a random initial value for the sequence the preconditions are met. The draft proposes new
number (RSeq). The response is then retransmitted attributes for SDP:
with an exponential backoff like a final response to
INVITE.
• "a=qos:" strength-tag SP
direction-tag
51
• "a=secure:" SP strength-tag SP security associations. The 183-Session-Progress is
direction-tag received by the UAC, and the UAC requests the
resources needed in its "send" direction, and establishes
the security associations.
where the strength can have values "mandatory",
"optional", "success" and "failure" and the direction
can have values "send", "recv" and "sendrecv". The diagram also demonstrates the usage of PRACK
and COMET methods for confirming the responses and
resource allocations respectively.
The document also proposes a new method to SIP. The
COMET method is used to confirm the completion of
all preconditions by the session originator. The
following diagram presents the message flow for a 3.3 INFO method
single-media session setup with a "mandatory" quality- The SIP INVITE method can be called one or more
of-service "sendrecv" precondition, where both the times during the established session (re-INVITE) to
UAC and UAS can only perform a single-direction change the properties of media flows or to update the
("send") resource reservation. SIP session timer. However, there is no general-
purpose mechanism to carry session control
information along the SIP signaling path during the
UAC UAS
| SIP-Proxy(s) | session.
| INVITE | |
|---------------------->|---------------------->|
| | |
| 183 w/SDP | 183 w/SDP |
|<----------------------|<----------------------|
RFC2976 [14] defines the INFO method for
| | communicating mid-session information during the
| PRACK |
|---------------------------------------------->| call. It is not used to change the state of the session but
| 200 OK (of PRACK) |
|<----------------------------------------------|
it provides means for exchanging additional
| Reservation Reservation | information between the peers. One example of such
===========> <===========
| | session control information is ISUP and ISDN
| | signaling messages used to control telephony call
| COMET |
|---------------------------------------------->| services.
| 200 OK (of COMET) |
|<----------------------------------------------|
|
|
| SIP-Proxy(s) User Alerted The information can be conveyed either in the header
|
| 180 Ringing
|
| 180 Ringing
|
|
of the INFO message or as part of the message body.
|<----------------------|<----------------------| The definition of the message body and/or message
| |
| PRACK | headers used to carry the mid-session information is
|---------------------------------------------->| outside the scope of this document. However,
| 200 OK (of PRACK) |
|<----------------------------------------------| consideration should be taken on the size of message
|
|
|
UserPicks-Up
bodies since it can be fragmented while carried over
| SIP-Proxy(s) the phone UDP bearer.
| | |
| 200 OK | 200 OK |
|<----------------------|<----------------------|
| | |
| |
| ACK |
|---------------------------------------------->|
3.4 Call Control - Transfer
The basic SIP protocol does not support any of the
multiple ways a call can be transferred to a third party.
Figure 8. Resource management signaling.
In an "unattended transfer" the transferor is not
participating the call simultaneously with the transferee
and transfer target whereas in an "attended transfer" the
The session originator (UAC) prepares an SDP
three actors participate the call simultaneously (ad-hoc
message body for the INVITE describing the desired
conference). In an "consultation hold transfer" the
QoS and security preconditions for each media flow,
transferor establishes and terminates a second call with
and the desired direction "sendrecv." This SDP is
the transfer target before performing the actual transfer.
included in the INVITE message sent through the
proxies, and includes an entry "a=qos:mandatory
sendrecv." The recipient of the INVITE (UAS), returns
a 183-Session-Progress provisional response The Internet draft document [11] proposes a SIP
containing SDP, along with the qos/secure attribute for extension, which can be used, for example, to
each stream having a precondition. The UAS now implement traditional unattended and consultation hold
attempts to reserve the qos resources and establish the transfers. The attended transfer is not drafted yet since
the call control framework has not addressed
52
conferencing. The following figure presents the 3.6 Caller preferences
message sequence of unattended transfer with
When a SIP server receives a request, there are at least
consultation hold.
three parties who have an interest and each of which
should have the means for expressing its policy:
Transferor Transferee Transfer Target • The administrator of the server, whose directives
INVITE/200/ACK can be programmed in the server.
INVITE(hold)/200/ACK • The callee, whose directives can be expressed
Call put
on hold
most easily through a script written in the call
INVITE/200/ACK
processing language (CPL)
consultation
BYE/200
• The caller, who doesn't have obvious ways to
express the preferences within the SIP server.
REFER/202 Accepted
INVITE/200/ACK
NOTIFY/200
Call
The Internet draft document [9] specifies an extension
terminated BYE/200 mechanisms by which the caller can provide its
BYE/200
preferences for processing a request. These preferences
include the ability to select which URIs a request gets
proxied or redirected to, and to specify certain request
Figure 9. Unattended call transfer with consultation handling directives in proxies and redirect servers. It
on hold. does so by defining three new request headers, Accept-
Contact, Reject-Contact and Request-Disposition. The
extension also defines new parameters for the Contact
The new REFER method indicates that the recipient header that describe attributes of a UA at a specified
(Request-URI) should contact a third party identified URI.
by the contact information (Refer-To). Once the
transferee knows whether the transfer succeeded or
failed it notifies the transferor by sending "refer" event
using the NOTIFY mechanism as if the REFER 3.7 Event Notification
message had established a subscription. The ability to request asynchronous notification of
events is useful in many types of services. Examples
include automatic callback services (based on terminal
state events), buddy lists (based on user presence
3.5 Caller Identity and Privacy events), message waiting indications (based on mailbox
In order for SIP to be a viable alternative to the current state change events), and PINT status (based on call
PSTN, it must support certain telephony services state events).
including Calling Identity Delivery, Calling Identity
Delivery Blocking, as well as the ability to trace the The Internet draft document [13] proposes a framework
originator of a call. While SIP can support each of by which notification of events can be ordered. The
these services independently, certain combinations draft can't be used directly, i.e. it doesn't specify any
cannot be supported. The issue of IP address privacy event types and it must be extended by other
for both the caller and callee needs to be addressed as specifications (event packages). In object-oriented
well. terminology, this is an abstract base class which must
be derived into an instantiatable class by further
The Internet draft document [12] specifies two extensions.
extensions to SIP that allow the parties to be identified
by a trusted intermediary while still being able to The extension is based on two new methods:
maintain their privacy. A new general header, Remote- SUBSCRIBE and NOTIFY and a new header "Event"
Party-ID, identifies each party. Different types of party together with the "Expires" header. Neither
information can be provided, e.g. calling, or called SUBSCRIBE nor NOTIFY necessitates the use of
party, and for each type of party, different types of "Require" or "Proxy-Require" header and no extension
identity information, e.g. subscriber, or terminal, can token is defined for "Supported" header. Clients may
be provided. Another new general header, Anonymity, probe for the support of SUBSCRIBE and NOTIFY
is also defined for hiding the IP addresses from the using the OPTIONS method.
other parties.
There is no separate media transmission between the
subscriber and notifier as in normal SIP session. The
message body of the NOTIFY method is to carry the
actual notification.
53
The Internet draft document [7] specifies the session
Removing and refreshing subscriptions are performed timer extension ("timer") for solving the problem and
in the same way as for REGISTER method. Usage of improving the reliability of the basic SIP protocol.
the message body in SUBSCRIBE request is left up to UAC, UAS and proxies communicate the support for
the concrete extensions. It may be used to filter and set the extension and assign the responsible party (UAC or
thresholds for the events. UAS) for sending the re-INVITEs in the original
INVITE message. If UAC supports the extension it sets
The basic scenario of a notification session is presented "timer" in the Supported header and if it wants to turn
in the following figure. Note that according to the SIP the extension on it sets the refresh interval in Session-
principle proxies need no additional behavior to Expires header. UAC will then be responsible for
support SUBSCRIBE and NOTIFY methods but they sending the re-INVITEs. A proxy may adjust the
can act as subscribers and notifiers. refresh interval to a smaller value and also require
(Proxy-Require) UAS to send the re-INVITEs in case
Subscriber Notifier UAC does not support the extension. If a re-INVITE is
SUBSCRIBE not received before the refresh interval passes, the
200
session is considered terminated, and call stateful
Generate immediate proxies can release the session.
NOTIFY state response
200
54
INVITE with the same Request-URI but connect to the based, for example, on user's location and caller's
host indicated by the maddr parameter. identity.
SIP protocol
CPIM Presence System
Presence is defined as user's reachability, capabilities Non-SIP protocol
55
The presence agent (PA) is capable of storing the The semantic difference between presence and IM
subscriptions and generating notifications based on the protocols and basic SIP protocol is in the type of
events. Present user agent (PUA) updates presence session they create. Presence protocol creates a passive
information. session which is used asynchronously for notifying the
subscriber using the signaling channel without any
media channel. Establishment and termination of the
Authorization is a critical component of a presence session is done differently to the basic protocol. IM
protocol. Authorization can be pushed to the server does not create a session at all which is currently
ahead of time or, more typically, determined at the time discussed in the working group. Surrounding the
of subscription. Since this is not covered by the basic related MESSAGE requests with INVITE and BYE
SIP protocol an Internet draft [17] proposes a new requests would be consistent with the basic protocol..
method (QUATH) for querying the authorization from
the subscription authorizer (e.g. PUA). This draft
seems to be arguable, however.
5 Conclusion
The IM protocol extensions are defined in the Internet
draft [18]. When a user wishes to send an instant
message to another, the sender issues a SIP request Simplicity is a key characteristic of SIP. It facilitates
using the new MESSAGE method. The request URI interoperable clients, servers and proxies coming from
can be in the format of "im: URL" or normal SIP URL. independent vendors. Sharing a lot of similarities with
The body of the request contains the message to be HTTP makes the understanding of SIP rather easy for a
delivered. Provisional and final responses will be large developer community.
returned to the sender as with any other SIP request.
The following diagram shows two message exchanges
between two users. Expandability is another key characteristic. Being
inbuilt in the basic protocol it provides the means for
extending the protocol capabilities. Network elements
User1 Proxy User2 can dynamically negotiate their capabilities. The basic
MESSAGE im:user2@domain.com SIP/2.0
From: im:user1@domain.com MESSAGE sip:user2@domain.com SIP/2.0 protocol specification can concentrate on its primary
From: im:user1@domain.com
To: im:user2@domain.com
Contact: sip:user1@user1pc.domain.com To: im:user2@domain.com function.
Contact: sip:user1@user1pc.domain.com
SIP/2.0 200 OK
From: im:user1@domain.com
To: im:user2@domain.com;tag=ab8asdasd9 Supporting different protocols for different purposes is
Contact: sip:user2@user2pc.domain.com
SIP/2.0 200 OK yet another key characteristic of SIP. This facilitates
MESSAGE sip:user1@user1pc.domain.com protocol development independence between SIP and
From: im:user2@domain.com;tag=ab8asd9
To: im:user1@domain.com other protocols and makes the overall adoption of SIP
Contact: sip:user2@user2pc.domain.com
more likely.
SIP/2.0 200 OK
Figure 12. Instant messaging between users in the A lot of SIP related development activities are going on
same domain. in IETF (over 70 drafts). This is an evidence of its
potential on one hand but an evidence of its immaturity
on the other hand. The potential is demonstrated by the
Proxy looks up the registration database for the binding application examples presented in this paper. The
from im address to sip address of User2 and forwards immaturity for IP telephony is demonstrated by the
the message to the current location. The response large number of suggested extensions described in this
traverses the same path. Based on the Contact header paper that are fundamental for this area.
of the message User2 can send the second message
directly to User1's current location because Proxy
added no Record-Route header in the first message. Many extensions seem to be very useful and easy to
The From and To headers are reversed, however. specify at first sight. However, they may not share the
semantics of the basic protocol and should not be
defined as a SIP extension. The ability of IETF to
The specifications for presence and instant messaging respond to the needs and at the same time control the
are still rather insufficient. This is indicated by the specification work will be tested in near future.
long list of open issues listed in the drafts.
56
"draft" state after being in "proposed" state over two February, 2001, http://www.ietf.org/internet-
years. At the same time 3GPP is stating its drafts/draft-ietf-sip-state-01.txt
requirements for SIP in the IP multimedia subsystem of [9] Internet Draft, SIP WG, Schulzrinne/Rosenberg:
3G. If these requirements are not included in the IETF SIP Caller Preferences and Callee Capabilities,
specifications the risk of SIP fragmentation may come November 24, 2000, http://www.ietf.org/internet-
true. drafts/draft-ietf-sip-callerprefs-03.txt
[10] Internet Draft, SIP WG, W. Marshall, et al:
Integration of Resource Management and SIP,
February, 2001, http://www.ietf.org/internet-
drafts/draft-ietf-sip-manyfolks-resource-01
References [11] Internet Draft, R. Sparks: SIP Call Control –
Transfer, February 26, 2001,
[1] Internet Draft, SIP WG,
http://www.ietf.org/internet-drafts/draft-ietf-sip-
Handley/Schulzrinne/Schooler/Rosenberg: SIP:
cc-transfer-04.txt
Session Initiation Protocol, November 24, 2000,
[12] Internet Draft, SIP WG, W. Marshall, et al: SIP
http://www.ietf.org/internet-drafts/draft-ietf-sip-
Extensions for Caller Identity and Privacy
rfc2543bis-02.txt
February, 2001, http://www.ietf.org/internet-
[2] RFC2327, Network WG, M. Handley, V.
drafts/draft-ietf-sip-privacy-01.txt
Jacobson: SDP: Session Description Protocol,
[13] Adam Roach: Event Notification in SIP, Internet
April 1998, http://www.ietf.org/rfc/rfc2327.txt
Draft, February 2001, http://www.ietf.org/internet-
[3] RFC2617, Network WG, J. Franks, et al: HTTP
drafts/draft-roach-sip-subscribe-notify-03.txt
Authentication: Basic and Digest Access
[14] RFC2976, Network WG, S. Donovan: The SIP
Authentication, June 1999,
INFO Method, October 2000,
http://www.ietf.org/rfc/rfc2617.txt
http://www.ietf.org/rfc/rfc2976.txt
[4] RFC2440, Network WG, J. Callas, et al: OpenPGP
[15] RFC2778, Network WG, M. Day, J. Rosenberg, H.
Message Format, November 1998,
Sugano: A Model for Presence and Instant
http://www.ietf.org/rfc/rfc2440.txt
Messaging, February 2000,
[5] Internet Draft, SIP WG, J.Rosenberg,
http://www.faqs.org/rfcs/rfc2778.html
H.Schulzrinne: Guidelines for Authors of SIP
[16] Internet Draft, SIMPLE WG, Rosenberg et al: SIP
Extensions, March 5, 2001,
Extensions for Presence, March 2, 2001,
http://www.ietf.org/internet-drafts/draft-ietf-sip-
http://www.cs.columbia.edu/sip/drafts/draft-
guidelines-02.txt
rosenberg-impp-presence-01.txt
[6] Internet Draft, SIP WG,
[17] Internet Draft, IMPP WG, Jonathan Rosenberg
J.Rosenberg,H.Schulzrinne: Reliability of
et.al: SIP Extensions for Presence Authorization,
Provisional Responses in SIP, March 2, 2001,
June 15, 2000,
http://www.ietf.org/internet-drafts/draft-ietf-sip-
http://www.cs.columbia.edu/sip/drafts/draft-
100rel-03.txt
rosenberg-impp-qauth-00.txt
[7] Internet Draft, SIP WG, S.Donovan, J.Rosenberg:
[18] Internet-Draft, J. Rosenberg, et al: SIP Extensions
The SIP Session Timer, November 22, 2000,
for Instant Messaging, February 28, 2001,
http://www.ietf.org/internet-drafts/draft-ietf-sip-
http://www.cs.columbia.edu/sip/drafts/draft-
session-timer-04.txt
rosenberg-impp-im-01.txt
[8] Internet Draft, SIP WG, W. Marshall, et all: SIP
Extensions for supporting Distributed Call State,
[19] 14 2000,
http://www.softarmor.com/sipwg/teams/sipt/index.
html
[20] Ericsson: Best Current Practice for ISUP to SIP
mapping , IETF, September 2000,
http://www.softarmor.com/sipwg/teams/sipt/index.
html
[21] Phillips Omnicom: Voice over IP, Phillips
Omnicom, July 2000.HERTS SG1 1EL – UK
[22] Srinivas sreemanthula etc: 'RT Hard Handoff
Concept for All-IP System, version V1.0.2, and
IPMN project.
57
A transport protocol for SIP
-58-
to-end traffic. However, signalling traffic does not
Sender Receiver
consist of large amounts of data. Signalling traffic
usually consists of small bursts of information. TCP’s
flow control mechanisms are not designed for such as 1:513
traffic pattern, and therefore do not perform as well as
1,5 secs
it might be expected.
Fast retransmit algorithm 1:513
When a large bulk of data is being transmitted by TCP,
ack 513
ack messages from the receiver are continuously
received indicating which segments have been
successfully received. The receiver sends duplicate
acks when out-of-order segments arrive. Thus, arrival Figure 2 : TCP timeout
of duplicate acks indicates that a segment was lost.
Therefore, the sender retransmits it without waiting for
a timeout. This mechanism is referred to as fast TCP connection establishment
retransmit and it is used together with the fast recovery TCP performs a three-way handshake before any user
algorithm. data can be transmitted between both ends. In a long-
lived connection, the connection establishment time is
Sender Receiver negligible compared to the whole connection duration.
However, signalling traffic is delay sensitive. If a SIP
UAC wants to send an INVITE over TCP it will have
1:257
to wait until the TCP connection is established before
ack 257
sending the INVITE.
257:513
513:769
Sender Receiver
ack 257
257:513
SYN
SYN
-59-
established SIP messages belonging to a new SIP Since UDP does not provide reliable transport, reliable
session are not affected by any additional delay. They delivery is achieved through application level
can be sent immediately. retransmissions. The SIP application retransmit a
particular SIP messages when the retransmission timer
A SIP UAC usually handles a single SIP session, but expires. This retransmission timer is lower than in
proxies in the network have several ongoing SIP TCP. Its default value is 0,5 seconds. Therefore, the
sessions between them at the same time. Therefore, retransmission policy of SIP when it runs over UDP is
proxies handling a high number of SIP sessions can more aggressive than when it runs over TCP.
typically take advantage of bundling SIP sessions.
Another example were bundling can be performed is
between a large gateway towards the PSTN and its Sender Receiver
outbound proxy.
Byte stream service INVITE
However, TCP presents an important limitation
0,5 secs
regarding bundling of sessions. TCP provides ordered
delivery of a stream of bytes. When TCP is used to
transmit messages it preserves the order in which the INVITE
messages were sent by the sender. This property causes
interaction problems between different SIP sessions
Figure 5 : SIP retranmission policy using UDP
carried on a single TCP connection.
SIP can afford to have a more aggressive
retransmission policy over UDP than TCP because it
transmits a small number of small messages. Therefore,
Sender Receiver
SIP assumes that it is not going to congest the network
because they are retransmitted more often than TCP.
1:513
513:1025 Therefore, when a single or a small number of SIP
ack 1 sessions are handled, UDP is a better choice than TCP.
However, UDP, as opposed to TCP, does not hide
1:513 TCP delivers
retransmissions from the application layer. Thus,
1:1025
although a SIP application using UDP has to store
more state information than when TCP is used this
does not represent an important issue for most of the
Figure 4 : TCP provides ordered delivery applications.
-60-
4 SCTP an association might be an ordered stream while
another is unordered.
The Stream Control Transmission Protocol (SCTP) is
intended to resolve the issues derived from the use of
TCP and UDP when there are multiple SIP sessions 4.3 Flow and congestion control per
between sender and receiver. SCTP [4] also provides a association
certain level of fault tolerance through multihoming.
Even if an association contains several streams, SCTP
4.1 SCTP connection establishment performs flow and congestion control per association.
SCTP is a connection oriented transport protocol. In This allows to use the behavior of all the traffic within
SCTP terminology, a connection is referred to as an the association as input for the flow control
association. An association is established through a mechanisms, which are effectively very similar to the
four-way handshake in which the last two messages ones used by TCP.
can already carry user data.
For instance, the fast retransmit algorithm can be used
effectively without waiting for timeouts in order to
Sender Receiver retransmit data. Figure 7 shows how stream
demultiplexing and flow control work together in an
INIT example.
INIT ACK
Sender Receiver
COOKIE ECHOE
COOKIE ACK TSN=1
Stream id=0 Chunk delivered
Stream seq=0 for stream id=0
Figure 6 : SCTP four-way handshake
TSN=2
In this handshake end users exchange one or multiple Stream id=1
IP addresses or host names. One destination address Stream seq=0
will be marked as the primary. The rest of them will be
used in case the primary destination becomes TSN=3 Chunk delivered
unavailable. This feature, known as multihoming, Stream id=0
for stream id=0
allows a SCTP connection to survive network failures. Stream seq=1
The data is just sent to another destination address in SACK
case of failure. TSN=1
TSN=2
The four-way handshake provides also a certain level Stream id=1
of protection against resource attacks. The receiver, Stream seq=0
upon reception of an INIT message sends back a
cookie in the INIT ACK. The receiver does not allocate Figure 7 : Multiple streams within an association
any resources for this SCTP association until it
receives the same cookie in the COOKIE ECHOE The association of figure 7 consists of two ordered
message. This way, resources are allocated when it is streams (stream id=0 and stream id=1). SCTP
ensured that the party sending the INIT message is implements a general sequence number space
really willing to establish an SCTP association. (Transmission Sequence Number) and a sequence
number space per stream. The general TSN is used to
4.2 Multiple streams within an association perform flow control and packet loss recovery and the
stream sequence numbers are used to deliver individual
SCTP provides multiplexing/demultiplexing streams.
capabilities within an association. A single association
can contain several streams. Each stream is identified When the message with TSN=3 arrives to the receiver,
by its stream id. During the four-way handshake the this knows that TSN=2 is lost. However, it also knows
number of streams in both directions is negotiated. that TSN=3 is the next packet of stream id=0 (Stream
seq=1). Therefore, it delivers the packet to the
An association can contain several types of streams. application without waiting to receive TSN=2. In the
The base SCTP specification [4] defines two services: SACK (Selective ACK) the receiver reports that
reliable ordered delivery and reliable unordered TSN=2 was missing.
delivery. However, there are extensions [7] that
provide an unreliable delivery service. Therefore, losses in one stream do not introduce delay
on other streams. Besides, since the whole association
It is important to note that a particular service is is used to perform flow control, the sender detects that
provided on stream basis. Therefore, one stream within TSN=2 got lost thanks to the SACK sent upon
-61-
reception of TSN=3, that belongs to a different stream. Unordered service for final responses
This way, SCTP does not have to wait for a timeout to In order to overcome this problem SIP final responses
retransmit TSN=2. can be sent using the SCTP unordered service. SCTP
allows to send unordered messages within an ordered
So, SCTP combines good features of both TCP and stream. Therefore, all SIP messages within a SIP
UDP. It bundles streams to take advantage of flow session are still sent using the same stream, but
control mechanisms and delivers separately packets messages carrying final responses are sent with the
belonging to different streams. SCTP unordered flag set.
-62-
However, there are just a few scenarios where this can These two situations are the only ones where both uses
happen. of SCTP described previously differ. If the requests are
Provisional responses sent unordered, a CANCEL or a BYE might overtake
Provisional responses are sent unreliably by SIP. SIP the INVITE sent before. Ordered SCTP ensures that
systems do not rely on provisional responses to drive they arrive in the same order as they were sent.
any protocol state machine. Therefore, receiving out of However, this is only ensured in the part of the path
order provisional responses does not represent a where ordered SCTP is used. If other transport protocol
problem for a SIP UAs. such as UDP is used in another part of the path,
reordering can still happen. Therefore, even systems
using ordered SCTP have to be prepared to handle out
When a SIP UA is interested in provisional responses it of order CANCELs and BYEs. Figure 10 shows how a
uses the extension defined in [9]. Then, provisional system using ordered SCTP might still receive out of
responses are transmitted reliably. [9] recommends SIP order requests.
servers sending provisional responses not to send
subsequent responses until the previous one has been Proxy 1 Proxy 2 Proxy 3
acknowledged with a PRACK. Thus, using ordered or
unordered SCTP to transport provisional responses
does not make a difference, since the SIP layer ensures
BYE BYE
that they are received in order.
INVITE
Client Server
INVITE
182 Two in the Queue
PRACK Ordered
UDP
200 OK transport
-63-
transport protocols deliver a message to the application failures. This feature increases the reliability of an
it contains a single SIP message. In order to parse a SIP association.
message received over TCP it is necessary to
implement application level boundaries such as the SIP Multiple destination addresses are not intended to
Content-Length header. provide a load balancing mechanism. SCTP marks one
address as the primary, and all the traffic is routed to
Transport-layer fragmentation that address until it fails. Other mechanisms such as
However, although both SCTP and UDP are message- DNS SRV [8] records might be used to provide load
oriented transport protocols, SCTP has an advantage balancing. SCTP multihoming just provides a fail over
over UDP. SCTP implements transport-level mechanism.
fragmentation while UDP does not. If a SIP message
inside a UDP packet is larger than the path MTU the 5.3 A single SIP session over SCTP
packet will be fragmented at the IP layer. It is clear that SIP entities that handle a high amount of
SIP traffic between them can take advantage of SCTP
IP-layer fragmentation presents several problems. The and all its features. However, SCTP advantages are not
likelihood of having packet losses increases and so evident when a single SIP session (or a small
firewall and NAT traversal becomes impossible. The number of them) is transported. In this scenario SCTP
fragments of the UDP packet do not carry the UDP shares some problems that TCP has. SCTP association
header, which contains the source and the destination establishment delays the delivery of the first INVITE,
port number of the UDP packet. Therefore, network and once the association is established, SCTP timeouts
devices that need to examine port numbers will simply are more conservative than the ones used by SIP over
discard the packets. UDP. The initial value for the SCTP retransmission
timer is 3 seconds and even when RTT measurements
SCTP implements transport-layer fragmentation. are performed its minimum value is 1 second.
Messages larger than the path MTU are transported in
different SCTP chunks. Every chunk carries complete Sender Receiver
transport information, and thus, problems derived from
IP fragmentation are avoided. Different chunks are
INVITE
reassemble at the destination and delivered to the
3 secs
Common 6 Conclusions
header
Chunk 1 [...] Chunk n
The best transport protocol for SIP depends on the
amount of SIP traffic that a particular SIP entity
Figure 11 : SCTP message format handles. SIP entities that handle a large amount of SIP
Therefore, a single SCTP packet can carry several SIP traffic between them such as proxies and large SIP
messages that belong to different sessions. Bundling gateways have in SCTP their best choice. SCTP
SCTP chunks decreases the number of packets sent bundles together several SIP sessions into a single
through the network. This avoids certain congestion SCTP association and then performs flow and
problems in IP routers and typically achieves a better congestion control per association. This way, packet
performance than sending various individual packets. losses are detected before retransmission timers expire
leading to an increase in the overall performance.
Multihoming Among all the possible services provided by SCTP,
SCTP provides several source and destination unordered delivery and ordered delivery with
addresses within an association. They are intended to unordered final responses are the ones that suit SIP
provide alternative paths to be used in case of network better.
-64-
However, SIP entities that handle a small number of [3] Postel J, “User Datagram Protocol”, RFC 768.
SIP sessions such as the SIP UA of a individual user IETF. August 1980.
cannot take advantage of the flow control provided by
SCTP. When a small number of SIP messages are [4] Stewart R., Xie Q., Morneault K., Sharp C.,
transported over SCTP packet losses are detected by Schwarzbauer H., Taylor T., Rytina I., Kalla M.,
timeouts. This leads to a too conservative Zhang L., Paxson V., “Stream Control
retransmission policy, since timers in SCTP are not Transmission Protocol”, RFC 2960. IETF. October
designed for situations where the traffic load is very 2000
low. Therefore, small SIP entities have in UDP their
best choice. UDP does not introduce any connection [5] Rosenberg J, Schulzrinne H., “SCTP as a
establishment time and retransmit lost packets in a Transport for SIP”, draft-rosenberg-sip-sctp-00.txt.
more aggressive way than SCTP. However, since SIP IETF. Jone 2000. Work in progress.
applications using UDP do not perform any congestion
control other than implementing a back-off [6] Fielding R., Gettys J., Mogul J., Frystyk H.,
retransmission timer, the use of UDP is not Berners-Lee T., “Hypertext Transfer Protocol --
recommended for high volumes of SIP traffic. HTTP/1.1”, RFC 2068. IETF. January 1997.
While TCP is an excellent protocol for transferring [7] Xie Q., Stewart R., Sharp C., Rytina I., “SCTP
large amounts of data such as files or the contents of a Unreliable Data Mode Extension”, draft-ietf-
particular web page, it presents important limitation sigtran-usctp-01.txt. IETF. February 2001. Work
regarding signalling transport. Therefore, depending on in progress.
the SIP entity, UDP or SCTP are better choices to
transport SIP signalling. [8] Gulbrandsen A., Vixie P., Esibov L., “A DNS RR
for specifying the location of services (DNS
SRV)”, RFC 2782. IETF. February 2000.
Acronyms
[9] Rosenberg J., Schulzrinne H., “Reliability of
ACK: Acknowledgement
Provisional Responses in SIP”, draft-ietf-sip-
DNS: Domain Name System
100rel-03.txt. IETF. March 2001. Work in
HTTP: HyperText Transfer Protocol
progress.
IP: Internet Protocol
ISDN: Integrated Services Digital Network
ISUP: ISDN User Part Protocol
MTU: Maximum Transmission Unit
NAT: Network Address Translator
PRACK: Provisional ACK
PSTN: Public Switched Telephone Network
RTT: Round Trip Time
SACK: Selective ACK
SCTP: Stream Control Transmission Protocol
SIGTRAN: Signalling Transport
SIP: Session Initiation Protocol
SYN: Synchronize sequence numbers flag
TCP: Transmission Control Protocol
TSN: Transmission Sequence Number
UA: User Agent
UAC: User Agent Client
UDP: User Datagram Protocol
References
[1] Handley M., Schulzrinne H., Schooler E.,
Rosenberg J., “SIP: Session Initiation Protocol”,
RFC 2543. IETF. March 1999.
-65-
Session Initiation Protocol in 3G
Tuomo Sipilä
Nokia Research Center, Helsinki, Finland
tuomo.sipila@nokia.com
66
making the system functionality more understandable - hide the operator network topology from users
the PS CN subsystem functionality is briefly and home/visited network. The network topology
illustrated. Also the IMSS linkage with the service is regarded as a key competitive factor between
subsystem that provides the open service generation is operators
briefly mentioned. - the resources shall be made available before the
destination alerts
2.1 PS CN Subsystem - identification of the entities with either SIP URL
The PS CN subsystem main functions are to establish or E.164 number
and maintain the connection between the terminal and - procedures for incoming and outgoing calls,
the GGSN, route the IP packets in both directions and emergency calls, presentation of originator
do charging. The Packet Switched Core Network identity, negotiation, accepting or rejecting
subsystem consists of the following GPRS based incoming sessions., suspending, resuming or
network elements and functions [4]: modifying the sessions
- Serving GPRS Support Node which maintains - user shall have the choice to select which session
the subscription data (identities and addresses) and components reject or accept
follows the location of the terminal within the
network 3.2 Architecture
- Gateway GPRS Support Node which maintains The IP Multimedia subsystem current architecture
the subscription information, allocated IP showing the functions (March 2001) is in figure 2.
addresses and follows the SGSN under which the Note that several of the illustrated functions can be
terminal is. merged into real network elements. The functions and
their purposes are clarified in the following
The PS CN subsystem in connected to the IMSS via subsections. It should be noted that the standardisation
Go and Gi interfaces that are located in the GGSN. The for the system is still ongoing so changes can be
Gi interface is the one that is also used for standard expected.
Internet access and it is relatively transparent. The Go Applications
Legacy mobile
signalling
interface is used for policy control between IM & Services network External IP
networks
Subsystem and GGSN and packet core. The reasons for SCP
and other IMS
policy control is to allow the operators to limit the R-SGW networks
P/I/S-CSCF
utilisation of the best 3G packet QoS classes to their Sc Ms
Mh
own IP Multimedia services. MRF Mc
S-CSCF
Mm BGCF
The IP connections between terminal (UE) and GGSN HSS
Cx
Mw
Mm
BGCF
address are allocated. GGSN
Go
P-CSCF
Mw Mg
Mj
Gi
MGCF T-SGW
Mc
3 IP Multimedia Subsystem MGW
PSTN/
Legacy /External
67
3.4 P-CSCF - It maintains session state and has the session
control for the registered endpoint's sessions
Proxy Call State Control Function (P-CSCF) performs
- Acts like a Registrar defined in the RFC2543[9],
the following functions:
i.e. it accepts Register requests and makes its
- Is the first contact point for UE within IM CN
information available through the location server
subsystem, forwards the registration to the I-
(e.g. HSS)
CSCF to find the S-CSCF and after that forwards
- may also behave as a proxy or as a user agent as
the SIP messages between UE and I-CSCF/S-
defined by RFC 2543 [9]
CSCF
- Interacts with Services Platforms for the support of
- Behaves as like a proxy in RFC 2543 [9]i.e.
Services
accepts requests and services the internally or
- obtain the address of the destination I-CSCF based
forwards them possibly after translation
on the dialled number or SIP URL
- may behave also like a RFC 2543 [9] User agent
- on behalf of a UE forward the SIP requests or
i.e. in abnormal conditions it may terminate and
responses to a P-CSCF or an I-CSCF if an I-CSCF
independently generate SIP transactions
is used in the path in the roaming case
- is discovered using DHCP during registration or
- generates charging information
the address is sent with PDP context activation
- Security issues are currently open in
- may modify the URI of outgoing requests
standardisation [3]
according to the local operator rules (e.g. perform
number analysis, detect local service numbers)
- detect and forward emergency calls to local S-
3.7 MGCF
CSCF Media Gateway Control Function (MGCF) Provides
- generation of charging information the following functions:
- maintains security association between itself and - protocol conversion between ISUP and SIP
UE, also provides security towards S-CSCF - routes incoming calls to appropriate CSCF
- provides the policy control function (PCF) - controls MGW resources [3]
- authorisation of bearer resources, QoS
management and Security issues are currently 3.8 MGW
open in standardisation [3]. Media Gateway (MGW) provides the following
functions:
3.5 I-CSCF - Transcoding between PSTN and 3G voice codecs
Interrogating Call State Control Function (I-CSCF) - Termination of SCN bearer channels
performs the following functions: - Termination of RTP streams [3]
- is the contact point within an operator’s network
for all connections destined to a subscriber of that 3.9 T-SGW
network operator, or a roaming subscriber Transport Signalling Gateway provides the following
currently located within that network operator’s functions
service area. It can be regarded as a kind of - Maps call related signalling from/to PSTN/PLMN
firewall between the external IMSS and the on an IP bearer
operators internal IMSS network. There may be - Provides PSTN/PLMN <-> IP transport level
multiple I-CSCFs within an operator’s network address mapping [3]
- Assigns a S-CSCF to a user performing SIP
registration 3.10 MRF
- Route a SIP request received from another Multimedia Resource Function provides the following
network towards the S-CSCF functions:
- Obtains from HSS the Address of the S-CSCF - Performs multiparty call and multimedia
- charging and resource utilisation conferencing functions [3]
- in performing the above functions the operator
may use I-CSCF to hide the configuration, 3.11 BGCF
capacity, and topology of the its network from the
The S-CSCF, possibly in conjunction with an
outside
application server, shall determine that the session
- additional functions related to inter-operator
should be forwarded to the PSTN. The S-CSCF will
security are for further study
forward the Invite information flow to the Breakout
3.6 S-CSCF Gateway control function (BGCF) in the same
network.
Serving Call State Control Function) (S-CSCF) The BGCF selects the network in which the
performs the following functions: interworking should occur based on local policy. If the
- performs the session control services for the BGCF determines that the interworking should occur in
terminal. Within an operator’s network, different the same network, then the BGCF selects the MGCF
S-CSCFs may have different functionality which will perform the interworking, otherwise the
68
BGCF forward the invite information flow to the SLF HSS AS
case when both callee and caller are roaming. In the BCGF MGCF
P-CSCF in the visited network forwards the service Figure 4: SIP protocol in IM SS [5]
request to the home network. However in some cases
some services can be provided directly via the visited Eventually there may be differences in the SIP
network i.e. by the P-CSCF. The P-CSCF is needed in procedures of Gm and Mw reference points. This
the home network to allow for the network flexiblity implies that there is a difference in UNI and NNI
because S-CSCFs may contain different services and interfaces [3].
also in the roaming case allow the visited operator
handle the call and provide local services. The local The following procedures have been defined for the
services can be an emergency call or other localised 3GPP IM subsystem in [3]:
services such as services related to geographical
location of the user or local numbering plans. I-CSCF - Local P-CSCF discovery: Either using DHCP or
is acting as a protective firewall between home and carrying address in the PDP context
visited networks. Notice that the true physical elements - S-CSCF assignment and cancel
may contain one or several of the CSCF functions [6]. - S-CSCF registration
- S-CSCF re-registration
User A
A’s visited network Required on A’s home network
- S-CSCF de-registration (UE or network initiated)
registration,
optional on - Call establishment procedures separated for
P-CSCF
sessiion establish
S-CSCF - Mobile origination; roaming, home and PSTN
I-CSCF
- Mobile termination; roaming, home and PSTN
I-CSCF
Optional
- S-CSCF/MGCF – S-CSCF/MGCF; between
and within operators, PSTN in the same and
User B
different network
I-CSCF - Routing information interrogation
I-CSCF
Required on
P-CSCF - Session release
S-CSCF registration,
optional on
- Session hold and resume
B’s home network
sessiion establish
B’s visited network
- Anonymous session establishment
Figure 3: Call model in roaming case [5] - Codec and media flow negotiation (Initial and
changes)
- Called ID procedures
4 SIP protocol in 3GPP Rel5 - Session redirect
- Session Transfer
4.1 SIP in IMSS 4.2 SIP in Service SS
SIP and SDP as a protocol has been selected to some The service subsystem and its connections to IM
and IPv6 as the only solution to all of the IP subsystem is shown in the figure 5. The S-CSCF
Multimedia Subsystem interfaces. interfaces the application development servers with
As shown by the figure 4 the basic SIP (RFC2543[9]) SIP+ protocols. The SIP application server can reside
has been selected as the main protocol on the following either outside or within operators network [3]. The
interfaces: OSA capability server and Camel refer to already
- Gm: P-CSCF - UE standardised 3G and GSM based service generation
- Mw: P-CSCF – S-CSCF and P-CSCF – I-CSCF elements.
- Mm: S/I-CSCF - external IP networks & other
IMS networks
- Mg: S-CSCF – BCGF
Mk: BCGF – external IP networks & other IMS
networks
69
SIP Application
Server 4.5 Bearer reservation before alerting
SIP+ For the session flow (user plane traffic) a secondary
Cx SIP+ OSA Service
OSA API
OSA
PDP context with different QoS requirements is
S-CSCF
HSS Capability Server
(SCS)
Application
Server activated. A timing synchronism has to be sought
SIP+ between signalling PDP context establishment,
IM SSF
secondary PDP context establishment, SIP connection
negotiation and callee alerting. This is needed to avoid
MAP
7. Service Control
Multimedia since it shall provide the bearer through 11. 183 SDP
10. 183 SDP
radio and the packet core network for the IP 12. 183 SDP
Multimedia Signalling (SIP) and also for the IP media 14. 183 SDP
15. PRACK
28. 200 OK
36. PRACK
37. PRACK
38. PRACK
39. PRACK
44. 200 OK
is sought [10].
For the second option i.e. using signalling PDP context
there are two alternate methods how the P-CSCF
address is provided to the terminal: either during the Figure 6: The 3GPP SIP 2-phase call setup [4]
PDP context activation or after that with DHCP
procedures. The latter case requires that the PDP
Context is modified after the IP address of the P-CSCF 5 3GPP SIP requirements
has been found so that the GGSN can filter the SIP 3GPP is in its specifications referring to IETF
traffic to the correct PDP flow or a new PDP context is specifications and the target has been to minimise the
established for SIP with the correct filter information changes. 3GPP is currently dependent on completion of
and the old is released. At the moment both options for the following SIP WG items [5] :
the CSCF discovery are available in the specifications - draft-ietf-sip-rfc2543bis: SIP: Session Initiation
[1][3]. Protocol
- draft-sip-manyfolks-resource: Integration of
resource management and SIP
70
- draft-ietf-sip-100rel: Reliability of Provisional - guarantees of QoS: Several elements and several
Responses in SIP IP based interfaces, in addition the packet radio
- draft-ietf-sip-privacy: SIP extensions for caller included in the path while the requirements are at
identity and privacy the same level as current GSM circuit voice calls
- draft-ietf-sip-call-auth: SIP extensions for media - lengthy standardisation time: more issues there
authorization are to standardise, more there are opinions and
- draft-roach-sip-subscribe-notify more time it will take
- suitability of the SIP protocol for the radio
3GPP has found out that a major part of the features interface i.e. it is a character based protocol with
are already provided by the SIP protocol and thus very long signalling messages and requires certain
few candidates for 3GPP originated enhancements transport quality
have been identified [7]. The following SIP - IETF and 3GPP standardisation co-operation:
enhancements have been recognised so far [1][8]: the operations and the behaviour are different in
- addition of routing PATH header to the SIP IETF and 3GPP
messages to record the signalling path from P- - Terminal complexity: the terminals become more
CSCF to S-CSCF and more complex with several protocol stacks,
- location information in the INVITE message to only to provide very similar services than today.
carry the location of the terminal (for instance Cell SIP has to provide true revolution in applications
ID) and services
- emergency call type is needed to indicate the type
of emergency call i.e. is it police, ambulance etc.
- filtering of routing information in the IM SS 7 Conclusions
before the SIP message is sent to the terminal to
The major identified differences with the SIP IETF in
hide the network topology from terminal
and 3GPP are as follows:
- refresh mechanism inside IM SS
1. the architecture of the IMSS is defined based on
- Network-initiated de-registration
3G model (home and visited), messages run
- 183 Session Progress provisional response for
always via S-CSCF
INVITE to ensure that the altering is not generated
2. Registration is mandatory
before PDP contexts for session are activated
3. The CSCFs interrogate the SIP and SDP flows
- Reliability of provisional responses – PRACK
either actively modifying the messages or reading
method to acknowledge the 183 message
the data, also the I-CSCF hides the names of CSCF
- Usage of session timers to keep the SIP session
behind it
alive
4. Codec negotiations in 3GPP do not allow different
- Indication of resource reservation status –
codecs in different directions
COMET method
5. in 3G networks there is a separation of UNI and
- Security for privacy
NNI interface
- Extensions for caller preferences and callee
6. due to radio and packet core functionality there are
capabilities
some change proposals to the SIP and SDP
- Media authorisation token
7. due to the P-CSCF – S-CSCF interface and the 3G
roaming mode there are some requirements to the
Discussions are currently ongoing on the changes
SIP and SDP protocols
between 3GPP and IETF.
8. in 3G SIP is used also to interface the application
development elements, they set requirements for
6 Problems and open issues SIP and SDP protocols
The following problems can be identified in the 3GPP Despite of the above mentioned differences it seems
IP Multimedia Subsystem: that the SIP protocol is suitable to the needs of the
- architecture complexity i.e. with several UMTS network. The identified problems can be
functions there will be several interfaces, overcome and some of them have political or
implementation may differ from vendor to vendor architectural nature thus they are more of choices than
thus the multivendor cases may become problems. The current work in 3GPP is still unfinished
challenging and the discussion with IETF has just been started. It
- call establishment delay problems due to the is likely that the 3GPP Release 5 shall contain some
signalling taking place on multiple levels (RAN, specifications on SIP and IMSS architecture but their
PS CN, IMSS). By making some calculations maturity is not probably too good by the end of 2001 to
based on figure 6 for establishing a call there will guarantee fully functioning network. One major
be 6 round trip times (RTT) end to end on SIP advantage is that the SIP changes so far required by
level. In addition to that there are the PDP context 3GPP are not extensive thus the SIP can probably be
reservations which take one round trip time tailored for 3GPP. However, since the specification
between UE and GGSN. work for a new subsystem is a relatively large, it can
71
be expected that the specification work will continue
also during 2002 and beyond. When the SIP and IMSS
has been finalised for the UMTS network the real-time
packet services can provide for the operators a true way
to differentiate from each other and thus generate
longed for revenues.
References
[1] Ahvonen, Kati: Master's Thesis: IP telephony
signalling in a UMTS All IP network, Helsinki
University of Technology, 24.11.2000
[2] 3G TS 23.002 version 5.1.0 Network Architecture
(Release 5)
[3] 3G TS 23.228 version 2.0.0 IP Multimedia (IM)
Subsystem - Stage 2
[4] 3G TS 24.228 version 2.0.0 IP Multimedia (IM)
Subsystem - Stage 3
[5] Drage, Keith: 3GPP and SIP. Presentation in IETF
#50 (March 18-23, 2001).
http://www.softarmor.com/sipwg/meets/ietf50/slid
es/drage-3gpp-sip.ppt
[6] 3G TS 22.228 V5.0.0 (2001-01) Service
requirements for the IP Multimedia Core Network
Subsystem (Stage 1) (Release 5)
[7] Meeting minutes of 3GPP TSG-CN/SA SIP ad-
hoc, February 12-14
http://www.3gpp.org/ftp/TSG_CN/WG1_mm-cc-
sm/SIP_meetings/CN1_SA2_03_(New%20Jersey)
/Report/NewJersey0102.zip
[8] Tdoc N1-010233; 3GPP TSG-SA WG2 / TSG-CN
WG1 SIP ad-hoc meeting, 13-15 February, 2001,
New Jersey, USA; Nokia: Feedback from IETF's
interim SIP WG meeting held on week #6
http://www.3gpp.org/ftp/TSG_CN/WG1_mm-cc-
sm/SIP_meetings/CN1_SA2_03_(New%20Jersey)
/Tdocs/N1-010233%20.zip
[9] RFC 2543 SIP: Session Initiation Protocol . ,
March 1999. http://www.ietf.org/rfc/rfc2543.txt
[10] 3GPP TR 23.874 V1.3.0 (2000-11) Feasibility
study of architecture for push service (Release 4)
http://www.3gpp.org/ftp/Specs/Latest_drafts/2387
4-130.ZIP
72
SIP Service Architecture
Markus Isomäki
Senior Research Engineer
Nokia Research Center
Markus.isomaki@nokia.com
73
who interact with other protocols such as HTTP or
RTSP.
74
even session stateful, which of course reduces their • Conferencing Server or bridge is able to mix
scalability. There can be proxies for different
medias coming from different parties together
purposes:
to implement a multi-party conference. SIP is
• Core Routing proxy used as a control protocol, also more tight
• Gateway controlling proxy control can be obtained using a special
conference control protocol.
• Firewall controlling proxy
• QoS controlling proxy • Presence Server obtains information on users
communication state and preferences or
• "Feature proxy" or "Regional Routing proxy", announcements ("I'm eating") and can convey
whose duty is to orchestrate the service this information to interested parties. More
routing and execution as explained in Chapter details in the next Chapter.
2.
• Redirect Server does not forward the request • Text-to-Speech Server is able to translate a text
further, but returns it towards its originator with stream to speech and possibly even vice versa.
redirection information. Redirect servers are very
easy to implement and scalable, and thus they are • Messaging Server is able to issue instant
powerful tools for simple services. messages to users.
• Back-to-Back User Agent is the term used for • Web Server and E-mail Server can be used to
any "proxy-like" element in the network which enhance services by alternative communication
does something more than a proxy is supposed to methods. For example interactive voice
do. This includes e.g. issuing requests to ongoing response can be replaced by web-based
sessions "in the middle" or modifying SDP dialogues. Web-pages can contain embedded
parameters. SIP or mailto URLs and SIP messages can
contain HTTP URLs, so for example an initial
SIP Application Server (APSE) is a vague term voice session initiation can be redirected to
which is not defined anywhere in the official fetch a HTML/XML document.
specifications. Basically Application Server can be a
Proxy, Redirect Server or Back-to-Back User
Agent. If simple change in the Request-URI is all Complex services can be built by combining the
that a service requires, redirection is the way to go. capabilities of different servers. It is feasible to treat
However, if changes in other headers, call state the servers as resources or service components
monitoring or acting upon responses is desired it addressible by either SIP, RTSP or HTTP URL, as
takes at least a proxy. APSE acting as a proxy proposed in [2]. In that way the servers who use
differs from common proxy only in how much other servers do not need to know the internal
intelligence or programmability it has. Obviously details of what they are using. This is an opposite
there is no official distinction between the two. If approach compared to device control protocols such
proxy functionality is not enough, a Back-to-Back as MGCP or Megaco, where the controller has tight
UA is needed. control over the slave and needs to understand
slave's internal structure to some extent. MGCP
and Megaco can still be used to separate media part
It may be useful to let one APSE to handle only one from the control part.
specific task and treat them as components from
which complete services can be build. In addition to
routing and control APSEs also other special media
handling components are needed in the network: 4 Service Building Blocks
75
Forking is another nice feature of SIP, which allows
for "parallel searches" in order to save time. C INVITE
INVITE
Unfortunately complete forking solution is only
possible for INVITE requests.
76
Transferor Transferee Transfer Target telephony servers [7]. Thus, even end-users
INVITE/200 OK/ACK
themselves could program CPL-scripts and
download them to servers without being able to do
INVITE (hold)/200 OK/ACK
any severe harm. The goal of the language
REFER
definition has been that it would be possible to
202 Accepted generate it with graphical tools. Nothing prevents
INVITE/200 OK/ACK running CPL directly in the User Agent. CPL
NOTIFY (200 OK) should become IETF standard-track RFC during
200 OK spring 2001.
BYE/200 OK • SIP Common Gateway Interface (CGI) is
BYE/200 OK equivalent to popular HTTP CGI, the difference
being the protocol under control. SIP CGI opens
the contents of SIP message headers directly
accessible by external programs, in a programming
language independent way.
Figure 4. Unattended transfer with REFER method. • SIP Servlets or SIPlets are equivalent to Java
Servlets used in Web programming. They provide
According to current specifications, REFER can occur a certain class library for developers to access and
outside of call-leg, thus it can basically initiate sessions control the SIP stack. Servlets usually offer better
from scratch. Refer-To header can contain other URLs performance than CGI scripts due to their more
than SIP URLs, thus it would be possible to initiate advanced handling of processes. Being tied to
also other than SIP-based forms of communication. Java, Servlets share the advantages and drawbacks
REFER is suitable to similar types of scenarios as of the language.
Third Party Call Control, although both have their
advantages. REFER details can be found in [5].
• SIP JAIN is another Java-based programming
interface to control the SIP stack.
Messaging is a new form of communication supported
Of the four presented mechanisms, the three latter ones
by SIP and is currently under development. Messaging
seem to be suitable for same type of tasks and are thus
can be done either with MESSAGE method or by
competing with each other. CPL has a bit different
opening a messaging session with INVITE. Messaging
scope, as it has certain limitations. One of the trade-
is a useful tool between two users but as well between
offs in the control interface is how much the developer
an APSE and a user. MESSAGE is specified in [6].
has to understand SIP. Low-level interfaces such as
CGI require certain knowledge of SIP messages and
Caller Preferences and Callee Capabilities can be
allow efficient and pinpointed control. On the other
expressed easily in SIP by header parameters.
hand high-level interfaces such as CPL do not require
Examples include indication of supported media types
SIP expertice, but lack efficiency and advanced
or spoken languages or type of the device the user
features. There are already Application Server products
currently has.
(or at least prototypes) available supporting at least
CPL, SIP CGI and Servlets.
5 Service Creation Tools
Also Intelligent Networks can be used to provide
service control for SIP proxies. This can be achieved
After talking about Application Servers and service by mapping SIP state machine to Basic Call State
building blocks it is useful to look briefly how the Machine (BCSM) and controlling it by INAP or
services can actually be programmed. As an APSE CAP protocols. In this architecture SIP proxy plays
is usually a proxy or redirect server, the task is to the role of "Soft" Service Switching Function (SSF)
find suitable tools to program them to handle which interacts with external Service Control
incoming SIP messages in a desired way. Function (SCF). While this model is suitable for
bringing "legacy" telephony services to SIP
network, it lacks the capabilities and flexibility
There are currently a lot of tools available for the task:
required for integration with other protocols and
• Call Processing Language (CPL) is an XML- services, and the most advanced features of SIP are
based language that can be used to describe and not utilized. Open Services Architecture (OSA) is
control Internet telephony services. CPL is not tied another telephone network oriented service control
to any signaling protocol, but the expectation is platform, which can be used to control SIP services.
that either SIP or H.323 is used. CPL is designed
so that it is powerful enough to describe a large
number of services and features, but it is limited in
power so that it can run safely in Internet
77
6 Example Service using Third Party Call Control. This can be
preceded with some kind of authentication scheme
using a web page or Media Server IVR. Conference
At this point it is useful to go through an example Server ties all participants together based on the
service to illustrate how different APSEs could Request-URI in INVITE it receives. Thus, all
interoperate using the building blocks defined in participants are in the same conference. When new
Chapter 4. participants join, Controller can invite Media
Server to the conference to play short
announcements like: "Bob just joined the
"Autoconferencing" service is used as an example. conference".
The service works so that a user is able to schedule
an audio or video conference to start when all
required participants indicate that they are All interaction between different servers happens
available for conferencing. Figure 5 depicts the by exchanging standard SIP and HTTP messages.
needed components: Controller for orchestrating Inside Media and Conferencing servers, MGCP or
the service, Presence Server for obtaining users' Megaco could be used to separate media and
availability, Messaging and Media Servers to send control processing from each other.
announcements and finally a Conferencing Server
to bridge the conference participants together.
7 3GPP Service Architecture
Mess. Prec.
3GPP is currently defining its service control
Server Server architecture for Release 5 IP Multimedia Subsystem
Contr. (IMS). The specifications are to be completed by the
end of year 2001. IMS is based on SIP protocol with
Media Conf. minor 3GPP specific modifications. In IMS each
Server Server incoming and outgoing request is routed via
subscribers home network through an element
called Serving Call State Control Function (S-
CSCF).
Terminals
S-CSCF plays the roles of registrar, feature proxy and
Figure 5. Autoconferencing Service. in some cases also back-to-back user agent. It has to
route the incoming and outgoing requests to correct
SIP Application Servers and other external service
First, a user fills in a web page order for the platforms according to subscribers' service profile.
conference listing all the required participants and Besides SIP APSEs, also CAMEL and OSA service
sends it to the Controller Server using standard control are supported in Release 5.
HTTP/HTML. Controller then subscribes to
presence information of all participants with SIP 3GPP Release 5 service architecture is depicted in
SUBSCRIBE and starts to receive NOTIFYs when Figure 6 [8]. S-CSCF routes incoming and outgoing
their presence state changes. When all participansts requests to APSEs and other service platforms using a
seem to be available for conferencing, Controller protocol called "SIP+". SIP+ requirements and
Server either sends them an instant message using definition work has just started in 3GPP, so the only
Messaging Server or invites them to a session with known fact about it is that it should resemble SIP as
Media Server using Third Party Call Control. The much as possible. Mapping to CAMEL (CAP) and
purpose of these actions is to get an OSA does not happen in S-CSCF but rather in the
acknowledgement from the participants on their specialized gateway elements shown in the Figure.
willingness to join the conference. Message can
contain push buttons or links to achieve this, while
Media Server can use IVR dialogues to get users
opinion. IVR dialogue can be controlled by the
Controller Server.
78
SIP Application
Server be made work together and how service specific
routing of requests should be orchestrated.
SIP+
OSA API
Cx SIP+ OSA Service OSA
HSS S-CSCF Capability Server Application
SIP+
(SCS) Server
References
IM SSF
3GPP IMS Application Servers should be quite [2] Jonathan Rosenberg, "An Application Server
similar to standard SIP APSEs described in this Component Architecture for SIP", Internet-Draft (work
paper. Some differences may arise, if SIP+ turns out in progress), March 2001,
to be much different from SIP. The most http://search.ietf.org/internet-drafts/draft-rosenberg-
challenging problem is to define how different sip-app-components-01.txt
APSEs and service platforms interoperate and how
S-CSCF makes service routing decisions. [3] Adam Roach, "Event Notification in SIP", Internet-
Draft (work in progress), February 2001,
http://search.ietf.org/internet-drafts/draft-roach-sip-
subscribe-notify-03.txt
8 Conclusions
[4] Jonathan Rosenberg, Jon Peterson, Henning
Schulzrinne, Gonzalo Camarillo, "Third Party Call
SIP and related protocols such as HTTP, RTSP and Control in SIP", Internet-Draft (work in progress),
SMTP provide a powerful machinery to implement March 2001, http://search.ietf.org/internet-drafts/draft-
services that integrate different forms of rosenberg-sip-3pcc-02.txt
communication. In SIP services are provided by
specialized Application Servers, which are actually [5] Robert Sparks, " SIP Call Control – Transfer",
proxy or registrar servers with extended Internet-Draft (work in progress), February 2001,
intelligence. http://search.ietf.org/internet-drafts/draft-ietf-sip-
cc-transfer-04.txt
79
IP TELEPHONY SERVICES IMPLEMENTATION
Eero Vaarnas
eero.vaarnas@iki.fi
80
the logical behavior of the signalling server, in doesn’t exist, the optional not-present tag can be
principle it isn’t tied to any specific protocol. chosen instead. If none of the outputs match (including
Like XML, CPL is based on tags that are not-present), the optional output otherwise is
hierarchically arranged according to the information chosen. There are four types of switches: address-
that they contain. The tags are traversed according to switch, string-switch, time-switch and
the hierarchy and the rules they contain. Eventually the priority-switch.
traversal ends and the action specified by the script is The address-switch makes decisions according
executed. In some cases the action remains unspecified,
to addresses. With the field parameter either
so some default policy is resumed.
origin, destination, or original-
2.1 Structure of CPL destination of the request can be chosen.
Moreover, the optional subfield parameter can be
CPL is specified as an XML DTD (Document Type
used to access the address-type, user, host,
Definition). It is going to have a public identifier in
port, tel, or display (display name) of the
XML (-//IETF//DTD RFCxxxx CPL
selected address. In the address output it can be
1.0//EN) and corresponding MIME (Multipurpose
Internet Mail Extensions) type. Here is only an compared if the address is an exact match,
overview of the structure, the complete DTD can be contains substring of the argument (for display
seen in [3] and XML specification in [4]. only) or is in the subdomain-of the argument (for
After the standard XML headers, CPL script is host, tel only). The address-switch is
enclosed between tags <cpl> and </cpl>. The script essentially independent of the signalling protocol. The
itself consists of nodes and outputs, arranged specific meaning of the entire address depends on the
hierarchically in a nested structure. Nodes and outputs protocol and additional subfield values may be defined
can be thought of as states and transitions, respectively for protocol-specific values.
(for a tree representation, cf. 2.2). The structure is The string-switch allows a CPL script to make
represented by nested start and end tag pairs, so both decisions based on free-form strings present in a
nodes and outputs can be simply referred as tags. Tags request. The field parameter selects either
can have parameters that describe their exact behavior subject, organization, user-agent
. (program or device name that made the request),
At the top level, there can be four kinds of tags: language or display. The string output checks
ancillary, subaction, outgoing and if the selected string is an exact match or contains
incoming. The subaction tag is used to describe a substring of the argument. String switches are
repeated structures to achieve modularity and to avoid dependent on the signalling protocol being used.
redundancy. The implementation is under the The time-switch handles requests according to
subaction tag with the id parameter as an the time and/or date the script is being executed. It uses
identifier. One or more references to the a subset of iCalendar standard [5], which allows CPL
implementation can be made using the sub tag with scripts to be generated automatically from calendar
the desired subaction identifier as the ref parameter. books. It also allows us to re-use the extensive existing
The outgoing and incoming tags are top level work specifying calendar entries such as time intervals
actions, similar to sub-actions in their implementation and repeated events. Parameters tzid (time zone
structure. The ancillary tag contains information identifier) or tzurl (time zone url) select the current
that is not part of any operation, but possibly necessary time zone and the output time match calendar entries
for some CPL extension. such as starting or ending times (dtstart, dtend),
The actual node-output structure of the script is inside days of the week (byday) and frequencies (freq).
the action tags, i.e. subaction, outgoing and Time switches are independent of the underlying
incoming. There are four categories of CPL nodes: signalling protocol.
switches, which represent choices a CPL script can With the priority-switch it is possible to
make; location modifiers, which add or remove consider priorities specified for the requests. Priority
locations from the set of destinations; signalling switches take no parameters. The priority output
operations, which cause signalling events in the can be used to match against less than, greater
underlying protocol; and non-signalling operations, than or equal to the argument. The priorities are
which trigger behavior which does not effect the emergency, urgent, normal, and non-urgent.
underlying protocol. The priority switches are dependent on the underlying
2.1.1 Switches signalling protocol.
Switches represent choices a CPL script can make, 2.1.2 Location modifiers
based on either attributes of the original call request or The set of locations to which a call is to be directed is
items independent of the call. The attributes are not given as node parameters. Instead, it is stored as an
represented by variables, depending on the switch type. implicit global variable throughout the execution of a
Switch has a list of output tags, that are traversed and processing action (and its subactions). Location
the first matching output is selected. If the variable modifiers add, retrieve or filter the set of locations.
81
There are three types of location nodes defined. 2.1.4 Non-signalling operations
Explicit locations add literally-specified locations to With non-signalling operations, it is possible to invoke
the current location set; location lookups obtain operations independently of the telephony signalling. If
locations from some outside source; and location filters supported, mail can be sent, log files can be
remove locations from the set, based on some specified generated, and also other operations can be added as so
criteria. called extensions.
The explicit location node has three node
parameters. The mandatory url parameter's value is 2.2 Tree representation of CPL
the URL of the address to add to the location set. The For illustrative purposes, CPL scripts can be
optional clear parameter specifies whether the represented as trees. Also graphical editors might
location set should be cleared before adding the new utilize the tree representation. Node tags represent
location to it. The optional priority parameter nodes of the tree, output tags are edges between them.
specifies a priority for the location. There are no In Figure 1 is an example CPL script from [3]. It is
outputs, next node follows directly. Explicit location converted into a tree in Figure 2.
nodes are dependent on the underlying signalling
protocol. 1: <?xml version="1.0" ?>
Locations can also be specified up through external 2: <!DOCTYPE cpl
3: PUBLIC "-//IETF//DTD RFCxxxx CPL 1.0//EN"
means, through the use of location lookups. The 4: "cpl.dtd">
lookup node initiates lookups according to the
source parameter. With the optional parameters, one 5: <cpl>
6: <subaction id="voicemail">
can use or ignore caller preferences fields or 7: <location
clear the location set before adding. The outputs are 8: url="sip:jones@voicemail.example.com">
success, notfound, and failure, one of them is 9: <redirect />
10: </location>
selected depending on the result of the lookup. 11: </subaction>
The remove-location is used to filter the
location set. Filtering is done based on the location 12: <incoming>
13: <address-switch field="origin"
parameter and caller preferences param - value 14: subfield="host">
pairs. There are no outputs, next node follows directly. 15: <address subdomain-of="example.com">
The meaning of the parameters is signalling-protocol 16: <location url="sip:jones@example.com">
17: <proxy timeout="10">
dependent. 18: <busy> <sub ref="voicemail" />
2.1.3 Signalling operations 19: </busy>
Signalling operation nodes cause signalling events in 20: <noanswer> <sub ref="voicemail" />
21: </noanswer>
the underlying signalling protocol. Three signalling 22: <failure> <sub ref="voicemail" />
operations are defined: proxy, redirect, and 23: </failure>
reject. 24: </proxy>
25: </location>
The proxy node causes the request to be forwarded 26: </address>
on to the currently specified set of locations. With the 27: <otherwise>
corresponding parameters, a timeout can be set, the 28: <sub ref="voicemail" />
29: </otherwise>
server can be forced to recurse to subsequent 30: </address-switch>
redirection responses, and the ordering of the 31: </incoming>
location set traversal can be set to parallel, 32: </cpl>
sequential, or first-only.
The redirect node causes the server to direct the Figure 1 Example CPL script
calling party to attempt to place its call to the currently
specified set of locations. The redirection can be set Let us have a brief look at the example script (also the
graphical representation can be followed and compared
permanent, otherwise considered temporary.
to the script structure). In lines 6-11 there is an
Redirect immediately terminates execution of the CPL
example of a subaction. It defines a redirection to the
script, so this node has no outputs and no next node.
user’s voicemail. This is accomplished by adding the
The specific behavior the redirect node invokes is
address of the voicemail to the location set (lines 7-8)
dependent on the underlying signalling protocol
and then activating the redirection (line 9). Lines 12-31
involved, though its semantics are generally applicable.
describe how incoming calls are handled. The address
The reject nodes cause the server to reject the
switch in lines 13-30 selects the host part of the callers
request, with a status code and possibly a reason. address. If the caller is from the same domain as the
Similarly to redirect, rejection terminates the owner of the script (line 15), the call is considered
execution, and specific behavior depends on the urgent and it is let through. Again, this is done in two
signalling protocol. stages: first the address is added to the location set (line
16), then the actual proxy behavior is activated (line
17). All the unsuccessful cases are directed to the
82
voicemail (lines 18-23). The voicemail is implemented scripts. It is also possible to generate scripts
as a reference to the previously defined subaction. Also automatically. Generation could be based on simple,
unimportant calls go to the voicemail (lines 27-29). standard text-processing languages. From other types
of XML documents, XSLT (eXensible Style Language
Translation) transformations could apparently be used.
incoming
Because of its tree representation CPL (and XML) can
be expressed and edited also graphically. With GUI
(Graphical User Interface) based editors also people
address-switch not so familiar with the syntax can create and edit
field: origin services. Users could upload their own CPL scripts
subfield: host using SIP registration messages, HTML forms, FTP, or
whatever method seems proper.
subdomain-of:
Things like scalability, stability and security depend
example.com much on the implementation of the CPL server.
However, because of the limited expression power of
the language, these problems are more easily treated.
location Scripts can be exhaustively validated upon their
url: sip:jones@
uploading, so in principle malicious or erroneous code
example.com
can be eliminated. Also the lack of loops and other
more complex programming structures makes CPL
otherwise
scripts potentially more compact.
CPL execution is already implemented at least in a
few SIP proxy servers [6]. There are also plenty of
proxy
timeout: 10
XML editors available and recently even some
specialized CPL editors. Some service creation
environments are based on automatic CPL generation.
busy failure
noanswer
3 Common Gateway Interface for
subaction SIP
id: voicemail
location
SIP-CGI (Common Gateway Interface for SIP) [7] is
url: sip:jones@ an interface for running arbitrary programs from a SIP
voicemail. proxy server or similar software. Since SIP borrows a
example.com lot from HTTP, also the CGI interface is adopted. Of
course, the technical specification is different, but the
basic idea is similar to HTTP-CGI.
When the server decides to invoke a SIP-CGI script, it
executes it as a normal process in the underlying
redirect operating system. It then uses standard input and output
(stdin, stdout) and environment variables to exchange
information with the process. Script status throughout
invocations is maintained with special tokens.
Figure 2 Tree representation of the example script
3.1 Input and metadata
2.3 General feasibility of CPL The header fields (with some exceptions, such as
CPL is a simple but powerful tool for IP telephony potentially sensitive authorization information) of the
service implementation. It is concentrated in basic call received SIP message are passed to the script as
control functions, but it is possible to create extensions metavariables. In practice, metavariables are
– some of them already available – for different kinds represented by the operating system environment
of advanced services. Of course CPL isn’t a variables. Each SIP header field name is converted to
programming language, so constructions like loops upper case, has all occurrences of “–” replaced by “_”,
aren’t possible and all the features must be actually and has SIP_ prepended to form the metavariable
implemented outside the scripts. name. For example Contact header would be
CPL is based on XML, which is a widely accepted represented by SIP_CONTACT metavariable. The
industry standard. This, along with its general values of the header fields are converted to fit the
simplicity, provides a good starting point for its requirements of the environment variables. Similar
utilization. First of all, people already familiar with transformations are applied for other protocols.
XML can easily adopt CPL. Even with minimal There are some additional metavariables that are
knowledge of XML it is possible to start writing CPL passed to the script. Some of them are derived from the
83
header fields or even match the values of the fields. are separated by double line feeds – in the same way
This redundancy is for the script to distinguish between that in a UDP packet in which multiple requests or
information from the original header fields and responses are sent. It is intended that all the actions are
information synthesized by the server. performed, but the server can choose which actions it
The type of the message is seen from metavariables will perform. An example of a SIP-CGI output can be
REQUEST_METHOD and RESPONSE_STATUS. If seen in Figure 3. It is explained in the following
REQUEST_METHOD is defined, the message was a chapters.
request and the method (INVITE, BYE, OPTIONS, 1: SIP/2.0 100 Trying
CANCEL, REGISTER or ACK) is stored in the 2:
metavariable. REQUEST_URI is the intended 3: CGI-PROXY-REQUEST sip:user@host SIP/2.0
4: Contact: sip:server@domain
recipient of the request. REGISTRATIONS contains a 5: CGI-Remove: Subject
list of the current locations the server has registered for 6:
the recipient (REQUEST_URI). 7: CGI-AGAIN yes SIP/2.0
8:
For responses, RESPONSE_STATUS is the numeric 9: CGI-SET-COOKIE abcd1234 SIP/2.0
code of the response and RESPONSE_REASON is the
string describing the status. For example SIP/2.0 Figure 3 Example SIP-CGI output
404 Not Found response contains the protocol
version, status code and reason phrase, respectively. 3.2.1 Action lines
REQUEST_TOKEN and RESPONSE_TOKEN are used If the action line is a normal status line, a normal SIP
to match requests and responses. SCRIPT_COOKIE response is generated according to the status code. CGI
can be used to store state information across header fields (and possibly some others) are discarded
invocations within the same transaction. and missing fields are filled according to the original
REMOTE_ADDR and REMOTE_HOST determine the message, if needed. For example line 1 in Figure 3
IP address and DNS name of the client that sent the would generate a provisional response to the request
message to the server, respectively. REMOTE_IDENT being processed.
can be used to supply identity information with The action line CGI-PROXY-REQUEST causes the
Identification Protocol, but it isn’t too widely used. server to forward a request to the specified SIP URI.
The AUTH_TYPE metavariable determines the Message to be sent depends on the triggering point: if
authorization method, if any. Authentication methods the script is triggered by a request, the triggering
comply to SIP/2.0 specification. Currently the options request is forwarded; if it is triggered by a response, the
are Basic, Digest or PGP. REMOTE_USER initial request of the transaction is sent. The initial
identifies the user to be authenticated. request can only be known by a stateful server. The
CONTENT_LENGTH and CONTENT_TYPE describe request can be supplemented with the header fields
the message body. Content type can be any registered possibly contained in the CGI output. Message body
MIME type, as stated in [1]. Actual message body can can be inserted, substituted or deleted. However,
be read from stdin. message integrity must be maintained. An example use
Some additional information of the server and the of CGI-PROXY-REQUEST can be seen in Figure 3,
outside world is provided in some special lines 3-5. It forwards the request to sip:user@host, adds
metavariables. The SERVER_NAME metavariable is set a Contact header and removes the Subject (cf.
to the name of the server. The SERVER_PROTOCOL 3.2.2 for details).
metavariable is set to the name and revision of the CGI-FORWARD-RESPONSE causes the server to
protocol with which the message arrived, e.g. forward a response on to its appropriate final
SIP/2.0. The SERVER_SOFTWARE metavariable is destination. The same rules apply for accompanying
set to the product name and version of the server SIP headers and message bodies as for CGI-PROXY-
software handling the message. REQUEST. RESPONSE_TOKEN metavariable can be
GATEWAY_INTERFACE is the version of SIP-CGI set.
used, e.g. SIP-CGI/1.1. Servers and CGI CGI-SET-COOKIE sets the SCRIPT_COOKIE
implementations can check their compatibility based metavariable to store information across invocations
on the information provided. SERVER_PORT is the (Figure 3, line 9).
port on which the message was received. CGI-AGAIN determines whether the script will be
invoked for subsequent requests and responses for this
3.2 Output transaction. If it won’t, the default action is performed
The output (stdout) consists of any number of for all later invocations. Default action results also if
messages determining the desired actions of the server. the script doesn’t generate any new messages. Line 7 in
The messages are like arbitrary SIP messages possibly Figure 3 instructs the script to be invoked again.
containing some additional information as special CGI 3.2.2 CGI Header Fields
header fields. The status line can be replaced by CGI CGI header fields pass additional instructions or
actions, thus referred as the action line. The messages information to the server. They resemble syntactically
84
SIP header fields, but their names all begin with CGI-. used depending on the platform, but their usage is
The SIP server strips all CGI header fields from any invisible to the CGI interface. Because of its similarity
message before sending it. to HTTP-CGI, SIP-CGI will be easy to adopt for
To assist in matching responses to proxied requests, experienced web programmers. However, CGI
the script can place a CGI-Request-Token CGI programming is getting a bit “old-fashioned”.
header in a CGI-PROXY-REQUEST or a new request.
This header contains a token, opaque to the server.
When a response to this request arrives, the token is 4 SIP Servlet API
passed back to the script as a meta-header. This allows SIP Servlet API is an interface for Java programs
scripts to fork (send to multiple locations in parallel) a which control the processing of SIP messages.
proxy request, and correlate which response Similarly to SIP-CGI and HTTP-CGI, the basic idea of
corresponds to which branch of the request. SIP Servlet API is from HTTP Servlet API. Currently
The CGI-Remove header allows the script to remove there is no single standard for SIP Servlet API. Here,
SIP headers from the outgoing request or response. we describe the first one of the proposals [8]. The rest
The value of this header is a comma-separated list of of the proposals [9] are either extensions to the first
SIP headers. If the headers exist in the message, they one or competing drafts.
are removed before sending, for example line 5 in The API is based on Java interface definitions.
Figure 3 removes the subject, if it exists. It is illegal to Any server/servlet that implements the appropriate
try to remove a header that is inserted elsewhere in the interfaces can be used together. The server and the
script. servlets communicate through the API and the state of
the servlets is maintained by the JVM (Java Virtual
3.3 General feasibility of SIP-CGI Machine).
SIP-CGI is an interface that provides practically The interface for all SIP servlets to be implemented is
endless possibilities in service creation within the SIP SipServlet. After instantiation (creation of a new
architecture. Since CGI scripts can be whatever object in Java), servlets are initialized and eventually
programs, it is possible to perform any kind of “cleaned” with init and destroy methods,
operations or access external services. This can be respectively. Their main function is to pass
considered as a weakness also: If the programs are configuration information and handle the allocation
extensively complex, they can cause severe and deallocation of needed resources.
overloading of the system. Also access to local file The SipServlet interface has methods for
systems or similar resources can be misused. This is different types of messages: gotRequest for
why care should be taken, when considering third party requests and gotResponse for responses. In its
implementations in CGI. Even though the uploading of
abstract implementation class,
scripts can be done straightforwardly, it is impossible
to verify the functionality of the code. Therefore it is SipServletAdapter, gotRequest divides
not advisable to let third party developers or service requests to their subtypes. Their implementation lies in
users freely create new CGI programs. Of course with methods doInvite, doAck, doOptions, doBye,
proper supervision and access restrictions it is possible doCancel and doRegister. When the server
to expose CGI programmability to a limited number of decides that some servlet is responsible for handling a
people/organizations. message, it calls the appropriate method. The methods
CGI scripts can be written in any programming return boolean values depending on the success. If
language available for the platform in use. There are false is returned, the server should apply its default
many powerful scripting languages such as Perl and processing to the message.
various shell scripts that can be used for simple The work distribution between servlets is based on
specialized tasks. When more complex operations are transactions. When a servlet is registered as a listener
needed, actual programming languages can be used. to a transaction, it receives all messages related to that
There can be portability problems concerning the transaction. Initially, the server is responsible for this
variety of languages: in order to implement the service registration. Servlets can register to further transactions
on a different platform, the compiler or interpreter for and remove registrations via the SipTransaction
the implementation language must be available. Even if interface.
the language is implemented in the new platform, there SipMessage and its sub-interfaces SipRequest
can be some dialect variations that can mess up the and SipResponse represent messages. A new
functionality. request in a SipTransaction can be initiated with
One more disadvantage of SIP-CGI is that every its method createRequest. A response to a
invocation of a script generates a new process. This is SipRequest can be created with its method
quite resource consuming in most of the operating createResponse. The method send is used to
systems. Thus, large number of simultaneous service send messages. A Request needs a next hop address,
users can cause overloading. whereas responses are routed according to their Via
There are some proxy/application servers with SIP-
CGI support available [6]. Programming tools can be
85
fields. Servlets can have different authorizations to (line 16) is used. The servlet returns true, which means
generate messages. that no default message processing is needed.
Servlets can inspect and modify the messages with
1: import org.ietf.sip.*;
certain restrictions. The body of the message can be 2:
accessed through the methods getContent and 3: public class RejectServlet
setContent. Header fields can be inspected with 4: extends SipServletAdapter {
5: protected int statusCode;
methods getHeaderNames, getHeaders and 6: protected String reasonPhrase;
getHeader. Method setHeader is used to modify 7:
the headers, excluding so-called system headers that 8: public void init(ServletConfig config) {
9: super.init(config);
are managed by the SIP stack. 10: try {
Similarly to SIP-CGI, requests and responses can be 11: statusCode = Integer.parseInt(
tied together with tokens. Sending a request returns a 12: getInitParameter("status-code"));
request token that can be used by servlets to match 13: reasonPhrase =
14: getInitParameter("reason-phrase");
against similar tokens contained in responses. This can 15: } catch (Exception _) {
be used, for example, in forking requests to different 16: statusCode = SC_INTERNAL_SERVER_ERROR;
destinations in parallel. 17: }
Current registrations of the users can be accessed 18: }
19:
through the interface ContactDatabase. Servlets 20: public boolean doInvite(SipRequest req) {
can inspect (getContacts), substitute 21: SipResponse res = req.createResponse();
(setContacts), add (addContact) or remove 22: res.setStatus(statusCode, reasonPhrase);
23: res.send();
(removeContact) registrations. Despite of its name, 24: return true;
ContactDatabase doesn’t have to be a database: 25: }
26: }
its internal implementation is hidden and it provides
only generic contact information.
SipURL represents SIP URL’s in the destination of Figure 4 Example SIP Servlet
the messages, user addresses etc. With additional
information such as display name, URL’s can be stored
4.1 General feasibility of SIP Servlet API
id SipAddress interface. SipAddress represents In its expression power, SIP Servlet API is quite
the values of From and To headers. Contact is an similar to SIP-CGI. As independent programs, servlets
extended version of SipAddress, including can carry out any kind of tasks needed for the service.
expiration information and similar information. However, there are some key differences in these two
techniques. Mainly they are the same that those
Contact represents values of Contact header and
between HTTP-CGI and HTTP Servlet API.
individual entries in the ContactDatabase.
The Java Virtual Machine is running as long as the
Besides the message manipulations and database servlet engine is up. This saves resources, since it is not
access, the server can set other restrictions for sensitive necessary to generate a new process for every servlet
operations such as file system or network access. For invocation. Once the servlet is instantiated, its methods
untrusted code, so-called servlet sandbox or similar can be called over and over again. Also the state
models can be used. The idea of the sandbox model is information is conserved in the servlets themselves, no
to restrict the set of operations that can be performed. If external mechanism is needed for distributing it.
feasible, even the bytecode of newly installed servlets The tight connection to the server has also other
can be analyzed to ensure that they don’t contain buggy advantages. As stated above, messages and even the
or malicious code such as endless loops. database are represented through the API. This makes
Figure 4 is an example SIP Servlet from [8]. To access to them “handier”. It is more convenient and
understand it completely the reader should be familiar safer to handle headers, database fields etc. when they
with Java API specification [10], but the following are readily parsed by the server. It is also easier to
brief explanation can be understood cursorily even control the access when it is done explicitly through the
without prior knowledge about Java. The servlet interface. In addition, different kinds of sandbox-like
implements an unconditional call reject. As a service it environments can be used.
isn’t interesting, but it serves as an example about SIP Servlet API (like practically anything written in
servlet programming. Java) is platform independent. Unfortunately it is tied
The example servlet extends to Java language, so obviously some flexibility is lost.
SipServletAdapter (line 4), which means that by Some operations are more suitable to be performed
default it doesn’t react to any messages. Only INVITE with a scripting language like Perl, than with a general-
requests are processed (lines 20-25). They are purpose language like Java. If it is necessary or more
responded with a generic response (lines 21-23), with efficient to use scripting languages, some of them can
status code and reason phrase (line 22) stored in the be run natively in Java. There are packages for Perl,
servlet instance (lines 5-6). Customized codes and regular expressions and many other tools. External
reasons can have been determined (lines 11-14) during scripts can also be invoked as system processes from
the initialization (lines 8-18), otherwise the default one Java (even CGI can be run from a servlet), but that
86
should be avoided because it effectively destroys the 5.3 General feasibility of H.323 services
original idea of tight integration.
It can be seen that H.323 is largely based on PSTN-like
There are some proxy/application servers with SIP
models. The most significant service implementation
Servlet API support available [6]. Java itself is widely
proposals are based on PBX and possibly IN
adopted, with many development environments to
technologies.
choose from. Because of their similarity to HTTP
It is worth thinking over, whether conventional
Servlets, SIP Servlets will be easy to adopt for
models should be used in IP telephony service
experienced web programmers.
implementation. It is clear that for example IN based
services must be accessible from IP environment, but it
is a completely different issue to reproduce the
5 H.323 services implementation mechanisms. There are already
standards like JAIN for integrating IP telephony
5.1 H.450-based services systems to IN. Services that are purely developed for
Originally H.323 intended to handle only basic call the new environment should provide some real added
control signalling [11]. The first solution to enable value utilizing the new possibilities.
advanced services in on top of H.323 was ITU-T Many vendors and carriers have already made
specification series H.450. Its idea was to specify significant investments in H.323. Equipment and
individual supplementary services similar to current software have been at commercial stage for quite a
PSTN services. period. However, at the services side the progress has
The protocol for all H.450-based services is defined in been a lot slower. Apart from H.450 services and the
H.450.1. It is derived from QSIQ protocol used proprietary implementations, there hasn’t been very
between private branch exchanges (PBX), so it can be much service implementation capabilities.
seen as a protocol for IP PBX services. One large
difference to PSTN model is that most of the service
logic is in terminal equipment (TE). Since the services 6 Example service architecture
are visible in the protocol and the TE’s execute the The interfaces presented in chapters 2-4 are typically
services, it is necessary to both endpoints to understand implemented within a SIP proxy server. Also other SIP
the logic of the service to be used. This is a major signalling server types can host services and the system
disadvantage, because services will work completely can also be referred as an application server. More
correctly only if all the TE’s have the same release of precise description about the overall architecture can be
H.323. found in [12]. Figure 5 depicts an example of the
The actual services are defined in H.450.2 and up. C internal architecture of the application server.
Current version (H.323 v. 4) includes H.450.2 to
H.450.12: H.450.2 for call transfer, H.450.3 for call
diversion (forwarding, deflection) H.450.4 for call CPL Scripts
hold, H.450.5 for call park and pickup, H.450.6 for
message waiting indication, H.450.7 for call waiting, CPL
H.450.8 for name identification, H.450.9 for call
completion, H.450.10 for call offer, H.450.11 for call CPL Servlet SIP Servlets SIP-CGI scripts
intrusion, and H.450.12 for additional common
SIP Servlet API SIP-CGI
information network services.
SIP proxy/application server core
5.2 Non-H.450-based services
H.450-based services are a bit cumbersome to deploy.
All the services are specified by ITU-T and often all Figure 5 Example service architecture
the TE’s must support the same version of H.323.
Another solution is to separate the service logic from In the example architecture, both servlets and CGI
the TE’s and implement the services in the gatekeeper. scripts communicate directly with the signalling server
Particularly routing related services could be offered by through respective interfaces. CPL scripts are handled
the gatekeeper. by a servlet specialized in that task. CPL support could
So far gatekeeper services have been proprietary be also implemented directly in the signalling server or
implementations. There’s been some discussion, through CGI scripts. In general, this is only a reference
whether IN should be integrated with gatekeepers. Also architecture, application servers or similar components
other alternatives – maybe similar to CGI or Servlets – can be realized in various ways.
could be developed. Since CPL is independent of the
signalling protocol, also CPL servers could be
implemented in an H.323 environment. 7 Conclusions
What comes to signalling and media transmission, IP
telephony isn’t going to change much. In the long term,
87
of course operation costs will reduce, because it won’t [7] Lennox, Jonathan et al: Common Gateway
be necessary to maintain two separate networks. Issues Interface for SIP, IETF, January 2001
like signalling delays and voice quality are going to http://www.ietf.org/rfc/rfc3050.txt
stay pretty much the same (if they will degrade, users [8] Kristensen, Anders; Byttner, Anders: The SIP
will complain). Of course more advanced codecs and Servlet API, IETF, September 1999,
other improvements are being developed but generally http://www.cs.columbia.edu/~hgs/sip/drafts/draft-
there isn’t much to do. kristensen-sip-servlet-00.txt
The part that is going to change most radically is the [9] Schulzrinne, Henning: SIP Drafts: APIs and
services. The existing services in the PSTN and the Programming Environments, Columbia
WWW can be combined. Some examples of the University, ongoing work,
combination are click-to-dial, Unified Messaging (UM) http://www.cs.columbia.edu/~hgs/sip/drafts_api.ht
and different kinds of information services. Also ml
completely new kind of services will emerge. The tools [10] Java 2 Platform, Standard Edition, v 1.3 API
used to implement these services are going to be Specification, Sun Microsystems, 1993-2000,
numerous, which can be seen already from the variety http://java.sun.com/j2se/1.3/docs/api/index.html
of service implementation techniques used in WWW. [11] Liu, Hong; Mouchtaris, Petros: Voice over IP
Some of them have already been adopted in IP Signalling: H.323 and Beyond, IEEE
telephony. CGI and servlets are being standardized for Communications Magazine, October 2000
SIP, and components like Java Beans are widely used [12] Isomäki, Markus: SIP Service Architecture,
in service creation environments. Just wait for the IP Helsinki University of Technology, May 2001
telephony equivalents of ASP, JSP, JavaScript,
VBScript, VRML, FutureSplash, Shockwave and
others to appear.
Like now everyone can run a web server, in the future
communications services could be distributed among
individuals. There is a project similar to Apache
starting to implement an open source SIP proxy server
with CGI and servlets. It could be downloaded and
installed by anyone, and services could be developed as
in a kind of “home-made telephone exchange”. Of
course carrier grade communications services will have
their own role regardless of the new, more open
solutions. How exactly the transition is going to
happen, is still to be seen.
References
[1] Schulzrinne, Henning et al: SIP: Session Initiation
Protocol, IEFT, March 1999 - April 2001,
http://www.ietf.org/rfc/rfc2543.txt,
http://search.ietf.org/internet-drafts/draft-ietf-sip-
rfc2543bis-02.txt
[2] ITU-T Recommendation H.323, Packet-Based
Multimedia Communications Systems, since 1996
[3] Lennox, Jonathan; Schulzrinne, Henning: CPL: A
Language for User Control of Internet Telephony
Services, IETF, November 14 2000,
http://search.ietf.org/internet-drafts/draft-ietf-iptel-
cpl-04.txt
[4] Bray, T. et al: Extensible markup language (XML)
1.0 (second edition), W3C, October 2000
[5] Dawson, F; Stenerson, D.: Internet Calendaring
and Scheduling Core Object Specification
(iCalendar), IETF, November 1998,
http://www.ietf.org/rfc/rfc2445.txt
[6] Schulzrinne, Henning: SIP Implementations,
Columbia University, ongoing work,
http://www.cs.columbia.edu/~hgs/sip/implementati
ons.html
88
MASTER SLAVE PROTOCOL
Sunesh Kumra
Nokia Networks
Takimo 1, Pitajanmaki
Helsinki
Sunesh.Kumra@nokia.com
89
for the end-users to buy and place in their home. This • Circuit switches, or packet switches, which can
avoids the need for two-stage dialing since the users offer a control interface to an external call control
telephone will already be connected to the gateway! element.
MGCP assumes a connection model where the basic Note: The examples of gateways give above are just
constructs are endpoints and connections. Endpoints functional classification of gateway. It is possible that
are sources or sinks of data and could be physical or two or more gateways explained above are present in
virtual. Example of physical endpoints is an interface the same physical gateway.
on a gateway that terminates a trunk connected to a
PSTN switch. Example of a virtual endpoint is an 2.2 Calls and Connections
audio source in an audio- content server. Connections are created on the call agent on each
Connections may be either point to point or multipoint. endpoint that will be involved in the "call." Each
A point to point connection is an association between connection will be designated locally by a connection
two endpoints with the purpose of transmitting data identifier, and will be characterized by connection
between these endpoints. Once this association is attributes.
established for both endpoints, data transfer between When the two endpoints are located on gateways that
these endpoints can take place. A multipoint are managed by the same call agent, the creation is
connection is established by connecting the endpoint to done via the three following steps:
a multipoint session. 1. The call agent asks the first gateway to "create a
Connections can be established over several types of connection" on the first endpoint. Denoted by Step
bearer networks: 1 in Figure 1. The gateway allocates resources to
• Transmission of audio packets using RTP and that connection, and respond to the command by
UDP over a TCP/IP network. providing a "session description." (step 2) The
• Transmission of audio packets using AAL2, or session description contains the information
another adaptation layer, over an ATM networks. necessary for a third party to send packets towards
• Transmission of packets over an internal the newly created connection, such as for example
connection, for example the TDM backplane or the IP address, UDP port, and packetization
interconnection bus of a gateway. parameters.
2. The call agent then asks the second gateway to
2.1 Telephony Gateway "create a connection" on the second endpoint.
A telephony gateway is a network element that (Step 3) The command carries the "session
provides conversion between the audio signals carried description" provided by the first gateway. The
on telephone circuits and data packets carried over the gateway allocates resources to that connection, and
Internet or over other packet networks. Examples of respond to the command by providing its own
gateways are: "session description."( Step 4).
• Trunking gateways, that interface between the 3. The call agent uses a "modify connection"
telephone network and a Voice over IP network. command to provide this second "session
Such gateways typically manage a large number of description" to the first endpoint.(Step 5) Once
digital circuits this is done, communication can proceed in both
• Voice over ATM gateways, which operate much directions.
the same way as voice over IP trunking gateways, .
except that they interface to an ATM network. Media
• Residential gateways, that provide a traditional Gateway
analog (RJ11) interface to a Voice over IP Controller
4
network. Examples of residential gateways include
cable modem/cable set-top boxes, xDSL devices, 2
1
broad-band wireless devices
3
• Access gateways, that provide a traditional analog 5
(RJ11) or digital PBX interface to a Voice
over IP network. Examples of access gateways
include small-scale voice over IP gateways. Media Media
• Business gateways, that provide a traditional Gateway 1 MEDIA Gateway 2
digital PBX interface or an integrated "soft
PBX" interface to a Voice over IP network.
• Network Access Servers that can attach a
"modem" to a telephone circuit and provide
data access to the Internet. It is expected, in
the future, the same gateways will combine Endpoint 1 Endpoint 2
Voice over IP services and Network Access Figure 1: Call Setup
services.
90
When the two endpoints are located on gateways that 3 DeleteConnection CA -> GW
are managed by the different call agents, these two call 4 NotificationRequest CA --> GW
agents shall exchange information through a call-agent 5 Notify CA <-- GW
to call-agent signaling protocol, in order to synchronize 6 AuditEndpoint CA --> GW
the creation of the connection on the two endpoints. 7 AuditConnection CA --> GW
Once established, the connection parameters can be 8 RestartInProgress CA <-- GW
modified at any time by a "modify connection" 9 Endpoint Configuration CA --> GW
Command. The call agent may for example instruct the
gateway to change the compression algorithm used on We shall now look into the individual MGCP
a connection, or to modify the IP address and UDP port Commands. Every command is represented by a few
to which data should be sent, if a connection is parameters, details on what those parameters can be
"redirected." found in Appendix B. For more information on how
command is represented, check the RFC 2705.
The call agent removes a connection by sending to the
gateway a "delete connection" command. The gateway
2.5.1 Endpoint Configuration
may also, under some circumstances, inform a gateway The EndpointConfiguration commands are used to
that a connection could not be sustained specify the encoding of the signals that will be received
by the endpoint. For example, in certain international
2.3 Usage of SDP telephony configurations, some calls will carry mu-law
encoded audio signals, while other will use A-law. The
The Call Agent uses the MGCP to provision the Call Agent will use the EndpointConfiguration
gateways with the description of connection parameters command to pass this information to the gateway.
such as IP addresses, UDP port and RTP profiles.
These descriptions will follow the conventions Command is represented by:
delineated in the Session Description Protocol which is ReturnCode
now an IETF proposed standard, documented in RFC EndpointConfiguration( EndpointId,
2327. BearerInformation)
SDP allows for description of multimedia conferences.
This version limits SDP usage to the setting of audio
2.5.2 Notification Request
The Notification Request commands are used to
circuits and data access circuits. The initial session
request the gateway to send notifications upon the
descriptions contain the description of exactly one
occurrence of specified events in an endpoint. For
media, of type "audio" for audio connections, "nas" for
example, a notification may be requested for when a
data access.
gateway detects that an endpoint is receiving tones
2.4 High Availability and Load Balancing associated with fax communication.
One of the nice features of this command is the
in MGCP association of actions with each of the events. Using
Call Agents are identified by their domain name, not this facility, the communication and processing of
their network addresses, and several addresses can be information between the two entities can be optimized.
associated with a domain name. In a typical To each event is associated an action, which can be:
configuration, the MG sends Notifications to the CA. • Notify the event immediately, together with the
After trying to contact the CA for some configurable accumulated list of observed events,
number of times and not getting any response back, it • Accumulate the event in an event buffer, but don't
starts contacting the other (back-up) MGC within the notify yet.
same domain name. • Accumulate according to Digit Map.
If a CA is overloaded, it can inform the MG about the
same, by changing the Notified Entity with the MG to a Command is represented by:
new CA. Therefore, when the MG has to deliver the ReturnCode
next Notification, it does so to the new CA. NotificationRequest( EndpointId,
[NotifiedEntity,]
2.5 MGCP Commands [RequestedEvents,]
The table below lists the various MGCP Commands. RequestIdentifier,
CA denotes the Call Agent and GW denotes the [DigitMap,]
Gateway. [SignalRequests,]
CA --> GW would mean that the command is send [QuarantineHandling,]
from CA to GW. [DetectEvents,]
[encapsulated EndpointConfiguration])
Table 3: MGCP Commands
2.5.3 Create Connection
Sr no. Commands Command flow This command is used to create a connection between
1 CreateConnection CA --> GW two endpoints. In addition to the necessary parameters
2 ModifyConnection CA --> GW that enable a media gateway to create a connection, the
91
localConnectionOptions parameter provides features
for quality of service, security, and network related In some circumstances, a gateway may have to clear a
QOS. connection, for example because it has lost the resource
associated with the connection, or because it has
Command is represented by: detected that the endpoint no longer is capable or
ReturnCode, willing to send or receive voice. The gateway
ConnectionId, terminates the connection by using a variant of the
[SpecificEndPointId,] DeleteConnection command.
[LocalConnectionDescriptor,]
[SecondEndPointId,] 2.5.6 Audit Endpoint
[SecondConnectionId] The Audit EndPoint command can be used by the Call
CreateConnection(CallId, Agent to find out the status of a given endpoint. This
EndpointId, feature has been inherited from the switch
[NotifiedEntity,] environment.
[LocalConnectionOptions,]
Mode, Command is represented by:
[{RemoteConnectionDescriptor | ReturnCode,
SecondEndpointId}, ] EndPointIdList|{
[Encapsulated NotificationRequest,] [RequestedEvents,]
[Encapsulated [DigitMap,]
EndpointConfiguration]) [SignalRequests,]
2.5.4 Modify Connection [RequestIdentifier,]
This command is used to modify the characteristics of [NotifiedEntity,]
a gateway's "view" of a connection. This "view" of the [ConnectionIdentifiers,]
call includes both the local connection descriptors as [DetectEvents,]
well as the remote connection descriptor. [ObservedEvents,]
[EventStates,]
Command is represented by: [BearerInformation,]
ReturnCode, [RestartReason,]
[LocalConnectionDescriptor] [RestartDelay,]
ModifyConnection(CallId, [ReasonCode,]
EndpointId, [Capabilities]}
ConnectionId, AuditEndPoint(EndpointId,
[NotifiedEntity,] [RequestedInfo])
[LocalConnectionOptions,] 2.5.7 Audit Connection
[Mode,] The Audit Connection command can be used by the
Call Agent to retrieve the parameters attached to a
[RemoteConnectionDescriptor,] connection.
[Encapsulated NotificationRequest,]
[Encapsulated Command is represented by:
EndpointConfiguration]) ReturnCode,
2.5.5 Delete Connection [CallId,]
This command is used to terminate a connection. As a [NotifiedEntity,]
side effect, it collects statistics on the execution of the [LocalConnectionOptions,]
connection. If there are more than one gateway [Mode,]
involved, the call agent will send the Delete [RemoteConnectionDescriptor,]
Connection command to each of the media gateways. It [LocalConnectionDescriptor,]
is also possible for the Call Agent to delete multiple [ConnectionParameters]
connections at the same time, using for example wild AuditConnection(EndpointId,
card options. ConnectionId,
RequestedInfo)
Command is represented by:
ReturnCode, 2.5.8 Restart in Progress
Connection-parameters The RestartInProgress command is used by the
DeleteConnection(CallId, gateway to signal that An endpoint, or a group of
EndpointId, endpoint, is taken in or out of service.
ConnectionId,
[Encapsulated NotificationRequest,] Command is represented by:
[Encapsulated ReturnCode,
EndpointConfiguration]) [NotifiedEntity]
92
RestartInProgress ( EndPointId, 8 CA finds the IP address that serves the dialed
RestartMethod, number for EP2 from the database.
[RestartDelay,] 9 After CA knows the IP address of RGW B, it
[Reason-code]) sends Create Connection command to it.
10 RGW B responds sending back its SDP.
11 CA now sends the SDP from RGW B to RGW A
in the Modify Connection command. AT this point
3 Protocol At Work two legs of the call are established in half duplex
mode.
We shall now see how MGCP works with the help two
12 CA instructs RGW B to start ringing by sending
examples.
Notification Request.
3.1 MGCP in all IP Network 13 CA notifies EP1 that EP2 is ringing
14 EP2 answers the call and the RGW B sends the
Let us now see how MGCP works in the case of all IP CA Notification that EP2 is answering the call.
network. In the figure RGW = Residential Gateway, 15 CA sends a Notification Request to RGW A to
CA = Call Agent and EP = Endpoint. stop ringing
For the sake of discussion, it is assumed that the two 16 CA sends Modify Connection to RGW A to
EPs, which want to talk with each other, are under the change the communication mode from half duplex
control of the same CA. to full duplex.
In the figure below the solid lines denote the signalling 17 The EP1 and EP2 are now talking!
path and the dashed line denotes the media flow. The
RGW, CA and database are all part of the IP Network.
3.2 MGCP in PSTN - IP Network
This is the case where the User A is in the PSTN
network and he wants to call to an IP phone.
As before the solid lines denote the signaling path and
the dotted lines denote the media path.
CA Database Point to be noted is that SG and TGW are on the edge
of the IP cloud. They interface with both the IP world
and SS7 and PSTN world respectively.
RGW A RGW B
CA Database
EP1 EP2
SG
STP
RGW
Figure 2: MGCP in all IP Network TGW B
1 CA directs the RGW A to look for an off-hook A
event and report it. Sends a Notification request to SSP
RGW A.
EP2
2 RGW A goes off-hook and the same is detected by
the RGW A and Notification is sent to the CA. EP1
3 CA looks for the service associated with the off-
hook event and asks the RGW A to collect the
digits and play dial tone to EP1 Figure 3: MGCP in PSTN-IP Network
4 RGW A accumulates the digits and send
Notification to CA.
1 EP1, which is in the PSTN world, dials the number
5 CA send a Notification Request to RGW A to stop
of EP2.
collecting digits and look for an on-hook event.
2 This number reaches SSP through EP1s local
6 CA seizes the incoming circuit (asks the RGW to
exchange.
create a call context) and then send the Create
3 SSP issues IAM (IAM is the ISUP Initial Address
Connection command to RGW A.
Message) to the CA, which is in the IP world. This
7 RGW A sends back the SDP (Session Description
IAM reaches SG via STP. SG is connected to IP
Parameter) to the CA.
world on one side and the SS7 world on the other.
93
SG converts the ISUP on SS7 to ISUP on IP and
sends the message to CA. The protocol provides commands for manipulating the
4 CA finds the IP address that serves the dialed logical entities of the protocol connection model,
number for EP2 from the database. Contexts and Terminations.
5 CA now sends the Create Connection command to Commands provide control at the finest level of
the TGW to connect to the incoming trunk using granularity supported by the protocol. For example,
CIC. TGW returns the SDP of the connection. Commands exist to add Terminations to a Context,
6 CA seizes the incoming trunk (asks the RGW to modify Terminations, subtract Terminations from a
create call context) and reserves the outgoing trunk Context, and audit properties of Contexts or
by sending the Create Connection to the RGW Terminations. Commands provide for complete
passing the SDP of TGW. control of the properties of Contexts and Terminations.
7 CA now sends Modify Connection to the TGW. This includes specifying which events a Termination is
8 CA requests the RGW to ring the called line by to report, which signals/actions are to be applied to a
sending Notification Request to the RGW. Termination and specifying the topology of a Context
9 When the CA receives the Ack from the RGW, it (who hears/sees whom). Most commands are for the
issues ACM to the SG. specific use of the Media Gateway Controller as
10 The SG forwards the ACM (ACM is the ISUP command initiator in controlling Media Gateways as
Address Complete Message) to the SSP. command responders. The exceptions are the Notify
11 EP2 goes off-hook, the RGW notifies the CA by and ServiceChange commands.
sending the Notification Request.
12 Now the voice channel has to be turned into the
Media Gateway
full duplex mode, the CA does this be sending the
Modify Connection command to the TGW.
13 CA then sends the answer message to the SG, the
STP forwards this message to the SSP. Context
14 The EP1 and EP2 are now talking!
Termination
SCN Bearer
Termination Channel
RTP Stream
4 MEGACO
MEGACO is used between elements of a physically Termination
decomposed multimedia gateway, i.e. a Media SCN Bearer
Gateway and a Media Gateway Controller. Megaco Channel
meets the requirements for a MGCP as defined in RFC
2705.
94
4.2 MEGACO Commands Internet TCP or UDP UDP
Protocol
Following are the various Megaco Commands. Evolution Formal extension Less structured
• Add. The Add command adds a termination to a process defined process,
context. The Add command on the first within the IETF managed by
Termination in a Context is used to create a and the ITU industry
Context consortia
• Modify. The Modify command modifies the
properties, events and signals of a termination.
• Subtract. The Subtract command disconnects a
Termination from its Context and returns statistics 6 Conclusion
on the Termination's participation in the Context. With Megaco you can do everything that you could
The Subtract command on the last Termination in have done with MGCP and more. Megaco would be
a Context deletes the Context. primarily used for the Media Gateway Control in the
• Move. The Move command atomically moves a future. MGCP is being tested in many networks today
Termination to another context. and should soon be operational commercially, but the
• AuditValue. The AuditValue command returns the popularity is Megaco is fast rising. Since MGCP would
current state of properties, events, signals and be soon deployed, so it is likely to stay for some time.
statistics of Terminations. However the networks that will appear maybe a year
• AuditCapabilities. The AuditCapabilities from now will likely use Megaco for Media Gateway
command returns all the possible values for Control. So I see that MGCP and Megaco will co-exist
Termination properties, events and signals allowed for some years, before we mainly have Megaco for
by the Media Gateway. Media Gateway Control.
• Notify. The Notify command allows the Media
Gateway to inform the Media Gateway Controller
of the occurrence of events in the Media Gateway.
• ServiceChange. The ServiceChange Command References
allows the Media Gateway to notify the Media [1] Request for Comments: 2705: Media Gateway
Gateway Controller that a Termination or group of Control Protocol
Terminations is about to be taken out of service or
has just been returned to service. ServiceChange [2] JAIN MGCP API, version 0.9.
is also used by the MG to announce its availability
to an MGC (registration), and to notify the MGC [3] IP Telephony Packet-based multimedia
of impending or completed restart of the MG. The communications systems
MGC may announce a handover to the MG by
sending it a ServiceChange command. The MGC [4] www.pulver.com
may also use ServiceChange to instruct the MG to
take a Termination or group of Terminations in
or out of service.
95
• DigitMap: DigitMap allows the Call Agent to
provision the gateways with a digit map according
to which digits will be accumulated. If this
Appendix A parameter is absent, the previously defined value is
Glossary retained.
• SignalRequests: SignalRequests is a parameter that
Terms Meaning
contains the set of signals that the gateway is
STP Signaling Transfer Points asked to apply to the endpoint, such as, for
SP Signaling Point example ringing, or continuity tones. Signals are
ISUP ISDN User Part identified by their name, which is an event name,
SSP Service Switching Points and may be qualified by parameters.
SCP Service Control Points
• QuarantineHandling: The QuarantineHandling
TGW Trunk Gateway
parameter specifies the handling of "quarantine"
RGW Residential Gateway
events, i.e. events that have been detected by the
EP Endpoint
gateway before the arrival of the
MGCP Media Gateway Control Protocol
NotificationRequest command, but have not yet
CA Call Agent
been notified to the Call Agent.
MG Media Gateway
• DetectEvents: DetectEvents specifies a list of
SG Signaling Gateway
events that the gateway is requested to detect
JAIN Java APIs for Integrated Networks
during the quarantine period.
SGCP Simple Gateway control protocol
RFI Request for Information • ConnectionId: ConnectionId uniquely identifies
RFP Request for Proposal the connection within one endpoint.
SS7 Signaling System No. 7 • SpecificEndpointId: SpecificEndPointId parameter
PSTN Public Switched Telephone Network identifies the responding endpoint when returned
IN Intelligent Network from a CreateConnection command.
UDP User Datagram Protocol • LocalConnectionDescriptor:
ITU International Telecommunication Union LocalConnectionDescriptor is a session
IETF Internet Engineering Task Force description that contains information about
IP Internet Protocol addresses and RTP ports, as definedin SDP.
IAM Initial Address Message • SecondEndpointId: When a SecondEndpointId is
CIC Circuit identification code returned from a CreateConnection command, the
ACM Address Complete Message command really creates two connections that can
VON Voice on Net be manipulated separately through
ModifyConnection and DeleteConnection
commands.
• SecondConnectionId: When this is returned from a
Appendix B CreateConnection, it identifies the second
connection.
• ReturnCode: ReturnCode is a parameter returned • LocalConnectionsOptions:
by the gateway. It indicates the outcome of the LocalConnectionOptions is a parameter used by
command and consists of an integer number the Call Agent to direct the handling of the
optionally followed by commentary. connection by the gateway. Some of the fields
• EndpointId: EndpointId is the name for the contained in LocalConnectionOptions are:
endpoint in the gateway where command executes. Encoding Method, Packetization period,
• BearerInformation: BearerInformation is a Bandwidth, Type of Service,Usage of echo
parameter defining the coding of the data received cancellation and so on.
from the line side. • Mode: Mode indicates the mode of operation for
• NotifiedEntity: NotifiedEntity is specifies where this side of the connection. The mode are "send",
the notifications should be sent. When this "receive", "send/receive", "conference", "data",
parameter is absent, the notifications should be "inactive", "loopback", "continuity test", "network
sent to the originator of the NotificationRequest. loop back" or "network continuity test."
• RequestedEvents: RequestedEvents is a list of • DetectEvents: DetectEvents, the list of events that
events that the gateway is requested to detect and are currently detected inquarantine mode.
report. Such events include, for example, fax • RestartMethod: The RestartMethod parameter
tones, continuity tones, or on-hook transition. To specified the type of restart of the endpoint. The
each event is associated an action methods include "graceful" and "forced".
• RequestIdentifier: RequestIdentifier is used to • RestartDelay: The parameter is expressed as a
correlate the request with the notifications that it number of seconds. If the number is absent, the
triggers. delay value should be considered null.
96
• Capabilities: The capabilities for the endpoint are
similar to the LocalConnectionOptions parameter
and including event packages and connection
modes.
Appendix C
Mentioned below are some interesting comments from
Speakers during the VON conference.
• In 1998 there were more than 1 trillion minutes of
POTS usage.
• The US market for Telephony services is about
$250 billions and the global telecom service
market is about $800 billion.
• The cross-over for the wide-area data traffic
exceeding voice traffic is happening about now,
but voice revenues are much greater than the data
revenues.
• By 2004, 5% to 20% of long distance calls will be
VoIP.
• Circuit switching will be dead by 2005.
• Voice will be only 1% of the total global network
traffic by 2008.
• The worldwide market for IP Telephony will grow
from $480 million in 1999 to $19 billion in
2004.
97
Network dimensioning for voice over IP
Tuomo Hakala
Oy Datatie Ab
tuomo.hakala@datatie.fi
• Overall delay
Factors related to the applications are: When a VoIP packet is transferred through an IP
network, it will experience delay that is caused by:
• Overall packet loss
• Transmission delay between the nodes, depends on
• Jitter buffers the frame size and the transmission speed
98
• Switching and processing delay in the nodes, the identity of the participants [4]. RTP and RTCP are
time to switch a frame from an input port to an mostly used on top of User Datagram Protocol (UDP)
output port [3], which provides the use of a port number and a
checksum. The use of UDP enables also the use of IP
• Propagation delay, depends on the characteristics multicast i.e. sending packets to IP multicast addresses.
of the transmission media and the distance This means that a RTP stream generated by a single
between the nodes source can be received by several destinations. [1]
99
Figures 1 to 5 show this probability P(I) with activity
0,09 1,2
rate a=0,5 and different values of N. 0,08
1
0,07
0,06 0,8
Cumulative
0,35 1,2 0,05
P(I)
0,6
0,3 1 0,04
0,25 0,03 0,4
0,8
Cumulative
0,02
0,2 0,2
0,01
P(I)
0,6
0,15 0 0
0,4
15
22
29
36
43
50
57
64
71
78
85
92
99
1
8
0,1
0,05 0,2 I = the number of active one way voice channels (N=100,
a=0,5)
0 0
1 2 3 4 5
Figure 4: The probability of having I active voice
I = the number of active one way voice channels (N=5,
a=0,5) channels in one direction of a link when the number of
conversations is N=100 and the activity rate is a=0,5.
Figure 1: The probability of having I active voice
channels in one direction of a link when the number of
0.03 1.2
conversations is N=5 and the activity rate is a=0,5.
0.025 1
0.02 0.8
Cumulative
0,3 1,2
P(I)
0.015 0.6
0,25 1
0.01 0.4
0,2 0,8
Cumulative
0.005 0.2
P(I)
0,15 0,6
0 0
127
190
253
316
379
442
505
568
631
694
757
820
883
946
64
1
0,1 0,4
0,05 0,2 I = the number of active one way voice channels (N=1000,
a=0,5)
0 0
1 2 3 4 5 6 7 8 9 10
Figure 5: The probability of having I active voice
I = the number of active one way voice channels (N=10,
a=0,5) channels in one direction of a link when the number of
conversations is N=1000 and the activity rate is a=0,5.
Figure 2: The probability of having I active voice
channels in one direction of a link when the number of The cumulative graphs in the figures 1 to 5 show the
conversations is N=10 and the activity rate is a=0,5. probability of having maximum I voice channels active
out of N at the same time in one direction of a link.
This can be used in link sizing. As an example, if we
0,16 1,2
know that the maximum number of conversations on a
0,14
0,12
1 link is 1000 and we want to be 99% sure that all
0,8 simultaneously active voice channels get all of their
Cumulative
0,1
packets through the link, we size the link for the
P(I)
0,08 0,6
0,06 bandwidth of 536 times the bandwidth required by a
0,4
0,04 single voice channel.
0,2
0,02
0 0 Table 1 shows I, the maximum number of active voice
channels in one direction of a link with various values
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
100
Table 1: I, the maximum number of active voice Table 4 shows the frame frequencies of three codecs.
channels out of N in one direction of a link, with the The frame frequency can be used to calculate the total
probability of 99% and a=0,5. For N=5, the 97% bandwidth of a single active voice stream when the
value is shown. total header overhead is known.
N I Probability
Table 4: Frame frequencies of three codecs
5 4 97%
10 8 99%
Codec G.723.1 G.723.1 G.729
30 21 99% (5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
100 61 99% Payload size 20 24 10
1000 536 99% (octets)
Sample (ms) 30 30 10
3.3 Buffering Frame 33,125 32,8125 100
Buffering in the network increases jitter and therefore frequency
reduces interactivity. It is good practice to dimension (1/s)
VoIP links assuming no buffering in the network. This
leads to some overprovision for slow links, but this
overhead can be used by non real-time traffic in an IP Table 5 shows the total level 2 frame size when one
network designed for both voice and data [1]. In an IP frame contains a single voice sample and considering
network with mixed voice and data the bandwidth the notes from table 3. Including more than one voice
requirements of VoIP are small compared to the sample in one packet would cause additional
bandwidth used for data in today’s IP networks. packetization delay and therefore it is not
recommended.
3.4 Link sizing Table 5: Total level 2 frame size (octects)
Table 2 shows the transport header overhead of IPv4,
UDP and RTP. Codec G.723.1 G.723.1 G.729
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
Table 2: Transport header overhead Frame Relay 62 66 52
PPP 68 72 58
Protocol Overhead POS 76 80 66
(octets) ATM 106 106 106
IPv4 (Internet Protocol version 4) [2] 20
UDP (User Datagram Protocol) [3] 8
Table 6 shows the total bandwidth of a single active
RTP (Real-time Transport Protocol) [4] 12
voice stream considering the frame frequencies
calculated in table 5. It is shown that the G.729 coded
voice stream that is run over an ATM link requires
Table 3 shows the header overhead with the following
more bandwidth than 64 kbit/s which is the bandwidth
level 2 technologies: Frame Relay, PPP (Point-to-Point
required by a G.711 codec run over a TDM link.
protocol), POS (Packet over SONET/SDH),
ATM/AAL5 with LLC/SNAP (Asynchronous Transfer
Mode, ATM Adaptation Layer 5, Logical Link
Control/Subnetwork Access Protocol). Table 6: Total required bandwidth of a single active
voice stream (kbit/s)
Table 3: Total header overhead
Codec G.723.1 G.723.1 G.729
Level 2 framing Frame PPP POS ATM, AAL5, (5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
Relay LLC/SNAP Frame Relay 16,430 17,325 41,600
(Note) PPP 18,020 18,900 46,400
Level 2 header 2 8 16 8+8 POS 20,140 21,000 52,800
(octets) ATM 28,090 27,825 84,800
IPv4+UDP+RTP 40 40 40 40
headers (octets) Table 7 shows the maximum number of simultaneously
Header overhead 42 48 56 56 active voice streams in one direction with zero packet
(octets) loss on a SDH/STM-1 link with POS and ATM/AAL5.
The available bandwidth on a SDH/STM-1 link is
Note: With ATM, add AAL5 padding octets to get 149,76 Mbit/s from the 155 Mbit/s link speed.
multiples of 48 and add 5 octets for every 48 octets to
get the 53 octet cell size.
101
Table 7: Maximum number of simultaneously • Integrated Services (IntServ) is a stateful approach
active voice streams in one direction with zero where resources are reserved in the network before
packet loss on a SDH/STM-1 link with POS and data starts to flow along the reserved path. [8]
ATM/AAL5 (149,76 Mbit/s available from the 155
Mbit/s link speed) • Differentiated Services (DiffServ) is a stateless
approach where real-time traffic is marked to get
Codec G.723.1 G.723.1 G.729 preferred treatment in the network. [9] [5]
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s)
POS 7 436 7 131 2 836
ATM, 5 331 5 382 1 766 4.1 Integrated Services (IntServ)
AAL5 IntServ model proposes two service classes in addition
to best-effort service: guaranteed service and
Table 8 shows the maximum number of simultaneously controlled-load service. Guaranteed service is for
active voice streams in one direction with zero packet applications requiring a fixed delay bound. Controlled-
loss on a 64k TDM link with Frame Relay and PPP. load service is for applications requiring reliable and
enhanced best-effort service. [12]
Table 8: Maximum number of simultaneously active
voice streams in one direction with zero packet loss on IntServ requires that resources are explicitly managed
a 64k TDM link with Frame Relay and PPP. for each real-time application. Routers must reserve
resources (e.g. bandwidth and buffer space) in order to
Codec G.723.1 G.723.1 G.729 provide specific QoS for each packet flow. This
(5,3 kbit/s) (6,3 kbit/s) (8 kbit/s) requires flow-specific states in the routers. [12]
Frame Relay 4 4 2
PPP 4 3 1 The four components of IntServ are:
102
• Services - Characteristics of packet transmission active at the same time and with a large number of
in one direction over a path in a network are conversations the link can be sized as only a bit more
defined by a service. DiffServ can be provided by than a half of the conversations were active at the same
two approaches: time.
• Quantitative DiffServ - QoS is specified in Buffering in the network increases jitter and therefore
deterministically or statistically quantitative reduces interactivity. It is good practice to dimension
terms of throughput, delay, jitter and/or loss. VoIP links assuming no buffering in the network.
• Priority based DiffServ - Services are When the bandwidth required by VoIP is calculated for
specified in terms of a relative priority of a low packet loss and no buffering is assumed in the
access to network resources. network, the network delay and jitter are minimized.
The receiving side must correct the remaining network
• Conditioning Functions and PHB - A user and a jitter and the desequencing of packets.
service provider must have a service level
agreement (SLA) in place that specifies the In an IP network with mixed voice and data traffic,
supported service classes and the amount of traffic some mechanism must be used to ensure that the
allowed in each class. Individual packets have bandwidth calculated for VoIP is not used by other
DiffServ (DS) fields that indicate the desired real-time traffic or non real-time traffic. There are two
service and these DS fields can be marked at hosts basic approaches to achieve this: Integrated Services
or at the access router or at the edge router in the (IntServ) and Differentiated Services (DiffServ).
service provider network. Packets are classified, IntServ is a stateful approach where resources are
policed and possibly shaped at the ingress of the reserved in the network before data starts to flow along
service provider network according to the rules the reserved path. DiffServ is a stateless approach
derived from the SLA. Between domains, service where real-time traffic is marked to get preferred
provider networks, DS fields may be remarked, if treatment in the network.
so defined in the SLA between the two service
providers. These traffic control functions at hosts,
access routers or edge routers are generically
called traffic conditioning. Per hop behavior References
(PHB) are defined to allocate buffer and [1] Hersent, Olivier; Gurle, David; Petit, Jean-Pierre:
bandwidth resources at each node among traffic IP Telephony, Packet-based multimedia
streams. PHB is applied to a DiffServ behavior communications systems; Great Britain, 2000,
aggregate and a DiffServ- compliant node. www.awl.com/cseng/, ISBN 0-201-61910-5
• DS CodePoint – DS field means the type of [2] Postel, Jon: Internet Protocol, RFC 791,
service (TOS) field in IPv4 and the traffic class September 1981
byte in IPv6. Six bits of this DS field are used as a
codepoint (DSCP) to select the PHB for a packet [3] Postel, Jon: User Datagram Protocol, RFC 768, 28
at each node. August 1980
• A node mechanism for achieving PHB – Buffer [4] Schulzrinne, Henning; Casner, Stephen L.;
management and packet scheduling mechanisms Frederick, Ron; Jacobson, Van: RTP: A Transport
are used in nodes to achieve a certain PHB. PHBs Protocol for Real-Time Applications, RFC 1889,
are defined as behavior characteristics relevant to January 1996
service provisioning policies instead of particular
implementation mechanisms. Various [5] Blake, Steven; Black, David L.; Carlson, Mark A.;
implementation mechanisms may be suitable for a Davies, Elwyn; Wang, Zheng; Weiss, Walter: An
particular PHB group. Architecture for Differentiated Services, RFC
2475, December 1998
103
[7] Trends in the Internet Telephony,
http://www.fokus.gmd.de/research/cc/glone/projec
ts/ipt/ (11 March 2001)
104
TRIP, ENUM and Number Portability
Nicklas Beijar
Networking Laboratory, Helsinki University of Technology
P.O. Box 3000, FIN-02015 HUT, Finland
Nicklas.Beijar@hut.fi
105
domain name is mapped into an IP address, which is address or host name. The signaling protocols allow
used for routing. In the telephone network, E.164 various formats of addresses to be used by the users.
numbers have traditionally been used as both names Users prefer to use E.164 or e-mail type addresses that
and addresses. However, due to number portability are familiar from traditional telephony or e-mail,
their roles have been separated. The number that the respectively. To set up a call, the name of the
user dials, which can be regarded as a name, is then destination is mapped to an IP address by a signaling
mapped into a routing number, which is an address. server. The signaling server can be manually
The dialed number is usually referred as a directory configured with the mappings for its local terminals.
number. It is also worth noting, that in many cases More usually however, the terminals must register to
entities that functionally are names are called the signaling server. Based on registration the mapping
addresses. is created. The server maintains a database of mappings
for its registered clients.
To transform the name into an address some type of
mapping method is needed. For the mapping of host The SIP architecture also includes a network element
names into IP addresses, the Domain Name Service named location server. The location servers store the
(DNS) [15],[16] is used. DNS is a distributed directory mappings on the behalf of the signaling servers. A
service based on DNS servers. Each server knows the location server may be used by a number of signaling
mapping of a range of hosts, or the address to a server servers. The location server may also be integrated
that has more detailed information. The parts of the with a signaling server. In this way we can generalize
domain names are analyzed in hierarchical order and to say that the location server stores the mapping, even
the mapping request is forwarded to more specific though the location server and signaling server in some
DNS servers until the mapping can be completed. cases are the same element. In case of separate servers,
the information is accesses with some directory access
protocol.
106
into the signaling server. A signaling server has a set of both calls to IP terminals and calls to PSTN
available gateways to use for external calls. For private destinations.
internal IP telephony networks, external numbers are
usually recognized by a preceding “0”.
In SIP the call can be set up using a gateway specified 3 Problem description
in the URL. The destination is then given as
“number@gateway”. This requires the user to know 3.1 Naming
that the destination is on the PSTN and also which
gateway should be used. If the gateway is down or if Traditionally E.164 numbers have been used on the
all lines are busy, the user must manually select telephone network and e-mail type addresses of format
another gateway. Another method is to let the signaling “user@domain” on the Internet. The signaling
server choose the gateway, whereas calls can be made protocols SIP and H.323 allows using multiple types of
by only giving a number. The server selects one from names, including both the above methods as well as IP
its list of available gateways. The H.323 protocol addresses. For Internet users, who have a keyboard
works in a much similar way. available, textual names are preferred since they are
easy to remember and deduce.
2.3 Number portability However, the problem arises when the networks are
interconnected. Callers on the PSTN have no keyboard
Number portability allows a user to change service
and a scheme for entering characters using number
providers, location or service type without changing
keypad would be too complicated. This limits the
the telephone number. Service provider portability is
PSTN users to entering numeric names. Consequently,
mandatory in many countries. The introduction of IP
an IP terminal must have a telephone number to be
telephony adds a new type of number portability:
accessible from the PSTN. The problem was
between different network types. [8]
recognized by TIPHON, which chose to equip IP-
terminals with an E.164 number. For calls within the IP
Today number portability is only implemented on the
network, other types of addressing can be used.
PSTN. The implementation of number portability
Unfortunately, this would require the user to know on
differs in different countries. Common to all
what type of network the destination is. When IP
implementations, is that the directory number dialed by
telephony is largely deployed, customers do not
the customer is mapped to either a routing number or a
necessary even know the underlying technology of
routing prefix. A routing number is a hierarchical
their own connection.
routing address, which can be digit-analyzed to reach
the correct country, network provider, end-office
As we saw in section 2.1, E.164 numbers can currently
switch and subscriber line. A routing prefix forms a
only be used between host registered to the same
routing number by adding some digits in front of the
signaling server. Using some propriety protocol,
directory number. The routing number replaces the
mapping can be distributed between smaller groups of
hierarchy that is lost, since the directory number space
servers, but there is no protocol for global distribution.
becomes flat due to number portability.
107
2. Given a phone number corresponding to a terminal 3.4 Routing and number portability
on the PSTN, determine the IP address of a gateway
For economic or quality related reasons a transit
capable of completing calls to that phone. The choice is
network of different type can be used, giving two more
influenced by a number of factors, such as policies,
call scenarios: PSTN-IP-PSTN and IP-PSTN-IP. Even
location, availability and features. This is called the
when only two network types are used, the transit
gateway location problem.
network must be selected. It is usually more cost
3. Given a phone number corresponding to a user of a
effective to hand over calls to IP destinations to the IP
terminal on the PSTN, determine the IP address of an
network near the origination point. On the other hand,
IP terminal owned by the same user. This type of
the voice quality is better if the call uses PSTN most of
mapping may be used if the PC services as an interface
the path. Possibly the caller could choose whether to
for the phone, for example for delivering a message to
route the calls via IP or PSTN using carrier selection
the PC when the phone rings.
mechanisms. Typically this would imply the use of a
prefix to select carrier. [23]
For calls from the PSTN to the IP network, the
selection of gateway is performed using normal routing
The call can thus propagate through several network
in the switched circuit network, which is static. On
types. Each time the call goes from one network type to
longer sight, it would also be necessary to dynamically
another, it has to pass a gateway where the media
select a gateway for these calls. This gives us a fourth
stream is converted. The conversions cause delay and
subproblem.
jitter, which decrease the quality. Therefore,
unnecessary media conversions should be avoided. It
would be good to know the type of the destination
3.3 The address mapping problem network already in the originating network.
To establish a call to a terminal on an IP network, the
destination IP address must be known. Alternatively With number portability numbers may move from one
the terminal can be identified by a host name, which is provider’s network to another, and even between
translated to an IP address by DNS. As terminals are network types. If a number belonging to a number
equipped with an E.164 number, a new mapping is block of a PSTN operator moves to an IP network,
required: from an E.164 name to an IP address. The calls from IP subscribers may unnecessarily be routed
address mapping problem usually refers to the task of through the PSTN.
locating terminals on the IP network.
When the switched circuit network and IP telephony 3.5 The gateway location problem
networks are interconnected, new call scenarios arise.
As the usage of IP telephony grows and the number of
Since the originating network and destination network
gateways increases, the management of gateways and
can be of two types, there are four basic call scenarios:
routes between the IP- and PSTN networks becomes
PSTN-PSTN, PSTN-IP, IP-PSTN and IP-IP. When
increasingly complex. In a situation where the IP
calls are setup, the first task is to determine the type of
network approaches the size of the PSTN, a large part
the destination network. A mapping from E.164 name
of the calls will pass through one or even several
to network type is required.
gateways on their path. For calls from the IP network
to the PSTN, the caller must locate a gateway that is
The required mappings could be solved with some type
able to complete calls to the desired destination. There
of directory. At a minimum, the mapping from E.164
may be several available gateways, and selecting the
number to network type and IP address must be
most suitable one is a nontrivial process.
supported. The directory must be scaleable too store
large amounts of mappings, possibly for all telephones
Currently the gateway must be selected by the user or
in the world. It must be capable to reply to a high rate
by the signaling servers. The selection and
of lookups, for each call that is set up. In practice, the
configuration of gateways to use involves manual
directory must therefore be distributed. The directory
work. The list of available gateways must be
must also propagate updates rather quickly when the
configured into the signaling servers and updated when
information changes.
new gateways become available. Additionally,
gateways may become blocked when all lines are in
Additionally the mapping is expected to be used with
use. The signaling server does not know which
several different services. In addition to voice calls, the
gateways are accessible.
IP network allows for video conferencing and e-mail
among others. Some method of locating the available
Connectivity to the PSTN means that every gateway is
contact modes and services is desired.
able to connect to nearly any terminal on the PSTN.
The number of available gateways can thus be very
large. The selection of which gateway to use is
influenced by a number of factors. Firstly, the location
of the gateway is important. For example, there is no
108
reason to use a gateway in a country far away to TRIP is modeled after the Border Gateway Protocol 4
connect parties in the same city. To minimize usage of (BGP-4) [10]. TRIP is like BGP-4 an inter-domain
resources it is important that the gateway is near the routing protocol driven by policies. The nodes of TRIP
path between the parties. are the location servers (LS), which exchange
information with other location servers. The
Secondly, business relationships are important. The information includes reachability information about
gateway service involves costs when calls are telephony destination, the routes towards these
completed to PSTN destination. Gateway providers, in destinations and properties of the gateways connecting
most cases, want to charge for using their gateways. the PSTN and IP network.
Because of this, the usage of gateways may be
restricted to the groups of users that have some type of TRIP uses the concept of Internet Telephony
established relationship with the gateway provider. The Administrative Domains (ITAD) in a similar way as
end user will probably not pay for the gateway service BGP-4 uses autonomous systems. The location servers
directly. Instead, the end user may have a relationship that are administered by a single provider form an
with an IP telephony service provider (ITSP). The ITAD. The ITAD may contain zero or more gateways.
ITSP may have own gateways or use the gateways of a The border of the ITAD does not have to correspond to
separate gateway provider. All these policies and the border of an autonomous system. The main
relationships influence in the selection of gateway. function of TRIP is to distribute information between
ITADs, but TRIP also contains functions for inter-
Additionally, the end user may have requirements on domain synchronization of routing information. It is
the gateway. The end user may prefer a certain not required that all ITADs in the world are connected.
provider or require a specific feature. The caller may Groups of ITADs can be formed that exchange
use a specific signaling protocol or media codec that is information with TRIP.
supported by only some gateways.
TRIP connects location servers with administratively
Keeping in mind that also the gateway capacity is created peer relationships. The location server forwards
limited, it is obvious that an automatic method for the information received from one peer to the other
gateway selection is required. Since the selection is peers. Hereby the location servers in one ITAD learn
largely driven by policies, some type of global about gateways in the other ITADs. The location server
directory of gateways is not suitable. Instead, a selects the routes to use in its own domain, and the
protocol for exchanging gateway information between routes to forward to neighboring domain according to
the providers would be a better solution. its local policies. The information can be modified
according to the policies before it is forwarded. In this
way, the provider can control the type of calls passing
through the domain.
4 Telephony Routing over IP
The location servers collect information and use it to
To solve the gateway selection problem, the Internet
reply to queries about routes to destinations. The query
Engineering Task Force (IETF) working group IP
protocol is not defined by TRIP. Any directory access
Telephony (IPTEL) began working on a protocol for
protocol can be used, for example LDAP [11].
distributing gateway information between gateway
providers and IP telephony providers. The protocol was
first called Gateway Location Protocol (GLP) but after
4.1 Operation of TRIP
finding the problem larger than merely locating
gateways, the protocol was renamed to Telephony The TRIP protocol, the structure and operation of a
Routing over IP (TRIP). The most important node, and the implementation details are specified in
documents of the work are the TRIP framework [6] and the TRIP specification draft [9].
the draft protocol specification [9].
TRIP location servers process three types of routes:
The working group found that a global directory for
gateway information is not feasible. The selection of 1. External routes received from external peers.
gateway is in large part driven by the policies of the 2. Internal routes received from another location
parties along the path of the call. Gateway information server in the same ITAD.
is exchanged between the providers and depending on 3. Local routes which are locally configured or
policies, made available locally and propagated to received from another routing protocol.
other providers. The providers create their own
databases of reachable phone numbers and the routes The routes are stored in the Telephony Routing
towards them. These databases can be different for Information Base (TRIB), whose structure is depicted
each provider. in Figure 1. The TRIB consists of four distinct parts.
1. The Adj-TRIBs-In store routing information that
has been learned from other peers. These routes are
109
the unprocessed routed that are given as input to the Table 1: The basic set of TRIP attributes
decision process. Routes learned from internal
location servers and from external location servers Name Description
are stored in separate Adj-TRIBs-In. Withdrawn routes List of telephone numbers that are
2. The Ext-TRIB stores the preferred route to each no longer available.
destination, as selected by the route selection Reachable routes List of reachable telephone
algorithm. numbers.
3. The Loc-TRIB contains the routes selected by Next hop server The next signaling server on the
applying the local policies to the routes in the path towards the destination.
internal peers’ Adj-TRIBs-In and Ext-TRIB. Advertisement The path that the route
4. The Adj-TRIBs-Out store the routes selected for path advertisement has traveled.
advertisement to external peers. Routed path The path that the signaling
messages will travel.
Atomic aggregate Indicates that the signaling may
traverse ITADs not listed in the
Local TRIB routed path attribute.
Local preference The intra-domain preference of the
location server.
Decision Process
Multi exit disc The inter-domain preference of the
route if several links are used.
Adj-TRIBs-In Communities For grouping destinations in groups
Adj-TRIBs-Out
(internal LSs) with similar properties.
ITAD topology For advertising the ITAD topology
Ext-TRIB to other servers in the same ITAD.
Authentication Authentication of selected
attributes.
Adj-TRIBs-In
Local Routes
(external peers)
The advertisements represent routes toward a gateway
through a number of signaling servers. A route must at
Figure 1: Structure of a TRIP node least contain the following attributes: withdrawn
routes, reachable routes, next hop server, advertisement
path and routed path. For an advertised route, the
TRIP uses the same state machine and the same withdrawn routes attribute is empty. The reachable
messages as BGP-4. The messages are the OPEN routes attribute contains the list of telephone number
message for establishing peer connection and ranges belonging to this route, and the corresponding
exchanging capability information, the UPDATE application protocol. The next hop server is the next
message for exchanging route information, the server that signaling messages are sent to. For the final
NOTIFICATION message for informing about error hop, it contains the address of the gateway. The
conditions, and finally the KEEPALIVE message for advertisement path is the path that this advertisement
ensuring that the peer node is running. has traveled through and the routed path is the path for
the signaling. These paths are lists of ITADs. They are
The routing information is transmitted in attributes of mainly used by the policy to select routes containing,
the TRIP messages. The specification includes a set of or not containing specific ITADs.
mandatory well-known attributes. In addition to the
well-known and mandatory attributes, optional 4.2 TRIP for gateways
attributes can be added to allow for expansion.
The TRIP framework [9] leaves the question open, how
Gateways have many properties that may need to be
the location servers learn about the gateways. Usually
advertised, so the expected large number of expansion
attributes must be handled correctly. An attribute flag the register message of SIP has been suggested.
indicates how a location server handles a message that However, the draft [14] points out the weaknesses of
using the register message and suggests that a subset of
it does not recognize. The flag can take a combination
TRIP could be used to export routing information from
of the values optional, transitive, dependent, partial and
gateways and soft switches to location servers. TRIP
link-state encapsulated.
manages the needed information transfer and keep-
The specification [9] defines the basic set of attributes alives more efficiently than other protocols and can
better describe the gateway properties. Two new
shown in Table 1. Additional attributes are defined in
attributes are proposed: circuit capacity for informing
separate drafts. An authentication attribute is defined in
about the number of free PSTN circuits, and DSP
[12] and a service code attribute is defined in [13].
capacity for informing about the amount of available
DSP resources. Because of their dynamic nature, these
110
are only transmitted to the location server that manages in conjunction with several application protocols, and
the gateway, and are not propagated. can for example, map a telephone number to an email
address.
A more lightweight version of TRIP can be used in the
gateways. Since the gateway does not need to learn
Table 2: Fields of the NAPTR record
about other gateways, it operates in send-only mode. It
neither needs to create any call routing databases. This Name Description
stripped down version, called TRIP-GW, is still Order The order in which records are
interoperable with normal TRIP nodes. Nevertheless, processed if a response includes
due to scalability problems it is recommended that several records.
location servers peering with gateways run a separate Preference The order in which records are
TRIP instance for TRIP-GW peers. processed if the records have the
same order value.
Service The resolution protocol and
resolution service that will be
5 Telephone Number Mapping available if the rewrite of the
While TRIP is carrying routes to destinations on the regexp or replacement field is
PSTN, a method for locating terminals on the IP applied.
network is still required. This problem is simpler than Flags Modifiers for how the next DNS
the gateway location problem, since the amount of lookup is performed.
information describing a terminal is less than the Regexp Used for the rewrite rules.
information about a gateway. TRIP could be used also Replacement Used for the rewrite rules.
for this purpose, but the complexity of it is not needed.
A simpler directory can be used. It has been suggested
that Domain Name System (DNS) [15], [16] could be Figure 2 shows some example NAPTR records with
used. An IETF working group called ENUM the E2U service. These records describe a telephone
(tElephone NUmber Mapping) was established to number that is preferably contacted by SIP and
specify the number mapping procedures. secondly by either SMTP or using the “tel” URI
scheme [20]. The result of the rewrite of the NAPTR
DNS is used to map domain names into IP addresses. record is a URL, as indicated by the “u” flag. The own
By constructing a domain name from the E.164 resolution methods of SIP and SMTP are used. In case
number, the DNS system can be used to map telephone of SIP, the result is a SIP URI, which is resolved as
numbers into IP addresses. More generally, the result described in [1]. In case of the “tel” scheme, the
of an ENUM lookup is a Uniform Resource Identifier procedure is restarted with a new E.164 number.
(URI) [18], which contains the signaling protocol and
the host name. An additional DNS lookup is thus
required to map the host name to an IP address. The
procedure is described in RFC 2916 [17], the main $ORIGIN 3.0.3.5.1.5.4.9.8.5.3.e164.arpa.
document specifying the ENUM service. IN NAPTR 10 10 “u” “sip+E2U”
“!^.*$!sip:nbeijar@tct.hut.fi!” .
ENUM uses the domain “e164.arpa” to store the
IN NAPTR 100 10 “u” “mailto+E2U”
mapping. Numbers are converted to domain names
using the scheme defined in [17]. The E.164 number “!^.*$!mailto:nbeijar@tct.hut.fi!” .
must be in its full form, including the country code. All IN NAPTR 100 10 “u” “tel+E2U”
characters and symbols are removed, only the digits “!^.*$!tel:+35894515303!” .
remain. Dots are put between the digits. The order of
the digits is reversed and the string “.e164.arpa” is
added to the end. This procedure will map, for Figure 2: Example NAPTR records
example, the number +358-9-4515303 into the host
name “3.0.3.5.1.5.4.9.8.5.3.e164.arpa”.
The draft [21] describes a telephone number directory
DNS stores information in different types of records. service based on ENUM. The model is divided into
The Naming Authority Pointer (NAPTR) record [19] is four levels.
used for identifying available ways to contact a node
with a given name. It can also be used to identify what The first level is a mapping of the telephone number
services exist for a specific domain name. The fields of delegation tree into authorities, to which the number
the NAPTR record are shown in Table 2. ENUM has been delegated. The hierarchical structure of DNS
defines a new service named “E.164 to URI”, which is used, and the mapping may involve one or several
maps one E.164 number to a list of URIs. The DNS queries, which are transparent from the user’s
mnemonic of the service is “E2U”. ENUM can be used point of view. The delegation maps the hierarchy of the
111
E.164 number to the DNS hierarchy, using the country 6.1 Call setup using ENUM
codes, area codes and other parts of the number. The
To illustrate the use of ENUM, we will study a call
first level mapping uses name server (NS) resource
setup situation, where the DNS records of Figure 3 are
records in DNS.
used. The figure shows the DNS configuration for the
top level delegations, the national delegations, a service
The second level is the delegation from the authority,
provider and a service registrar.
to which the number has been delegated, to the service
registrar. The registrar maintains the set of service
records for a given telephone number. Since there may
Sample top level delegations from ITU:
be several service providers for a given number, the
registrar has the role to manage service registrations 3.3.e164.arpa IN NS ns.FR.phone.net. ;France
and arbitrate conflicts between service providers. The 8.5.3.e164.arpa IN NS ns.FI.phone.net. ;Finland
112
similar to that on the IP network, but there are currently
no corresponding solutions like TRIP. The draft [22]
leaves the question open.
Voice path
LS DNS
Signaling path
POTS
PSTN Gateway
Phone SIP IP-based
SIP Client Network
Server
SIP IP-based
SIP Client Network
Server POTS
PSTN Gateway
Phone
113
7.2 Interworking and number portability Networking Laboratory at Helsinki University of
Technology. The suggested protocol, named Circuit
ENUM also solves number portability for hybrid
Telephony Routing Information Protocol (CTRIP),
PSTN-IP networks. The draft [22] separates three
automates the distribution of routing information
scenarios:
between operators and network elements. Information
1. The number moves within the PSTN.
is exchanged with other protocols in Numbering
2. The number moves between PSTN and IP.
Gateways. [24]
3. The number moves within the IP network.
For each scenario, the call setup procedure from both
PSTN and the IP network is described.
114
References [15] Mockapetris, P.: Domain names – concepts and
facilities, November 1987, IETF RFC 1034
[1] Handley, M. Schulzrinne, H., Schooler, E.,
Rosenberg J.: SIP: Session Initiation Protocol, [16] Mockapetris, P.: Domain names – implementation
March 1999, IETF RFC 2543 and specification, November 1987, IETF RFC
1035
[2] International Telecommunications Union
Telecommunication Standardization Sector, Study [17] Faltstrom, P.: E.164 number and DNS, September
group 16: Packet-based multimedia 2000, IETF RFC 2916
communications systems, February 1998, ITU-T
Recommendation H.323 [18] Berners-Lee, T., Fielding, R.T., Masinter, L.:
Uniform Resource Identifiers (URI): Generic
[3] International Telecommunications Union Syntax, August 1998, IETF RFC 2396
Telecommunication Standardization Sector: The
international public telecommunication numbering [19] Mealling, M., Daniel, R.: The Naming Authority
plan, Geneva, May 1997, ITU-T Recommendation Pointer (NAPTR) DNS Resource Record,
E.164 September 2000, IETF RFC 2915
[4] European Telecommunications Standards Institute: [20] Vaha-Sipila, A.: URLs for Telephone Calls, April
The Procedure for Determining IP Addresses for 2000, IETF RFC 2806
Routeing Packets on Interconnected IP Networks
that support Public Telephony, DTR 4006, 2000 [21] Brown, A.: ENUM Service Provisioning:
Principles of Operation, October 2000, draft-ietf-
[5] Postel, Jonathan: Simple Mail Transfer Protocol, enum-operation-01.txt
August 1982, IETF RFC 821
[22] Lind, S.: ENUM Call Flows for VoIP
[6] Rosenberg, J., Schulzrinne, H.: A Framework for Interworking, November 2000, draft-line-enum-
Telephony Routing over IP, June 2000, IETF RFC callflows-01.txt
2871
[23] Rosbotham, Paul: WG4 FAQ, TIPHON temporary
[7] Mensola, Sami: IP-verkon kommunikaatio- document (discussion)
palveluiden hallinta, November 1998, Master’s
Thesis [24] Raimo Kantola, Jose Costa Requena, Nicklas
Beijar: Interoperable routing for IN and IP
[8] Foster, Mark, McGarry, Tom, Yu, James: Number telephony, Computer Networks, Volume 35, Issue
Portability in the GSTN: An Overview, March 5, April 2001
2000, draft-foster-e164-gstn-np-00.txt
115