A Key Agreement Protocol For P2P VoIP Applications
A Key Agreement Protocol For P2P VoIP Applications
Authorized licensed use limited to: Istanbul Universitesi-Cerrahpasa. Downloaded on June 02,2024 at 21:34:03 UTC from IEEE Xplore. Restrictions apply.
SIP are described; while in section 4 there is a brief long-term secrets is compromised in the future is known as
description of the key agreement protocol implementation Perfect Forward Secrecy (PFS). Diffie-Hellman key
within a real SIP UA; finally in section 5 some conclusions exchange does provide PFS, and for this reason it has been
are drawn. chosen in our key agreement protocol as defined in section 3.
There are also other methods and variants for key
agreement in literature. An example are the password-
2. CURRENT SECURITY ASSOCIATION AND KEY authenticated key exchange (PAKE) methods that integrate a
AGREEMENT PROTOCOLS key agreement procedure, still based on asymmetric
cryptography, with the knowledge of a shared password.
An important aspect in the setup and maintenance of a SA However in the rest of the paper we will not consider PAKE
between two (or more) peers is the key-agreement method. algorithms since their application scenario is exactly the same
The key-agreement procedure is a mechanism whereby two as in case of password-based authenticated DH exchange.
or more parties can agree on a key (hereafter called master Currently there are several protocols that enforce security
key) in such a way that both influence the outcome. If for end-to-end communications and that include proper
properly done, this precludes undesired third-parties from mechanisms for SA establishment and key agreement. In the
forcing a key choice on the agreeing parties. Protocols that following sub-sections the main security protocols are briefly
are useful in practice also do not reveal to any eavesdropping described specifically focusing on their key agreement
party what key has been agreed upon. methods.
The main used approaches for key-agreement are:
1) pre-shared key derived – the master key is derived from a 2.1 TLS
pre-shared key; usually the derived key is influenced by
Transport Layer Security (TLS) [5] is a standard transport
random values and generated and exchanged by both
level protocol that provides security and data integrity for
parties in order to avoid the reusing of previously
connection oriented and TCP-based end-to-end
generated keys. The main problem of this approach is that
communications. A TLS communication involves three
it requires a previous relationship (the pre-shared key)
phases:
between the two parties;
- security algorithms negotiation,
2) secret key or public key encryption – the master key (or a
- key exchange and authentication,
pre-master key) is generated by one peer and transmitted
- secure data communication with symmetric cipher
to the other party encrypted through a shared secret key or
encryption and message authentication.
a public key of the recipient. The main problem is that
During the first phase, the client and server negotiate
both parties still have to previously agree on a pre-shared
cipher suites including the key exchange and authentication
secret key or on the receiver’s public key. In the latter
algorithms. The key exchange and authentication algorithms
case the initiator need to own the responder’s public key
are typically public key algorithms, although pre-shared key
or the responder’s digital certificate or the certificate of
(PSK) could also be used. For the key exchange, the
the CA that signed the responder’s certificate;
following methods are supported: RSA, DH, ECDH (Elliptic
3) Diffie-Hellman – the master key is generated through the
Curve DH), SRP (Secure Remote Password Protocol) PAKE,
Diffie-Hellman (DH) exponential key exchange, in which
PSK. For peer and key authentication: RSA, DSA, or
two parties jointly exponentiate a generator with random
ECDSA (Elliptic Curve DSA).
numbers, in such a way that an eavesdropper has no way
of guessing what the key is. The main disadvantage of
2.2 IKE
this approach is that it does not guarantees that a third
party tries to trick both parties forcing them to agree to The IPSec (IP Security) architecture [6] specifies both a
two different keys shared with itself. This is a Man-in- secure protocol, actually two different protocols AH
the-middle (MITM) attack and can be prevented by (Authentication Header) and ESP (Encapsulated Secure
authenticating the DH exchange. In turn this usually Payload), for securing IP packet-based communications, and
requires a pre-shared secret or public/private key pair in a protocol for setting up an IPSec SA, named IKE (Internet
order to provide and verify a MAC (message Key Exchange) [7]. IKE uses a DH key exchange to set up a
authentication code) or digital signature. shared session secret, from which cryptographic keys are
The first two methods have also the disadvantage that if derived. Public key techniques or, alternatively, a pre-shared
the private key or the shared secret used to derive or decrypt key, are used to protect (authenticate) the DH exchange and
the master key is compromised in the future, the master key to mutually authenticate the communicating parties.
will be compromised too.
At the contrary, the third method does not suffer of this 2.3 MIKEY
vulnerability since the generated key is completely fresh and Multimedia Internet KEYing (MIKEY) [2] is a key
it is not directly derived from a previous key. This property management protocol intended for use with real-time
that ensures that a key will not be compromised if one of the applications. It can specifically be used to set up encryption
Authorized licensed use limited to: Istanbul Universitesi-Cerrahpasa. Downloaded on June 02,2024 at 21:34:03 UTC from IEEE Xplore. Restrictions apply.
keys for multimedia sessions that can be secured using SRTP created through DH. In order to protect such a key against
[8] or other session-level security protocols. It uses binary MITM attacks the DH exchange is authenticated by means of
messages that can be encapsulated in any media descriptor or the vocal reading of a Short Authentication String (SAS),
session initiation protocol. hash-generated from the master key, like in ZRTP. Actually
MIKEY supports different methods for setting up a master the SAS authentication is used only the first time the two
key (called TGK - Traffic-encrypting key Generation Key): UAs try to initiate a session. Successive inviting attempts
- Pre-Shared Key - symmetric encryption is used to (actually the corresponding master keys) will be
exchange a master key; it requires that an individual key authenticated with the previously established master keys.
is previously shared with every other peer; The proposed approach is similar to the one used by the
- Public-Key - master key is exchanged by using public ZRTP protocol, however the two methods differ in different
encryption; it requires the knowledge of the responder’s aspects and particularly: i) in the information exchanged
public key or its certificate or the use of a PKI; during the key setup, and ii) in the way that such information
- Diffie-Hellman - A DH key exchange is used to set up the is effectively encapsulated and exchanged. Particularly,
master key; it requires more computational resources than ZRTP establishes a new master key directly at media level,
the previous ones, but has the advantage of providing by using the RTP protocol as transport support for the key
perfect forward secrecy; if it is not authenticated, it is negotiation. Instead, the proposed solution uses MIKEY as
vulnerable by the MITM attack; otherwise it requires negotiation protocol, opportunely encapsulated within SIP
certificates and RSA signature or shared secret and messages used for session setup.
HMAC authentication. In order to support a full authenticated DH exchange, the
MIKEY protocol has been extended in order to consider a
2.3 ZRTP MIKEY 3-way handshake (MIKEY originally supported DH
ZRTP [1] is a cryptographic key-agreement protocol in a 2-way request/response transaction). The new
specifically designed to negotiate an encryption key for offer/answer/confirm handshake between the initiator (the
securing VoIP or multimedia sessions using SRTP. caller) and the responder (the callee) is depicted in figure 1.
ZRTP performs a DH key exchange during call setup in- The standard MIKEY message format was not modified,
band in the Real-time Transport Protocol (RTP) media stream whereas the available payloads have been extended in order
which has been established using some other signaling to support new fields and value types.
protocol such as Session Initiation Protocol (SIP).
One of ZRTP's characteristics is that it does not rely on
SIP INVITE [MIKEY OFFER]
SIP or other signaling protocols for the key management,
since it performs it over RTP media streams. SIP 200 OK [MIKEY ANSWER]
ZRTP does not require prior shared secrets or rely on a SIP ACK [MIKEY CONFIRM]
PKI or on CAs, since the ephemeral DH keys are
SRTP flows
authenticated by means of the use of a Short Authentication
String (SAS), which is essentially a cryptographic hash of the SIP BYE
two DH values. More precisely the two end users verbally
SIP 200 OK
compare the shared SAS value displayed at both ends. If the
values do match, it means with high probability that the DH Figure 1 – SIP/MIKEY session setup.
succeeded and no MITM attack has been performed,
otherwise it indicates the presence of a MITM wiretapper. The initiator starts sending a MIKEY offer message
including: i) a MIKEY header (HDR), ii) identities of both
the initiator (IDi) and responder (IDr), iii) a random value
3. NEW MULTIMEDIA KEY AGREEMENT (RANDi), iv) the list of offered encryption and hash
PROTOCOL algorithms (SAi), v) the DH part of the initiator (DHi), and
vi) a list of HMAC of the last five retained secrets (i.e. the
In this section a new key agreement protocol for securing last previously established master keys) (RS=
multimedia sessions established through the standard Session rs1,rs2,rs3,rs4,rs5), calculated as follows:
Initiation Protocol (SIP) is described. The objective of the rsj = HMAC(MKj, “Retained Secret”)
proposed protocol is to securely establish a master key
between two multimedia SIP UAs that may or may not have If no previous sessions have been already setup between
already communicated with each other. the two UAs, such list is empty and SAS verification is
The proposed protocol has been designed in such a way requested in order to authenticate the DH exchange.
that it does not rely on any PKI, user certificates or public Once the responder receives the MIKEY offer message, it
keys, or pre-shared secret keys. When initiating a new controls that the retained secrets match the list stored at its
multimedia session between two UAs, a new master key is own side: in case it succeeds such list is further used to
Authorized licensed use limited to: Istanbul Universitesi-Cerrahpasa. Downloaded on June 02,2024 at 21:34:03 UTC from IEEE Xplore. Restrictions apply.
authenticate the DH exchange, otherwise the SAS The main motivations for integrating the key management
authentication will be used. with SIP session setup were:
The responder calculates the new master key MK0 as the - the possibility to negotiate at the beginning of a session
hash function of the result of the DH algorithm (DHres) the security credential for any desired media flows,
concatenated with other values as follows: accordingly to what is done for other media parameters
MK0=hash(DHres||IDi||IDr||RANDi||RANDr||MK1||..||MK5) (media types, codecs, transport ports, etc.);
- the possibility to reuse the same established master key
and replies with a MIKEY answer message similar to the for any successive media flows (no re-key negotiation is
offer, in which the RS list is replaced by a HMAC_r needed).
calculated on the MIKEY answer message without the The MIKEY messages are exchanged within SIP messages
HMAC_r field): through a new defined header field named “Security-
HMAC_r = HMAC(MK0, MIKEY answer) Association”, similarly to that defined in [9]. This new header
field is inserted into the SIP INVITE (carrying the MIKEY
In case the SAS is required, it is calculated as the 32 most offer), 200 OK (MIKEY answer) and ACK (MIKEY
significant bits of: confirm) messages, in order to implement the complete key
sas_hash = HMAC(MK0,“SAS”) management protocol.
The content of the new header field varies according to the
Once the initiator receives the MIKEY answer message, it SIP message: in the INVITE message it includes the value
checks the correctness of the HMAC_r and, if it succeeds, it “mikey offer=” followed by the base64 encoding of the
sends a MIKEY confirm message including a HMAC_i field MIKEY offer message. The same happens in the 200 OK and
for authenticating the original offer and confirming the ACK messages with the respective only difference of the
correctness of the new master key. initial parameters “ answer=” and “confirm=” In figure 3 an
HMAC_i = HMAC(MK0, MIKEY offer) INVITE message with MIKEY offer is shown.
Authorized licensed use limited to: Istanbul Universitesi-Cerrahpasa. Downloaded on June 02,2024 at 21:34:03 UTC from IEEE Xplore. Restrictions apply.
architecture as defined by RFC 3261 [3]. Both the JavaSE generated Short Authentication String. The key agreement is
and JavaME (J2ME/CLDC1.1/MIDP2.0) supports are performed during the session setup with the SIP protocol, and
available. the exchange is done through the MIKEY protocol
We have then integrated the key agreement protocol with opportunely encapsulated within new proper SIP header
the standard SIP UA (mjUA) provided by the MjSip project. fields. The proposed protocol has been implemented and
The SIP stack and UA have been opportunely extended in integrated in an SIP UA based on the open source MjSip SIP
order to support key agreement during the SIP session setup stack implementation and publicly released open source at
and to support SAS authentication by the user voice. Figure 4 [10].
shows a screenshot of the new UA with key management As for security considerations, it has to be noted that the
extensions. proposed key agreement mechanism (as also ZRTP) can be
theoretically subject to the following voice forgery attacks:
- Bill Clinton attack: one or both parties have a well known
voice that can be easily forged during the SAS authentication;
- 6 month attack: assuming that the two parties won’t
remember their voices for a long period the attacker can
perform SAS authentication without voice forgery ability.
- Court reporter attack: performable if both parties don’t
know the counterpart’s voice; in this case the attacker can
Figure 4 – Screenshot of the developed secure mjUA. simply use its voice to trick SAS authentication.
5. CONCLUSION
Authorized licensed use limited to: Istanbul Universitesi-Cerrahpasa. Downloaded on June 02,2024 at 21:34:03 UTC from IEEE Xplore. Restrictions apply.