08 TuboEPC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

TurboEPC: Leveraging Dataplane Programmability to Accelerate

the Mobile Packet Core


Rinku Shah, Vikas Kumar, Mythili Vutukuru, Purushottam Kulkarni
Department of Computer Science & Engineering
Indian Institute of Technology Bombay
rinku@cse.iitb.ac.in,vikask@iitb.ac.in,{mythili,puru}@cse.iitb.ac.in

ABSTRACT 1 INTRODUCTION
Recent architectures of the mobile packet core advocate the separa- Software Defined Networking (SDN) is a network design paradigm
tion of the control and dataplane components, with all signaling that advocates the separation of the control plane of a network ele-
messages being processed by the control plane entities. This paper ment, which makes the decision on how to handle network traffic,
presents the design, implementation, and evaluation of TurboEPC, from the dataplane that does the actual packet forwarding. With
a redesign of the mobile packet core that revisits the division of SDN, the more complex control plane functionality can be logically
work between the control and data planes. In TurboEPC, the con- centralized and implemented in agile software controllers, while
trol plane offloads a small amount of user state to programmable the dataplane can be implemented in simpler, efficient forwarding
dataplane switches, using which the switches can correctly pro- switches. This principle of separating the control and dataplane
cess a subset of signaling messages within the dataplane itself. The has also been applied to the design of the mobile packet core, and
messages that are offloaded to the dataplane in TurboEPC consti- is referred to as Control and User Plane Separation (CUPS) [2] by
tute a significant fraction of the total signaling traffic in the packet the telecom community. The mobile packet core, e.g., the 4G LTE
core, and handling these messages on dataplane switches closer to EPC (Long Term Evolution Evolved Packet Core), consists of net-
the end-user improves both control plane processing throughput working elements that perform control plane functionality such as
and latency. We implemented the TurboEPC design using P4-based authenticating users and setting up data sessions, and dataplane
software and hardware switches. The TurboEPC hardware proto- functionality of forwarding user traffic from the wireless radio net-
type shows throughput and latency improvements by up to 102× work to external data networks (§2). With recent releases of 4G (and
and 98% respectively when the switch hardware stores the state the new 5G standards) espousing the CUPS principle, the dataplane
of 65K concurrent users, and 22× and 97% respectively when the can move closer to the end users, enabling applications that require
switch CPU is busy forwarding dataplane traffic at linerate, over low latency data forwarding (e.g., self-driving cars).
the traditional EPC.
EPC procedure Number of transactions/sec
CCS CONCEPTS Attach 9K
• Networks → In-network processing; Programmable net- Detach 9K
works; Mobile networks. S1 release 300K
Service request 285K
Handover 45K
KEYWORDS
Network load when total subscribers in the core=1M
LTE-EPC, cellular networks, programmable networks, in-network
Table 1: Sample EPC load statistics [43, 55].
compute, smartNIC
In this paper, we revisit the boundary between the control plane
ACM Reference Format:
and dataplane in the CUPS-based design of the mobile packet
Rinku Shah, Vikas Kumar, Mythili Vutukuru, Purushottam Kulkarni. 2020.
core. Our work is motivated by two observations pertaining to
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile
Packet Core. In Symposium on SDN Research (SOSR ’20), March 3, 2020, San the signaling traffic in the core. First, signaling traffic is growing
Jose, CA, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/ rapidly [8, 42], fueled by smartphones, IoT devices and other end-
3373360.3380839 user equipment that connect frequently to the network in short
bursts. In fact, the signaling load in LTE is 50% higher than that of
2G/3G networks [42]. This high signaling load puts undue pressure
on the packet core, making it difficult for operators to meet the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed signaling traffic SLAs [29]. Second, the signaling procedures can
for profit or commercial advantage and that copies bear this notice and the full citation be classified into two types based on their frequency (see Table 1)
on the first page. Copyrights for components of this work owned by others than ACM and nature of processing. A small percentage of the signaling traf-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a fic consists of procedures like the attach procedure (1–2% of total
fee. Request permissions from permissions@acm.org. traffic, as per [43, 55]) that is executed when a user connects to the
SOSR ’20, March 3, 2020, San Jose, CA, USA mobile network for the first time, or the handover procedure (~5%)
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7101-8/20/03. . . $15.00 that is executed when the user moves across regions of the mobile
https://doi.org/10.1145/3373360.3380839 network. A significant fraction of the signaling traffic (~63–90%) is

83
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

Figure 1: The mobile packet core.

made up of procedures like the S1 release that is invoked to release There are several challenges in realizing this idea. First, the
the forwarding state of the user when the user session goes idle for control plane state stored in switches may be modified locally by
a brief period, and the service request procedure that restores the the offloaded signaling messages, causing it to diverge from the
user’s forwarding state when the user becomes active again. Fur- “master copy” in the centralized control plane. This inconsistency
ther, these two classes of signaling procedures also require access in state may impact the correctness of the processing of other
to different types of state during their processing. Attaching a user non-offloadable signaling messages in the control plane, but syn-
to the network entails authenticating the user using a network- chronizing the two copies of the state continuously will erode the
wide subscriber database, and setting up the forwarding path of the performance gains of the offload itself. TurboEPC overcomes this
user under mobility requires access to the global network topology. challenge by synchronizing the offloaded control plane state with
However, frequent signaling procedures like the S1 release and its master copy only when such state is required for the process-
service request access only the user context of a single subscriber, ing of some non-offloadable message, and piggy-backs this state
and not network-wide global state. onto the said non-offloadable message itself. Second, dataplane
The key idea of our work is that we can improve control plane switches have limited memory, and the contexts of millions of
performance of the mobile packet core by offloading a subset of active users ([55, 57]) cannot possibly be accommodated within a
the control plane procedures like the S1 release and service request single switch. To overcome this challenge, TurboEPC partitions user
from the control plane onto the dataplane switches. Our idea is context across multiple switches as per operator policy, thereby
inspired by recent advances in dataplane technologies, where data- increasing the probability that the user context can be stored within
plane switches are evolving from fixed function hardware towards the dataplane. Third, switch failures can lead to loss of the latest
programmable components that can forward traffic at line rate version of the user context stored in switches, and result in inconsis-
while being highly customizable [41, 53]. Since the S1 release and tencies in the control plane state of the user. TurboEPC overcomes
service request procedures only access and modify user-specific this challenge by replicating user context across switches and im-
context, the handling of these procedures can be embedded into the plements a failover mechanism to tackle switch failures.
packet processing pipeline of programmable dataplane switches, We implemented TurboEPC (§4) over a simplified mobile packet
provided the required user context is made available in the switches. core consisting of the ONOS [31] SDN controller in the control plane
Offloading these frequent signaling procedures to the dataplane and P4-based bmv2 software switches [58] in the dataplane. The con-
switches improves both control plane throughput (by utilizing spare trol and dataplane components communicate using P4Runtime [59].
switch capacity for handling signaling traffic) and latency (by han- Our P4-based dataplane switch pipeline was also ported to P4-
dling signaling traffic closer to the end user at the switches). We programmable Netronome Agilio CX smartNICs [53], helping us
will use the term offloadable procedures to describe signaling pro- realize an implementation of TurboEPC on programmable hardware.
cedures in the mobile control plane that can be easily offloaded to Evaluation of our prototype (§5) shows that the software proto-
programmable dataplane switches. type of TurboEPC improves LTE EPC control plane throughput by
This paper describes TurboEPC (§3), a redesign of the LTE EPC 2.3× and reduces latency by 90% over the traditional CUPS-based
mobile packet core, where offloadable control plane procedures EPC design over realistic traffic mixes, by utilizing spare dataplane
are handled at programmable switches in the dataplane for better switch capacity for signaling message processing. Our hardware
control plane performance. TurboEPC modifies the processing of prototype shows throughput and latency improvements by up to
the non-offloadable messages (like the attach procedure) in the 102× and 98% respectively when the switch hardware stores the
control plane in such a way that a copy of the user-specific context state of 65K concurrent users, and 22× and 97% respectively when
information that is generated/modified during such procedures the switch CPU is busy forwarding dataplane traffic at linerate.
is pushed to dataplane switches closer to the end user. This user While prior work has proposed several optimizations to the mo-
context is stored in the switches along with the forwarding state bile packet core architecture (§6), to the best of our knowledge, we
needed for dataplane processing, and is used to process offloadable are the first to show that the control plane of mobile data networks
signaling messages within the dataplane switch itself.

84
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

(b) S1 release procedure


(c) Service request procedure
(a) Attach procedure

Figure 2: LTE EPC procedures.

can be accelerated by offloading signaling procedures on to pro- generalize to the 5G architecture, as well as other CUPS-based EPC
grammable dataplane switches. Further, while we have evaluated implementations, e.g., if the control plane components were to be
our ideas over the 4G core, we believe that our contributions apply standalone applications.
to the future 5G packet core [17] as well, because the design of the LTE EPC procedures. Figure 2 briefly illustrates a subset of the
5G dataplane maps closely to that in the CUPS-based 4G EPC. To LTE EPC control plane procedures When a UE connects to a LTE
summarize, the key contributions of this paper are: network for the first time, the initial message sent from the UE via
• TurboEPC, a redesigned mobile packet core that offloads a the eNB triggers the attach procedure in the core. During this pro-
significant fraction of signaling procedures from the con- cedure, the UE and the network mutually authenticate each other,
trol plane to programmable dataplanes, thereby improving by using the user state stored in Home Subscriber Subsystem (HSS),
performance. and establish security keys for use during future communication.
• An implementation of TurboEPC over P4-based programmable Finally, the MME sets up the state required to forward user traffic
software/hardware switches, to demonstrate the feasibility through the core at the SGW and PGW that are on the path from
of our design. the user to the external packet data network. The detach procedure
• A quantification of the performance gains of TurboEPC over reverses the processing of the attach procedure.
the traditional CUPS-based EPC design. In the dataplane, user data packets are tunneled through the S/P-
GWs using the GPRS Tunneling Protocol (GTP). The GTP header
consists of Tunnel Endpoint Identifiers (TEIDs) that uniquely iden-
2 BACKGROUND & MOTIVATION tify the path of a user’s traffic through the core, and the S/P-GWs in
Mobile packet core architecture. The core of a mobile network the core network route dataplane traffic based on the TEID values.
connects the radio access network, consisting of user equipments Separate TEIDs are generated for each of the two links on the data-
(UEs) and the base stations (eNBs), with other packet data networks, path (eNB-SGW and SGW-PGW) and for each of the two directions
including the Internet. Figure 1(a) shows the architecture of the of traffic (uplink and downlink). When a user’s IP data packet ar-
traditional 4G packet core, also called the LTE EPC (Long Term rives from the wireless network at the eNB, it is encapsulated into
Evolution Evolved Packet Core). The main components of the EPC a GTP packet, which is then transmitted over UDP/IP, first between
are the control plane Mobility Management Entity (MME) that han- the eNB and the SGW, and then between the SGW and PGW. The
dles all signaling procedures, and the dataplane Serving and Packet egress PGW strips the GTP header before forwarding the user’s
Gateways (SGW and PGW) that forward user traffic. The SGW and data to external networks.
PGW also participate in control plane procedures pertaining to If the UE goes idle without sending data for a certain duration
establishing and tearing down user forwarding paths. In order to (usually 10-30 seconds [21]), a S1 release procedure (figure 2(b)) is
enable independent scaling of the control and dataplane logic in the invoked. During this procedure, the uplink/downlink forwarding
S/P-GWs, later releases of 4G LTE espoused the Control and User rules for the user are deleted from the eNB, downlink forwarding
Plane Separation (CUPS) principle. Figure 1(b) shows the LTE EPC rules are deleted from the SGW, and the connection state of the
architecture with CUPS; the S/P-GWs are separated into control user changes idle. Later, when the UE becomes active again, the
and dataplane entities, which communicate using a standardized UE initiates a service request procedure (figure 2(c)) to restore the
protocol called PFCP (Packet Forwarding Control Protocol [2]). The forwarding state that was released during the idle period at the
upcoming 5G standard fully embraces the CUPS principle, as shown dataplane gateways. The user state at the MME also changes back
in Figure 1(c). In the 5G core, the Access and Mobility Management to being actively connected. When a UE moves from one network
Function (AMF), Session Management Function (SMF), and other location to another, it triggers a handover procedure in the core. The
components handle signaling traffic in the control plane, while the handover procedure involves, among other things, releasing the
User Plane Function (UPF) forwards traffic in the dataplane. The user’s forwarding context in the old dataplane gateways, and setting
control and dataplane components once again communicate via up the user’s forwarding context along the new path. Note that
PFCP. We base our discussion of TurboEPC in the rest of the pa- the core network performs several other procedures beyond those
per on the CUPS-based EPC architecture shown in Figure 1(b). We discussed here; however, this description suffices to understand our
assume that the MME and the control plane components of the S/P- work.
GWs are implemented atop an SDN controller, and the dataplane
of the S/P-GWs is implemented in SDN switches. Our ideas easily

85
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

State Description Example network-wide or local


Security keys Used for user authentication, authorization, anonymity, K AS M E , CK, IK, AV, K N AS e nc , K N AS int network-wide
confidentiality.
Permanent identifiers Identifies the user globally IMSI, MSIN network-wide
Temporary identifiers Temporary identity for security GUTI, TMSI per-user
IP address Identifies the user UE IP address network-wide
Registration management state Indicates if the user is registered to the network ECM-DEREGISTERED, ECM-REGISTERED network-wide
Connection management state Indicates if the user is currently idle or connected ECM-IDLE, ECM-CONNECTED per-user
User location Tracks the current user location Tracking Area(TA), TAI per-user
Forwarding state Used for routing data traffic Tunnel end-point identifiers(TEID) per-user
Policy/QoS state Determines policies & QoS values GBR, MBR per-user

Table 2: Classification of LTE EPC state.

Message Security Permanent Temporary IP address Registration Connection User Forwarding Policy Frequency
keys identifiers identifiers management management location state /QoS (%)
state state state
Attach req r+w r r+w r+w r+w r+w r+w r+w r+w 0.5 – 1
Detach req — r r+w r+w r+w r+w r+w r+w — 0.5 – 1
Service req — — r+w r — r+w — r+w — 30 – 46
S1 release — — r+w r — r+w — r+w — 30 – 46
Handover req r+w r r+w r+w r+w r+w r+w r+w r+w 4–5

Table 3: Classification of LTE EPC messages.

Motivation for TurboEPC. Table 2 shows the various compo-


nents of the per-user state, or user context, that is accessed by LTE
procedures [16]. One key contribution of our work is to identify
parts of the user context that have network-wide scope (shaded rows
in the table). A piece of user context has network-wide scope if
it is derived from, or depends on, network-wide information. For
example, the security keys of the user or the IP address are derived
from information that is located in the centralized HSS database,
and hence have network-wide scope. On the other hand, the con-
nection state of a user (whether connected or idle) is only changed
based on local events at the eNB (whether radio link is active or
not), and hence has local scope.
Next, Table 3 shows the various user states that are accessed
during the processing of each LTE EPC procedure, along with the (a) Traditional CUPS-based EPC (b) TurboEPC
relative frequencies of each procedure. The shaded columns rep-
Figure 3: TurboEPC Design.
resent the states with network-wide scope. We see from this table
that the S1 release and service request procedures modify only the at the edge closer to the user, we can more easily achieve these
connection management state (from ECM-CONNECTED to ECM- stringent latency bounds, and protect the core from high signaling
IDLE and vice versa), forwarding state (GTP tunnel identifiers), load.
and temporary user identifiers, none of which have network-wide
scope. Therefore, if we offload this subset of per-user state to data- 3 TURBOEPC DESIGN
plane switches closer to the eNB edge, the S1 release and service We begin with an overview of TurboEPC’s basic design (§3.1) and
request procedures can be processed within the dataplane itself, then describe design features related to scalability (§3.2) and fault
without being forwarded all the way to the centralized controller. tolerance (§3.3).
How do we identify which location on the edge to offload this state to?
Note that a given user is only connected to one eNB at a time, and 3.1 Overview
any changes in user location are notified to the core via suitable
The key idea of TurboEPC is to offload a subset of the user context,
signaling messages (e.g., handover). Therefore, it is safe to offload
and a subset of LTE EPC procedures to the edge, on to dataplane
some parts of the user context to the edge close to the current eNB,
switches closer to the eNB, so that the throughput and latency of
without worrying about concurrent access to this state from other
processing such messages can be improved. We define an offloadable
network locations. The offload of the S1 release and the service
state as that which is accessed/modified by local events at the edge,
request procedures to the edge is particularly useful because of
and is never accessed/modified concurrently from multiple network
the high proportion of these messages in the already high LTE
locations. A particular LTE EPC procedure is offloadable if all the
signaling traffic [8, 42, 43, 55]. Further, the latency targets for these
states that are needed to process the message are also offloadable.
signalling messages in future networks [1, 20, 28] is as low as 1ms.
We will refer to messages that are not offloadable as non-offloadable
Therefore, if we process these high frequency signaling messages
messages. While this paper applies the concepts of offloading state

86
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

User state (in bytes) Forwarding state (in bytes) state will be directly forwarded to the root by the eNB. However,
eNB 0 32 there are some non-offloadable messages in EPC (e.g., some types of
SGW 64 28 handover messages) that require access to both the latest offloaded
PGW 0 19 user context in the dataplane as well as the non-offloaded state
stored in the root. These messages are first sent to the dataplane
Table 4: Size of state stored at TurboEPC switches. switches by the eNB, and any processing of the message that can
and control plane messages to the EPC architecture, our ideas can be done without access to global state is performed at the switch
be generalized to other systems where these definitions apply as itself. Next, the message is forwarded from the dataplane switch to
well. the root controller, with a copy of the modified user context (that is
Figure 3 compares the CUPS-based traditional EPC design with subsequently invalidated at the switch) appended to the packet, in
TurboEPC. In the traditional CUPS-based EPC design (Figure 3(a)), order to correctly complete the rest of the processing at the root.
the MME, SGW-C, and PGW-C components are implemented within We acknowledge that TurboEPC introduces a small amount of
a root SDN controller in the control plane, while the dataplane pro- overhead during the processing of non-offloadable handover mes-
cessing is performed in dataplane switches (SGW-D & PGW-D). sages, since we need to piggyback the user context from the switch
The eNB forwards all control plane traffic to the root controller, to the root controller, as described above. This overhead may be
which processes these messages and installs forwarding state at the acceptable in current networks, because the handover messages
S/P-GW switches. All control plane state, including the per-user comprise only 4–5% [43, 55] of all signaling traffic. However, the
context, is maintained only in the control plane. In contrast, in the handover traffic can increase for future networks, e.g., with small
TurboEPC design (shown in Figure 3(b)), the eNB forwards offload- cells in 5G. We plan to revisit our handover processing to reduce
able messages (e.g., S1 release and service request) to the dataplane overhead in such usecases as part of our future work.
S/P-GW switches.1 To enable the processing of offloadable signal- This basic design of TurboEPC faces two significant challenges,
ing messages in the dataplane, the root controller in TurboEPC (i) A typical mobile core must handle millions of active connected
pushes a subset of the generated/modified per-user context into the users [55, 57], while switch memory is usually limited. For example,
dataplane switches after the completion of every non-offloadable recent high-end programmable switches have a few tens of MB of
signaling message processing. The user context that is pushed to memory available to store tables [41], which means that a single
the dataplane consists of a mapping between the UE identifier and switch can only store user context information for a few 100K users
the following subset of information pertaining to the user: the tun- in our current design. In fact, the Netronome programmable NIC
nel identifiers (TEIDs) and the UE connection state (connected/idle). hardware used in our prototype implementation [54] could only
This user context is stored in dataplane switch data structures, store user context information for 65K users. Therefore, it is unlikely
much like the forwarding state, and consumes an additional ≈64 that a single dataplane switch can accommodate the contexts of all
bytes of memory over and above the ≈32 bytes of forwarding state users connected to an eNB. (ii) The latest version of the modified
in our prototype (as shown in Table 4). user context stored at the switches may be lost in the case of switch
Offloadable signaling messages that arrive at the edge dataplane failures, making the UE’s view and the network’s view of the user’s
switches (close to the eNB) are processed within the switch dat- context inconsistent. We now describe how TurboEPC overcomes
aplane itself, by accessing and modifying the offloaded per-user these challenges.
context. For example, the S1 release request processing requires the
TurboEPC switch dataplane to delete the uplink/downlink TEIDs 3.2 Partitioning for Scalability
at the eNB and the downlink TEID at the SGW, change the user In order to overcome single switch memory limitations, and maxi-
connection state to idle, and update GUTI if required. Because these mize handling of offloadable messages at the dataplane, TurboEPC
offloadable messages reach the switch at least a few tens of seconds relies on multiple programmable switches in the core network. Tur-
(idle timeout) after the context is pushed by the root controller, the boEPC partitions the user context required to handle offloadable
state offload does not cause any additional delays while waiting for messages amongst multiple dataplane switches along the path from
state to be synchronized. If the signaling message requires a reply the eNB to S/P-GW (possibly including the S/P-GW itself). Further,
to be sent back to the user, the reply is generated and sent by the if the dataplane switches cannot accommodate all user contexts
switch dataplane as well. even with partitioning, some subset of the user contexts can be
Note that, after the user context has been modified by the offload- retained in the root controller itself. With this design, any given
able signaling messages within the switch data structures, the latest dataplane switch stores the contexts of only a subset of the users,
copy of this state resides only in the dataplane. TurboEPC does and handles the offloadable signaling messages pertaining to only
not synchronize this state back to the root after every modification those users. The switches over which the partitioning of user con-
to the offloaded state, because doing so nullifies the performance text state is done can be connected to each other in one of two
gains due to offload in the first place. Instead, TurboEPC lazily syn- ways, as we describe below.
chronizes this state with its master copy at the root controller only Series design. In the series design shown in Figure 4(a), the contexts
when required. That is, all future offloadable messages will access of a set of users traversing a certain eNB to S/P-GW path in the
the latest copy of the offloaded state within the dataplane itself, network are split amongst a series of programmable switches placed
and non-offloadable messages that do not depend on this offloaded along the path. When an offloadable control plane message arrives
1 We assume that the eNB is capable of analyzing the header of a signaling message to at one of the switches in the series, it looks up the user context
determine if it is offloadable or not. tables to check if the state of the incoming packet’s user exists on

87
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

Figure 4: Scalability and fault tolerance in TurboEPC.

the switch. If it exists (a hit), the switch processes the signaling interesting question that we defer to future work is deciding which
message as discussed in §3.1. If the user context is not found (a users should be handled at which switches. With the advent of new
miss), the packet is forwarded to the next switch in the series until use cases such as vehicular automation, IoT, smart sensors, and
the last switch is reached. If the user context is not found even at AR/VR in next generation networks, it is becoming important to
the last switch, the message is forwarded to the root controller, and provide ultra-low latency and ultra-high reliability in processing
is processed like in the traditional EPC. signaling traffic of some users. Subscribers who require low latency
Parallel design. Figure 4(b) depicts a parallel design, where the for their frequent signaling requests, but are not highly mobile (e.g.,
user context is distributed amongst programmable switches located smart sensors), are ideal candidates to offload to the dataplane. It is
on multiple parallel network paths between the eNB and the S/P- also conceivable to think that an operator would wish to offload
GW in the network. The difference from the series design is that the the contexts of premium subscribers. TurboEPC can support any
eNB now needs to maintain information on how the user contexts such operator-desired placement policy.
are partitioned along multiple paths, and must forward offload-
able messages of a certain user along the correct path that has the 3.3 Replication for Fault Tolerance
user’s state. This entails the extra step of parsing the signaling
In TurboEPC, a subset of the user context is pushed into the data-
message header to identify the user, and an additional table lookup
plane switches during the attach procedure. This context is then
to identify the path to send the message on, at the eNB. Offloadable
modified in the dataplane tables during the processing of subse-
signaling messages that do not find the necessary user context at
quent offloadable signaling messages. For example, the S1 release
the switches on any of the parallel paths are forwarded to the root.
message changes the connection state in the context from con-
While the series design leads to simpler forwarding rules at the eNB,
nected to idle. In the case of a switch failure, such modifications
the parallel design lends itself well to load balancing across network
could be lost, leaving the UE in an inconsistent state. For example,
paths. Note that, while our current implementation supports only
a UE might believe it is idle while a stale copy of the user con-
the simple series and parallel designs described above, a network
text at the root controller might indicate that the user is actively
could employ a combination of series and parallel designs, where
connected.
user contexts are partitioned across multiple parallel paths from
To be resilient to such failure scenarios, TurboEPC stores the user
the eNB to the S/P-GWs, and are further split amongst multiple
context at one primary dataplane switch, and another secondary
switches on each parallel path. Across all designs, the root con-
switch. During the processing of non-offloadable messages such as
troller installs suitable rules at all switches, to enable forwarding
the attach procedure, the root controller pushes the user context
of signaling message towards the switch that can handle it. §5 com-
to the primary as well as the secondary switch of that user. The
pares the performance of both designs, and evaluates the impact of
root also sets up forwarding paths such that offloadable signaling
partitioning state on TurboEPC performance.
messages of a user are directed to the primary switch of the user.
Partitioning user context. The question of how best to partition
Upon processing an offloadable message, the primary switch first
user contexts across multiple programmable switches in a large
synchronously replicates the updated user context at the secondary
network depends upon many factors, including the number of active
switch, before generating a response to the signaling message back
users, the size of the core network, the capacity of the programmable
to the user, as shown in Figure 4(c). Our current implementation
switches, and the routing and traffic engineering policies employed
uses simple synchronous state replication from the primary to one
within the network, and is beyond the scope of this paper. Another
other secondary switch, and is not resilient to failures of both the

88
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

the traditional CUPS-based EPC prototype, it also performs addi-


tional processing of offloadable signaling messages in TurboEPC.
We have compiled our TurboEPC P4 code to run on two targets:
the bmv2 simple_switch_grpc [58] software switch target, and the
Netronome CX 2x10GbE [54] smartNIC hardware target. We now
describe these hardware and software switches.
TurboEPC software switch. In the software switch based Tur-
boEPC prototype, the SDN application that forms the EPC control
plane is implemented in the ONOS controller [31] in 10K lines
of Java code. The offloadable message processing is implemented
Figure 5: TurboEPC implementation. within a local ONOS controller that is co-located with the P4-based
software dataplane switches. This local controller configures and
modifies the P4 software switch tables that contain the offloaded
primary and secondary switches in quick succession. We plan to
state. We use P4Runtime [59] as the communication protocol be-
evolve our design for replication across multiple secondary switches
tween the ONOS controller and the P4 software switch. However,
as part of future work, using techniques from recent research [24].
the current P4Runtime v1.0.0 does not support multiple controllers
If a primary switch fails before replication completes, no re-
(e.g., local and root controllers) configuring the same dataplane
sponse is sent to the user, the user will retry the signaling message,
switch. Therefore, we built custom support for this feature by modi-
and will be redirected to a new switch after the network repairs the
fying the proto/server package of the P4Runtime [59] to send/receive
failure. If the primary switch fails after successful replication, the
packets to/from multiple controllers. While our initial implementa-
SDN controller will be notified of the failure in the normal course of
tion simply broadcast control plane messages to all the controllers,
events, e.g., in order to repair network routes, and the TurboEPC ap-
this resulted in unnecessary message processing overhead at the
plication installs forwarding rules to route subsequent offloadable
controllers. Therefore, we further modified the P4Runtime agent
messages of the user to the secondary switch. The root controller
at the bmv2 switch and the ONOS controller to enable a switch to
also synchronizes itself with the latest copy of user context from
identify the specific controller where the control packet should be
the now primary (former secondary) switch, and repopulates this
forwarded to. This optimization required significant code changes
context at another new secondary switch. Users served by the failed
but also improved performance.
switch may see a temporary disruption in offloadable message re-
TurboEPC hardware switch. Our hardware-based TurboEPC switch
sponses (along with a disruption in dataplane forwarding) during
did not integrate with the ONOS SDN controller used in the soft-
the time of failure recovery and we evaluate the impact of such
ware prototype, due to limitations of the control-dataplane com-
disruptions in §5.
munication mechanisms in the programmable hardware we used.
Therefore, we used a separate Python based controller as the root
4 IMPLEMENTATION controller in our hardware TurboEPC prototype. Another differ-
We implemented simplified versions of the CUPS-based traditional ence with the software switch is in how offloadable messages are
EPC and TurboEPC in order to evaluate our ideas. We have built processed. The software prototype stores offloaded user context
our prototype by extending the SDN based EPC implementation and forwarding state in switch tables, and the local controller is in-
available at [47] & [22]. Our implementation supports a basic set of voked to modify these tables when processing offloadable messages.
procedures: attach, detach, handover, S1 release, and service request However, this local controller can consume the limited switch CPU
in the control plane, and GTP-based data forwarding. While our available in hardware switches. Therefore, the hardware prototype
implementation of these procedures is based on the 3GPP standards, stores offloaded state not in switch tables but in switch register
complete standards compliance was not our goal, and is not critical arrays, which are distinct from switch tables. While a switch table
to our evaluation. The source code of TurboEPC is available at [48]. can only be modified from the root/local control plane, a register
Figure 5 shows the various components of our implementation. A can be modified by P4 code running within the dataplane itself.
load generator emulates control and dataplane traffic from multiple Therefore, we modified our design so that the switch tables only
UEs to the core, a simplified eNB switch implements only the wired store a pointer from the user identifier to this register state, and
interface to the core, and a sink consumes the traffic generated not the actual state itself. The root controller takes care of main-
by the load generator. The load generator is a multi-threaded raw- taining the free and used slots in the register arrays of the switches,
sockets based program of 5.3K lines, that generates EPC signaling and creates the table entries that map from user identifiers (which
messages and TCP data traffic. The load generator can emulate are either available in packet headers, or can be derived from the
traffic from a configurable number of concurrent UEs. Further, the packet headers) to register array indices when the user context
emulated traffic mix (i.e., the relative proportions of the various is first created during the attach procedure. After the entries are
signaling and dataplane messages) is also configurable. created, offloadable messages that change the offloaded state do not
The control plane components of the packet core (MME, SGW- require to invoke the switch control plane (that consumes switch
C, PGW-C) are implemented within an SDN controller. The dat- CPU) to modify the tables, but can fetch the register index from the
aplane switches (eNB, SGW-D, PGW-D) are implemented as P4- table and directly modify the registers from within the dataplane
based packet processing pipelines in approximately 3K lines of P4 itself.
code While the dataplane performs only GTP-based forwarding in

89
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

TurboEPC packet processing pipeline. We now briefly describe Traffic Attach, S1 release, Handover %
the P4-based packet processing pipeline in the TurboEPC dataplane Mix Detach % Service request %
switches (Figure 6). Incoming packets in an EPC switch are first run Att-1 1 99 0
through a message redirection table that matches on various header Att-5 5 95 0
fields to identify if the incoming message is a signaling message, and
Att-10 10 90 0
if yes, where it should be forwarded to. This table is populated by
Att-50 50 50 0
the root controller to enable correct redirection of non-offloadable
HO-5 10 85 5
signaling messages to the root, and offloadable messages to the
Typical [43] 1–2 63–94 5
switch that has the particular user’s context. Packets that do not
match the message redirection table continue along the pipeline,
and are matched through multiple GTP forwarding tables for GTP- Table 5: LTE-EPC traffic mix used for experiments.
based dataplane forwarding.

5.1 TurboEPC software prototype


We first evaluate the TurboEPC prototype implemented on P4-
based software switches. We primarily aim to evaluate the benefits
of our TurboEPC design as compared to the traditional EPC design.
Further, we also seek to demonstrate the correctness and efficacy
of the various mechanisms for scalability and fault tolerance in our
design.
Setup. The components in our evaluation setup include the load
generator, a sink node, ONOS v1.13 SDN controller, and multiple P4-
based programmable bmv2 software switches (simple_switch_grpc)
for the eNB, SGW, and PGW components of LTE EPC. We use mul-
tiple “forwarding chains” of load generators and switches in the
dataplane, to generate enough load to saturate the root SDN con-
troller. All components run on Ubuntu 16.04 hosted over separate
LXC containers to ensure isolation. The root controller container is
hosted on an Intel Xeon E5-2697@2.6GHz (24GB RAM) server, and
Figure 6: Packet processing pipeline in TurboEPC. the rest are hosted on an Intel Xeon E5-2670@2.3GHz (64GB RAM)
server. The root/local controllers, and all P4 software switches are
allocated 1 CPU core and 4GB RAM each. Our load generator is
Offloadable signaling messages destined to the current switch are a closed loop load generator which emulates multiple concurrent
first run through the user context table to find any existing offloaded UEs generating signaling and dataplane traffic. The number of con-
user context. The signaling message is processed by modifying current emulated UEs in our load generator is tuned to saturate the
or deleting the user context and/or GTP forwarding state stored control plane capacity (root or local or both) of the system in all
on the switch. The switch data structures are either updated by experiments, and is varied between 4 and 100.
the local controller (software prototype) or within the dataplane Parameters and metrics. We generate different workload scenar-
itself (hardware prototype). After message processing, the packet ios by varying the mix of offloadable (S1 release and service request)
may be forwarded to the secondary switch for state replication. On and non-offloadable (attach, detach, and handover) signaling mes-
successful replication (within the dataplane), the secondary switch sages in the control plane traffic generated by the load generator.
generates the response packet for the user, and forwards it to the Table 5 shows the relative proportions of the various signaling
primary switch as an acknowledgement for successful state replica- messages in the traffic mixes used, along with a typical traffic mix
tion. The primary switch dataplane forwards the response packet found in real user traffic [43]. All results reported are averaged
back to the user, indicating successful execution of the signaling over three runs of an experiment conducted for 300 seconds, unless
message. If the signaling message processing could not complete at mentioned otherwise. The performance metrics measured are the
the switch (e.g., user context is not found, or the handover message average control plane throughput (number of control plane mes-
requires further processing at the root), the packet is forwarded sages processed/sec) and average response latency of control plane
to the root controller for further processing. In the case of series requests, as measured at the load generator over the duration of
design (not last switch), if user context is not found, the message is the experiment.
forwarded to the next switch on the path. TurboEPC vs. Traditional EPC. We first quantify the perfor-
mance gains of the basic TurboEPC design as compared to the
traditional EPC design. In these set of experiments, we assume (and
5 EVALUATION ensure) that all UE context state fits in the memory of a single
We now evaluate the TurboEPC software and hardware switch switch. We also do not perform any replication of dataplane state
prototypes, and quantify the performance gains over the traditional for fault tolerance. As we are interested in measuring maximum
CUPS-based EPC. control plane capacity, our load generator does not generate any

90
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

3000
900 Traditional-EPC TurboEPC 1-chain throughput
TurboEPC TurboEPC 2-chain throughput
Average throughput (req/s)

800 2.3x TurboEPC 3-chain throughput


2500
TurboEPC 4-chain throughput

Average throughput (req/s)


700 1.3x Traditional-EPC throughput
600 1.2x
2000
0.9x
500
1.1x
400 1500
300

200
1000
0.3x
100
500
0
Att-1 Att-5 Att-10 Att-50 HO-5 Typical
LTE-EPC Traffic Mix 0
<1ms 5ms 10ms
RTT to the core network
Figure 7: TurboEPC vs. traditional EPC: Throughput.
Figure 9: Throughput with varying distance to core, and
25 Traditional-EPC
TurboEPC
varying number of dataplane switches.
End to end latency (ms)

20 5%
TurboEPC 1-chain latency
1000 TurboEPC 2-chain latency
15 TurboEPC 3-chain latency
TurboEPC 4-chain latency
Traditional-EPC latency

End to end latency (ms)


-2%
100
10
-5%
5 -54%
-90% 10
-70%

0
Att-1 Att-5 Att-10 Att-50 HO-5 Typical
LTE-EPC Traffic Mix 1

Figure 8: TurboEPC vs. traditional EPC: Latency.


0.1
<1ms 5ms 10ms
RTT to the core network
dataplane traffic. Figures 7 and 8 show the control plane through-
put and latency respectively of the traditional EPC and TurboEPC, Figure 10: Latency with varying distance to core, and varying
for various traffic mixes of Table 5. As can be seen, performance number of dataplane switches.
gains of TurboEPC over traditional EPC are higher for traffic mixes
with a greater fraction of offloadable messages. For example, for increase with the distance to the core network, and the latency is
the typical traffic mix, we observe that TurboEPC improves control reduced by two orders of magnitude compared to traditional EPC
plane throughput by 2.3× over traditional EPC, while control plane when the round trip latency to the core is greater than 5ms.
latency is reduced by 90%. Further, we note that the root controller
was fully saturated in the traditional EPC experiments, while CPU Design Attach, S1 release, Handover
utilization was under 20% with TurboEPC, because most signaling Detach Service request
traffic was processed using dataplane switch CPU. However, when RTT to the core is less than 1ms
the traffic consists of a high proportion of non-offloadable messages Centralized 10.72 10.28 17.38
(e.g., mix Att-50, which is unrealistic), TurboEPC has slightly lower
TurboEPC 10.98 1.44 18.36
throughput than traditional EPC, because it incurs an additional
RTT to the core is 10ms
overhead of pushing user context to the dataplane switches dur-
Centralized 200 38 549
ing the processing of non-offloadable messages. In summary, we
TurboEPC 205 2.4 580
expect TurboEPC to deliver significant performance gains over the
traditional EPC over realistic traffic mixes which contain a high Table 6: Average end-to-end latency for LTE-EPC (in ms).
proportion of offloadable signaling messages.
The performance gains of TurboEPC are more pronounced when While TurboEPC improves average control plane performance, it
the distance between the “edge” and “core” of the network increases, can (and does) degrade performance for some specific non-offloadable
and with increasing number of switches that can process offloadable messages. For example, as discussed in §3.1, processing non-offloadable
messages in the dataplane, both of which are likely in real-life messages like the attach request incurs the extra cost of pushing
settings. Figures 9 and 10 show the performance of TurboEPC offloaded user context to dataplane switches. Similarly, handover
as a function of the distance to the root controller (emulated by message processing incurs a higher overhead with TurboEPC be-
adding delay to all communications to the root) and the number of cause we need to piggyback the offloaded state and synchronize it
forwarding chains of dataplane switches. We see from the figures with the root. Table 6 shows the average processing latency of vari-
that TurboEPC with 4 chains provides 4× – 5× throughput over ous individual signaling messages in TurboEPC and the traditional
traditional EPC. We also observe that TurboEPC latency does not EPC, in the setup with a single forwarding chain. The generated

91
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

1400 16
Average throughput (req/s)

Throughput 700 TurboEPC (with fault tolerance)

Average throughput (req/sec)


1200 Latency 14 TurboEPC (basic)
1000 12 600

Latency (ms)
10 Fail Recover
800 500
8
600 400
6
400 4 300
200 2
200
0 0
Traditional-EPC

TurboEPC-Series(1)

TurboEPC-Series(2)

TurboEPC-Series(3)

TurboEPC-Series(miss)

TurboEPC-Parallel(1)

TurboEPC-Parallel(2)

TurboEPC-Parallel(3)

TurboEPC-Parallel(miss)
100

1000

1100

1200
100

200

300

400

500

600

700

800

900
0
Time in secs

Figure 12: TurboEPC throughput during failover.


10
TurboEPC-FT
Fail Recover

End to end latency (ms)


8
Figure 11: Series vs. parallel partitioning.
6
load followed the typical traffic distribution as shown in Table 5. Ta-
ble 6 shows the latency results for two scenarios: (i) when the EPC 4
core is close to the edge (RTT < 1ms), and (ii) when the EPC core is
far from the edge (RTT = 10ms). We see that the processing latency 2 TurboEPC (basic)
reduces by up to 86–94% for offloadable messages like S1 release and
service request, but increases by 2–5% for non-offloadable messages 0

1000

1100

1200
100

200

300

400

500

600

700

800

900
like attach requests and handovers. Because offloadable messages 0
Time in secs
form a significant fraction of signaling traffic, TurboEPC improves
the overall control plane performance of the mobile packet core, Figure 13: TurboEPC latency during failover.
even though a small fraction of signaling messages may see slightly
degraded performance. where each switch adds an extra hop to latency. However, even
Series vs. parallel partitioning. Next, we perform experiments with 3 switches in series or parallel, TurboEPC latency is still lower
with the series vs. parallel state partitioning design variants of than that of the traditional EPC. We also see from the figure that
the TurboEPC software switch prototypes, to evaluate the perfor- the miss latency of offloadable message processing is worse than
mance impact of the additional complexity of these designs. This the message processing latency of the traditional EPC, because
experiment was performed with traffic mix Att-1 of Table 5 (1% the messages undergo multiple table lookups within the dataplane
attach-detach requests), and results for other traffic mixes were before eventually ending up at the root controller.
similar. We use multiple (up to 3) TurboEPC switches in series and TurboEPC fault tolerance. Next, we evaluate the fault tolerance
parallel configurations, and partition 100 active users uniformly of the TurboEPC design, by simulating a failure of the primary
over these switches. Besides these 100 users, our load generator switch in the middle of an experiment and observing the recovery.
also generates traffic on behalf of an additional 20 users whose Figure 12 shows the average throughput and Figure 13 shows the
contexts were not stored in the dataplane switches, to emulate the average latency of the fault-tolerant TurboEPC for an experiment
scenario where all contexts cannot be accommodated in the data- of duration 1200 seconds, where the primary switch was triggered
plane. Figure 11 shows the average control plane throughput and to fail after 600 seconds. Also shown in the graphs are the through-
latency of the TurboEPC-Series(n) and TurboEPC-Parallel(n) de- put and latency values of the basic TurboEPC without any fault
signs, for varying number of switches n in series and parallel, both tolerance for reference. We see that the throughput of the basic
when the context of the users is found within one of the switches TurboEPC is 40% higher and the latency is 33% lower than the fault
(hit) and when it is not (miss). We see from the figure that the Tur- tolerant design due to lack of the overhead of replication. After
boEPC throughput scales well when an additional switch becomes the failure of the primary switch, we found that the root controller
available to process offloadable signaling messages. The scaling is takes about 15 seconds to detect the primary switch failure, ∼2 ms
imperfect when there are 3 switches in series or parallel, because to push rules to eNB that would route incoming packets to the sec-
the eNB switch became the bottleneck in these scenarios. This eNB ondary switch, and ∼30 ms to restart offloadable signaling message
bottleneck is more pronounced in the case of the parallel design, processing at the secondary switch. During this recovery period, we
because the eNB does extra work to lookup the switch that has observed ∼200 signaling message retransmissions, but all signaling
the user’s context in the parallel design. We hope to tune the eNB messages were eventually correctly handled by TurboEPC after the
software switch to ameliorate this bottleneck in the future. failure.
While the throughput increases with extra TurboEPC switches,
the control plane latency also increases due to extra hop traversals 5.2 TurboEPC hardware prototype
and extra table lookups, as compared to the basic TurboEPC design. We now evaluate our hardware-based TurboEPC prototype, built
This impact on latency is more pronounced in the series designs, using the P4-programmable Netronome Agilio smartNIC [54].

92
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

Setup. The TurboEPC hardware setup was hosted on three Intel 106
TurboEPC (hardware) throughput
105

Xeon E5-2670@2.3GHz (128GB RAM) servers, each connected to Traditional-EPC throughput


TurboEPC (hardware) latency
one Netronome Agilio CX 2x10GbE smartNIC. The three servers 105 Traditional-EPC latency
104

Average throughput (req/s)


hosted the single chain of the load generator+eNB, SGW, and

End to end latency (µs)


104
PGW+sink respectively. A python based controller is hosted on
103
the SGW switch, and served as the root control plane.
103
Parameters and Metrics. Our load generator generated a mix
of offloadable/non-offloadable signaling messages and dataplane 102
102
traffic (using iperf3) in the experiments. The smartNIC hardware
could accommodate the user contexts of 65K users within the switch 101
101
hardware tables, and the load generator generated traffic for up to
65K users in all experiments. The maximum forwarding capacity of
100 100
our smartNICs (without any TurboEPC changes) was measured at 100 1K 5K 10K 20K 30K 40K 50K 60K 65K
Number of LTE-EPC users (user context stored on NIC)
8 Gbps, so our load generator also limited its maximum dataplane
traffic rate to 8 Gbps. All experiments were run for 300 seconds, Figure 14: TurboEPC throughput vs. Number of Users
and we report the maximum throughput and latency of processing 6 5
offloadable signaling messages in the hardware prototype.
10 10
TurboEPC (hardware) throughput
Traditional-EPC throughput
Capacity of TurboEPC hardware switch. First, we measure the 10
5
TurboEPC (hardware) latency
Traditional-EPC latency

Average throughput (req/s)


104
maximum control plane capacity of our hardware TurboEPC switch,

End to end latency (µs)


without any interfering dataplane traffic. Figure 14 shows the of- 104

103
floadable message processing throughput and latency of a single
3
TurboEPC hardware switch. We evaluate the maximum throughput 10

with the smartNIC loaded with user state size varying from 100 102
102
to 65K. We found that the throughput does not vary when we add
state of more users to the smartNIC. Also shown are the through- 101
101

put and latency numbers for the traditional CUPS-based EPC (RTT
to the root < 1ms) for reference. We see from the table that our 100 100
No-data

1Gbps

2Gbps

3Gbps

4Gbps

5Gbps

6Gbps

7Gbps

8Gbps
TurboEPC hardware switch can successfully serve upto 65K users,
while providing 102× higher throughput and 98% lower latency Data traffic rate at the switch
than traditional EPC.
Performance with dataplane traffic. TurboEPC improves con- Figure 15: TurboEPC throughput with data traffic interfer-
trol plane throughput over the traditional EPC by leveraging the ence.
extra capacity at dataplane switches for offloadable signaling mes-
sage processing. However, the performance gains of TurboEPC may have spare processing capacity, an idea we have explored in our
be lower if the switch is busy forwarding dataplane traffic. We now prior work [49].
measure the impact of this dataplane cross traffic on the control
plane throughput of TurboEPC. We pump increasing amounts of 6 RELATED WORK
dataplane traffic through our TurboEPC hardware switch (with Optimizations to the Packet Core. Prior work has several pro-
state for 65K users) and measure the maximum rate at which the posals that redesign the mobile packet core to achieve a diverse set
switch can process offloadable signaling messages while forwarding of goals. Softcell [23] proposes the solution to accelerate the 4G dat-
data traffic simultaneously. Figure 15 show the signaling message aplane forwarding via offload of the packet route installation task
throughput and latency respectively, as a function of the dataplane to the edge switch. They further minimize the forwarding table size
traffic forwarded by the TurboEPC hardware dataplane switch. We by aggregating the flow rules within the switch. While this work
see from the figure that as the data traffic rate increases, the of- is primarily focused on optimizing the dataplane processing, Tur-
floadable signaling message throughput decreases, and response boEPC accelerates the control plane via offload of signaling message
latency varies between 100µs to 180µs. The throughput and latency processing to the edge switch. CleanG [37], PEPC [45], SCALE [5],
of the traditional EPC (RTT to the root < 1ms) is also shown for DMME [4], MMLite [39], MobileStream [10], DPCM [33], and other
reference in the figures. We observe that when the switch is idle, similar proposals [34, 44, 46] optimize 4G/5G control plane process-
the hardware based TurboEPC throughput is 102× higher, and the ing, much like TurboEPC. CleanG [37] and PEPC [45] refactor the
latency 98% lower, as compared to the traditional EPC. However, EPC control plane processing, to reduce the overhead of state trans-
even when the switch is forwarding data at line rate (8Gbps), we fer across components. SCALE [5] proposes a distributed design of
observe throughput to be 22× higher and latency 97% lower than the control plane, and horizontally scales the EPC control plane by
traditional EPC, confirming our intuition that spare switch CPU can distributing signaling load across multiple replicas. MMLite [39]
be used for handling offloaded signaling traffic. As part of future proposes a stateless scalable MME design by storing the user spe-
work, we plan to explore an adaptive offload design that offloads cific state in shared memory. MobileStream [10] decomposes the
signaling message processing to the dataplane only when switches traditionally monolithic control plane components and proposes the
use of a streaming framework for scalability. DPCM [33] modifies

93
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.

the EPC protocol by reducing the number of messages exchanged state in switch tables, and use this state to process some of the
and by starting dataplane forwarding before completion of control more frequent signaling messages at switches closer to the edge.
plane processing. While these proposals advocate optimized archi- We implemented TurboEPC on P4-based software switches and
tectures of the EPC control plane, none of them revisit the boundary programmable hardware, and demonstrated that offloading signal-
between the EPC control and dataplanes. On the other hand, Tur- ing messages to the dataplane significantly improves control plane
boEPC revisits the split of functionality between the control plane throughput and latency.
software and dataplane switches, and proposes a refactoring of the
mobile core with the goal of offloading a subset of control plane ACKNOWLEDGEMENTS
processing to programmable dataplane switches closer to the end We thank our shepherd Sonia Fahmy and the anonymous reviewers
user. Therefore, this body of work is orthogonal and complemen- for their insightful feedback.
tary to our work, and TurboEPC can leverage these control plane
optimizations for the processing of non-offloadable messages at the REFERENCES
root controller. [1] 3GPP. 2017. 5G 3GPP specifications. https://www.3gpp.org/ftp/Specs/archive/
Programmable Dataplanes. While the first wave of SDN research 23_series/23.502/
decoupled the control plane from the dataplane and made the con- [2] 3GPP. 2017. Control and User Plane Separation. http://www.3gpp.org/cups
[3] Ashkan Aghdai et al. 2018. Transparent Edge Gateway for Mobile Networks. In
trol plane highly programmable, the second wave of SDN research IEEE 26th International Conference on Network Protocols (ICNP).
has made even the dataplanes highly programmable, realizing the [4] X. An, F. Pianese, I. Widjaja, and U. G. Acer. 2012. DMME: A distributed LTE
true vision of software defined networking. Today, dataplanes can mobility management entity. Bell Labs Technical Journal 17, 2 (2012), 97–120.
[5] Arijit Banerjee, Rajesh Mahindra, Karthik Sundaresan, Sneha Kasera, Kobus
be customized using P4 [6], a programming language to define Van der Merwe, and Sampath Rangarajan. 2015. Scaling the LTE Control-plane
packet processing pipelines. These software-defined dataplanes for Future Mobile Access. In Proceedings of the 11th ACM Conference on Emerging
Networking Experiments and Technologies.
can then be compiled to run on diverse targets, e.g., software [6] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer
switches [50, 58], hardware programmable switches guaranteed Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and
to work at line rate [7, 11, 41, 51], FPGAs [60], and smart pro- David Walker. 2014. P4: Programming Protocol-independent Packet Processors.
SIGCOMM Computer Communication Review 44 (2014).
grammable NICs [54]. Further, these programmable dataplanes [7] Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin
can be configured from software SDN controllers using standard Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis:
protocols [35, 59]. Programmable dataplanes have enabled a va- Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings
of the ACM SIGCOMM Conference.
riety of new applications within the dataplane, e.g., in-band net- [8] Gabriel Brown. 2012. On Signalling Storm. Retrieved November 10, 2018 from
work telemetry (INT) [27], traffic engineering [52], load balanc- https://blog.3g4g.co.uk/2012/06/on-signalling-storm-ltews.html
[9] Carmelo Cascone and Uyen Chau. 2018. Offloading VNFs to programmable
ing [12, 36], consensus [14, 15], traffic monitoring [40], key-value switches using P4. In ONS North America.
stores [25, 32], congestion control [26], and GTP header process- [10] Junguk Cho, Ryan Stutsman, and Jacobus Van der Merwe. 2018. MobileStream:
ing [3, 9]. Molero.E [38] demonstrate the possibility of accelerating A Scalable, Programmable and Evolvable Mobile Core Control Plane Platform. In
Proceedings of the 14th International Conference on Emerging Networking EXperi-
the control plane functions like failure detection/notification via ments and Technologies.
offload to programmable dataplanes. TurboEPC takes this line of [11] Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik,
work one step further, and proposes the offload of frequent and Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac
Keslassy, Ariel Orda, and Tom Edsall. 2017. dRMT: Disaggregated Programmable
simple signaling procedures to programmable switches. Switching. In Proceedings of the ACM SIGCOMM Conference.
Control Plane Scalability. With the SDN paradigm, a logically [12] Eyal Cidon, Sean Choi, Sachin Katti, and Nick McKeown. 2017. AppSwitch:
Application-layer Load Balancing Within a Software Switch. In Proceedings of
centralized control plane can potentially become a performance the APNet.
bottleneck and prior work has identified two broad approaches [13] Andrew R. Curtis et al. 2011. DevoFlow: Scaling Flow Management for High-
to solve this control plane scalability challenge. Some SDN con- performance Networks. In Proceedings of the ACM SIGCOMM.
[14] Huynh Tu Dang et al. 2018. Consensus for Non-Volatile Main Memory. In IEEE
trollers [30, 56, 61] use the technique of horizontal scaling, where 26th International Conference on Network Protocols (ICNP).
the incoming control plane traffic is distributed amongst multiple [15] Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert
homogeneous SDN controllers, which cooperate to maintain a con- Soule. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the the
ACM SIGCOMM SoSR.
sistent view of the common global network wide state amongst [16] ETSI. 2017. The Evolved Packet Core. http://www.3gpp.org/technologies/
themselves using standard consensus protocols. In contrast, other keywords-acronyms/100-the-evolved-packet-core
[17] ETSI. 2018. 5G standards specification (23.501). https://www.etsi.org/deliver/etsi_
SDN controllers [13, 18, 19, 49, 62] use hierarchical scaling to offload ts/123500_123599/123501/15.02.00_60/ts_123501v150200p.pdf
control plane functionality to lower levels of “local” SDN controllers [18] Luyuan Fang, Fabio Chiussi, Deepak Bansal, Vijay Gill, Tony Lin, Jeff Cox, and
that perform different functions. Our work is inspired by hierarchi- Gary Ratterree. 2015. Hierarchical SDN for the hyper-scale, hyper-elastic data
center and cloud. In Proceedings of the SoSR.
cal SDN controllers but is quite different from them—we apply the [19] Soheil Hassas Yeganeh and Yashar Ganjali. 2012. Kandoo: A Framework for
idea of offloading computation from SDN controllers to dataplane Efficient and Scalable Offloading of Control Applications. In Proceedings of the
switches in the CUPS-based mobile packet core. HotSDN.
[20] R. E. Hattachi. 2015. Next Generation Mobile Networks, NGMN.
https://www.ngmn.org/fileadmin/ngmn/content/downloads/Technical/
2015/NGMN_5G_White_Paper_V1_0.pdf
7 CONCLUSION [21] Open Air Interface. 2016. EPC: S1 release. https://gitlab.eurecom.fr/oai/
openairinterface5g/issues/16
This paper described TurboEPC, a mobile packet core design where [22] Aman Jain, Sunny Lohani, and Mythili Vutukuru. 2016. Opensource SDN LTE
a subset of signaling messages are offloaded to programmable dat- EPC. https://github.com/networkedsystemsIITB/SDN_LTE_EPC
[23] Xin Jin, Li Erran Li, Laurent Vanbever, and Jennifer Rexford. 2013. SoftCell:
aplane switches in order to improve control plane performance. Scalable and Flexible Cellular Core Network Architecture. In Proceedings of the
TurboEPC dataplane switches store a small amount of control plane Ninth ACM Conference on Emerging Networking Experiments and Technologies.

94
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA

[24] Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soule, Proceedings of the ACM SIGCOMM Conference.
Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordina- [52] Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, S. Muthukrish-
tion. In 15th USENIX Symposium on Networked Systems Design and Implementation nan, and Jennifer Rexford. 2017. Heavy-Hitter Detection Entirely in the Data
(NSDI 18). Plane. In Proceedings of the the SoSR.
[25] Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soule, Jeongkeun Lee, Nate Foster, [53] Netronome Systems. 2017. vEPC Acceleration Using Agilio SmartNICs. https:
Changhoon Kim, and Ion Stoica. 2017. NetCache: Balancing Key-Value Stores //www.netronome.com/media/documents/SB_vEPC.pdf
with Fast In-Network Caching. In Proceedings of the SOSP. [54] Netronome systems. 2018. Agilio CX SmartNIC. https://www.netronome.com/
[26] Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer m/documents/PB_NFP-4000.pdf
Rexford. 2016. HULA: Scalable Load Balancing Using Programmable Data Planes. [55] Sami Tabbane. 2016. Core network and transmission dimensioning. https://www.
In Proceedings of the the SoSR. itu.int/en/ITU-D/Regional-Presence/AsiaPacific/SiteAssets/Pages/Events/2016/
[27] Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, Aug-WBB-Iran/Wirelessbroadband/core%20network%20dimensioning.pdf
and Lawrence J Wobker. 2015. In-band network telemetry via programmable [56] Amin Tootoonchian and Yashar Ganjali. 2010. HyperFlow: A Distributed Control
dataplanes. In ACM SIGCOMM. Plane for OpenFlow. In Proceedings of the the INM/WREN.
[28] Dr. Kim. 2017. 5G stats. https://techneconomyblog.com/tag/economics/ [57] TRAI. 2017. Highlights of Telecom Subscription Data. https://main.trai.gov.in/
[29] P. Kiss, A. Reale, C. J. Ferrari, and Z. Istenes. 2018. Deployment of IoT applications sites/default/files/PR_60_TSD_Jun_170817.pdf
on 5G edge. In IEEE International Conference on Future IoT Technologies. [58] P4 working group. 2017. Behavioral-model. https://github.com/p4lang/
[30] Teemu Koponen et al. 2010. Onix: A Distributed Control Platform for Large-scale behavioral-model/tree/master/targets/simple_switch_grpc
Production Networks. In Proceedings of the OSDI. [59] P4 working group. 2018. P4Runtime. https://github.com/p4lang/PI
[31] Open Networking Lab. 2017. ONOS SDN controller. https://github.com/ [60] Xilinx. 2018. Xilinx FPGA. https://www.xilinx.com/products/silicon-devices/
opennetworkinglab/onos fpga.html
[32] Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew [61] Soheil Hassas Yeganeh and Yashar Ganjali. 2016. Beehive: Simple Distributed
Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance Programming in Software-Defined Networks. In Proceedings of the SoSR.
In-Memory Key-Value Store with Programmable NIC. In Proceedings of the SOSP. [62] Minlan Yu et al. 2010. Scalable Flow-based Networking with DIFANE. In Proceed-
[33] Yuanjie Li, Zengwen Yuan, and Chunyi Peng. 2017. A control-plane perspective ings of the ACM SIGCOMM.
on reducing data access latency in LTE networks. In Proceedings of the 23rd
Annual International Conference on Mobile Computing and Networking.
[34] Heikki Lindholm et al. 2015. State Space Analysis to Refactor the Mobile Core.
In Proceedings of the AllThingsCellular.
[35] Nick McKeown et al. 2008. OpenFlow: enabling innovation in campus networks.
ACM SIGCOMM Computer Communication Review 38, 2 (2008).
[36] Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017.
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switch-
ing ASICs. In Proceedings of the the ACM SIGCOMM Conference.
[37] Ali Mohammadkhan, KK Ramakrishnan, Ashok Sunder Rajan, and Christian
Maciocco. 2016. CleanG: A Clean-Slate EPC Architecture and ControlPlane
Protocol for Next Generation Cellular Networks. In Proceedings of the 2016 ACM
Workshop on Cloud-Assisted Networking.
[38] Edgar Costa Molero, Stefano Vissicchio, and Laurent Vanbever. 2018. Hardware-
Accelerated Network Control Planes. In Proceedings of the 17th ACM Workshop
on Hot Topics in Networks (HotNets).
[39] Vasudevan Nagendra, Arani Bhattacharya, Anshul Gandhi, and Samir R. Das. 2019.
MMLite: A Scalable and Resource Efficient Control Plane for Next Generation
Cellular Packet Core. In Proceedings of the 2019 ACM Symposium on SDN Research.
[40] Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat
Arun, Mohammad Alizadeh, Vimalkumar Jeyakumar, and Changhoon Kim. 2017.
Language-Directed Hardware Design for Network Performance Monitoring. In
Proceedings of the the ACM SIGCOMM Conference.
[41] Barefoot networks. 2018. NoviWare 400.5 for Barefoot Tofino chipset. https:
//noviflow.com/wp-content/uploads/NoviWare-Tofino-Datasheet.pdf
[42] Nokia Siemens Networks. 2012. Signaling is growing 50% faster than data traf-
fic. https://docplayer.net/6278117-Signaling-is-growing-50-faster-than-data-
traffic.html
[43] David Nowoswiat. 2013. Managing LTE Core Network Signaling Traffic. https:
//www.nokia.com/en_int/blog/managing-lte-core-network-signaling-traffic
[44] M. Pozza, A. Rao, A. Bujari, H. Flinck, C. E. Palazzi, and S. Tarkoma. 2017. A
refactoring approach for optimizing mobile networks. In 2017 IEEE International
Conference on Communications (ICC).
[45] Zafar Ayyub Qazi, Melvin Walls, Aurojit Panda, Vyas Sekar, Sylvia Ratnasamy,
and Scott Shenker. 2017. A High Performance Packet Core for Next Generation
Cellular Networks. In Proceedings of the Conference of the ACM Special Interest
Group on Data Communication.
[46] M. T. Raza, D. Kim, K. Kim, S. Lu, and M. Gerla. 2017. Rethinking LTE network
functions virtualization. In IEEE 25th International Conference on Network Protocols
(ICNP).
[47] Rinku Shah. 2018. Cuttlefish open source project. https://github.com/
networkedsystemsIITB/cuttlefish
[48] Rinku Shah, Vikas Kumar, Mythili Vutukuru, and Purushottam Kulkarni. 2015.
TurboEPC github code. https://github.com/rinku-shah/turboepc
[49] Rinku Shah, Mythili Vutukuru, and Purushottam Kulkarni. 2018. Cuttlefish:
Hierarchical SDN Controllers with Adaptive Offload. In IEEE 26th International
Conference on Network Protocols (ICNP).
[50] Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster,
Nick McKeown, and Jennifer Rexford. 2016. PISCES: A Programmable, Protocol-
Independent Software Switch. In Proceedings of the ACM SIGCOMM Conference
(SIGCOMM).
[51] Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad
Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking.
2016. Packet Transactions: High-Level Programming for Line-Rate Switches. In

95

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy