08 TuboEPC
08 TuboEPC
08 TuboEPC
ABSTRACT 1 INTRODUCTION
Recent architectures of the mobile packet core advocate the separa- Software Defined Networking (SDN) is a network design paradigm
tion of the control and dataplane components, with all signaling that advocates the separation of the control plane of a network ele-
messages being processed by the control plane entities. This paper ment, which makes the decision on how to handle network traffic,
presents the design, implementation, and evaluation of TurboEPC, from the dataplane that does the actual packet forwarding. With
a redesign of the mobile packet core that revisits the division of SDN, the more complex control plane functionality can be logically
work between the control and data planes. In TurboEPC, the con- centralized and implemented in agile software controllers, while
trol plane offloads a small amount of user state to programmable the dataplane can be implemented in simpler, efficient forwarding
dataplane switches, using which the switches can correctly pro- switches. This principle of separating the control and dataplane
cess a subset of signaling messages within the dataplane itself. The has also been applied to the design of the mobile packet core, and
messages that are offloaded to the dataplane in TurboEPC consti- is referred to as Control and User Plane Separation (CUPS) [2] by
tute a significant fraction of the total signaling traffic in the packet the telecom community. The mobile packet core, e.g., the 4G LTE
core, and handling these messages on dataplane switches closer to EPC (Long Term Evolution Evolved Packet Core), consists of net-
the end-user improves both control plane processing throughput working elements that perform control plane functionality such as
and latency. We implemented the TurboEPC design using P4-based authenticating users and setting up data sessions, and dataplane
software and hardware switches. The TurboEPC hardware proto- functionality of forwarding user traffic from the wireless radio net-
type shows throughput and latency improvements by up to 102× work to external data networks (§2). With recent releases of 4G (and
and 98% respectively when the switch hardware stores the state the new 5G standards) espousing the CUPS principle, the dataplane
of 65K concurrent users, and 22× and 97% respectively when the can move closer to the end users, enabling applications that require
switch CPU is busy forwarding dataplane traffic at linerate, over low latency data forwarding (e.g., self-driving cars).
the traditional EPC.
EPC procedure Number of transactions/sec
CCS CONCEPTS Attach 9K
• Networks → In-network processing; Programmable net- Detach 9K
works; Mobile networks. S1 release 300K
Service request 285K
Handover 45K
KEYWORDS
Network load when total subscribers in the core=1M
LTE-EPC, cellular networks, programmable networks, in-network
Table 1: Sample EPC load statistics [43, 55].
compute, smartNIC
In this paper, we revisit the boundary between the control plane
ACM Reference Format:
and dataplane in the CUPS-based design of the mobile packet
Rinku Shah, Vikas Kumar, Mythili Vutukuru, Purushottam Kulkarni. 2020.
core. Our work is motivated by two observations pertaining to
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile
Packet Core. In Symposium on SDN Research (SOSR ’20), March 3, 2020, San the signaling traffic in the core. First, signaling traffic is growing
Jose, CA, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/ rapidly [8, 42], fueled by smartphones, IoT devices and other end-
3373360.3380839 user equipment that connect frequently to the network in short
bursts. In fact, the signaling load in LTE is 50% higher than that of
2G/3G networks [42]. This high signaling load puts undue pressure
on the packet core, making it difficult for operators to meet the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed signaling traffic SLAs [29]. Second, the signaling procedures can
for profit or commercial advantage and that copies bear this notice and the full citation be classified into two types based on their frequency (see Table 1)
on the first page. Copyrights for components of this work owned by others than ACM and nature of processing. A small percentage of the signaling traf-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a fic consists of procedures like the attach procedure (1–2% of total
fee. Request permissions from permissions@acm.org. traffic, as per [43, 55]) that is executed when a user connects to the
SOSR ’20, March 3, 2020, San Jose, CA, USA mobile network for the first time, or the handover procedure (~5%)
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7101-8/20/03. . . $15.00 that is executed when the user moves across regions of the mobile
https://doi.org/10.1145/3373360.3380839 network. A significant fraction of the signaling traffic (~63–90%) is
83
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
made up of procedures like the S1 release that is invoked to release There are several challenges in realizing this idea. First, the
the forwarding state of the user when the user session goes idle for control plane state stored in switches may be modified locally by
a brief period, and the service request procedure that restores the the offloaded signaling messages, causing it to diverge from the
user’s forwarding state when the user becomes active again. Fur- “master copy” in the centralized control plane. This inconsistency
ther, these two classes of signaling procedures also require access in state may impact the correctness of the processing of other
to different types of state during their processing. Attaching a user non-offloadable signaling messages in the control plane, but syn-
to the network entails authenticating the user using a network- chronizing the two copies of the state continuously will erode the
wide subscriber database, and setting up the forwarding path of the performance gains of the offload itself. TurboEPC overcomes this
user under mobility requires access to the global network topology. challenge by synchronizing the offloaded control plane state with
However, frequent signaling procedures like the S1 release and its master copy only when such state is required for the process-
service request access only the user context of a single subscriber, ing of some non-offloadable message, and piggy-backs this state
and not network-wide global state. onto the said non-offloadable message itself. Second, dataplane
The key idea of our work is that we can improve control plane switches have limited memory, and the contexts of millions of
performance of the mobile packet core by offloading a subset of active users ([55, 57]) cannot possibly be accommodated within a
the control plane procedures like the S1 release and service request single switch. To overcome this challenge, TurboEPC partitions user
from the control plane onto the dataplane switches. Our idea is context across multiple switches as per operator policy, thereby
inspired by recent advances in dataplane technologies, where data- increasing the probability that the user context can be stored within
plane switches are evolving from fixed function hardware towards the dataplane. Third, switch failures can lead to loss of the latest
programmable components that can forward traffic at line rate version of the user context stored in switches, and result in inconsis-
while being highly customizable [41, 53]. Since the S1 release and tencies in the control plane state of the user. TurboEPC overcomes
service request procedures only access and modify user-specific this challenge by replicating user context across switches and im-
context, the handling of these procedures can be embedded into the plements a failover mechanism to tackle switch failures.
packet processing pipeline of programmable dataplane switches, We implemented TurboEPC (§4) over a simplified mobile packet
provided the required user context is made available in the switches. core consisting of the ONOS [31] SDN controller in the control plane
Offloading these frequent signaling procedures to the dataplane and P4-based bmv2 software switches [58] in the dataplane. The con-
switches improves both control plane throughput (by utilizing spare trol and dataplane components communicate using P4Runtime [59].
switch capacity for handling signaling traffic) and latency (by han- Our P4-based dataplane switch pipeline was also ported to P4-
dling signaling traffic closer to the end user at the switches). We programmable Netronome Agilio CX smartNICs [53], helping us
will use the term offloadable procedures to describe signaling pro- realize an implementation of TurboEPC on programmable hardware.
cedures in the mobile control plane that can be easily offloaded to Evaluation of our prototype (§5) shows that the software proto-
programmable dataplane switches. type of TurboEPC improves LTE EPC control plane throughput by
This paper describes TurboEPC (§3), a redesign of the LTE EPC 2.3× and reduces latency by 90% over the traditional CUPS-based
mobile packet core, where offloadable control plane procedures EPC design over realistic traffic mixes, by utilizing spare dataplane
are handled at programmable switches in the dataplane for better switch capacity for signaling message processing. Our hardware
control plane performance. TurboEPC modifies the processing of prototype shows throughput and latency improvements by up to
the non-offloadable messages (like the attach procedure) in the 102× and 98% respectively when the switch hardware stores the
control plane in such a way that a copy of the user-specific context state of 65K concurrent users, and 22× and 97% respectively when
information that is generated/modified during such procedures the switch CPU is busy forwarding dataplane traffic at linerate.
is pushed to dataplane switches closer to the end user. This user While prior work has proposed several optimizations to the mo-
context is stored in the switches along with the forwarding state bile packet core architecture (§6), to the best of our knowledge, we
needed for dataplane processing, and is used to process offloadable are the first to show that the control plane of mobile data networks
signaling messages within the dataplane switch itself.
84
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
can be accelerated by offloading signaling procedures on to pro- generalize to the 5G architecture, as well as other CUPS-based EPC
grammable dataplane switches. Further, while we have evaluated implementations, e.g., if the control plane components were to be
our ideas over the 4G core, we believe that our contributions apply standalone applications.
to the future 5G packet core [17] as well, because the design of the LTE EPC procedures. Figure 2 briefly illustrates a subset of the
5G dataplane maps closely to that in the CUPS-based 4G EPC. To LTE EPC control plane procedures When a UE connects to a LTE
summarize, the key contributions of this paper are: network for the first time, the initial message sent from the UE via
• TurboEPC, a redesigned mobile packet core that offloads a the eNB triggers the attach procedure in the core. During this pro-
significant fraction of signaling procedures from the con- cedure, the UE and the network mutually authenticate each other,
trol plane to programmable dataplanes, thereby improving by using the user state stored in Home Subscriber Subsystem (HSS),
performance. and establish security keys for use during future communication.
• An implementation of TurboEPC over P4-based programmable Finally, the MME sets up the state required to forward user traffic
software/hardware switches, to demonstrate the feasibility through the core at the SGW and PGW that are on the path from
of our design. the user to the external packet data network. The detach procedure
• A quantification of the performance gains of TurboEPC over reverses the processing of the attach procedure.
the traditional CUPS-based EPC design. In the dataplane, user data packets are tunneled through the S/P-
GWs using the GPRS Tunneling Protocol (GTP). The GTP header
consists of Tunnel Endpoint Identifiers (TEIDs) that uniquely iden-
2 BACKGROUND & MOTIVATION tify the path of a user’s traffic through the core, and the S/P-GWs in
Mobile packet core architecture. The core of a mobile network the core network route dataplane traffic based on the TEID values.
connects the radio access network, consisting of user equipments Separate TEIDs are generated for each of the two links on the data-
(UEs) and the base stations (eNBs), with other packet data networks, path (eNB-SGW and SGW-PGW) and for each of the two directions
including the Internet. Figure 1(a) shows the architecture of the of traffic (uplink and downlink). When a user’s IP data packet ar-
traditional 4G packet core, also called the LTE EPC (Long Term rives from the wireless network at the eNB, it is encapsulated into
Evolution Evolved Packet Core). The main components of the EPC a GTP packet, which is then transmitted over UDP/IP, first between
are the control plane Mobility Management Entity (MME) that han- the eNB and the SGW, and then between the SGW and PGW. The
dles all signaling procedures, and the dataplane Serving and Packet egress PGW strips the GTP header before forwarding the user’s
Gateways (SGW and PGW) that forward user traffic. The SGW and data to external networks.
PGW also participate in control plane procedures pertaining to If the UE goes idle without sending data for a certain duration
establishing and tearing down user forwarding paths. In order to (usually 10-30 seconds [21]), a S1 release procedure (figure 2(b)) is
enable independent scaling of the control and dataplane logic in the invoked. During this procedure, the uplink/downlink forwarding
S/P-GWs, later releases of 4G LTE espoused the Control and User rules for the user are deleted from the eNB, downlink forwarding
Plane Separation (CUPS) principle. Figure 1(b) shows the LTE EPC rules are deleted from the SGW, and the connection state of the
architecture with CUPS; the S/P-GWs are separated into control user changes idle. Later, when the UE becomes active again, the
and dataplane entities, which communicate using a standardized UE initiates a service request procedure (figure 2(c)) to restore the
protocol called PFCP (Packet Forwarding Control Protocol [2]). The forwarding state that was released during the idle period at the
upcoming 5G standard fully embraces the CUPS principle, as shown dataplane gateways. The user state at the MME also changes back
in Figure 1(c). In the 5G core, the Access and Mobility Management to being actively connected. When a UE moves from one network
Function (AMF), Session Management Function (SMF), and other location to another, it triggers a handover procedure in the core. The
components handle signaling traffic in the control plane, while the handover procedure involves, among other things, releasing the
User Plane Function (UPF) forwards traffic in the dataplane. The user’s forwarding context in the old dataplane gateways, and setting
control and dataplane components once again communicate via up the user’s forwarding context along the new path. Note that
PFCP. We base our discussion of TurboEPC in the rest of the pa- the core network performs several other procedures beyond those
per on the CUPS-based EPC architecture shown in Figure 1(b). We discussed here; however, this description suffices to understand our
assume that the MME and the control plane components of the S/P- work.
GWs are implemented atop an SDN controller, and the dataplane
of the S/P-GWs is implemented in SDN switches. Our ideas easily
85
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
Message Security Permanent Temporary IP address Registration Connection User Forwarding Policy Frequency
keys identifiers identifiers management management location state /QoS (%)
state state state
Attach req r+w r r+w r+w r+w r+w r+w r+w r+w 0.5 – 1
Detach req — r r+w r+w r+w r+w r+w r+w — 0.5 – 1
Service req — — r+w r — r+w — r+w — 30 – 46
S1 release — — r+w r — r+w — r+w — 30 – 46
Handover req r+w r r+w r+w r+w r+w r+w r+w r+w 4–5
86
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
User state (in bytes) Forwarding state (in bytes) state will be directly forwarded to the root by the eNB. However,
eNB 0 32 there are some non-offloadable messages in EPC (e.g., some types of
SGW 64 28 handover messages) that require access to both the latest offloaded
PGW 0 19 user context in the dataplane as well as the non-offloaded state
stored in the root. These messages are first sent to the dataplane
Table 4: Size of state stored at TurboEPC switches. switches by the eNB, and any processing of the message that can
and control plane messages to the EPC architecture, our ideas can be done without access to global state is performed at the switch
be generalized to other systems where these definitions apply as itself. Next, the message is forwarded from the dataplane switch to
well. the root controller, with a copy of the modified user context (that is
Figure 3 compares the CUPS-based traditional EPC design with subsequently invalidated at the switch) appended to the packet, in
TurboEPC. In the traditional CUPS-based EPC design (Figure 3(a)), order to correctly complete the rest of the processing at the root.
the MME, SGW-C, and PGW-C components are implemented within We acknowledge that TurboEPC introduces a small amount of
a root SDN controller in the control plane, while the dataplane pro- overhead during the processing of non-offloadable handover mes-
cessing is performed in dataplane switches (SGW-D & PGW-D). sages, since we need to piggyback the user context from the switch
The eNB forwards all control plane traffic to the root controller, to the root controller, as described above. This overhead may be
which processes these messages and installs forwarding state at the acceptable in current networks, because the handover messages
S/P-GW switches. All control plane state, including the per-user comprise only 4–5% [43, 55] of all signaling traffic. However, the
context, is maintained only in the control plane. In contrast, in the handover traffic can increase for future networks, e.g., with small
TurboEPC design (shown in Figure 3(b)), the eNB forwards offload- cells in 5G. We plan to revisit our handover processing to reduce
able messages (e.g., S1 release and service request) to the dataplane overhead in such usecases as part of our future work.
S/P-GW switches.1 To enable the processing of offloadable signal- This basic design of TurboEPC faces two significant challenges,
ing messages in the dataplane, the root controller in TurboEPC (i) A typical mobile core must handle millions of active connected
pushes a subset of the generated/modified per-user context into the users [55, 57], while switch memory is usually limited. For example,
dataplane switches after the completion of every non-offloadable recent high-end programmable switches have a few tens of MB of
signaling message processing. The user context that is pushed to memory available to store tables [41], which means that a single
the dataplane consists of a mapping between the UE identifier and switch can only store user context information for a few 100K users
the following subset of information pertaining to the user: the tun- in our current design. In fact, the Netronome programmable NIC
nel identifiers (TEIDs) and the UE connection state (connected/idle). hardware used in our prototype implementation [54] could only
This user context is stored in dataplane switch data structures, store user context information for 65K users. Therefore, it is unlikely
much like the forwarding state, and consumes an additional ≈64 that a single dataplane switch can accommodate the contexts of all
bytes of memory over and above the ≈32 bytes of forwarding state users connected to an eNB. (ii) The latest version of the modified
in our prototype (as shown in Table 4). user context stored at the switches may be lost in the case of switch
Offloadable signaling messages that arrive at the edge dataplane failures, making the UE’s view and the network’s view of the user’s
switches (close to the eNB) are processed within the switch dat- context inconsistent. We now describe how TurboEPC overcomes
aplane itself, by accessing and modifying the offloaded per-user these challenges.
context. For example, the S1 release request processing requires the
TurboEPC switch dataplane to delete the uplink/downlink TEIDs 3.2 Partitioning for Scalability
at the eNB and the downlink TEID at the SGW, change the user In order to overcome single switch memory limitations, and maxi-
connection state to idle, and update GUTI if required. Because these mize handling of offloadable messages at the dataplane, TurboEPC
offloadable messages reach the switch at least a few tens of seconds relies on multiple programmable switches in the core network. Tur-
(idle timeout) after the context is pushed by the root controller, the boEPC partitions the user context required to handle offloadable
state offload does not cause any additional delays while waiting for messages amongst multiple dataplane switches along the path from
state to be synchronized. If the signaling message requires a reply the eNB to S/P-GW (possibly including the S/P-GW itself). Further,
to be sent back to the user, the reply is generated and sent by the if the dataplane switches cannot accommodate all user contexts
switch dataplane as well. even with partitioning, some subset of the user contexts can be
Note that, after the user context has been modified by the offload- retained in the root controller itself. With this design, any given
able signaling messages within the switch data structures, the latest dataplane switch stores the contexts of only a subset of the users,
copy of this state resides only in the dataplane. TurboEPC does and handles the offloadable signaling messages pertaining to only
not synchronize this state back to the root after every modification those users. The switches over which the partitioning of user con-
to the offloaded state, because doing so nullifies the performance text state is done can be connected to each other in one of two
gains due to offload in the first place. Instead, TurboEPC lazily syn- ways, as we describe below.
chronizes this state with its master copy at the root controller only Series design. In the series design shown in Figure 4(a), the contexts
when required. That is, all future offloadable messages will access of a set of users traversing a certain eNB to S/P-GW path in the
the latest copy of the offloaded state within the dataplane itself, network are split amongst a series of programmable switches placed
and non-offloadable messages that do not depend on this offloaded along the path. When an offloadable control plane message arrives
1 We assume that the eNB is capable of analyzing the header of a signaling message to at one of the switches in the series, it looks up the user context
determine if it is offloadable or not. tables to check if the state of the incoming packet’s user exists on
87
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
the switch. If it exists (a hit), the switch processes the signaling interesting question that we defer to future work is deciding which
message as discussed in §3.1. If the user context is not found (a users should be handled at which switches. With the advent of new
miss), the packet is forwarded to the next switch in the series until use cases such as vehicular automation, IoT, smart sensors, and
the last switch is reached. If the user context is not found even at AR/VR in next generation networks, it is becoming important to
the last switch, the message is forwarded to the root controller, and provide ultra-low latency and ultra-high reliability in processing
is processed like in the traditional EPC. signaling traffic of some users. Subscribers who require low latency
Parallel design. Figure 4(b) depicts a parallel design, where the for their frequent signaling requests, but are not highly mobile (e.g.,
user context is distributed amongst programmable switches located smart sensors), are ideal candidates to offload to the dataplane. It is
on multiple parallel network paths between the eNB and the S/P- also conceivable to think that an operator would wish to offload
GW in the network. The difference from the series design is that the the contexts of premium subscribers. TurboEPC can support any
eNB now needs to maintain information on how the user contexts such operator-desired placement policy.
are partitioned along multiple paths, and must forward offload-
able messages of a certain user along the correct path that has the 3.3 Replication for Fault Tolerance
user’s state. This entails the extra step of parsing the signaling
In TurboEPC, a subset of the user context is pushed into the data-
message header to identify the user, and an additional table lookup
plane switches during the attach procedure. This context is then
to identify the path to send the message on, at the eNB. Offloadable
modified in the dataplane tables during the processing of subse-
signaling messages that do not find the necessary user context at
quent offloadable signaling messages. For example, the S1 release
the switches on any of the parallel paths are forwarded to the root.
message changes the connection state in the context from con-
While the series design leads to simpler forwarding rules at the eNB,
nected to idle. In the case of a switch failure, such modifications
the parallel design lends itself well to load balancing across network
could be lost, leaving the UE in an inconsistent state. For example,
paths. Note that, while our current implementation supports only
a UE might believe it is idle while a stale copy of the user con-
the simple series and parallel designs described above, a network
text at the root controller might indicate that the user is actively
could employ a combination of series and parallel designs, where
connected.
user contexts are partitioned across multiple parallel paths from
To be resilient to such failure scenarios, TurboEPC stores the user
the eNB to the S/P-GWs, and are further split amongst multiple
context at one primary dataplane switch, and another secondary
switches on each parallel path. Across all designs, the root con-
switch. During the processing of non-offloadable messages such as
troller installs suitable rules at all switches, to enable forwarding
the attach procedure, the root controller pushes the user context
of signaling message towards the switch that can handle it. §5 com-
to the primary as well as the secondary switch of that user. The
pares the performance of both designs, and evaluates the impact of
root also sets up forwarding paths such that offloadable signaling
partitioning state on TurboEPC performance.
messages of a user are directed to the primary switch of the user.
Partitioning user context. The question of how best to partition
Upon processing an offloadable message, the primary switch first
user contexts across multiple programmable switches in a large
synchronously replicates the updated user context at the secondary
network depends upon many factors, including the number of active
switch, before generating a response to the signaling message back
users, the size of the core network, the capacity of the programmable
to the user, as shown in Figure 4(c). Our current implementation
switches, and the routing and traffic engineering policies employed
uses simple synchronous state replication from the primary to one
within the network, and is beyond the scope of this paper. Another
other secondary switch, and is not resilient to failures of both the
88
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
89
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
TurboEPC packet processing pipeline. We now briefly describe Traffic Attach, S1 release, Handover %
the P4-based packet processing pipeline in the TurboEPC dataplane Mix Detach % Service request %
switches (Figure 6). Incoming packets in an EPC switch are first run Att-1 1 99 0
through a message redirection table that matches on various header Att-5 5 95 0
fields to identify if the incoming message is a signaling message, and
Att-10 10 90 0
if yes, where it should be forwarded to. This table is populated by
Att-50 50 50 0
the root controller to enable correct redirection of non-offloadable
HO-5 10 85 5
signaling messages to the root, and offloadable messages to the
Typical [43] 1–2 63–94 5
switch that has the particular user’s context. Packets that do not
match the message redirection table continue along the pipeline,
and are matched through multiple GTP forwarding tables for GTP- Table 5: LTE-EPC traffic mix used for experiments.
based dataplane forwarding.
90
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
3000
900 Traditional-EPC TurboEPC 1-chain throughput
TurboEPC TurboEPC 2-chain throughput
Average throughput (req/s)
200
1000
0.3x
100
500
0
Att-1 Att-5 Att-10 Att-50 HO-5 Typical
LTE-EPC Traffic Mix 0
<1ms 5ms 10ms
RTT to the core network
Figure 7: TurboEPC vs. traditional EPC: Throughput.
Figure 9: Throughput with varying distance to core, and
25 Traditional-EPC
TurboEPC
varying number of dataplane switches.
End to end latency (ms)
20 5%
TurboEPC 1-chain latency
1000 TurboEPC 2-chain latency
15 TurboEPC 3-chain latency
TurboEPC 4-chain latency
Traditional-EPC latency
0
Att-1 Att-5 Att-10 Att-50 HO-5 Typical
LTE-EPC Traffic Mix 1
91
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
1400 16
Average throughput (req/s)
Latency (ms)
10 Fail Recover
800 500
8
600 400
6
400 4 300
200 2
200
0 0
Traditional-EPC
TurboEPC-Series(1)
TurboEPC-Series(2)
TurboEPC-Series(3)
TurboEPC-Series(miss)
TurboEPC-Parallel(1)
TurboEPC-Parallel(2)
TurboEPC-Parallel(3)
TurboEPC-Parallel(miss)
100
1000
1100
1200
100
200
300
400
500
600
700
800
900
0
Time in secs
1000
1100
1200
100
200
300
400
500
600
700
800
900
like attach requests and handovers. Because offloadable messages 0
Time in secs
form a significant fraction of signaling traffic, TurboEPC improves
the overall control plane performance of the mobile packet core, Figure 13: TurboEPC latency during failover.
even though a small fraction of signaling messages may see slightly
degraded performance. where each switch adds an extra hop to latency. However, even
Series vs. parallel partitioning. Next, we perform experiments with 3 switches in series or parallel, TurboEPC latency is still lower
with the series vs. parallel state partitioning design variants of than that of the traditional EPC. We also see from the figure that
the TurboEPC software switch prototypes, to evaluate the perfor- the miss latency of offloadable message processing is worse than
mance impact of the additional complexity of these designs. This the message processing latency of the traditional EPC, because
experiment was performed with traffic mix Att-1 of Table 5 (1% the messages undergo multiple table lookups within the dataplane
attach-detach requests), and results for other traffic mixes were before eventually ending up at the root controller.
similar. We use multiple (up to 3) TurboEPC switches in series and TurboEPC fault tolerance. Next, we evaluate the fault tolerance
parallel configurations, and partition 100 active users uniformly of the TurboEPC design, by simulating a failure of the primary
over these switches. Besides these 100 users, our load generator switch in the middle of an experiment and observing the recovery.
also generates traffic on behalf of an additional 20 users whose Figure 12 shows the average throughput and Figure 13 shows the
contexts were not stored in the dataplane switches, to emulate the average latency of the fault-tolerant TurboEPC for an experiment
scenario where all contexts cannot be accommodated in the data- of duration 1200 seconds, where the primary switch was triggered
plane. Figure 11 shows the average control plane throughput and to fail after 600 seconds. Also shown in the graphs are the through-
latency of the TurboEPC-Series(n) and TurboEPC-Parallel(n) de- put and latency values of the basic TurboEPC without any fault
signs, for varying number of switches n in series and parallel, both tolerance for reference. We see that the throughput of the basic
when the context of the users is found within one of the switches TurboEPC is 40% higher and the latency is 33% lower than the fault
(hit) and when it is not (miss). We see from the figure that the Tur- tolerant design due to lack of the overhead of replication. After
boEPC throughput scales well when an additional switch becomes the failure of the primary switch, we found that the root controller
available to process offloadable signaling messages. The scaling is takes about 15 seconds to detect the primary switch failure, ∼2 ms
imperfect when there are 3 switches in series or parallel, because to push rules to eNB that would route incoming packets to the sec-
the eNB switch became the bottleneck in these scenarios. This eNB ondary switch, and ∼30 ms to restart offloadable signaling message
bottleneck is more pronounced in the case of the parallel design, processing at the secondary switch. During this recovery period, we
because the eNB does extra work to lookup the switch that has observed ∼200 signaling message retransmissions, but all signaling
the user’s context in the parallel design. We hope to tune the eNB messages were eventually correctly handled by TurboEPC after the
software switch to ameliorate this bottleneck in the future. failure.
While the throughput increases with extra TurboEPC switches,
the control plane latency also increases due to extra hop traversals 5.2 TurboEPC hardware prototype
and extra table lookups, as compared to the basic TurboEPC design. We now evaluate our hardware-based TurboEPC prototype, built
This impact on latency is more pronounced in the series designs, using the P4-programmable Netronome Agilio smartNIC [54].
92
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
Setup. The TurboEPC hardware setup was hosted on three Intel 106
TurboEPC (hardware) throughput
105
103
floadable message processing throughput and latency of a single
3
TurboEPC hardware switch. We evaluate the maximum throughput 10
with the smartNIC loaded with user state size varying from 100 102
102
to 65K. We found that the throughput does not vary when we add
state of more users to the smartNIC. Also shown are the through- 101
101
put and latency numbers for the traditional CUPS-based EPC (RTT
to the root < 1ms) for reference. We see from the table that our 100 100
No-data
1Gbps
2Gbps
3Gbps
4Gbps
5Gbps
6Gbps
7Gbps
8Gbps
TurboEPC hardware switch can successfully serve upto 65K users,
while providing 102× higher throughput and 98% lower latency Data traffic rate at the switch
than traditional EPC.
Performance with dataplane traffic. TurboEPC improves con- Figure 15: TurboEPC throughput with data traffic interfer-
trol plane throughput over the traditional EPC by leveraging the ence.
extra capacity at dataplane switches for offloadable signaling mes-
sage processing. However, the performance gains of TurboEPC may have spare processing capacity, an idea we have explored in our
be lower if the switch is busy forwarding dataplane traffic. We now prior work [49].
measure the impact of this dataplane cross traffic on the control
plane throughput of TurboEPC. We pump increasing amounts of 6 RELATED WORK
dataplane traffic through our TurboEPC hardware switch (with Optimizations to the Packet Core. Prior work has several pro-
state for 65K users) and measure the maximum rate at which the posals that redesign the mobile packet core to achieve a diverse set
switch can process offloadable signaling messages while forwarding of goals. Softcell [23] proposes the solution to accelerate the 4G dat-
data traffic simultaneously. Figure 15 show the signaling message aplane forwarding via offload of the packet route installation task
throughput and latency respectively, as a function of the dataplane to the edge switch. They further minimize the forwarding table size
traffic forwarded by the TurboEPC hardware dataplane switch. We by aggregating the flow rules within the switch. While this work
see from the figure that as the data traffic rate increases, the of- is primarily focused on optimizing the dataplane processing, Tur-
floadable signaling message throughput decreases, and response boEPC accelerates the control plane via offload of signaling message
latency varies between 100µs to 180µs. The throughput and latency processing to the edge switch. CleanG [37], PEPC [45], SCALE [5],
of the traditional EPC (RTT to the root < 1ms) is also shown for DMME [4], MMLite [39], MobileStream [10], DPCM [33], and other
reference in the figures. We observe that when the switch is idle, similar proposals [34, 44, 46] optimize 4G/5G control plane process-
the hardware based TurboEPC throughput is 102× higher, and the ing, much like TurboEPC. CleanG [37] and PEPC [45] refactor the
latency 98% lower, as compared to the traditional EPC. However, EPC control plane processing, to reduce the overhead of state trans-
even when the switch is forwarding data at line rate (8Gbps), we fer across components. SCALE [5] proposes a distributed design of
observe throughput to be 22× higher and latency 97% lower than the control plane, and horizontally scales the EPC control plane by
traditional EPC, confirming our intuition that spare switch CPU can distributing signaling load across multiple replicas. MMLite [39]
be used for handling offloaded signaling traffic. As part of future proposes a stateless scalable MME design by storing the user spe-
work, we plan to explore an adaptive offload design that offloads cific state in shared memory. MobileStream [10] decomposes the
signaling message processing to the dataplane only when switches traditionally monolithic control plane components and proposes the
use of a streaming framework for scalability. DPCM [33] modifies
93
SOSR ’20, March 3, 2020, San Jose, CA, USA Rinku Shah, et al.
the EPC protocol by reducing the number of messages exchanged state in switch tables, and use this state to process some of the
and by starting dataplane forwarding before completion of control more frequent signaling messages at switches closer to the edge.
plane processing. While these proposals advocate optimized archi- We implemented TurboEPC on P4-based software switches and
tectures of the EPC control plane, none of them revisit the boundary programmable hardware, and demonstrated that offloading signal-
between the EPC control and dataplanes. On the other hand, Tur- ing messages to the dataplane significantly improves control plane
boEPC revisits the split of functionality between the control plane throughput and latency.
software and dataplane switches, and proposes a refactoring of the
mobile core with the goal of offloading a subset of control plane ACKNOWLEDGEMENTS
processing to programmable dataplane switches closer to the end We thank our shepherd Sonia Fahmy and the anonymous reviewers
user. Therefore, this body of work is orthogonal and complemen- for their insightful feedback.
tary to our work, and TurboEPC can leverage these control plane
optimizations for the processing of non-offloadable messages at the REFERENCES
root controller. [1] 3GPP. 2017. 5G 3GPP specifications. https://www.3gpp.org/ftp/Specs/archive/
Programmable Dataplanes. While the first wave of SDN research 23_series/23.502/
decoupled the control plane from the dataplane and made the con- [2] 3GPP. 2017. Control and User Plane Separation. http://www.3gpp.org/cups
[3] Ashkan Aghdai et al. 2018. Transparent Edge Gateway for Mobile Networks. In
trol plane highly programmable, the second wave of SDN research IEEE 26th International Conference on Network Protocols (ICNP).
has made even the dataplanes highly programmable, realizing the [4] X. An, F. Pianese, I. Widjaja, and U. G. Acer. 2012. DMME: A distributed LTE
true vision of software defined networking. Today, dataplanes can mobility management entity. Bell Labs Technical Journal 17, 2 (2012), 97–120.
[5] Arijit Banerjee, Rajesh Mahindra, Karthik Sundaresan, Sneha Kasera, Kobus
be customized using P4 [6], a programming language to define Van der Merwe, and Sampath Rangarajan. 2015. Scaling the LTE Control-plane
packet processing pipelines. These software-defined dataplanes for Future Mobile Access. In Proceedings of the 11th ACM Conference on Emerging
Networking Experiments and Technologies.
can then be compiled to run on diverse targets, e.g., software [6] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer
switches [50, 58], hardware programmable switches guaranteed Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and
to work at line rate [7, 11, 41, 51], FPGAs [60], and smart pro- David Walker. 2014. P4: Programming Protocol-independent Packet Processors.
SIGCOMM Computer Communication Review 44 (2014).
grammable NICs [54]. Further, these programmable dataplanes [7] Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin
can be configured from software SDN controllers using standard Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis:
protocols [35, 59]. Programmable dataplanes have enabled a va- Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings
of the ACM SIGCOMM Conference.
riety of new applications within the dataplane, e.g., in-band net- [8] Gabriel Brown. 2012. On Signalling Storm. Retrieved November 10, 2018 from
work telemetry (INT) [27], traffic engineering [52], load balanc- https://blog.3g4g.co.uk/2012/06/on-signalling-storm-ltews.html
[9] Carmelo Cascone and Uyen Chau. 2018. Offloading VNFs to programmable
ing [12, 36], consensus [14, 15], traffic monitoring [40], key-value switches using P4. In ONS North America.
stores [25, 32], congestion control [26], and GTP header process- [10] Junguk Cho, Ryan Stutsman, and Jacobus Van der Merwe. 2018. MobileStream:
ing [3, 9]. Molero.E [38] demonstrate the possibility of accelerating A Scalable, Programmable and Evolvable Mobile Core Control Plane Platform. In
Proceedings of the 14th International Conference on Emerging Networking EXperi-
the control plane functions like failure detection/notification via ments and Technologies.
offload to programmable dataplanes. TurboEPC takes this line of [11] Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik,
work one step further, and proposes the offload of frequent and Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac
Keslassy, Ariel Orda, and Tom Edsall. 2017. dRMT: Disaggregated Programmable
simple signaling procedures to programmable switches. Switching. In Proceedings of the ACM SIGCOMM Conference.
Control Plane Scalability. With the SDN paradigm, a logically [12] Eyal Cidon, Sean Choi, Sachin Katti, and Nick McKeown. 2017. AppSwitch:
Application-layer Load Balancing Within a Software Switch. In Proceedings of
centralized control plane can potentially become a performance the APNet.
bottleneck and prior work has identified two broad approaches [13] Andrew R. Curtis et al. 2011. DevoFlow: Scaling Flow Management for High-
to solve this control plane scalability challenge. Some SDN con- performance Networks. In Proceedings of the ACM SIGCOMM.
[14] Huynh Tu Dang et al. 2018. Consensus for Non-Volatile Main Memory. In IEEE
trollers [30, 56, 61] use the technique of horizontal scaling, where 26th International Conference on Network Protocols (ICNP).
the incoming control plane traffic is distributed amongst multiple [15] Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert
homogeneous SDN controllers, which cooperate to maintain a con- Soule. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the the
ACM SIGCOMM SoSR.
sistent view of the common global network wide state amongst [16] ETSI. 2017. The Evolved Packet Core. http://www.3gpp.org/technologies/
themselves using standard consensus protocols. In contrast, other keywords-acronyms/100-the-evolved-packet-core
[17] ETSI. 2018. 5G standards specification (23.501). https://www.etsi.org/deliver/etsi_
SDN controllers [13, 18, 19, 49, 62] use hierarchical scaling to offload ts/123500_123599/123501/15.02.00_60/ts_123501v150200p.pdf
control plane functionality to lower levels of “local” SDN controllers [18] Luyuan Fang, Fabio Chiussi, Deepak Bansal, Vijay Gill, Tony Lin, Jeff Cox, and
that perform different functions. Our work is inspired by hierarchi- Gary Ratterree. 2015. Hierarchical SDN for the hyper-scale, hyper-elastic data
center and cloud. In Proceedings of the SoSR.
cal SDN controllers but is quite different from them—we apply the [19] Soheil Hassas Yeganeh and Yashar Ganjali. 2012. Kandoo: A Framework for
idea of offloading computation from SDN controllers to dataplane Efficient and Scalable Offloading of Control Applications. In Proceedings of the
switches in the CUPS-based mobile packet core. HotSDN.
[20] R. E. Hattachi. 2015. Next Generation Mobile Networks, NGMN.
https://www.ngmn.org/fileadmin/ngmn/content/downloads/Technical/
2015/NGMN_5G_White_Paper_V1_0.pdf
7 CONCLUSION [21] Open Air Interface. 2016. EPC: S1 release. https://gitlab.eurecom.fr/oai/
openairinterface5g/issues/16
This paper described TurboEPC, a mobile packet core design where [22] Aman Jain, Sunny Lohani, and Mythili Vutukuru. 2016. Opensource SDN LTE
a subset of signaling messages are offloaded to programmable dat- EPC. https://github.com/networkedsystemsIITB/SDN_LTE_EPC
[23] Xin Jin, Li Erran Li, Laurent Vanbever, and Jennifer Rexford. 2013. SoftCell:
aplane switches in order to improve control plane performance. Scalable and Flexible Cellular Core Network Architecture. In Proceedings of the
TurboEPC dataplane switches store a small amount of control plane Ninth ACM Conference on Emerging Networking Experiments and Technologies.
94
TurboEPC: Leveraging Dataplane Programmability to Accelerate the Mobile Packet Core SOSR ’20, March 3, 2020, San Jose, CA, USA
[24] Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soule, Proceedings of the ACM SIGCOMM Conference.
Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordina- [52] Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, S. Muthukrish-
tion. In 15th USENIX Symposium on Networked Systems Design and Implementation nan, and Jennifer Rexford. 2017. Heavy-Hitter Detection Entirely in the Data
(NSDI 18). Plane. In Proceedings of the the SoSR.
[25] Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soule, Jeongkeun Lee, Nate Foster, [53] Netronome Systems. 2017. vEPC Acceleration Using Agilio SmartNICs. https:
Changhoon Kim, and Ion Stoica. 2017. NetCache: Balancing Key-Value Stores //www.netronome.com/media/documents/SB_vEPC.pdf
with Fast In-Network Caching. In Proceedings of the SOSP. [54] Netronome systems. 2018. Agilio CX SmartNIC. https://www.netronome.com/
[26] Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer m/documents/PB_NFP-4000.pdf
Rexford. 2016. HULA: Scalable Load Balancing Using Programmable Data Planes. [55] Sami Tabbane. 2016. Core network and transmission dimensioning. https://www.
In Proceedings of the the SoSR. itu.int/en/ITU-D/Regional-Presence/AsiaPacific/SiteAssets/Pages/Events/2016/
[27] Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, Aug-WBB-Iran/Wirelessbroadband/core%20network%20dimensioning.pdf
and Lawrence J Wobker. 2015. In-band network telemetry via programmable [56] Amin Tootoonchian and Yashar Ganjali. 2010. HyperFlow: A Distributed Control
dataplanes. In ACM SIGCOMM. Plane for OpenFlow. In Proceedings of the the INM/WREN.
[28] Dr. Kim. 2017. 5G stats. https://techneconomyblog.com/tag/economics/ [57] TRAI. 2017. Highlights of Telecom Subscription Data. https://main.trai.gov.in/
[29] P. Kiss, A. Reale, C. J. Ferrari, and Z. Istenes. 2018. Deployment of IoT applications sites/default/files/PR_60_TSD_Jun_170817.pdf
on 5G edge. In IEEE International Conference on Future IoT Technologies. [58] P4 working group. 2017. Behavioral-model. https://github.com/p4lang/
[30] Teemu Koponen et al. 2010. Onix: A Distributed Control Platform for Large-scale behavioral-model/tree/master/targets/simple_switch_grpc
Production Networks. In Proceedings of the OSDI. [59] P4 working group. 2018. P4Runtime. https://github.com/p4lang/PI
[31] Open Networking Lab. 2017. ONOS SDN controller. https://github.com/ [60] Xilinx. 2018. Xilinx FPGA. https://www.xilinx.com/products/silicon-devices/
opennetworkinglab/onos fpga.html
[32] Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew [61] Soheil Hassas Yeganeh and Yashar Ganjali. 2016. Beehive: Simple Distributed
Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance Programming in Software-Defined Networks. In Proceedings of the SoSR.
In-Memory Key-Value Store with Programmable NIC. In Proceedings of the SOSP. [62] Minlan Yu et al. 2010. Scalable Flow-based Networking with DIFANE. In Proceed-
[33] Yuanjie Li, Zengwen Yuan, and Chunyi Peng. 2017. A control-plane perspective ings of the ACM SIGCOMM.
on reducing data access latency in LTE networks. In Proceedings of the 23rd
Annual International Conference on Mobile Computing and Networking.
[34] Heikki Lindholm et al. 2015. State Space Analysis to Refactor the Mobile Core.
In Proceedings of the AllThingsCellular.
[35] Nick McKeown et al. 2008. OpenFlow: enabling innovation in campus networks.
ACM SIGCOMM Computer Communication Review 38, 2 (2008).
[36] Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017.
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switch-
ing ASICs. In Proceedings of the the ACM SIGCOMM Conference.
[37] Ali Mohammadkhan, KK Ramakrishnan, Ashok Sunder Rajan, and Christian
Maciocco. 2016. CleanG: A Clean-Slate EPC Architecture and ControlPlane
Protocol for Next Generation Cellular Networks. In Proceedings of the 2016 ACM
Workshop on Cloud-Assisted Networking.
[38] Edgar Costa Molero, Stefano Vissicchio, and Laurent Vanbever. 2018. Hardware-
Accelerated Network Control Planes. In Proceedings of the 17th ACM Workshop
on Hot Topics in Networks (HotNets).
[39] Vasudevan Nagendra, Arani Bhattacharya, Anshul Gandhi, and Samir R. Das. 2019.
MMLite: A Scalable and Resource Efficient Control Plane for Next Generation
Cellular Packet Core. In Proceedings of the 2019 ACM Symposium on SDN Research.
[40] Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat
Arun, Mohammad Alizadeh, Vimalkumar Jeyakumar, and Changhoon Kim. 2017.
Language-Directed Hardware Design for Network Performance Monitoring. In
Proceedings of the the ACM SIGCOMM Conference.
[41] Barefoot networks. 2018. NoviWare 400.5 for Barefoot Tofino chipset. https:
//noviflow.com/wp-content/uploads/NoviWare-Tofino-Datasheet.pdf
[42] Nokia Siemens Networks. 2012. Signaling is growing 50% faster than data traf-
fic. https://docplayer.net/6278117-Signaling-is-growing-50-faster-than-data-
traffic.html
[43] David Nowoswiat. 2013. Managing LTE Core Network Signaling Traffic. https:
//www.nokia.com/en_int/blog/managing-lte-core-network-signaling-traffic
[44] M. Pozza, A. Rao, A. Bujari, H. Flinck, C. E. Palazzi, and S. Tarkoma. 2017. A
refactoring approach for optimizing mobile networks. In 2017 IEEE International
Conference on Communications (ICC).
[45] Zafar Ayyub Qazi, Melvin Walls, Aurojit Panda, Vyas Sekar, Sylvia Ratnasamy,
and Scott Shenker. 2017. A High Performance Packet Core for Next Generation
Cellular Networks. In Proceedings of the Conference of the ACM Special Interest
Group on Data Communication.
[46] M. T. Raza, D. Kim, K. Kim, S. Lu, and M. Gerla. 2017. Rethinking LTE network
functions virtualization. In IEEE 25th International Conference on Network Protocols
(ICNP).
[47] Rinku Shah. 2018. Cuttlefish open source project. https://github.com/
networkedsystemsIITB/cuttlefish
[48] Rinku Shah, Vikas Kumar, Mythili Vutukuru, and Purushottam Kulkarni. 2015.
TurboEPC github code. https://github.com/rinku-shah/turboepc
[49] Rinku Shah, Mythili Vutukuru, and Purushottam Kulkarni. 2018. Cuttlefish:
Hierarchical SDN Controllers with Adaptive Offload. In IEEE 26th International
Conference on Network Protocols (ICNP).
[50] Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster,
Nick McKeown, and Jennifer Rexford. 2016. PISCES: A Programmable, Protocol-
Independent Software Switch. In Proceedings of the ACM SIGCOMM Conference
(SIGCOMM).
[51] Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad
Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking.
2016. Packet Transactions: High-Level Programming for Line-Rate Switches. In
95