2019 IEEE Access - Preprint
2019 IEEE Access - Preprint
2019 IEEE Access - Preprint
Abstract—In the recent years, the complexity of the network CPU cores, thus permanently stealing precious CPU cycles to
data plane and their requirements in terms of agility has other tasks (NFs deployed on the servers, or user applications
increased significantly, with many network functions now imple- running on the end hosts). Second, they require to install
mented in software and executed directly in datacenter servers.
To avoid bottlenecks and to keep up with the ever increasing additional kernel modules or to update the network card driver,
network speeds, recent approaches propose to move the software operations that are not always possible in production networks.
packet processing in kernel space using technologies such as Recent technologies such as eBPF [9], [10] and eXpress
eBPF/XDP, or to offload (part of it) in specialized hardware, Data Path (XDP) [11] offer excellent processing capabilities
the so called SmartNICs. This paper aims at guiding the reader
without requiring to permanently allocate dedicated resources
through the intricacies of the above mentioned technologies, lever-
aging SmartNICs to build a more efficient processing pipeline and in the host; eBPF programs combined with XDP are executed
providing concrete insights on their usage for a specific use case, at the earliest level of the Linux networking stack, directly
namely, the mitigation of Distributed Denial of Service (DDoS) upon the receipt of a packet and immediately after the driver
attacks. In particular, we enhance the mitigation capabilities RX queues. Furthermore, eBPF/XDP are included in vanilla
of edge servers by transparently offloading a portion of DDoS
Linux kernels, hence avoiding the need to install custom kernel
mitigation rules in the SmartNIC, thus achieving a balanced
combination of the XDP flexibility in operating traffic sampling modules or additional device drivers.
and aggregation in the kernel, with the performance of hardware- To further reduce the workload on the precious general-
based filtering. purpose CPU cores of the servers, system administrators have
We evaluate the performance in different combinations of host resumed the old idea of introducing programmable intelligent
and SmartNIC-based mitigation, showing that offloading part of
the DDoS network function in the SmartNIC can indeed optimize
networking adapters (a.k.a., SmartNICs) in their servers [12],
the packet processing but only if combined with additional [13], hence combining the flexibility of software network
processing on the host kernel space. functions with the improved performance of the hardware NIC
acceleration. SmartNICs offer hardware accelerators that en-
able to partially (or fully) offload packet processing functions;
I. I NTRODUCTION examples include load balancing [14], key-value stores [15]
With the recent trend of “network softwarization”, promoted or more generic flow-level network functions [16], [17]. On
by emerging technologies such as Network Function Virtu- the other hand, SmartNICs may present additional challenges
alization (NFV) and Software Defined Networking (SDN), due to their limited memory and computation capabilities
system administrators of data center and enterprise networks compared to current high-performance servers.
have started to replace dedicated hardware-based middleboxes In this paper we consider the potential of exploiting Smart-
with virtualized Network Functions (NFs) running on com- NICs on a specific use case, i.e., to mitigate volumetric
modity servers and end hosts [1]–[6]. This radical change has DDoS attacks, which are considered as one of the major
facilitated the provisioning of advanced and flexible network threats in today’s Internet, accounting for the 75.7% of the
services, ultimately helping the system administrators to cope total DDoS attacks [18]–[20]. While the detection of DDoS
with the rapid changes on service requirements and networking attacks is a largely studied problem in the literature with
workloads. several algorithms proposed to rapidly and efficiently detect
Unfortunately, the ever growing network capacity installed an ongoing attack, in this paper we focus on the challenges
in data center and enterprise networks requires a highly flexi- related to the DDoS attack mitigation; in particular, we explore
ble low-latency network processing, which is hardly achievable how the recent advances on the host data-plane acceleration
with standard packet processing mechanisms implemented in can be used to adequately handle the large speeds required by
the operating systems of servers and end-hosts. Common today’s networks.
solutions rely on kernel bypass approaches, such as DPDK [7] This paper provides the following contributions. First, we
and Netmap [8], which map the network hardware buffers analyze the various approaches that can be used to de-
directly to user space memory, hence bypassing the operating sign an efficient and cost-effective DDoS mitigation solution.
system. Although these technologies bring an unquestion- As generally expected, our results show that offloading the
able performance improvement, they also have two major mitigation task to the programmable NIC yields significant
limitations. First, they take the ownership of one (or more) performance improvements; however, we demonstrate also
2
that, due to the memory and compute limitations of current programs to adapt to the (dynamically changing) operating
SmartNIC technologies, a fully offloaded solution may lead conditions. This provides an unique option for flexibility and
to deleterious performance. Second, as a consequence of the efficiency that was not available before.
previous findings, we propose the design and implementation 1) eXpress Data Path (XDP): Networking eBPF programs
of a hybrid mitigation pipeline architecture that leverages the can be attached to different points of the Linux stack. Starting
flexibility of eBPF/XDP to handle different type of traffic and from Linux kernel v4.8, the eXpress Data Path (XDP) provides
attackers and the efficiency of the hardware-based filtering in the possibility to execute those programs at the lowest level
the SmartNIC to discard traffic from malicious sources. Third, of the TCP/IP stack, in the NIC driver itself, before the
we present a mechanism to transparently offload part of the allocation of costly kernel data structures (e.g., sk_buff),
DDoS mitigation rules into the SmartNIC, which takes into thus achieving the best possible packet processing performance
account the most aggressive sources, i.e., the ones that largely in the kernel stack. As consequence, they represent the best
impact on the mitigation effectiveness. choice to detect and drop malicious packets with minimal
This rest of the paper is structured as follows. Section II consumption of the host CPU resources, and will represent
presents a high-level overview of eBPF and XDP, together one of the key technologies exploited in this paper.
with the SmartNIC and TC Flower, the flow classifier of the
Linux traffic control kernel subsystem. Section III analyzes B. SmartNICs
the different approaches that can be used to build an efficient
Smart Network Interface Cards (SmartNICs) are intelligent
DDoS mitigation solution. Section IV presents the design of an
adapters used to boost the performance of servers by offloading
architecture that uses the above mentioned technologies to both
(part of) the network processing workload from the host CPU
detect and mitigate DDoS attacks, including the offloading
to the NIC itself [22]. Although the term SmartNIC is being
algorithm adopted to install the rules into the SmartNIC
widely used in the industry and academic world, there is still
(Section IV-A1), while keeping the flexibility and improved
some confusion over the precise definition. We consider tra-
performance of the in-kernel XDP packet processing. Finally,
ditional NICs the devices that provide several pre-defined of-
Section V provides the necessary evidence to the previous
floaded functions (e.g., transmit/receive segmentation offload,
findings, Section VI briefly discusses the related works and
checksum offload) without including a fully programmable
Section VII concludes the paper.
processing path, e.g., which may involve the presence of a
general-purpose CPU on board. In our context, a SmartNIC is
II. BACKGROUND a NIC equipped with a fully-programmable system-on-chip
A. extended Berkeley Packet Filter (eBPF) (SoC) multi-core processor that is capable to run a fully-
fledged operating system, offering more flexibility and hence
The extended Berkeley Packet Filter (eBPF) is an enhanced
potentially taking care of any arbitrary network processing
version of the original BPF virtual machine [21], originally
task. This type of SmartNIC can also be enhanced with a
developed as kernel packet filtering mechanism for the BSD
set of specialized hardware functionalities that can be used to
operating system and used by tools such as tcpdump. Com-
accelerate specific class of functions (e.g., OpenvSwitch data-
pared to the original version, eBPF enables the execution
plane) or to perform generic packet and flow-filtering. On the
of custom bytecode (either interpreted or compiled just-in-
other hand, they have limited compute and memory capabil-
time) at various points of the Linux kernel in a safe manner.
ities, making not always possible (or efficient) to completely
Furthermore, thanks to the support from the Clang/LLVM
offload all types of tasks. Furthermore, SmartNICs feature their
compiler, eBPF programs can be written in a restricted-C
own operating system and therefore may have to be handled
language, which is then compiled into the corresponding eBPF
separately from the host. For instance, offloading a network
object file that can be loaded into the kernel through the
task to the SmartNIC may require the host to have multiple
apposite bpf() system call. In addition to the improved
interactions with the card, such as to compile and inject the
and enriched instruction set, eBPF offers several pre-defined
new eBPF code, to execute additional commands (either on the
data structures (e.g., hash map, lru map, array) that can
host, or directly on the card) to exploit the available features
be read/written from either kernel or userspace program,
such as configure hardware co-processors. Finally, no current
hence providing the possibility to modify the behavior of an
standard exist to interact with SmartNICs, hence different (and
eBPF program based upon dynamically changing operating
often proprierary) methods have to be implemented when the
conditions. Moreover, it provides helper functions that can
support of several manufacturers is required.
either be used to implement complex features that may not be
feasible in the eBPF restricted-C, or to interact with kernel-
level functionalities. Finally, eBPF programs can be cascaded C. TC Flower
in order to create larger service chains. The above additional The Flow Classifier is a feature of the Linux Traffic Control
capabilities allow eBPF to provide its functions in a broad (TC) kernel subsystem that provides the possibility to match,
range of kernel-level use cases, such as tracing, security modify and apply different actions to a packet based on the
and networking. In particular, in the latter case, this special- flow it belongs to. It offers a common interface for hardware
purpose event-driven virtual machine enables arbitrary packet vendors to implement an offloading logic within their devices;
processing on incoming/outgoing traffic directly in the Linux when a TC Flower rule is added, active NIC drivers check
kernel, with the possibility to re-configure the existing eBPF if that rule is supported in hardware; in that case the rule is
3
pushed to the physical card, causing packets to be directly fixed allocation of one (or more) CPU cores to the above
matched in the hardware device, hence resulting in greater programs, independently from the presence of an ongoing
throughput and a decrease of the host CPU usage. attack, hence reducing the performance-cost ratio, as precious
TC Flower represents a promising technology that can hide CPU resources are no longer available for normal processing
the differences between different hardware manufacturers, but tasks (e.g., virtual machines).
it not able (yet) to support all the high-level features that may XDP can be considered as a mix of the previous approaches.
be available in modern SmartNICs. It is technically a kernel-space framework, although XDP
programs can be injected from userspace to the kernel, after
III. DD O S M ITIGATION : APPROACHES guaranteeing that all security properties are satisfied. XDP
Once a DDoS attack is detected, efficient packet dropping programs are executed in the kernel context but as early
is a fundamental part of a DDoS attack mitigation solution. In as possible, well before the netfilter framework, hence
a typical DDoS mitigation pipeline, a set of mitigation rules providing an improvement of an order of magnitude compared
are deployed in the server’s data plane to filter the malicious to iptables. The adoption of XDP to implement packet
traffic. The strategy used to block the malicious sources may filtering functionalities has grown over the years; (i) its perfect
be determined by several factors such as the characteristics integration with the Linux kernel makes it more efficient
of the server (e.g., availability of a SmartNIC, its hardware to pass legitimate packets up to the stack, (ii) its simple
capabilities), the characteristics of the malicious traffic (e.g., programming model makes it easy to express customized
number of attackers) or the type and complexity of the rules filtering rules without taking care of low-level details such as
that are used to classify the illegitimate traffic. In particular, required by common user-space framework and (iii) its event-
we envision the following three approaches. driven execution gives the possibility to consume resources
1) Host-based mitigation: In this case all traffic (either only when necessary, providing a perfect trade-off between
malicious or legitimate) is processed by the host CPU, which performance and CPU consumption.
drops incoming packets that match a given blacklist of ma- 2) SmartNIC-based mitigation: If the server is equipped
licious sources; this represents the only viable option if the with a SmartNIC, an alternative approach would be to offload
system lacks of any underlying hardware speedup. the entire mitigation task to this device. This enables to
All the host-based mitigation techniques and tools used dedicate all the available resources on the host CPU to the
today fall in two different macro-categories depending on target workloads, operating only on the legitimate traffic,
whether packets are processed at kernel or user-space level. freeing the host CPU from spending precious CPU cycles in
Focusing on Linux-based system, the first category includes the mitigation.
iptables and its derivatives, such as nftables, which However, although SmartNICs (by definition) support arbi-
represent the main tools used to mitigate DDoS attacks. It trary data path processing, they often differ on how this can
allows to express complex policies to the traffic, filtering be achieved. Possible options range from running a custom
packets inside the netfilter subsystem. However, the deep executable, which should already be present on the card, to
level in the networking stack where the packet processing dynamically inject a new program created on the fly, e.g.,
occurs causes poor performance when coping with increasing thanks to technologies such as XDP or P4, or to directly
speed of the today’s DDoS attacks, making this solution compile those programs into the hardware device [24]. This
practically unfeasible, as demonstrated in Section V. makes more cumbersome the implementation of offloading
As opposite to kernel-level processing, a multitude of fast features that run on cards from multiple manufacturers.
packet I/O frameworks relying on specialized NIC/networking In our context, we envision two different options: (i) exploit
drivers and user-space processing have been built over the past any hardware filter (if available) in the SmartNIC and, if
years. Examples such as Netmap [8], DPDK [7], PF RING the number of blacklisted addresses exceeds the capability
ZC [23] rely on a small kernel component that maps the NIC of the hardware (which may be likely, given the typical size
device memory directly to user space, hence making it di- of the above structure), block the rest of the traffic with a
rectly available to (network-specialized) userland applications custom dropping program (e.g., XDP) running on the NIC
instead of relying on normal kernel data-path processing. This CPU; (ii) block all the packets in software, running entirely
approach provides huge performance benefits compared to the on the SmartNIC CPU, e.g., in case the card does not have
standard kernel packet processing but incurs in several non- any hardware filtering capability. In both cases, the surviving
negligible drawbacks. First of all, these frameworks require (benign) traffic is redirected to the host where the rest of
to take the exclusive ownership of the NIC, so that all server applications are running. An evaluation of the above
packets received are processed by the userspace application. possibilities will be carried out in Section V.
This means that, in a DDoS mitigation scenario, packets 3) Hybrid (SmartNIC + XDP Host): An alternative strategy
belonging to legitimate sources have to be inserted back into that combines the advantages of the previous approaches
the kernel, causing unnecessary packet copies that slow down would be to adopt a hybrid solution where part of the malicious
the performance1 . Furthermore, these frameworks require the traffic is dropped by the SmartNIC (reducing the overhead
on the host’s CPU) and the remaining part is handled on the
1 It is worth mentioning that Netmap has a better kernel integration
host, possibly leveraging the much greater processing power
compared to DPDK; in fact, it is possible to inject packets back into the kernel
by just passing a pointer, without any copy. However, it is still subjected to available in modern server CPUs compared to the one available
a high CPU consumption compared to eBPF/XDP. in embedded devices.
4
A. Mitigation
DDOS DETECTION / MITIGATION LOGIC
USERSPACE
Insert and
TC MITIGATION DDoS ATTACK APPS The first program encountered in the pipeline is the filtering
monitor
FLOWER RATE MONITOR DETECTION
DDoS Rules
module, which matches the incoming traffic against the list
of blacklisted entries to drop packets coming from malicious
USER BLACKLIST & IP SRC- STATISTICS sources; surviving packets are redirected to the host where
KERNEL COUNTERS DST
BPF_HASH BPF_HASH additional (more advanced) checks can be performed before
XDP PROGRAM
redirecting packets directly to the next program in the pipeline
PASS
FEATURE (i.e., the feature extraction).
EXTRACTION
XDP PROGRAM Although our architecture is flexible enough to instantiate
tail call
FILTERING XDP (SWAP) the filtering program in different locations (e.g., SmartNIC,
FEATURE
EXTRACTION
Host, and even partitioned across the two above), at the
DROP
eBPF sandbox beginning we instantiate an XDP filtering program in the host
in order to obtain the necessary traffic information and decide
Fig. 1: High-level architecture of the system. the best mitigation strategy. If the userspace DDoS mitigation
module recognizes the availability of the hardware offload
functionality in the SmartNIC, it starts adding the filtering
rules into the hardware tables, causing malicious packet to
In this scenario, we exploit the fixed hardware functions
be immediately dropped in hardware. However, since those
commonly available in the current SmartNICs to perform
tables have often a limited size (typically ∼1-2K entries), we
stateless matching on selected packet fields and apply simple
place the most active top-K malicious talkers in the SmartNIC
actions such as modify, drop or allow packets. To avoid
hardware tables, where K is the size of those tables, while
redirecting all the traffic to the (less powerful) SmartNIC
the remaining ones are filtered by the XDP program running
CPU, we could let it pass through the above hardware tables
either on the SmartNIC CPU or on the host, depending on
(where the match/drop is performed at line rate) and forward
a configuration option that enables us to compare the results
the rest of the packets to the host, where the remaining part
with different operating conditions.
of the mitigation pipeline is running. However, given the
limited number of entries often available in the above hardware 1) Offloading algorithm: The selection of the top-K mali-
tables, which are not enough to contain the large number cious talkers that are most appropriate for hardware offloading
of mitigation rules needed during a large DDoS attack, the is carried out by the rate monitor module, which computes a
whole list of dropping targets is partitioned between the NIC set of statistics on the dropped traffic and applies a hysteresis-
and the host dropping program (e.g., XDP). This requires based function to predict the advantages of possibly modifying
specific algorithms to perform this splitting, which should the list of offloaded rules that are active in the SmartNIC. In
keep into account the difference in terms of supported rules fact, altering this list requires either computational resources or
and their importance. Interesting, this scenario in which the time (in our card a single rule update may require up to 2 ms),
companion filtering XDP program is executed in the server which may be unnecessary if the rank of the new top-K rules
is also compatible with some traditional NICs that support does not effectively impact on the mitigation effectiveness.
fixed hardware traffic filtering, such as Intel cards with Flow The pseudo-code of our algorithm is shown in Listing 1.
Director2 . In this case, the mitigation module can use the card- First, it computes a list of the global top-K sources, which
specific syntax (e.g., Flow Director commands) to configure contains both SmartNIC and XDP entries sorted in descending
filtering rules, with the consequent decrease of the filtering order according to their rate, and a second list containing only
processing load in the host. the offloaded entries, i.e., the ones present in the SmartNIC
hardware tables, which is arranged in ascending order. Next, it
computes the difference of the above lists, resulting in two lists
IV. A RCHITECTURE AND IMPLEMENTATION containing two disjoint set of elements; the first list contains
all the candidate rules that are not yet in the SmartNIC and the
This section presents a possible architecture that can be used second list includes the SmartNIC entries that are not in the
to compare the previous three approaches in the important use top-K anymore. At this point, starting from the first element
case of the DDoS mitigation, enabling a fair comparison of of the former list, it calculates the possible benefit obtained
their respective strength and weaknesses in the implementa- by removing the first entry of the second list (given by the
tion of an efficient and cost-effective mitigation pipeline. In ratio between the rate of the two entries) and inserting this
particular, we present the different components constituting new entry in the SmartNIC; if the value is greater than a
the proposed architecture (shown in Figure 1) and their role, certain threshold, the entry is moved into the offloaded list
together we some implementation details that result from the and the algorithm continues with the next entry. This threshold
use of the assessed technologies. is adjusted according to the current volume of DDoS traffic
and it is inversely proportional to it; this avoids unnecessary
2 The Flow Director is an Intel feature that supports advanced filters and changes in the top-K SmartNIC list when the traffic rate is low
packet processing in the NIC; for this reason it is often used in scenarios (compared to the maximum achievable rate), which may bring
where packets are small and traffic is heavy (e.g., DoS attacks). a negligible improvement. On the other hand, it increases the
5
Algorithm 1 Offloading algorithm companion userspace application and saved in memory for
Input: K, the max # of supported SmartNIC entries further processing. However, this process was found to be
Output: υk0 ← The list of SmartNIC entries. relatively slow; our tests report an average of 30µs to read
1: γk ← TOP-K Global entries a single entry from the eBPF map, requiring more than
2: υk ← TOP-K SmartNIC entries
3: SORT D ESCENDING(γk )
ten seconds to process the entire dataset in case of large
4: SORTA SCENDING(υk ) DDoS attacks (e.g., ∼300K entries). In fact, eBPF does not
5: γk0 ← γk - υk . Remove already offloaded entries provide any possibility to read an entire map within a single
6: υk0 ← υk - γk . List of non TOP-K rules bpf() system call, hence requiring to read each single value
7: for each γi,k
0 ∈ γ 0 do
k separately. As consequence, to guarantee coherent data to the
8: βi ← OFFLOAD G AIN(γi,k0 , υ0 )
i,k userspace detection application, we should lock the entire
9: if βi ≥ threshold then table while reading the values, but this would result in the
10: υk ← υk − υi,k
0 0 0 . Remove old entry from offload list
impossibility for the kernel to process the current incoming
11: υk0 ← υk0 + γi,k
0 . Add new entry into offload list
12: end if traffic for a considerable amount of time.
13: end for
To avoid the above problem, we adopted a swappable dual-
map approach, in which the userspace application reads data
from a first eBPF map that represents a snapshot of the traffic
update likelihood when the volume of traffic is close to the statistics at a given time, while the XDP program computes the
maximum achievable rate; in this scenario, where the system traffic information for the incoming packets received in the the
is overloaded, mitigating even slightly more aggressive talkers previous timespan, and saved in a second map. This process is
may introduce substantial performance benefits. repeated every time the periodic user-space detection process
is triggered, hence allowing the detection algorithm to always
work with consistent data. From the implementation point of
B. Feature extraction view, we opted for a swappable dual-program approach instead
Although not strictly belonging to the mitigation pipeline, of a swappable dual-map because of its reduced swapping
the feature extraction module monitors the incoming traffic latency. We create two feature extraction XDP programs, each
and collects relevant parameters required by the mitigation one with its own hash-map, and swap them atomically by
algorithm (e.g., counting the number of packets for each asking the filtering module to dynamically update the address
combination of source and destination hosts). Being placed of the next program in the pipeline, which basically means
right after the mitigation module, it receives all the (presumed) updating the target address of an assembly jump instruction.
benign traffic that has not been previously dropped so that can
be further analyzed and then passed up to the target applica- C. Detection
tions. XDP represents the perfect technology to implement this The identification of a DDoS attack is performed by the
component since it provides (i) the low overhead given by the detection module, which operates on the traffic statistics
kernel-level processing and (ii) the possibility to dynamically presented in the previous section and exploits the retrieved
change the behavior of the system by re-compiling and re- information to identify the right set of malicious sources,
injecting (in the kernel) an updated program when we require which are then inserted in the blacklist map used by the
the extraction of a different set of features . Moreover, XDP filtering module to drop the traffic.
offers the possibility to export the extracted information into Since the selection of the best mitigation algorithm is out
specific key-value data structures shared between the kernel of the focus of this paper, we provide here only a small
and userspace (i.e., where the DDoS attack detection algorithm description of the possible choices that, however, need to be
is running) or to directly send the entire packet up to userspace carefully selected depending on the characteristics of the envi-
if a more in-depth analysis is needed. ronment and the type of workloads running on the end-hosts.
In the former case, data are stored in a per-CPU eBPF In fact, different approaches are available [19], [25] falling in
hash map, which is periodically read by the userspace attack two main categories: (i) anomaly-based detection mechanisms
detection application. Since multiple instances of the same such as entropy-based approaches [26]–[28], used to detect
XDP program are executed in parallel on different CPU cores, variations in the distribution of traffic features observed in
each one processing a different packet, the use of a per-CPU consecutive timeframes and (ii) signature-based approaches
map guarantees very fast access to data thanks to its per-core that employ a-priori knowledge of attack signatures to match
dedicated memory; consequently data are never realigned with incoming traffic and detect intrusions.
the other caches present on other CPU cores, avoiding the It is important to note that the type of detection algo-
cost of cache synchronization. As result, each instance of the rithm may influence the exported traffic information on the
feature extraction works independently, saving the statistics feature extraction module; however, thanks to the excellent
of each IP source/destination on its own private map. In the programmability of XDP we can change the behavior of the
latter case, a specific eBPF helper is used to copy packets to program without impacting on the rest of the architecture.
a perf event ring buffer, which is then read by the userspace
application. D. Rate Monitor
Analysis and Aggregation. Computed traffic statistics are Sometimes, a given detection algorithm may erroneously
retrieved from each kernel-level hash-map, aggregated by the detect some legitimate sources as attackers. To counter this
6
16
64
1k
4K
K
8K
host is removed by mistake, the detection algorithm will re-add
25
16
64
12
to the list of malicious sources in the next iteration. # sources
(a)
40
HW + XDP SmartNIC
V. P ERFORMANCE EVALUATION HW + XDP Host
35
XDP Host
This section provides an insight of the benefits of Smart-
A. Test environment 0
16
64
1k
4K
K
8K
25
16
64
12
Our testbed includes a first machine used as packet gener- # sources
ator, which creates a massive DDoS attack with an increasing (b)
number of attack sources, and a second server running the Fig. 2: Dropping rate with an increasing number of attackers.
DDoS mitigation pipeline. Both servers are equipped with an (a): uniformly distributed traffic; (b): traffic normally dis-
Intel Xeon E3-1245 v5 with a quad-core CPU @3.50GHz, tributed among all sources.
8MB of L3 cache and two 16GB DDR4-2400 RAM mod-
ules, running Ubuntu 18.04.2 LTS and kernel 4.15. The two
machines are linked with two 25Gbps SmartNICs, with each where the traffic is uniformly distributed among all sources
port directly connected to the corresponding one of the other (Figure 2a) and a situation where the traffic generated by each
server. source follows a Gaussian distribution (Figure 2b). In addition,
We used Pktgen-DPDK v3.6.4 and DPDK v19.02 to gen- we report the CPU consumption for the first test (uniform
erate the UDP traffic (with small 64B packets) simulating the distribution) in Figure 3.
attack. We report the dropping rate of the system and the CPU 1) Iptables: One of the most common approaches for
usage, which are the two fundamental parameters to keep into DDoS attacks mitigation relies on iptables, a Linux tool
account during an attack. We also measure the capability of anchored to the netfilter framework that can filter traffic,
the server to perform real work (i.e., serve web pages) while perform network address translation and manipulate packets.
under attack, comparing the results of the different mitigation For this test we deployed all the rules containing the source IPs
approaches. In this case, the legitimate traffic is generated to drop in the PREROUTING netfilter chain, which provides
using the open-source benchmarking tool weighttp, which higher efficiency compared to the more common INPUT
creates a high number of parallel TCP connections towards the chain, which is encountered later in the networking stack.
device under test; in this case we count only the successfully Figure 2a and 2b show how the dropping rate of iptables
completed TCP sessions. are rather limited, around 2.5-4.5Mpps, even with a relatively
small number of attack sources, making this solution incapable
of dealing with the massive DDoS attacks under consideration.
B. Mitigation performance This is mainly given by the linear matching algorithm used
The first test measures the ability of the server to react by iptables, whose performance degrade rapidly when an
to massive DDoS attacks that involve an increasing number increasing number of rules are used, leading to a throughput
of sources (i.e., bots), showing the performance of different almost equal to zero with more than 4K rules. The CPU
mitigation approaches in terms of dropping rate (Mpps) and consumption (Figure 3) confirms this limitation; using iptables
CPU consumption. We generate 64B UDP packets at line- to mitigate large DDoS attacks would saturate the CPUs of the
rate at 25Gbps (i.e., 37.2Mpps); we consider both a scenario system, which would be occupied discarding traffic rather then
7
60 XDP Host
50 XDP SmartNIC the dropping rate is considerably higher than the HW + XDP
40 Iptables
SmartNIC case, thanks to the higher performance of the host
30
20 CPU compared to the SmartNIC one. Although hardware
10 filtering is available also on some “traditional” NICs (e.g.,
0 Intel with Flow Director), we were unable to implement the
16
64
1k
4K
K
8K
25
16
64
hybrid approach in them because of the unavailability of
12
# sources
hardware counters to measure the dropped packets for each
Fig. 3: CPU usage of the different mitigation approaches under source, which are required by our algorithm; however, we
a simulated DDoS attack (uniform distribution). cannot exclude that other mitigation algorithms can leverage
the hardware speed-up provided by the above cards as well.
executing the target services. 5) Final considerations: Figures 2a and 2b confirm a
2) Host-based mitigation: Compared to iptables, XDP in- clear advantage of the hardware offloading, which is even
tercepts packets at a lower level of the stack, right after the more evident depending on the distribution of the traffic. For
NIC driver. This test runs the entire mitigation pipeline in instance, in the second scenario (Figure 2b, with some sources
XDP without any help from the SmartNIC, which simply generating more traffic than others) we can reach even higher
redirects all the packets to the host where the XDP program dropping performance, thanks to the offloading algorithm that
is triggered. The dropping efficiency of XDP is much higher places the top-K malicious talkers in the SmartNIC, resulting
than iptables, being able to discard ∼26Mpps up to 1K sources, in more traffic dropped in hardware. Also the CPU consump-
and still ∼10Mpps with 128K attackers, using all CPU cores tion shown in Figure 3 confirms the clear advantage of the
of the target machine3 . This performance degradation is due to offloading, particularly when most of the traffic is handled by
the eBPF map used (BPF_HASH), in which the lookup time, the hardware of the SmartNIC, hence avoiding the host CPU
needed to match the IP source of the current packet against to take care of the above portion of malicious traffic. It is
the blacklist, is influenced by the total number of map entries. worth noticing that the case where a server has to cope with a
3) SmartNIC-based mitigation: In this case the mitigation limited number of malicious sources may be rather common,
pipeline is executed entirely on the SmartNIC. We performed as the incoming traffic in datacenters may be balanced across
a first test where the attack is mitigated only through an XDP multiple servers (backends), each one being asked to handle
filtering program in the SmartNIC CPU, without any help a portion of the connections and, hence, also a subset of the
from the hardware filter. Results shown in Figures 2a and 2b current attackers.
confirm a performance degradation compared to the host-based
mitigation due to the slower CPU of the NIC, balanced by C. Effect on legitimate traffic
the fact that we do not consume any CPU cycles in the host
This test evaluates the capability of the system to perform
(Figure 3), hence leaving room for other applications.
useful work (e.g., serve web pages) even in presence of a
A second test exploits a mixture of hardware filtering and
DDoS attack. We generate 64Bytes UDP packets towards
XDP-based software filtering in the card. Results demon-
the server simulating different attack rates and number of
strate that for relatively small attack sources (less than 512),
attackers, while a weighttp client generates 1M HTTP
the dropping rate is equal to the maximum achievable rate
requests (using 200 concurrent clients) towards the nginx
(37.2Mpps); in fact, the first K rules (where K=512 in our
server running on the target device. The capability of the server
card) are inserted in the SmartNIC hardware tables, causing
to perform real work is reported by the number of successfully
all the packets to be dropped at line rate. However, when
completed requests/s, with a timeout of 5 seconds, varying the
dealing with larger attacks (greater than 1K), the dropping rate
rate of DDoS traffic.
immediately decreases, since an increasing number of entries
Results, depicted in Figures 4a and 4b show the performance
stay outside the SmartNIC hardware tables; as a consequence,
with 1K and 4K attackers respectively. In the first case,
the dropping rate is influenced by the performance of the XDP
both hardware-based solutions reach the same number of
program running in the SmartNIC CPU. This approach may
connection/s, since almost all entries are dropped by the
be reasonable when the DDoS attack rate does not exceed
hardware, leaving the host’s CPU free to perform real work.
the maximum achievable dropping rate in the SmartNIC CPU,
The same behavior can be noticed when the mitigation is
which in our case is approximately 15Mpps; handling more
performed entirely on the SmartNIC CPU; in this case, the
massive attacks will cause the SmartNIC to drop packets
host’s CPU is underused, achieving the maximum number of
without processing, with an higher chances to drop also
HTTP requests/s that the DUT is able to handle. However, the
legitimate traffic, as highlighted in Section V-C.
performance immediately drop when the attack rate exceeds
3 In our case, the limiting factor is our Intel Xeon E3-1245 CPU, which is 15Mpps, which is the maximum rate that the SmartNIC CPU
able to drop around 10Mpps within a single core, as opposed to other (more sustain; in such scenario, NIC queues become rapidly full,
powerful) CPUs that are able to achieve higher rates (e.g., 24Mpps [11]). hence dropping packets without going through the mitigation
8
70000
unwanted traffic, given the enormous benefits from both filter-
HW + XDP SmartNIC ing performance and low resource consumption. In particular,
HW + XDP Host
60000 XDP Host in [31] Cloudflare presented a DDoS mitigation architecture
XDP SmartNIC that was initially based on kernel bypass, to overcome the
50000 Iptables
performance limitations of iptables, and classical BPF to filter
HTTP req/s
40000 of our knowledge, this work is the first that analyzes and
proposes a complete hardware/software architecture for the
30000
DDoS mitigation use case.
20000
0
Given the sheer increase in the amount of traffic handled by
5 10 15 20 25 30 35 modern datacenters, SmartNICs represent a promising solution
DDoS Traffic (Mpps) - 4K attackers
to offload part of the network processing to dedicated (and
(b) possibly more optimized) components. This paper presents an
Fig. 4: Number of successfully completed HTTP requests/s analysis of the various approaches that could be adopted to
under different load rates of a DDoS attack carried out by (a) introduce SmartNICs in server-based data plane processing,
1K attackers and (b) 4K attackers. assessing the achievable results in particular for the DDoS
mitigation use case under different alternatives. In this re-
spect, the paper describe a solution that combines SmartNICs
pipeline and increasing the chance to drop also legitimate with other recent technologies such as eBPF/XDP to handle
traffic. With respect to the XDP Host mitigation, we notice that large amounts of traffic and attackers. The key aspect of
the number of connections/s is initially lower, in presence of our solution is the adaptive hardware offloading mechanism,
small attack rates, compared to the SmartNIC-based solution, which partitions the attacking sources to be filtered among
since the host’s CPU has to handle the HTTP requests and, SmartNIC and/or host, smartly delegating the filtering of the
at the same time, execute the XDP program. However, when most aggressive DDoS sources to former.
the rate of the attack grows, it will continue to handle an According to our experiments, the best approach is a
adequate number of connections/s until 25Mpps, which is the combination of hardware filtering on the SmartNIC and XDP
maximum rate that the host XDP program is able to handle. software filtering on the host, which results more efficient in
Finally, iptables-based mitigation results unfeasible with large terms of dropping rate and CPU usage. In fact, running part
attack sources because of its very poor processing efficiency, of the filtering pipeline on the SmartNIC CPU would bring to
severely impacting on the capability of the server to handle inferior dropping performance due to its slower CPU, resulting
the legitimate traffic. in a lower capability to cope with large and massive DDoS
The same analysis is valid for larger attacks (e.g., 4K attacks.
sources); the main difference here is that the HW + XDP Host Our findings suggest that current SmartNICs can help
solution performs significantly better in this case, thanks to the mitigating the network load on congested servers, but may
higher processing capabilities of the host’s CPU compared to not represent a turn-key solution. For instance, an effective
the SmartNIC ones. SmartNIC-based solution for DDoS attacks may require the
presence of a DDoS-aware load balancer that distributes
VI. R ELATED WORK incoming datacenter traffic in a way to reduce the amount
The advantages of using XDP to filter packets at high of attackers landing on each server, whose number should
rates have been largely discussed and demonstrated [29], [30]; be compatible with the size of the hardware tables of the
several companies (e.g., Facebook, Cloudflare) have integrated SmartNIC. Otherwise, the solution may require the software
XDP in their data center networks to protect end hosts from running on the SmartNICs to cooperate with other components
9
running on the host, reducing the effectiveness of the solution [15] G. Siracusano and R. Bifulco, “Is it a smartnic or a key-
in terms of saved resources in the servers. value store?: Both!” in Proceedings of the SIGCOMM Posters
and Demos, ser. SIGCOMM Posters and Demos ’17. New
VIII. ACKNOWLEDGEMENT York, NY, USA: ACM, 2017, pp. 138–140. [Online]. Available:
http://doi.acm.org/10.1145/3123878.3132014
This work has received funding from the European Union’s [16] S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani,
Horizon 2020 Research and Innovation Programme under V. Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda,
F. Huici, and G. Siracusano, “Flowblaze: Stateful packet processing
grant agreement no. 815141 (DECENTER: Decentralised in hardware,” in 16th USENIX Symposium on Networked
technologies for orchestrated Cloud-to-Edge intelligence), Systems Design and Implementation (NSDI 19). Boston, MA:
www.decenter-project.eu. USENIX Association, 2019, pp. 531–548. [Online]. Available:
https://www.usenix.org/conference/nsdi19/presentation/pontarelli
[17] Y. G. Moon, I. Park, S. Lee, and K. S. Park, “Accelerating flow
R EFERENCES processing middleboxes with programmable nics,” in Proceedings of
[1] H. Ballani, P. Costa, C. Gkantsidis, M. P. Grosvenor, T. Karagiannis, the 9th Asia-Pacific Workshop on Systems, ser. APSys ’18. New
L. Koromilas, and G. O’Shea, “Enabling end-host network functions,” York, NY, USA: ACM, 2018, pp. 14:1–14:3. [Online]. Available:
in Proceedings of the 2015 ACM Conference on Special Interest http://doi.acm.org/10.1145/3265723.3265744
Group on Data Communication, ser. SIGCOMM ’15. New [18] Arbor Networks. Worldwide infrastructure security report.
York, NY, USA: ACM, 2015, pp. 493–507. [Online]. Available: [Online]. Available: https://pages.arbornetworks.com/rs/082-KNA-
http://doi.acm.org/10.1145/2785956.2787493 087/images/13th Worldwide Infrastructure Security Report.pdf [Ac-
[2] M. Casado, T. Koponen, S. Shenker, and A. Tootoonchian, “Fabric: A cessed: 2019-03-17]
retrospective on evolving sdn,” in Proceedings of the First Workshop [19] A. Srivastava, B. B. Gupta, A. Tyagi, A. Sharma, and A. Mishra, “A
on Hot Topics in Software Defined Networks, ser. HotSDN ’12. recent survey on ddos attacks and defense mechanisms,” in Advances
New York, NY, USA: ACM, 2012, pp. 85–90. [Online]. Available: in Parallel Distributed Computing, D. Nagamalai, E. Renault, and
http://doi.acm.org/10.1145/2342441.2342459 M. Dhanuskodi, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg,
[3] Y. Li, D. Wei, X. Chen, Z. Song, R. Wu, Y. Li, X. Jin, and W. Xu, 2011, pp. 570–580.
“Dumbnet: A smart data center network fabric with dumb switches,” in [20] E. Alomari, S. Manickam, B. Gupta, S. Karuppayah, and R. Alfaris,
Proceedings of the Thirteenth EuroSys Conference, ser. EuroSys ’18. “Botnet-based distributed denial of service (ddos) attacks on web
New York, NY, USA: ACM, 2018, pp. 9:1–9:13. [Online]. Available: servers: classification and art,” arXiv preprint arXiv:1208.0403, 2012.
http://doi.acm.org/10.1145/3190508.3190531 [21] S. McCanne and V. Jacobson, “The bsd packet filter: A new
[4] T. Karagiannis, R. Mortier, and A. Rowstron, “Network exception architecture for user-level packet capture,” in Proceedings of the
handlers: Host-network control in enterprise networks,” in Proceedings USENIX Winter 1993 Conference Proceedings on USENIX Winter
of the ACM SIGCOMM 2008 Conference on Data Communication, ser. 1993 Conference Proceedings, ser. USENIX’93. Berkeley, CA,
SIGCOMM ’08. New York, NY, USA: ACM, 2008, pp. 123–134. USA: USENIX Association, 1993, pp. 2–2. [Online]. Available:
[Online]. Available: http://doi.acm.org/10.1145/1402958.1402973 http://dl.acm.org/citation.cfm?id=1267303.1267305
[5] B. Pfaff, J. Pettit, K. Amidon, M. Casado, T. Koponen, and S. Shenker, [22] N. Tausanovitch. (2016, sept) What makes a nic a smartnic, and why is
“Extending networking into the virtualization layer.” in Hotnets, 2009. it needed? [Online]. Available: https://www.netronome.com/blog/what-
[6] R. Neugebauer, G. Antichi, J. F. Zazo, Y. Audzevich, S. López-Buedo, makes-a-nic-a-smartnic-and-why-is-it-needed/ [Accessed: 2019-03-17]
and A. W. Moore, “Understanding pcie performance for end host [23] ntop. Pf ring zc (zero copy). [Online]. Avail-
networking,” in Proceedings of the 2018 Conference of the ACM able: https://www.ntop.org/products/packet-capture/pf ring/pf ring-zc-
Special Interest Group on Data Communication, ser. SIGCOMM ’18. zero-copy/ [Accessed: 2019-03-17]
New York, NY, USA: ACM, 2018, pp. 327–341. [Online]. Available: [24] P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown,
http://doi.acm.org/10.1145/3230543.3230560 M. Izzard, F. Mujica, and M. Horowitz, “Forwarding metamorphosis:
[7] DPDK. (2018, jun) Data plane development kit. [Online]. Available: Fast programmable match-action processing in hardware for sdn,” in
https://www.dpdk.org/ [Accessed: 2019-03-17] Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM,
[8] L. Rizzo, “Netmap: a novel framework for fast packet i/o,” in 21st ser. SIGCOMM ’13. New York, NY, USA: ACM, 2013, pp. 99–110.
USENIX Security Symposium (USENIX Security 12), 2012, pp. 101– [Online]. Available: http://doi.acm.org/10.1145/2486001.2486011
112. [25] P. Kamboj, M. C. Trivedi, V. K. Yadav, and V. K. Singh, “Detection
[9] C. Authors. (2018, jul) Bpf and xdp reference guide. [Online]. Available: techniques of ddos attacks: A survey,” in 2017 4th IEEE Uttar Pradesh
https://cilium.readthedocs.io/en/latest/bpf/ [Accessed: 2019-03-17] Section International Conference on Electrical, Computer and Electron-
[10] M. Fleming. (2017, dec) A thorough introduction to ebpf. [Online]. ics (UPCON). IEEE, 2017, pp. 675–679.
Available: https://lwn.net/Articles/740157/ [Accessed: 2019-03-17] [26] S. Behal and K. Kumar, “Detection of ddos attacks and flash events
[11] T. Høiland-Jørgensen, J. D. Brouer, D. Borkmann, J. Fastabend, using novel information theory metrics,” Computer Networks, vol. 116,
T. Herbert, D. Ahern, and D. Miller, “The express data path: Fast pp. 96–110, 2017.
programmable packet processing in the operating system kernel,” [27] ——, “Detection of ddos attacks and flash events using information
in Proceedings of the 14th International Conference on Emerging theory metrics–an empirical investigation,” Computer Communications,
Networking EXperiments and Technologies, ser. CoNEXT ’18. New vol. 103, pp. 18–28, 2017.
York, NY, USA: ACM, 2018, pp. 54–66. [Online]. Available: [28] M. H. Bhuyan, D. Bhattacharyya, and J. K. Kalita, “An empirical
http://doi.acm.org/10.1145/3281411.3281443 evaluation of information metrics for low-rate and high-rate ddos attack
[12] D. Firestone, A. Putnam, S. Mundkur, D. Chiou, A. Dabagh, detection,” Pattern Recognition Letters, vol. 51, pp. 1–7, 2015.
M. Andrewartha, H. Angepat, V. Bhanu, A. Caulfield, E. Chung, H. K. [29] B. Blanco and Y. Lu. (2016, oct) Leveraging xdp for
Chandrappa, S. Chaturmohta, M. Humphrey, J. Lavier, N. Lam, F. Liu, programmable, high performance data path in openstack. [Online].
K. Ovtcharov, J. Padhye, G. Popuri, S. Raindel, T. Sapre, M. Shaw, Available: https://www.openstack.org/videos/barcelona-2016/leveraging-
G. Silva, M. Sivakumar, N. Srivastava, A. Verma, Q. Zuhair, D. Bansal, express-data-path-xdp-for-programmable-high-performance-data-path-
D. Burger, K. Vaid, D. A. Maltz, and A. Greenberg, “Azure accelerated in-openstack [Accessed: 2019-03-17]
networking: Smartnics in the public cloud,” in 15th USENIX Symposium [30] H. Zhou, Nikita, and M. Lau. (2017, apr) Xdp
on Networked Systems Design and Implementation (NSDI 18). Renton, production usage: Ddos protection and l4lb. [Online].
WA: USENIX Association, 2018, pp. 51–66. [Online]. Available: Available: https://www.netdevconf.org/2.1/slides/apr6/zhou-netdev-xdp-
https://www.usenix.org/conference/nsdi18/presentation/firestone 2017.pdf [Accessed: 2019-03-17]
[13] A. Caulfield, P. Costa, and M. Ghobadi, “Beyond smartnics: Towards a [31] G. Bertin, “Xdp in practice: integrating xdp into our ddos mitigation
fully programmable cloud.” pipeline,” in Technical Conference on Linux Networking, Netdev, 2017.
[14] R. Miao, H. Zeng, C. Kim, J. Lee, and M. Yu, “Silkroad: Making [32] A. Fabre. L4drop: Xdp ddos mitigations. [On-
stateful layer-4 load balancing fast and cheap using switching line]. Available: https://blog.cloudflare.com/l4drop-xdp-ebpf-based-
asics,” in Proceedings of the Conference of the ACM Special ddos-mitigations/ [Accessed: 2019-03-17]
Interest Group on Data Communication, ser. SIGCOMM ’17. New [33] Y. Le, H. Chang, S. Mukherjee, L. Wang, A. Akella, M. M. Swift, and
York, NY, USA: ACM, 2017, pp. 15–28. [Online]. Available: T. V. Lakshman, “Uno: Uniflying host and smart nic offload for flexible
http://doi.acm.org/10.1145/3098822.3098824 packet processing,” in Proceedings of the 2017 Symposium on Cloud
10
Computing, ser. SoCC ’17. New York, NY, USA: ACM, 2017, pp. 506–
519. [Online]. Available: http://doi.acm.org/10.1145/3127479.3132252
11