Cloud and Datacenter Networking
Cloud and Datacenter Networking
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 2
Fat-tree: a scalable commodity DC network architecture
Topology derived from Clos networks 3-layers fat-tree network k=4
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 3
Fat-tree (continues)
k3/4 hosts grouped in
3-layers fat-tree network k=4
k pods with (k/2)2 hosts each
Each edge switch connects k/2 hosts
to k/2 aggregation switches
Each aggregation switch connects k/2 edge switches
to k/2 core switches
(5/4) k2 switches, of which (k/2)2 core switches
To obtain the required high capacity in the core layer
a large number of links are aggregated
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 4
Fat-tree (continues)
Fat-tree network: redundancy
3-layers fat-tree network k=4
k different paths exist between any pair of hosts
Only one path exists between a given core switch
1 2 3 4
and any possible host
Question:
how is it possible to exploit the alternate paths ?
The network topology in the picture has:
16 access links (server-switch)
16 edge-aggregation link
16 aggregation-core links
No bottlelnecks in the upper layers
A limited amount of oversubscription
may be introduced
(for instance, by using only 2 core switches)
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 5
Datacenter networks and alternate paths exploitation
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 6
TRILL: Transparent Interconnection of Lots of Links
TRILL relies on special switches called R-Bridges
R-Bridges run a link-state protocol:
learn the network topology through the exchange of Link State Packets (LSPs)
compute shortest path tree between them
The link-state protocol used by TRILL is IS-IS
IS-IS was originally defined as an ISO/OSI standard (ISO/IEC 10589:2002 ) and later described in IETF RFC1142
IS-IS chosen because it runs directly over Layer 2, so it can be run without configuration
no IP addresses need to be assigned
TRILL switches are identified by 6-byte IS-IS System ID and by 2-bytes nicknames
TRILL is compatible with existing IP Routers: R-Bridges are transparent to IP routers
R-Bridges encapsulate each packet they receive from hosts with a header bringing the ID of the next-hop
R-Bridge in the shortest path to the destination
the R-bridge which is closest to the destination decapsulates the packet before delivering it to the destination
TRILL data packets between R-Bridges have a Local Link header and a TRILL header
For unicast packets:
Local Link header contains the addresses of the local source R-Bridge to the next hop R-Bridge
TRILL header specifies the first/ingress R-Bridge and the last/egress R-Bridge
A 6-bits hop count is decreased at each R-Bridge
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 7
TRILL packet forwarding
Figure by Ronald van der Pol, from “TRILL and IEEE 802.1aq Overview” (Apr.2012)
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 8
TRILL in the datacenter
TRILL implemented in the switches at all layers …
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 9
Multi-path routing: ECMP
In a datacenter with multiple possible paths between any (source, destination) couple
ECMP allows to randomly spread traffic over alternative paths
At the first edge switch, traffic from 10.0.1.2 to 10.2.0.3 is randomly routed
either on the left path or on the right path
Also aggregation switches may randomly choose one among two different paths
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 10
ECMP and flow hashing
To avoid misordered delivery of packets belonging to the same flow, ECMP calculates the
hash of the packet header to determine the output port at each switch
In this manner, packets of the same flow, i.e. with same (source, destination), follow the
same path and are not misordered
Works well for a large number of small flows → traffic is evenly distributed
If multiple long-lasting flows are mapped onto the same ports, this technique may lead to an
unbalance of traffic flows
This problem arises because, actually, the concept of flow above is too coarse
To avoid this problem and achieve a more fine-grained balancing of traffic, randomization
may occur at micro-flow or flowlet level
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 11
ECMP and flowlets
A flowlet is a sequence of consecutive packets whose inter-arrival is smaller than the
conservative estimate of latency difference between any two paths within the datacenter
network
If two flowlets are routed along different paths, no misordered delivery may happen anyway
> ∆I > ∆I
[FLARE] Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur Berger.
Dynamic load balancing without packet reordering.
ACM SIGCOMM Comput. Commun. Rev. 37, 2, pp. 51-62, March 2007
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 12
ECMP issues: local decisions
One issue with ECMP is that it only takes local decisions without any knowledge of further
links status
In this example topology, once the path has been pinned to the core switch, there’s no
further alternative to a given destination (i.e. only one path)
If a link fails, ECMP can do nothing to prevent upstream switches to select the path that
contains that link, even if an alternative path exists
FAILED LINK
Cloud and Datacenter Networking Course – Prof. Roberto Canonico – Università degli Studi di Napoli Federico II 13