Xen Networking
Xen Networking
Dongli Zhang
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 1 / 26
Plan
Paravirtualized Networking
vif, bridge, bond
Emulated Networking
Environment:
xen: Oracle VM server 3.3.3 with xen-4.3.0-55.el6.47.33.x86 64
dom0: Unbreakable Enterprise Kernel v4.1.12-89
domU: Unbreakable Enterprise Kernel v4.1.12-89
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 2 / 26
Paravirtual xen-netfront/xen-netback framework
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 3 / 26
xen-netfront/xen-netback source code
Unbreakable Enterprise Kernel v4.1.12-89
drivers/net/xen-netfront.c
drivers/net/xen-netback/xenbus.c
drivers/net/xen-netback/netback.c
drivers/net/xen-netback/interface.c
DomU Application
DomU Application
id = 1 id = 2
eth0 eth0
TCP/IP Stack TCP/IP Stack
(xen-netfront) (xen-netfront)
via ring buffer, grant table via ring buffer, grant table
and event channel and event channel
vif1.0 vif2.0
(xen-netback) (xen-netback)
xenbr0
Dom0
(bridge)
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 5 / 26
Paravirtual networking scenario 2/2
DomU Application
DomU Application
eth0 eth0
TCP/IP Stack TCP/IP Stack
(xen-netfront) (xen-netfront)
via ring buffer, grant table via ring buffer, grant table
and event channel and event channel
eth0 eth0
vif1.0 (e1000e) (e1000e) vif1.0
(xen-netback) (xen-netback)
bond0 bond0
(bonding) (bonding)
xenbr0 xenbr0
Dom0 (bridge) Dom0 (bridge)
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 6 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
pv xmit: front —> backend 1/3
netfront_queue 0 xenvif_queue 0
net_device net_device
(eth0)
● queue id TX Ring Buffer
● queue id (vif1.0)
● tx_irq ● tx_irq
● rx_irq
queue 0 ● rx_irq
● tx ring info ● tx ring info
● rx ring info ● rx ring info
netfront_info xenvif
● * queues ● * queues
netfront_queue 1 netfront_queue 1
TX Ring Buffer
● queue id queue 1
● queue id
● tx_irq ● tx_irq
● rx_irq ● rx_irq
● tx ring info ● tx ring info
● rx ring info ● rx ring info
xen-netfront xen-netback
… ... TX Ring Buffer … ...
queue n
netfront_queue n netfront_queue n
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 8 / 26
pv xmit: front —> backend 2/3
… ...
skb headroom headroom skb
xen_netif_tx_request PROD
head ● gref = 1802 head
data ● offset = 2298 data
DATA ● id = 10 DATA
tail size = 1478 GNTTABOP_copy tail
1478 bytes ●
XEN_NETIF_EXTRA_TYPE_GSO
● gso.size = 1460
skb_shared_info skb_shared_info
● gso_size = 1460 xen_netif_tx_request ● gso_size = 1460
● gref = 1801 skb_frag_t
skb_frag_t ● offset = 3384 GNTTABOP_map
page
● offset = 3384 page ● id = 9 1350 bytes
● size= 1496 ● size = 712
skb_frag_t page
page xen_netif_tx_request GNTTABOP_map
712 bytes
● gref = 1800
● offset = 0 skb_frag_t
● id = 8
GNTTABOP_map page
784 bytes
size = 784 CONS
xen-netfront ●
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 9 / 26
pv xmit: front —> backend 3/3
headroom skb
head
DATA data
GNTTABOP_copy tail
128 bytes
end
tailroom
xenvif_zerocopy_callback()
skb_shared_info
● gso_size = 1460
● void * destructor_arg
mapped from front end
struct ubuf_info
grant table reference page skb_frag_t ● callback()
GNTTABOP_map
1350 bytes
page
skb_frag_t
GNTTABOP_map
712 bytes
struct ubuf_info
page ● callback()
Called when skb (with frags)
skb_frag_t is free to unmap pages
GNTTABOP_map
784 bytes
from frontend
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 10 / 26
pv xmit: backend —> bridge —> bond
netif_receive_skb(skb)
// skb→dev = vif1.0 struct net_device (xenbr0)
br_forward
struct net_bridge_port
● struct net_bridge *br
● struct net_device *dev
● struct list_head list
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 11 / 26
pv xmit: bond —> physical NIC
bond_start_xmit
struct net_device (bond0) // skb→dev = bond0
struct bonding
● struct slave *curr_active_slave case BOND_MODE_ACTIVE_BACKUP:
bond_dev_queue_xmit
struct slave
● struct bonding *bond set skb→dev = slave (eth0, e1000e)
● struct net_device *dev // e1000e is from bonding.curr_active_slate
call bond_dev_queue_xmit
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 12 / 26
pv recv: physical NIC —> bond —> bridge
__netif_receive_skb(skb)
struct slave
● struct net_device *dev
● struct bonding *bond bond_handle_frame
// obtain slave from rx_handler_data
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 13 / 26
pv recv: bridge —> backend
br_forward
struct net_bridge_port
● struct net_bridge *br
● Struct net_device *dev
● struct list_head list
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 14 / 26
pv recv: backend —> frontend
xen-netback … ...
● Consume Request
kthread RX Ring Buffer xennet_rx_interrupt
● Produce Response
vif1.0-q[n]-guest-rx queue [n] queue[n].napi->poll=xennet_poll
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 15 / 26
xen-netfront/xen-netback summary: req/rsp protocol
1 1st page of linear data (skb->data) 1 1st page of linear data (skb->data)
2 extra info (xen netif extra info) 2 extra info (xen netif extra info)
3 the rest of linear data (skb->data) 3 the rest of linear data (skb->data)
4 all skb fragments (skb shinfo(skb)->frags) 4 all skb fragments (skb shinfo(skb)->frags)
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 16 / 26
xen-netfront/xen-netback summary: irq and napi
xenvif_kthread_guest_rx
wake up kthread
xennet_start_xmit XMIT [vif1.0-guest-rx]
xenvif_rx_action
Ring Buffer RECV
RECV XMIT
Grant Table
napi_poll xennet_rx_interrupt napi_poll
xenvif_tx_interrupt
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 17 / 26
features: multiqueue (default)
netif_init()
netback_init()
xennet_max_queues = num_online_cpus()
xenvif_max_queues = num_online_cpus()
talk_to_netback()
num_queues = min(max_queues, xennet_max_queues)
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 18 / 26
features: gso/tso offload
Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload
Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload
Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload
Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;
TSO and other offload features are stored in xenstore (e.g., feature-gso-tcpv4)
.ndo fix features = xennet fix features
.ndo set features = xennet set features
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload
Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;
TSO and other offload features are stored in xenstore (e.g., feature-gso-tcpv4)
.ndo fix features = xennet fix features
.ndo set features = xennet set features
checksum offload
XEN NETTXF csum blank: Protocol checksum field is blank in the packet (hardware
offload)
XEN NETTXF data validated: Packet data has been validated against protocol checksum
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: multicast
XEN_NETIF_EXTRA_TYPE_MCAST_ADD
XEN_NETIF_EXTRA_TYPE_MCAST_DEL
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 20 / 26
xen-netfront/xen-netback init
udev: /etc/udev/rules.d/xen-backend.rules
/etc/xen/scripts/vif-setup
DomU e.g., set vif1.0 mtu and `brctl addif` vif1.0 to xenbr0
Dom0
user xm create vm.cfg
xenstore will write device config
kernel to xenstore
eth0
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 21 / 26
performance tuning
netfront/netback multiqueue
Limit and pin dom0 CPUs to first NUMA socket
Interrupt affinity to reduce CPU 0 workload
domU vcpu affinity to improve memory access performance
Jumbo frame
NIC offload
TCP Parameter Settings
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 22 / 26
Interesting works related to paravirtual I/O
Achieving 10 Gb/s Using Safe and Transparent Network Interface Virtualization. VEE
2009
Efficient and Scalable Paravirtual I/O System. USENIX ATC 2013
rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers. ASPLOS 2015
vRIO: Paravirtual remote I/O. ASPLOS 2016
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 23 / 26
Networking Emulation with QEMU
DomU emulated
bridge
qemu-dm Dom0
Application
emulated tap
e1000 userspace
e1000
eth0 vif1.0-emu
TCP/IP Stack
(e1000) (tap device)
xenbr0
Xen Hypervisor
event channel
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 24 / 26
qemu arguments
pvm
/usr/lib/xen/bin/qemu-dm -d 4 -serial pty -domain-name testpv -videoram 4 -k en-us -vnc
0.0.0.0:0 -vncunused -M xenpv
pvhvm
/usr/lib/xen/bin/qemu-dm -d 5 -domain-name oel65.xm -videoram 4 -k en-us -vnc 0.0.0.0:0
-vncunused -vcpus 2 -vcpu avail 0x3 -boot dc -serial pty -acpi -net none -M xenfv
hvm
/usr/lib/xen/bin/qemu-dm -d 3 -domain-name oel65.xm -videoram 4 -k en-us -vnc 0.0.0.0:0
-vncunused -vcpus 2 -vcpu avail 0x3 -boot dc -serial pty -acpi
-net nic,vlan=1,macaddr=00:16:e3:cc:64:a9,model=e1000
-net tap,vlan=1,ifname=vif3.0-emu,bridge=xenbr0,script=no,downscript=no
-M xenfv
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 25 / 26
Take-Home Message
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26