0% found this document useful (0 votes)
46 views

Xen Networking

This document introduces networking in Oracle VM (Xen), including paravirtualized networking using vif, bridge, and bond interfaces. It discusses the xen-netfront/xen-netback framework and how it uses ring buffers, grant tables, and event channels for communication between guest and host. Examples are given of networking scenarios involving bridging and bonding. The differences between paravirtual and emulated networking drivers are also outlined.

Uploaded by

Adam Nowak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Xen Networking

This document introduces networking in Oracle VM (Xen), including paravirtualized networking using vif, bridge, and bond interfaces. It discusses the xen-netfront/xen-netback framework and how it uses ring buffers, grant tables, and event channels for communication between guest and host. Examples are given of networking scenarios involving bridging and bonding. The differences between paravirtual and emulated networking drivers are also outlined.

Uploaded by

Adam Nowak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Introduction to Oracle VM (Xen) Networking

Dongli Zhang

Oracle Asia Research and Development Centers (Beijing)


dongli.zhang@oracle.com

May 30, 2017

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 1 / 26
Plan

Paravirtualized Networking
vif, bridge, bond

Emulated Networking
Environment:
xen: Oracle VM server 3.3.3 with xen-4.3.0-55.el6.47.33.x86 64
dom0: Unbreakable Enterprise Kernel v4.1.12-89
domU: Unbreakable Enterprise Kernel v4.1.12-89

Prerequisite Knowledge: http://finallyjustice.github.io/xen-arch.pdf


xen framework
PVM vs. HVM vs. PVHVM
event channel, grant table
xen admin hands-on experience (preferred)

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 2 / 26
Paravirtual xen-netfront/xen-netback framework

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 3 / 26
xen-netfront/xen-netback source code
Unbreakable Enterprise Kernel v4.1.12-89
drivers/net/xen-netfront.c
drivers/net/xen-netback/xenbus.c
drivers/net/xen-netback/netback.c
drivers/net/xen-netback/interface.c

kernel upstream v4.9-rc8


drivers/net/xen-netfront.c
drivers/net/xen-netback/xenbus.c
drivers/net/xen-netback/netback.c
drivers/net/xen-netback/interface.c
drivers/net/xen-netback/rx.c
drivers/net/xen-netback/hash.c
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 4 / 26
Paravirtual networking scenario 1/2

DomU Application
DomU Application
id = 1 id = 2

eth0 eth0
TCP/IP Stack TCP/IP Stack
(xen-netfront) (xen-netfront)

via ring buffer, grant table via ring buffer, grant table
and event channel and event channel

vif1.0 vif2.0
(xen-netback) (xen-netback)

xenbr0
Dom0
(bridge)

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 5 / 26
Paravirtual networking scenario 2/2

DomU Application
DomU Application

eth0 eth0
TCP/IP Stack TCP/IP Stack
(xen-netfront) (xen-netfront)

via ring buffer, grant table via ring buffer, grant table
and event channel and event channel
eth0 eth0
vif1.0 (e1000e) (e1000e) vif1.0
(xen-netback) (xen-netback)

bond0 bond0
(bonding) (bonding)

xenbr0 xenbr0
Dom0 (bridge) Dom0 (bridge)

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 6 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver
device discovery PCI Tree
device configuration PCI Config Space (IO/MMIO)
data flow DMA Ring Buffer
shared memory N/A or IOMMU
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree
device configuration PCI Config Space (IO/MMIO)
data flow DMA Ring Buffer
shared memory N/A or IOMMU
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree Xenstore
device configuration PCI Config Space (IO/MMIO)
data flow DMA Ring Buffer
shared memory N/A or IOMMU
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree Xenstore
device configuration PCI Config Space (IO/MMIO) Xenstore
data flow DMA Ring Buffer
shared memory N/A or IOMMU
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree Xenstore
device configuration PCI Config Space (IO/MMIO) Xenstore
data flow DMA Ring Buffer Memory Ring Buffer
shared memory N/A or IOMMU
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree Xenstore
device configuration PCI Config Space (IO/MMIO) Xenstore
data flow DMA Ring Buffer Memory Ring Buffer
shared memory N/A or IOMMU Grant Table
interrupt IOAPIC, MSI, MSI-X

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
PV driver vs. PCI driver

PCI driver PV driver


device abstraction pci device, pci driver xenbus device, xenbus driver
device discovery PCI Tree Xenstore
device configuration PCI Config Space (IO/MMIO) Xenstore
data flow DMA Ring Buffer Memory Ring Buffer
shared memory N/A or IOMMU Grant Table
interrupt IOAPIC, MSI, MSI-X Event Channel

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 7 / 26
pv xmit: front —> backend 1/3

netfront_queue 0 xenvif_queue 0
net_device net_device
(eth0)
● queue id TX Ring Buffer
● queue id (vif1.0)
● tx_irq ● tx_irq
● rx_irq
queue 0 ● rx_irq
● tx ring info ● tx ring info
● rx ring info ● rx ring info

netfront_info xenvif
● * queues ● * queues
netfront_queue 1 netfront_queue 1
TX Ring Buffer
● queue id queue 1
● queue id
● tx_irq ● tx_irq
● rx_irq ● rx_irq
● tx ring info ● tx ring info
● rx ring info ● rx ring info
xen-netfront xen-netback
… ... TX Ring Buffer … ...
queue n
netfront_queue n netfront_queue n

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 8 / 26
pv xmit: front —> backend 2/3
… ...
skb headroom headroom skb
xen_netif_tx_request PROD
head ● gref = 1802 head
data ● offset = 2298 data
DATA ● id = 10 DATA
tail size = 1478 GNTTABOP_copy tail
1478 bytes ●

end 128 bytes end


xen_netif_extra_info
type =
tailroom tailroom

XEN_NETIF_EXTRA_TYPE_GSO
● gso.size = 1460

skb_shared_info skb_shared_info
● gso_size = 1460 xen_netif_tx_request ● gso_size = 1460
● gref = 1801 skb_frag_t
skb_frag_t ● offset = 3384 GNTTABOP_map
page
● offset = 3384 page ● id = 9 1350 bytes
● size= 1496 ● size = 712
skb_frag_t page
page xen_netif_tx_request GNTTABOP_map
712 bytes
● gref = 1800
● offset = 0 skb_frag_t
● id = 8
GNTTABOP_map page
784 bytes
size = 784 CONS
xen-netfront ●

.ndo_start_xmit = xennet_start_xmit … ... xen-netback


napi→poll = xenvif_poll
Ring Buffer

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 9 / 26
pv xmit: front —> backend 3/3

headroom skb

head
DATA data
GNTTABOP_copy tail
128 bytes
end

tailroom
xenvif_zerocopy_callback()
skb_shared_info
● gso_size = 1460
● void * destructor_arg
mapped from front end
struct ubuf_info
grant table reference page skb_frag_t ● callback()
GNTTABOP_map
1350 bytes

page
skb_frag_t
GNTTABOP_map
712 bytes
struct ubuf_info
page ● callback()
Called when skb (with frags)
skb_frag_t is free to unmap pages
GNTTABOP_map
784 bytes
from frontend

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 10 / 26
pv xmit: backend —> bridge —> bond

netif_receive_skb(skb)
// skb→dev = vif1.0 struct net_device (xenbr0)

struct net_device (vif1.0) struct net_bridge


= br_handle_frame
● rx_handler_func_t *rx_handler ● struct list_head port_list
// check bridging table Struct hlist_head hash[BR_HASH_SIZE]
● void *rx_handler_data ●

br_forward

set skb→dev = bond0


bond_start_xmit
call bond0.ndo_start_xmit

struct net_bridge_port
● struct net_bridge *br
● struct net_device *dev
● struct list_head list

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 11 / 26
pv xmit: bond —> physical NIC

bond_start_xmit
struct net_device (bond0) // skb→dev = bond0

struct bonding
● struct slave *curr_active_slave case BOND_MODE_ACTIVE_BACKUP:
bond_dev_queue_xmit

struct slave
● struct bonding *bond set skb→dev = slave (eth0, e1000e)
● struct net_device *dev // e1000e is from bonding.curr_active_slate
call bond_dev_queue_xmit

call e1000e.ndo_start_xmit e1000_xmit_frame


struct net_device (e1000e eth0)

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 12 / 26
pv recv: physical NIC —> bond —> bridge

struct net_device (e1000e eth0) napi_gro_receive e1000_receive_skb


● rx_handler_func_t *rx_handler
● void *rx_handler_data

__netif_receive_skb(skb)

struct slave
● struct net_device *dev
● struct bonding *bond bond_handle_frame
// obtain slave from rx_handler_data

struct net_device (bond0)

skb→dev = master (bond0) bridging


struct bonding netif_receive_skb(skb)
● struct slave *curr_active_slave

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 13 / 26
pv recv: bridge —> backend

netif_receive_skb(skb) struct net_device (xenbr0)


struct net_device (bond0) // skb→dev = bond0
● rx_handler_func_t *rx_handler = struct net_bridge
● void *rx_handler_data
br_handle_frame ● struct list_head port_list
// check bridge table ● struct hlist_head hash[BR_HASH_SIZE]
struct bonding
● struct slave *curr_active_slave

br_forward

set skb→dev = vif1.0


xenvif_start_xmit
call vif1.0.ndo_start_xmit

struct net_bridge_port
● struct net_bridge *br
● Struct net_device *dev
● struct list_head list

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 14 / 26
pv recv: backend —> frontend

net_device .ndo_start_xmit = xenvif_start_xmit


(vif1.0) xen-netfront
I. Select a queue for the skb to be sent ● Pre-produce request
II. Wake up the queue’s kthread ● Consume response

RX Ring Buffer xennet_rx_interrupt


xenvif kthread
vif1.0-q0-guest-rx queue 0 queue[0].napi->poll=xennet_poll
● * queues

kthread RX Ring Buffer xennet_rx_interrupt


vif3.0-q0-guest-rx
vif1.0-q1-guest-rx queue 1 queue[1].napi->poll=xennet_poll

xen-netback … ...
● Consume Request
kthread RX Ring Buffer xennet_rx_interrupt
● Produce Response
vif1.0-q[n]-guest-rx queue [n] queue[n].napi->poll=xennet_poll

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 15 / 26
xen-netfront/xen-netback summary: req/rsp protocol

netfront to netback (produce req) netback to netfront (produce rsq)

1 1st page of linear data (skb->data) 1 1st page of linear data (skb->data)
2 extra info (xen netif extra info) 2 extra info (xen netif extra info)
3 the rest of linear data (skb->data) 3 the rest of linear data (skb->data)
4 all skb fragments (skb shinfo(skb)->frags) 4 all skb fragments (skb shinfo(skb)->frags)

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 16 / 26
xen-netfront/xen-netback summary: irq and napi

DomU Xen Hypervisor Dom0

clean ring buffer via xenvif_start_xmit


xennet_tx_buf_gc xennet_tx_interrupt xenvif_rx_interrupt // select queue
Event channel

xenvif_kthread_guest_rx
wake up kthread
xennet_start_xmit XMIT [vif1.0-guest-rx]
xenvif_rx_action
Ring Buffer RECV

RECV XMIT

xennet_poll xenvif_tx_action xenvif_poll

Grant Table
napi_poll xennet_rx_interrupt napi_poll
xenvif_tx_interrupt

net_rx_action schedule net_rx_action


queue->napi schedule
queue->napi
NET_RX_SOFTIRQ NET_RX_SOFTIRQ

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 17 / 26
features: multiqueue (default)

DomU-netfront xenstore timeline Dom0-netback

netif_init()
netback_init()
xennet_max_queues = num_online_cpus()
xenvif_max_queues = num_online_cpus()

talk_to_netback() backend: multi-queue-max-queues = 8 netback_probe()


read “multi-queue-max-queues” from xenstore write xenvif_max_queues to xenstore
to max_queues “multi-queue-num-queues”

talk_to_netback()
num_queues = min(max_queues, xennet_max_queues)

talk_to_netback() backend: multi-queue-max-queues = 8


write num_queues to xenstore connect()
frontend: multi-queue-num-queues = 4
“multi-queue-num-queues” read “multi-queue-num-queues” from xenstore
to requested_num_queues

init ring, evtchn/irq


for each queue at frontend init ring, evtchn/irq, netback kthread
for each queue at backend

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 18 / 26
features: gso/tso offload

Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload

Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload

Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload

Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;
TSO and other offload features are stored in xenstore (e.g., feature-gso-tcpv4)
.ndo fix features = xennet fix features
.ndo set features = xennet set features

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: gso/tso offload

Segmentation Offload
GSO (Generic Segmentation Offload): software segmentation
TSO (TCP Segmentation Offload): hardware segmentation
TSO would postpone segmentation to as late (low level) as possible
TSO info is shared via ”struct xen netif extra info gso” in ring buffer
gso.gso->u.gso.size = skb shinfo(skb)->gso size;
gso->u.gso.type = XEN NETIF GSO TYPE TCPV4;
TSO and other offload features are stored in xenstore (e.g., feature-gso-tcpv4)
.ndo fix features = xennet fix features
.ndo set features = xennet set features
checksum offload
XEN NETTXF csum blank: Protocol checksum field is blank in the packet (hardware
offload)
XEN NETTXF data validated: Packet data has been validated against protocol checksum

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 19 / 26
features: multicast

Please support multicast


I will support multicast
xenstore
DomU request-multicast-control = 1 Dom0

XEN_NETIF_EXTRA_TYPE_MCAST_ADD

xen_netif_extra_info struct list_head fe_mcast_addr

XEN_NETIF_EXTRA_TYPE_MCAST_DEL

check registered multicast address


and drop if not registered

if (!xenvif_mcast_match(vif, eth->h_dest)) SKB


goto drop; xenvif_start_xmit()

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 20 / 26
xen-netfront/xen-netback init

udev: /etc/udev/rules.d/xen-backend.rules
/etc/xen/scripts/vif-setup
DomU e.g., set vif1.0 mtu and `brctl addif` vif1.0 to xenbr0

Dom0
user xm create vm.cfg
xenstore will write device config
kernel to xenstore

eth0

1. Init net_device and vif1.0


netfront_info (queus) Init net_device and vif.
2. Allocate and write Read and bind (map) xenvif
evtchn, gnttab refs to xenstore evtchn and gnttab refs.

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 21 / 26
performance tuning

netfront/netback multiqueue
Limit and pin dom0 CPUs to first NUMA socket
Interrupt affinity to reduce CPU 0 workload
domU vcpu affinity to improve memory access performance
Jumbo frame
NIC offload
TCP Parameter Settings

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 22 / 26
Interesting works related to paravirtual I/O

Achieving 10 Gb/s Using Safe and Transparent Network Interface Virtualization. VEE
2009
Efficient and Scalable Paravirtual I/O System. USENIX ATC 2013
rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers. ASPLOS 2015
vRIO: Paravirtual remote I/O. ASPLOS 2016

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 23 / 26
Networking Emulation with QEMU

DomU emulated
bridge
qemu-dm Dom0
Application
emulated tap
e1000 userspace

e1000
eth0 vif1.0-emu
TCP/IP Stack
(e1000) (tap device)
xenbr0

Xen Hypervisor
event channel

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 24 / 26
qemu arguments
pvm
/usr/lib/xen/bin/qemu-dm -d 4 -serial pty -domain-name testpv -videoram 4 -k en-us -vnc
0.0.0.0:0 -vncunused -M xenpv

pvhvm
/usr/lib/xen/bin/qemu-dm -d 5 -domain-name oel65.xm -videoram 4 -k en-us -vnc 0.0.0.0:0
-vncunused -vcpus 2 -vcpu avail 0x3 -boot dc -serial pty -acpi -net none -M xenfv

hvm
/usr/lib/xen/bin/qemu-dm -d 3 -domain-name oel65.xm -videoram 4 -k en-us -vnc 0.0.0.0:0
-vncunused -vcpus 2 -vcpu avail 0x3 -boot dc -serial pty -acpi
-net nic,vlan=1,macaddr=00:16:e3:cc:64:a9,model=e1000
-net tap,vlan=1,ifname=vif3.0-emu,bridge=xenbr0,script=no,downscript=no
-M xenfv
Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 25 / 26
Take-Home Message

xen paravirtual networking workflow

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message

xen paravirtual networking workflow


xen paravirtual networking framework

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message

xen paravirtual networking workflow


xen paravirtual networking framework
xen paravirtual networking init, protocol, features

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message

xen paravirtual networking workflow


xen paravirtual networking framework
xen paravirtual networking init, protocol, features
xen paravirtual networking performance

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26
Take-Home Message

xen paravirtual networking workflow


xen paravirtual networking framework
xen paravirtual networking init, protocol, features
xen paravirtual networking performance
xen emulated networking

Dongli Zhang (Oracle) Introduction to Oracle VM (Xen) Networking May 30, 2017 26 / 26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy