How Kubernetes Networking Works - Under The Hood
How Kubernetes Networking Works - Under The Hood
(https://neuvector.com)
Products (https://neuvector.com/run- Solutions Learn About
time-container-security/) (https://neuvector.com/learn-
container-security/)
Kubernetes Security Company & Tea
Full Lifecycle Container Security (https://neuvector.com/kubernetes- (https://neuvect
(https://neuvector.com/container- security-solutions/)
security-platform/)
Contact Us
Network Security (https://neuvect
Complete Run-Time Security (https://neuvector.com/container-
(https://neuvector.com/run-time- network-security-solutions/)
Technology Par
container-security/)
(https://neuvect
Red Hat OpenShift
Unique Container Network Security (https://neuvector.com/redhat-
Solution Partne
(https://neuvector.com/container- openshift/)
(https://neuvect
network-security/)
partners/)
Service Mesh Security
Fast Vulnerability Scanning (https://neuvector.com/service-
Events (https://
(https://neuvector.com/vulnerability- mesh-security/)
scanning/)
Webinars
AWS, EKS and ECS
(https://neuvect
(https://neuvector.com/aws-
webinars/)
security-solutions/)
Press (https://n
Azure, Google, IBM, Alibaba Public
Cloud
(https://neuvector.com/public- Join Our Team
cloud-container-solutions/) descriptions/)
NeuVector (https://neuvector.com) Network Security (https://neuvector.com/category/network-security/) How Kubernetes Networking Works – Under the Hood
By Tobias Gurtzick
Kubernetes networking is a complex topic, if not even the most complicated topic. This post will give you insight on how kubernetes
actually creates networks and also how to setup a network for a kubernetes cluster yourself.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 1/14
This article doesn’t cover how to setup a kubernetes cluster
23/04/2019 itself, you could
How Kubernetes use minikube
Networking to quickly
Works - Under spin up a test cluster. All the examples
the Hood
in this post will use a rancher 2.0 cluster, but apply everywhere else too. Even if you are planning to use any of the new public cloud
managed kubernetes services such as EKS, AKS, GKE or IBM cloud, you should understand how kubernetes networking works.
For a basic introduction to kubernetes networking, please see this post How Kubernetes Networking Works – The Basics
(https://neuvector.com/network-security/kubernetes-networking/).
With this k8s from a networking perspective is ready to go. To test everything is working we create 2 pods.
This will create two pods, which are already utilizing our driver. Looking at one of the containers we nd the network with the ip range
10.42.0.0/24 attached.
A quick ping test from the other pod shows us, that the network is working properly.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 2/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
It’s also important to mention that k8s does not use docker0, which is docker’ default bridge, but creates its own bridge named cbr0, which
was chosen to differentiate from the docker0 bridge.
Any solution on L2 or L3 makes a pod addressable on the network. This means a pod is reachable not just within the docker network, but is
directly addressable from outside the docker network. These could be public or private ip addresses.
However, communication on L2 is cumbersome and your experience will vary depending on your network equipment. Some switches need
some time to register your MAC Address, before it actually gets reachable to the rest of the network. You could also run into trouble because
the neighbor (ARP) table of the other hosts in the system still runs on an outdated cache, and you always need to run with dhcp instead of
host-local to avoid con icting ips between hosts. The mac address and neighbor table problems are the reasons solutions such as ipvlan
exist. These do not register new mac addresses but route traf c over the existing one, but these also have their own issues.
The conclusion and my recommendation is that for most users the overlay network is the default solution and should be adequate. However
as soon as workloads get more advanced with more speci c requirements you will want to consider other solutions such as BGP and direct
routing instead of overlay networks.
Apart from that local communication, the communication between pods looks pretty much the same as container to container
communication in docker networks.
It gets a bit longer when we communicate to a different pod. The traf c will pass over to cbr0, which next will notice that we communicate
on the same subnet and therefore directly forwards the traf c to its destination pod, as shown below.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 3/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
This gets a bit more complicated when we leave the node. cbr0 will now pass the traf c to the next node, whose con guration is managed by
the CNI. These are basically just routes of the subnets with the destination host as a gateway. The destination host can then just proceed on
its own cbr0 and forward the traf c to the destination pod, as shown below.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 4/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
You can nd the maintained reference plugins which include most of the important ones in the of cial repo of containernetworking here
https://github.com/containernetworking/plugins (https://github.com/containernetworking/plugins).
A CNI version 3.1 is basically not very complicated. It consists of 3 required functions, ADD, DEL and VERSION, which pretty much should do
what they sound like for managing the network. For a more detailed description of what each function is expected to return and gets passed,
you can read the spec at https://github.com/containernetworking/cni/blob/master/SPEC.md
(https://github.com/containernetworking/cni/blob/master/SPEC.md).
Flannel
Flannel is a simple network and is the easiest setup option for an overlay network. It’s capabilities include native networking but has
limitations when using it on multiple networks. Flannel is for most users, beneath Canal, the default network to choose, it is a simple option
to deploy, and even provides some native networking capabilities such as host gateways. Flannel has some limitations including lack of
support for network security policies and the capability to have multiple networks.
Calico
Calico takes a different approach than annel. It is technically not an overlay network, but rather a system to con gure routing between all
systems involved. To accomplish this, Calico leverages the BorderGatewayProtocol (BGP) which is used for the Internet in a process named
peering, were every peering party exchanges traf c and participates in the bgp network. The BGP protocol itself propagates routes under its
ASN, with the difference that these are private and there isn’t a need to register them with RIPE.
However, for some scenarios also Calico works with an overlay network, in this case IPINIP, which is always used when a node is sitting on a
different network, in order to enable the exchange of traf c between those two hosts.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 5/14
Canal
23/04/2019 How Kubernetes Networking Works - Under the Hood
Canal is is based on Flannel but with some Calico components such as felix, the host agent, which allows you to utilize network security
policies. These are normally missing in Flannel. So it basically extends Flannel with the addition of security policies.
Multus
Multus is a CNI that actually is not a network interface itself. It orchestrates multiple interfaces and without an actual network con gured,
pods couldn’t communicate with multus alone. So multus is an enabler for multi device and multi subnet networks. The graphic below
shows how this works, multus itself basically calls the real CNI instead of the kubelet and communicates back to the kubelet the results.
Kube-Router
Also worth mentioning is kube-router, which like Calico works with BGP and routing instead of an overlay network and like Calico utilizes
IPINIP where needed. It also utilizes ipvs for loadbalancing.
One of that limitations is that portmapping does not work, which is documented and tracked on the following issue on github
(https://github.com/intel/multus-cni/issues/29 (https://github.com/intel/multus-cni/issues/29)). This is going to be xed in the future, but
currently should you be in need of mapping ports, either nodePort con gs or hostPort con gs, you won’t be able to do that due to the bug
referenced.
Setting Up Multus
The rst thing we need to do is to setup multus itself. This is pretty much the con g from the multus repositories examples, but with some
important adjustments. See the link below to the sample.
The rst thing was to adjust the con g map. Because we plan to have a default network with Flannel, we de ne the con guration in the
delegates array of the multus con g. Some important settings here marked in red are “masterplugin”: true and to de ne the bridge for the
annel network itself. You’ll see why we need this in the next steps. Other than that there isn’t much else to adjust except adding the
mounting de nition of the con g map, because for some reason this was not done in their example.
Another important thing about this con g map is that everything de ned in this con g map are the default networks that are automatically
mounted to the containers without further speci cation. Also should you want to edit this le, please note you either need to kill and rerun
the containers of the daemonset or reboot your node to have the changes take effect.
For the primary Flannel network, things are pretty much very easy. We can take the example from the multus repository for this and just
deploy it. The only adjustments that have been made here are the CNI mount, adjustment of tolerations and some adjustments made for the
CNI settings of Flannel. For example adding “forceAddress”:true and removing “hairpinMode”: true.
This was tested on a cluster that was setup with RKE, but should work on other clusters as well as long as you mount the CNIs from your
host correctly, in our case /opt/cni/bin.
The multus team themself did not really change much; they only commented out the initcontainer con g, which you could just safely
delete. This is made since multus will setup its delegates and will act as the primary “CNI.”
With these samples deployed we are pretty much done and our pods should now be assigned an ip address. Let’s test it:
pod/overlay1pod created
As you can see we have successfully deployed a pod and were assigned the ip 10.42.2.43 on the eth0 interface, which is the default interface.
All extra interfaces will appear as netX, i.e. net1.
Next we can actually deploy the secondary network. The setup of this pretty much looks the same as the primary one, but with some key
differences. The most obvious is that we changed the subnet, but we also need to change a few other things.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 7/14
First of all we need to set a different dataDir, i.e. /var/lib/cni/
23/04/2019 annel2, and
How Kubernetes a different
Networking subnetFile,
Works - Under i.e.
the /run/
Hood annel/ annel2.env. This is needed
because they are otherwise occupied and already used by our primary network. Next we need to adjust the bridge because kbr0 is already
used by the primary Flannel overlay network.
The remaining con guration required includes changing it to actually target our etcd server which we con gured before. In the primary
network this was done by connecting to the k8s api directly, which is done via the “–kube-subnet-mgr” ag. But we can’t do that because we
also need to modify the pre x from which we want to read. You can see this below marked in orange and settings for our cluster etcd
connection in red. The last setting is to specify the subnet le again, marked in green in the sample. Last but not least we add a network
de nition. The rest of the sample is identically to our main networks con g.
The last thing we need to decide for is the name for the network, we name ours annel.2.
Now we’re nally ready to spawn our rst pod with our secondary network.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 8/14
This should create your new pod with your secondary network,
23/04/2019 and weNetworking
How Kubernetes should seeWorks
those attached asHood
- Under the additional network interfaces now.
Success, we see the secondary network which was assigned 10.5.22.4 as its ip address.
Troubleshooting
Should this example not work for you, you will need to look in the logs of your kubelet.
One common issue is missing CNIs. In my rst tests I was missing the bridge CNI since this was not deployed by RKE, xing this is as easy
as downloading them from the containernetworking repo.
Con guring an ingress is quite easy. In the following example you see an example of a linked service. In blue you will nd the basic
con guration which in this example points to a service. In green you nd the con guration needed to link your SSL certi cate unless you do
not employ SSL (you will need this certi cate installed before though). And last but not least in brown you will nd an example to adjust
some of the detailed settings of the nginx ingress. The latter ones you can look up over here: https://github.com/kubernetes/ingress-
nginx/blob/master/docs/user-guide/nginx-con guration/annotations.md (https://github.com/kubernetes/ingress-
nginx/blob/master/docs/user-guide/nginx-con guration/annotations.md).
https://neuvector.com/network-security/advanced-kubernetes-networking/ 9/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 12m
name: my-ingress
namespace: default
spec:
rules:
- host: my.domain.com
http:
paths:
- backend:
serviceName: api
servicePort: 5000
path: /api
- backend:
serviceName: websockets
servicePort: 5000
path: /ws
tls:
- hosts:
- my.domain.com
Using {host,node}Ports
A {host,node}Port is basically the equivalent to docker -p port:port, especially the hostPort. The nodePort, unlike the hostPort, is available on
all nodes instead of only on the nodes running the pod. For nodePort kubernetes creates a clusterIP rst and then load balances traf c over
this port. The nodePort itself is just an iptable rule to forward traf c on the port to the clusterIP.
A nodePort is rarely used except in quick testing and only really needed in production if you want every node to expose the port, i.e. for
monitoring. Most of the time you will want to use a layer 4 load balancer instead. And hostPort is mostly only really used for testing
purposes or very rarely to stick a pod to a speci c node and publish under a speci c ip address pointing to this node.
To give you an example, a hostPort is de ned in the container spec, like the following:
spec:
containers:
- args:
- sleep
- infinity
image: ubuntu:xenial
imagePullPolicy: Always
name: test
ports:
- containerPort: 1111
hostPort: 1111
name: 1111tcp11110
protocol: TCP
However a nodePort is de ned as a service instead and not in the container spec. An example for a nodePort would look like this:
https://neuvector.com/network-security/advanced-kubernetes-networking/ 10/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
apiVersion: v1
kind: Service
spec:
externalTrafficPolicy: Cluster
ports:
- name: http
port: 7777
protocol: TCP
targetPort: 80
selector:
app: MyApp
sessionAffinity: None
type: NodePort
What is a ClusterIp?
A clusterIP is an internally reachable IP for the kubernetes cluster and all services within it. This ip itself load balances traf c to all pods that
matches its selector rules. A clusterIP also automatically gets generated in a lot of cases, for example when specifying a type: LoadBalancer
service or setting up nodePort. The reason behind this is that all the load balancing happens through the clusterIP.
The clusterIP itself as a concept was created to solve the problem of multiple addressable hosts and the effective renewal of those. It is much
easier to have single IP that does not change than to refetch data all the time via service discoveries all the time for all natures of services.
Although there are times when it is appropriate to use service discovery instead if you want explicit control, for example in some
microservice environments.
Common Troubleshooting
If you use a public cloud environment and setup the hosts manually your cluster might be missing rewall rules. For example in AWS you
will want to adjust your security groups to allow inter-cluster communication as well as ingress and egress. Otherwise this will lead to an
inoperable cluster. Make sure you always open the required ports between master and worker nodes. The same goes for ports that you open
on the hosts directly, i.e. hostPort or nodePort.
Network Security
Now that we have setup all of our kubernetes networking, we also need to make sure that we have some security in place. A simple rule in
security is to give applications the least access they need. This ensures to a certain degree that even in the case of a security breach
attackers will have a hard time to dig deeper into your network. While it does not completely secures you, it makes it harder and more time
consuming. This is important because it gives you more time to react and prevent further damage and can often prevent the damage itself.
A prominent example is the combination of different exploits/vulnerabilities of different applications, which the attacker is only able to
pursue, if there is actually any attack surface reachable from multiple vectors (e.g. network, container, host).
The options here are either we utilize network policies, or we look to 3rd party security solutions for container network security. With
network policy we have a solid base to ensure that traf c only ows where it should ow, but this only works for a few CNI’s. They work for
example with Calico and Kube-router. Flannel does not support it but luckily as of today you can move to canal, which makes the network
policy feature from Calico usable by Flannel. For most other CNI’s there is no support and also no support planned.
But this is not the only issue. The problem is that a network policy rule is only a very simple rewall rule targeting a certain port. This means
you can’t apply any advanced settings. For example, you can’t block just a single container on demand should you detect something is
suspicious with this container. Further network rules do not understand the traf c, so you don’t have any visibility of the traf c owing and
you’re purely limited to create rules on the Layer 3 and 4 level. And lastly there is no detection of network based threats or attacks such as
DDoS, DNS, SQL injection and other damaging network attacks that can occur even over trusted ip addresses and ports.
This is where specialized container network security solutions like NeuVector (https://neuvector.com/run-time-container-security/) provide
the security needed for critical applications such as nancial or compliance driven ones. The NeuVector container rewall
(https://neuvector.com/docker-security/how-to-deploy-a-docker-container- rewall/) is a solution that I had experience with deploying at
Arvato/Bertelsmann and it provided the layer 7 visibility and protection we needed.
It should be noted that any network security solution must be cloud-native and auto-scaling and adapting. You can’t be checking on iptable
rules or having to update anything when you deploy new applications or scale your pods. Perhaps for a simple application stack on a couple
nodes you could manage this all manually, but for any enterprise deployment security can’t be slowing down the CI/CD pipeline.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 11/14
In addition to the security and visibility, I also found that
23/04/2019 Howhaving connection
Kubernetes and packet
Networking Workslevel container
- Under network tools helped debug
the Hood
applications during test and staging. With a kubernetes network you’re never really certain where all the packets are going and which pods
are being routed to unless you can see the traf c.
Is my application:
Sensitive to latency?
A monolith?
On multiple networks?
Ipvlan
Ipvlan could be an option for you, it has a good performance, but has it caveats, i.e. you can’t use macv{tap,lan} at the same time on the same
host.
Calico
Calico is not the most user friendly CNI, but it provides you with much better performance compared to vxlan and can
be scaled without issues.
Kube-Router
Kube-router will give you better performance like Calico, since they both use BGP and routing, plus support for LVS/IPVS. But it might not be
as battle tested as Calico.
https://neuvector.com/network-security/advanced-kubernetes-networking/ 12/14
Yes, that is understandable, and this is the case for most
23/04/2019 Howusers. In this case
Kubernetes probably
Networking canal
Works or Flannel
- Under with vxlan will be the way to go
the Hood
because it is a no-brainer and just works. However as I said before vxlan is slow and it will cost you signi cantly more resources as you
grow. But this is de nitely the easiest way to start.
With all this information, I hope that you will have some relevant background and a good understanding of
To learn more about the attack surface of Kubernetes workloads and how to protect the CI/CD pipeline, Kubernetes system services, and
containers at run-time, Download this Ultimate Guide to Kubernetes Security
He has been programming for more than a decade and is crazy about Security, Networks, and Linux. When not working on new projects and OSS,
he loves to play music.
By Tobias Gurtzick
Tags: Container Networking (https://neuvector.com/tag/container-networking/), Kubernetes (https://neuvector.com/tag/kubernetes/)
No Comments
Share
(https://www.facebook.com/sharer/sharer.php?u=https://neuvector.com/network-security/advanced-kubernetes-networking/)
(http://twitter.com/intent/tweet?text=How Kubernetes Networking Works – Under the Hood%20https://neuvector.com/network-security/advanced-
kubernetes-networking/) (https://www.linkedin.com/shareArticle?mini=true&url=https://neuvector.com/network-security/advanced-kubernetes-
networking/&title=How Kubernetes Networking Works – Under the Hood) (http://reddit.com/submit?url=https://neuvector.com/network-
security/advanced-kubernetes-networking/&title=How Kubernetes Networking Works – Under the Hood) (http://www.tumblr.com/share/link?
url=https://neuvector.com/network-security/advanced-kubernetes-networking/&name=How Kubernetes Networking Works – Under the Hood)
(https://plus.google.com/share?url=https://neuvector.com/network-security/advanced-kubernetes-networking/) (mailto:?&subject=How
Kubernetes Networking Works – Under the Hood&body=Check%20out%20this%20article%20https://neuvector.com/network-security/advanced-
kubernetes-networking/)
https://neuvector.com/network-security/advanced-kubernetes-networking/ 13/14
23/04/2019 How Kubernetes Networking Works - Under the Hood
Try NeuVector
(https://neuvector.com/try-
neuvector)
FOLLOW US
()
() ()
https://neuvector.com/network-security/advanced-kubernetes-networking/ 14/14