100% found this document useful (5 votes)
2K views129 pages

OpenShift 4 Technical Deep Dive

OpenShift is a Kubernetes-based container platform that provides developers and organizations tools to manage containerized applications throughout the development lifecycle. It offers cluster services, application services, and developer services including monitoring, logging, routing, builds, and more to enable automated operations and provide the best developer experience.

Uploaded by

Muhammad Fadil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
2K views129 pages

OpenShift 4 Technical Deep Dive

OpenShift is a Kubernetes-based container platform that provides developers and organizations tools to manage containerized applications throughout the development lifecycle. It offers cluster services, application services, and developer services including monitoring, logging, routing, builds, and more to enable automated operations and provide the best developer experience.

Uploaded by

Muhammad Fadil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

OPENSHIFT

CONTAINER
PLATFORM
Functional
overview
Self-Service Standards-based

Multi-language Web-scale

Automation Open Source

Collaboration Enterprise Grade

Multi-tenant Secure
Value of OpenShift

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Automated Operations

Kubernetes

Red Hat Enterprise Linux | RHEL CoreOS

Best IT Ops Experience CaaS PaaS FaaS Best Developer Experience


OpenShift Services Kibana | Elasticsearch Kibana | Elasticsearch

Infrastructure Registry Registry


services

Router Router
Kubernetes
services

Prometheus | Grafana Prometheus | Grafana


Alertmanager Alertmanager
etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift and
Kubernetes
core concepts
a container is the smallest compute unit
containers are created from
container images

IMAGE CONTAINER

BINARY RUNTIME
container images are stored in
an image registry

IMAGE REGISTRY
an image repository contains all versions of
an image in the image registry

IMAGE REGISTRY

myregistry/frontend myregistry/mongo

frontend:latest mongo:latest
frontend:2.0 mongo:3.7
frontend:1.1 mongo:3.6
frontend:1.0 mongo:3.4
containers are wrapped in pods which are
units of deployment and management

10.140.4.44 10.15.6.55
ReplicationControllers &
ReplicaSets ensure a specified number of
pods are running at any given time

image name
replicas
labels
cpu
memory
storage

ReplicaSet
ReplicationController
Deployments and
DeploymentConfigurations define how to
roll out new versions of Pods

image name
replicas
labels
version
strategy

Deployment
DeploymentConfig
a daemonset ensures that all
(or some) nodes run a copy of a
pod

image name
replicas
labels
cpu
memory
storage

DaemonSet Node Node Node


foo = bar foo = bar foo = baz
configmaps allow you to decouple
configuration artifacts from image content

Dev Prod

appconfig.conf appconfig.conf

MYCONFIG=true MYCONFIG=false

ConfigMap ConfigMap
secrets provide a mechanism to hold
sensitive information such as passwords
Dev Prod

hash.pw hash.pw

ZGV2Cg== cHJvZAo=

ConfigMap ConfigMap
services provide internal load-balancing and
service discovery across pods

role:
backend

10.140.4.44 10.110.1.11 10.120.2.22 10.130.3.33


apps can talk to each other via services

role:
backend

10.140.4.44 10.110.1.11 10.120.2.22 10.130.3.33


routes make services accessible to clients outside
the environment via real-world urls
app-prod.mycompany.com

> curl http://app-prod.mycompany.com role:


frontend
Persistent Volume and Claims

2Gi 2Gi

PersistentVolumeClaim PersistentVolume

My app is stateful.
Liveness and Readiness

alive?

ready?
PAYMENT DEV CATALOG

PAYMENT PROD INVENTORY

❌ ❌
OpenShift 4
Architecture
COMPUTE NETWORK STORAGE
WORKER WORKER

COMPUTE NETWORK STORAGE


MASTER

COMPUTE NETWORK STORAGE


10.140.4.44
etcd

MASTER

COMPUTE NETWORK STORAGE


Kubernetes
API server

Kubernetes
Scheduler
services

etcd

Cluster Management

MASTER

COMPUTE NETWORK STORAGE


OpenShift
services OpenShift
API server

Kubernetes Operator Lifecycle


services Management

etcd

Web Console

MASTER

COMPUTE NETWORK STORAGE


OpenShift Services

Infrastructure
services
Monitoring | Logging | Tuned | SDN | DNS | Kubelet

Kubernetes
services

etcd

MASTER

COMPUTE NETWORK STORAGE


OpenShift Services

Infrastructure
services

Kubernetes
services

etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift Services

Infrastructure Registry Registry


services

Kubernetes
services

etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift Services

Infrastructure Registry Registry


services

Kubernetes
services

Prometheus | Grafana Prometheus | Grafana


Alertmanager Alertmanager
etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift Services Kibana | Elasticsearch Kibana | Elasticsearch

Infrastructure Registry Registry


services

Kubernetes
services

Prometheus | Grafana Prometheus | Grafana


Alertmanager Alertmanager
etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift Services Kibana | Elasticsearch Kibana | Elasticsearch

Infrastructure Registry Registry


services

Router Router
Kubernetes
services

Prometheus | Grafana Prometheus | Grafana


Alertmanager Alertmanager
etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift Services Kibana | Elasticsearch Kibana | Elasticsearch

Infrastructure Registry Registry


services

Router Router
Kubernetes
services

Prometheus | Grafana Prometheus | Grafana


Alertmanager Alertmanager
etcd

Monitoring | Logging | Tuned Monitoring | Logging | Tuned


SDN | DNS | Kubelet SDN | DNS | Kubelet

MASTER WORKER WORKER

COMPUTE NETWORK STORAGE


OpenShift
lifecycle,
installation &
upgrades
OpenShift 4
Installation
OPENSHIFT CONTAINER PLATFORM HOSTED OPENSHIFT

Simplified opinionated “Best Customer managed resources & Deploy directly from the Azure
Practices” for cluster provisioning infrastructure provisioning console. Jointly managed by Red
Hat and Microsoft Azure engineers.
Fully automated installation and Plug into existing DNS and security
updates including host container boundaries
OS.

Get a powerful cluster, fully


Managed by Red Hat engineers and
support.
User managed

Operator managed Control Plane Worker Nodes

OCP Cluster Resources

OCP Cluster
openshift-install deployed
RH
RHCoreOS
CoreOS RH
RHCoreOS
CoreOS
RHEL CoreOS RHEL CoreOS

Cloud Resources Cloud Resources


User managed

Operator managed Control Plane Worker Nodes

OCP Cluster Resources


openshift-install deployed

OCP Cluster
Note: Control plane nodes
must run RHEL CoreOS!
RH
RHCoreOS
CoreOS RHEL
RHEL CoreOS RHEL 7
CoreOS
Customer deployed
Cloud Resources Cloud Resources
Full Stack Automation Pre-existing Infrastructure

Build Network Installer User

Setup Load Balancers Installer User

Configure DNS Installer User

Hardware/VM Provisioning Installer User

OS Installation Installer User

Generate Ignition Configs Installer Installer

OS Support Installer: RHEL CoreOS User: RHEL CoreOS + RHEL 7

Node Provisioning / Autoscaling Yes Only for providers with OpenShift


Machine API support
OpenShift 4
Lifecycle
Each OpenShift release
is a collection of Operators
● 100% automated, in-place upgrade process

● 30 Operators run every major part of the platform:

○ Console, Monitoring, Authentication,


Machine management, Kubernetes Control
Plane, etcd, DNS, and more.

● Operators constantly strive to meet the desired


state, merging admin config and Red Hat
recommendations

● CI testing is constantly running install, upgrade and


stress tests against groups of Operators
OpenShift Upgrades and Migrations
Happy path = upgrade through each version What is Extended Update Support (EUS) ?
● On a regular cadence, upgrade to the next ● Extended timeframe for critical security and bug fixes
supported version. ● Work within a customer’s release management philosophies
Optional path = migration tooling ● Goal to provide a serial pathway to update from EUS to EUS
● To skip versions or catch up, use the application ○ Augmented by Migration Tool and/or Advanced
migration tooling to move to a new cluster. Cluster Management (ACM) based on use-case

2020 2021 2022

Upgrade
4.5 Migration or Serial Upgrade
4.6 EUS
4.7

N release N-2 release


Full support, RFEs, bugfixes, security OTA pathway to N release, critical bugs and security
4.6 EUS for Layered Products/Add-ons
2020 2021 2022

4.6 EUS

Complete “hands off” EUS


Remain on single supported OpenShift Logging Process Automation
version for the entire EUS period OpenShift Container Storage OpenShift CNF
Advanced Cluster Manager Jaeger CT
ODU
PR
ED
Y ER DE
LA GRA
UP
Mid-cycle refresh during EUS
The EUS cycles for these products Cluster Migration Tool Quarkus Vert.x
refresh during the OpenShift EUS Red Hat SSO Thorntail JWS (Tomcat)
JBoss EAP Spring Boot DataGrid
E E E
AD AD AD DE DE
GR GR GR RA RA
UP UP UP PG PG
RE
D
RE
D
RE
D DU DU
YE YE YE RE RE
LA LA LA YE YE
LA LA

Normal updates during EUS


Follows the normal support window OpenShift Virtualization OpenShift Service Mesh
for the add-on, shorter than EUS OpenShift Serverless CodeReady Containers
OpenShift Pipelines Red Hat Quay / CSO
Operations
and
infrastructure
deep dive
Red Hat
Enterprise Linux
CoreOS
General Purpose OS Immutable container host

BENEFITS • 10+ year enterprise life cycle • Self-managing, over-the-air updates


• Industry standard security • Immutable and tightly integrated with
• High performance on any infrastructure OpenShift
• Customizable and compatible with wide • Host isolation is enforced via Containers
ecosystem of partner solutions • Optimized performance on popular
infrastructure

WHEN TO USE When customization and integration with When cloud-native, hands-free operations
additional solutions is required are a top priority
Red Hat Enterprise Linux CoreOS is versioned with RHEL CoreOS admins are responsible for:
OpenShift

Red Hat Enterprise Linux CoreOS is managed by the cluster





Runs any
Minimal and Secure Optimized for
OCI-compliant image
Architecture Kubernetes
(including docker)
CRI-O tracks and versions identical to Kubernetes, simplifying support permutations



etcd kube coredns openshift
controller-manager controller-manager

kubelet CRI-O
kube-scheduler kube-apiserver openshift-apiserver openshift-oauth
systemd-managed
native binaries kubelet static containers scheduled containers
OpenShift 4
installation
How to boot a self-managed cluster:




Bootstrapping process step by step:


Masters (Special)
● Terraform provisions initial masters*
● Machine API adopts existing masters post-provision
● Each master is a standalone Machine object
● Termination protection (avoid self-destruction)
Workers
● Each Machine Pool corresponds to MachineSet
● Optionally autoscale (min,max) and health check (replace if not ready > X minutes)
Multi-AZ
● MachineSets scoped to single AZ
● Installer stripes N machine sets across AZs by default
● Post-install best effort balance via cluster autoscaler
OpenShift 4
Cluster
Management
OpenShift Cluster Management
Cloud API

MachineDeployment MachineSet Machine

Machine Machine Set Machine


Deployment Controller Controller
Controller

Future
Cloud

Node

Bootstrap
Instance
NodeLink
Controller
OpenShift Cluster Management | Machine Configuration

Machine Config Operator


A Kube-native way to configure hosts
# test.yaml
apiVersion: machineconfiguration.openshift.io/v1
OS configuration is stored and applied across the cluster kind: MachineConfig
via the Machine Config Operator.
metadata:
● Subset of ignition modules applicable post labels:
provisioning machineconfiguration.openshift.io/role: worker
○ SSH keys name: test-file
○ Files
spec:
○ systemd units
○ kernel arguments config:

● Standard k8s YAML/JSON manifests storage:

● Desired state of nodes is checked/fixed regularly files:


● Can be paused to suspend operations - contents:
source: data:,hello%20world%0A
verification: {}
filesystem: root
mode: 420
path: /etc/test
OpenShift Cluster Management | Machine Configuration

Operator/Operand Relationships

Machine
Config
Controller

Machine Machine Config


Config Daemon
Operator Machine Config
NodeDaemon
Machine Config
NodeDaemon
Machine
Config Node
Server
OpenShift Cluster Management | Machine Configuration

Machine Config and Machine Config Pool


Inheritance-based mapping of configuration to nodes

50-kargs
role:worker

5-chrony
role:worker

Rendered config:
rendered-worker-<hash>

50-motd
role:worker
OpenShift Cluster Management | Machine Configuration

Custom Machine Config Pools


Hierarchical/layered configuration rendering

files:

50-args 60-args 5-chrony:


/etc/args /etc/args /etc/ntp.conf
role:worker role:highperf
5-other:
/etc/other.conf

50-args:
/etc/args
5-chrony 5-other
/etc/ntp.conf /etc/other.conf 50-motd:
role:worker role:highperf /etc/motd

51-motd:
/etc/motd

60-args:
50-motd 51-motd /etc/args
/etc/motd /etc/motd
role:worker role:worker
Pool: Pool:
role:worker role:highperf
OpenShift Cluster Management | Machine Configuration

Machine Config Server


Providing Ignition configuration for provisioning

rendered-worker-<hash> “worker.ign” RHCOS Image


{.spec.config}

VM / Server
Ignition

Machine
Config
Server
Instance Metadata:
https://api-int.xxx.local:22623/config/worker
OpenShift Cluster Management | Machine Configuration

Machine Config Server


Identical nodes at massive scale

rendered-worker-<hash>
{.spec.config} Existing Workers

Machine
Config New Workers
Server
OpenShift Cluster Management | Machine Configuration

Machine Config Daemon


Preventing drift

50-registries /etc/containers/registries.conf
role:worker

5-chrony /etc/chrony.conf
role:worker Machine
Rendered config: Config
Daemon
rendered-worker-<hash>

50-motd /etc/motd
role:worker
OpenShift Cluster Management | Machine Configuration

Machine Config Daemon


Acting on drift

1. Validates node state matches


The MCO coordinates with the MCD to perform the desired state
following actions, in a rolling manner, when OS updates OS_VERSION
and/or configuration changes are applied: = <hash>

● Cordon / uncordons nodes


● Drain pods 2. Validate cluster state & policy to
● Stage node changes
apply change
○ OS upgrade
○ config changes MaxUnavailable
○ systemd units =1
● Reboot

3. Change is rolled across cluster


OpenShift Cluster Management | Machine Configuration

Transactional updates with rpm-ostree


Transactional updates ensure that RHEL
CoreOS is never altered during runtime. Rather it
is booted directly into an always “known good”
version.

● Each OS update is versioned and tested as


a complete image.
● OS binaries (/usr) are read-only
● OS updates encapsulated in container
images
● file system and package layering available
for hotfixes and debugging
OpenShift Cluster Management
Over-the-air updates: Cluster Components

Release Payload Info

some-component
Upgrade
... Cluster Process
Some
Version
Operator Operands
... Operator
...
OpenShift Cluster Management
Over-the-air updates: Nodes

Release Payload Info

machine-config-operator

machine-os-content Cluster Machine Rolling


Config Machine
Version
Operator Config
... Operator
Daemons
...

Machine Machine
Config Config
Daemon Daemon

Download and Update host


mount update using mounted
content into host content
OpenShift
Security
CONTROL Container Content CI/CD Pipeline
Application
Container Registry Deployment Policies
Security

Container Platform Container Host Multi-tenancy


DEFEND
Network Isolation Storage
Infrastructure
Audit & Logging API Management

EXTEND Security Ecosystem


Security Pod
Context Security
Constraint Preset
(PSP)
(SCC)
● OpenShift provides its own internal CA
✓ MASTER
● Certificates are used to provide secure
connections to ✓ ETCD

○ master (APIs) and nodes


○ Ingress controller and registry ✓ NODES

○ etcd INGRESS
✓CONTROLLER
● Certificate rotation is automated
✓ CONSOLE
● Optionally configure external endpoints to use
custom certificates ✓ REGISTRY
service.beta.openshift.io/
inject-cabundle="true"

service-ca.crt

service.alpha.openshift.io/
serving-cert-my

serving-cert-my

tls.crt
tls.key
LDAP Google

Keystone OpenID

GitHub Request Header

GitLab Basic

userXX
● Project scope & cluster scope
available

● Matches request attributes


(verb,object,etc)

● If no roles match, request is


denied ( deny by default )

● Operator- and user-level


roles are defined by default

● Custom roles are supported


OpenShift
Monitoring
Metrics collection and storage Alerting/notification Metrics visualization
Node (kubelet) Node (kubelet)

Infra/Worker (“hardware”) Worker (“hardware”)


OpenShift
Logging
○ Elasticsearch:
○ Fluentd:
○ Kibana:



Node

Node
Application Logs
Node
stdout
stderr

OS DISK journald

kubelet

Node (OS)
Persistent
Storage
NFS OpenStack Cinder iSCSI Azure Disk AWS EBS FlexVolume

GCE Persistent VMWare


GlusterFS Ceph RBD Fiber Channel Azure File
Disk vSphere VMDK

Container Storage
NetApp Trident*
Interface (CSI)**
Storage

apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers: /foo/bar
- name: myfrontend
image: nginx
volumeMounts: Kubelet
- mountPath: "/var/www/html"
name: mypd
volumes:
Node
- name: mypd
persistentVolumeClaim:
claimName: z
2Gi NFS

PersistentVolumes

...
VolumeMount: Z

2Gi RWX
Fast 2Gi NFS
NetApp Flash

NetApp SSD
Block
VMware VMDK

Good
NetApp SSD

StorageClass

...
VolumeMount: Z
2Gi RWX
Good
Special
Resources and
Devices
NFD finds certain resources

Node Feature
Discovery Operator
(NFD)

NFD Worker
Daemonset

kubelet CRI-O

Worker Node (CoreOS)

GPU GPU GPU


NFD labels nodes

kubernetes API
(Master)

feature.node.kubernetes.io
/pci-10de.present=true
NFD Worker
Daemonset

kubelet CRI-O

Worker Node (CoreOS)

GPU GPU GPU


Specialty Resource Operator deploys to relevant nodes

Special Resource
Operator
(SRO)

GPU Feature
GPU Driver CRI-O Plugin Device Plugin Node Exporter
Discovery
Daemonset Daemonset Daemonset Daemonset
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
Worker Node (CoreOS)

GPU GPU GPU


GPU Feature Discovery reports additional capabilities

kubernetes API nvidia.com/gpu.family=tesla


(Master) nvidia.com/gpu.memory=16130
...

GPU Feature
Discovery
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
Worker Node (CoreOS)

GPU GPU GPU


GPU Driver installs kmod and userspace drivers

kmod-nvidia
nvidia-driver-userspace

GPU Driver
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
nvidia.com/gpu.family=tesla Worker Node (CoreOS)
nvidia.com/gpu.memory=16130
... GPU GPU GPU
CRI-O Plugin installs prestart hook

CRI-O (runc) prestart hook

CRI-O Plugin
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
nvidia.com/gpu.family=tesla Worker Node (CoreOS)
nvidia.com/gpu.memory=16130
... GPU GPU GPU
Device Plugin informs kubelet of resource details

nvidia.com/gpu=3 GPU healthy?

Device Plugin
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
nvidia.com/gpu.family=tesla Worker Node (CoreOS)
nvidia.com/gpu.memory=16130
... GPU GPU GPU
Node Exporter provides metrics on GPU

Prometheus
(cluster monitoring)
/metrics

Node Exporter
Daemonset

kubelet CRI-O
feature.node.kubernetes.io
/pci-10de.present=true
nvidia.com/gpu.family=tesla Worker Node (CoreOS)
nvidia.com/gpu.memory=16130
... GPU GPU GPU
GPU workload deployment

mypod

...
resources:
requests:
nvidia.com/gpu: 1
...

mypod

kubelet CRI-O

Worker Node (CoreOS)

GPU GPU GPU


CONFIDENTIAL

automated
kernel module
matching
NFD detects kernel version and labels node

kubernetes API
(Master)

kernel=4.18.0-80
NFD Worker
Daemonset

kubelet CRI-O
kernel=4.18.0-80
Worker Node (CoreOS : kernel 4.18.0-80)

GPU GPU GPU


SRO builds driver container image against kernel

Special Resource Container build


Operator (driver-container- Image registry
(SRO) 4.18.0-80)

kubelet CRI-O
kernel=4.18.0-80
Worker Node (CoreOS : kernel 4.18.0-80)

GPU GPU GPU


SRO targets specific kernel version hosts

Special Resource
Operator
(SRO)

driver-container-4.18.0-80

GPU Driver
Daemonset

kubelet CRI-O
kernel=4.18.0-80
Worker Node (CoreOS : kernel 4.18.0-80)

GPU GPU GPU


NFD detects updated kernel and relabels node

kubernetes API
(Master)

kernel=4.18.0-147*
NFD Worker
Daemonset

kubelet CRI-O
kernel=4.18.0-147*
Worker Node (CoreOS : kernel 4.18.0-147*)

GPU GPU GPU


SRO detects mismatch and rebuilds driver container

Special Resource S2I build


Operator (driver-container- Image registry
(SRO) 4.18.0-147)

driver-container-4.18.0-80

GPU Driver
Daemonset

kubelet CRI-O
kernel=4.18.0-147*
Worker Node (CoreOS : kernel 4.18.0-147*)

GPU GPU GPU


SRO updates daemonset with new image

Special Resource
Operator
(SRO)

driver-container-4.18.0-147

GPU Driver
Daemonset

kubelet CRI-O
kernel=4.18.0-147
Worker Node (CoreOS : kernel 4.18.0-147)

GPU GPU GPU


OpenShift Technical slides for

Virtualization
OpenShift Virtualization

can be found here:

● PnT Portal

● Google Slides

112
V0000000
Load Balancing and DNS
with OpenShift
For physical, OSP, RHV, and vSphere IPI deployments

113
V0000000
On-prem OpenShift IPI DNS and Load Balancer

● OpenShift 4.2, with OpenStack IPI,


introduced a new way of doing DNS and
load balancing for the api, api-int,
DNS, and *.apps (Ingress) endpoints
○ OCP 4.4 added RHV IPI
○ OCP 4.5 added vSphere IPI
○ OCP 4.6 added physical IPI
● This method was originally used by the mDNS
Kubernetes-native Infrastructure
concept when creating bare metal
clusters

V0000000
Excruciating detail: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md
mDNS with CoreDNS
● CoreDNS is used by Kubernetes (and
OpenShift) for internal service discovery
○ Not used for node discovery I’m etcd-0! I’m etcd-1! I’m etcd-2!
● Multicast DNS (mDNS) works by sending DNS
packets, using UDP, to a specific multicast
address
○ mDNS hosts listen on this address and
respond to queries
● mDNS in OpenShift
○ Nodes publish IP address/hostname for
themselves to local mDNS responder
○ mDNS responder on each node replies
with local value
● DNS SRV records are not used for etcd in OCP What are the
etcd servers?
4.4 and later

V0000000
keepalived
● Used to ensure that the API and Ingress
(*.apps) Virtual IPs (VIP) are always available
● Utilizes Virtual Router Redundancy Protocol
VIP
(VRRP) to determine node health and elect an
Active node
IP owner
uses RARP to
○ Only one host owns the IP at any time claim traffic

○ All nodes have equal priority


○ Failover can take several seconds Health
status
● Node health is checked every one second checks
○ Separate checks for each service (API,
Ingress/*.apps)
Active node
● ARP is used to associate the VIP with the passes traffic to
owner node’s interface endpoint

V0000000
API load balancer
1) Client creates a new request to
api.cluster-name.domain.name
2) HAProxy on the node actively hosting 3 1
the API IP address (as determined by
keepalived) load balances across
control plane nodes using round robin
3) The connection is forwarded to the
chosen control plane node, which 2
responds directly to the client, a.k.a.
“direct return”

Control plane nodes

V0000000
Ingress load balancer
● The VIP, managed by Keepalived, will
only be hosted on nodes which have a
Router instance
○ Nodes without a Router continue to
participate in the VRRP domain, but
fail the check script, so are ineligible
for hosting the VIP
● Traffic destined for the *.apps Ingress
VIP will be passed directly to the Router
instance

Worker / Infra nodes

V0000000
Requirements and limitations
1) Multicast is required for the Keepalived (VRRP) and mDNS configuration used
2) VRRP needs layer 2 adjacency to function
a) All control plane nodes must be on the same subnet
b) All worker nodes capable of hosting a router instance must be on the same subnet
c) The VIPs must be on the same subnet as the hosts
3) Ingress (*.apps) throughput is limited to a single node
4) Keepalived failover will result in disconnected sessions, e.g. oc logs -f <pod> will terminate
a) Failover may take several seconds
5) There cannot be more than 119 on-prem IPI cluster instances on the same L2 domain
a) Each cluster uses two VRRP IDs (API, ingress)
b) The function used to generate router IDs returns values of 1-239
c) There is no collision detection between clusters for the VRRP ID
d) The chance of collision goes up as additional clusters are deployed

V0000000
Alternatives
“I don’t like this,” “I can’t use this,” and/or “this does not meet my needs”. What other options are there?
● Ingress
○ 3rd party partners, such as F5 and Citrix, have certified Operators that are capable of replacing
the Ingress solution as a day 2 operation
● API
○ There is no supported way of replacing the API Keepalived + HAProxy configuration
● DNS
○ There is no supported way of replacing mDNS in this configuration
● DHCP
○ DHCP is required for all IPI deployments, there is no supported way of using static IPs with IPI

Remember that IPI is opinionated. If the customer’s needs cannot be met by the IPI config, and it’s not
an option to reconfigure within the scope of supported options, then UPI is the solution. Machine API
integration can be deployed as a day 2 operation for node scaling.
V0000000
Developer
Experience
Deep Dive
Application
Probes
Application Probes

Three Types: One Goal

httpGet

exec CONTAINER

POD

tcpSocket
Application Probes

Liveness Probes

alive?
Application Probes

Readiness Probes

SERVICE

ready?

POD POD POD


Application Probes

Important settings
initialDelaySeconds: How long to wait after the pod is launched to begin checking

timeoutSeconds: How long to wait for a successful connection (httpGet, tcpSocket only)

periodSeconds: How frequently to recheck

failureThreshold: How many consecutive failed checks before the probe is considered failed
Build and Deploy
Container Images
DEPLOY YOUR DEPLOY YOUR DEPLOY YOUR
SOURCE CODE APP BINARY CONTAINER IMAGE
CODE APPLICATION

BUILD IMAGE

DEPLOY

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy