Telco Cloud Platform Ran Reference Architecture Guide 22
Telco Cloud Platform Ran Reference Architecture Guide 22
You can find the most up-to-date technical documentation on the VMware website at:
https://docs.vmware.com/
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
©
Copyright 2023 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware, Inc. 2
Contents
VMware, Inc. 3
About the Telco Cloud Platform
RAN Reference Architecture
Guide
1
This reference architecture guide provides guidance for designing and deploying a RAN solution
based on VMware Telco Cloud Platform™ — RAN.
Intended Audience
This guide is intended for telecommunications and solution architects, sales engineers, field
consultants, advanced services specialists, and customers who are responsible for designing the
Virtualized Network Functions (VNFs), Cloud Native Network Functions (CNFs), and the RAN
environment in which they run.
BC Boundary Clock
CU Centralized Unit
DPDK Data Plane Development Kit, an Intel-led packet processing acceleration technology
DU Distributed Unit
GM Grandmaster
VMware, Inc. 4
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The following table lists the Cloud Native acronyms used frequently in this guide:
VMware, Inc. 5
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
®
CSI Container Storage Interface. VMware vSphere CSI
exposes vSphere storage to containerized workloads on
container orchestrators, such as Kubernetes. It enables
vSAN and other types of vSphere storage.
K8s Kubernetes
VMware, Inc. 6
Overview of the Telco
Cloud Platform RAN Reference
Architecture
2
This section provides the architecture overview including the high-level physical and virtual
infrastructure, networking, and storage elements in the Telco Cloud Platform RAN solution.
n Physical Infrastructure
VMware Telco Cloud Platform is a cloud-native platform that empowers CSPs to manage VNFs
and CNFs across the core, far edge (RAN), enterprise edge, and cloud with efficiency, scalability,
and agility.
Telco Cloud Platform provides the framework to deploy and manage VNFs and CNFs quickly and
efficiently across distributed 5G networks. You can run VNFs and CNFs from dozens of vendors,
on any cloud, with holistic visibility, orchestration, and operational consistency.
For more information about the infrastructure layers of the platform design, see the VMware Telco
Cloud Platform 5G Edition documentation.
VMware, Inc. 7
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
into a 5G multi-services hub “mini cloud”, enabling Communication Services Providers (CSPs) to
monetize their RAN investments.
Telco Cloud Platform RAN is designed to meet the performance and latency requirements
inherent to RAN workloads:
n Enables CSPs to run virtualized Baseband functions that include virtualized Distributed Units
(vDUs) and virtualized Central Units (vCUs).
n Simplifies CSPs’ operations consistently across distributed vRAN sites with centralized cloud-
first automation, while reducing the Operating Expense (OpEx).
n Enables CSPs to accelerate innovation speed, deploy 5G services fast, and scale the services
as customers’ demands increase.
To handle the massive amount of data traffic, 5G is designed to separate the user plane from
the control plane and to distribute the user plane as close to the device as possible. As the user
traffic increases, an operator can add more user plane services without changing the control plane
capacity. This distributed architecture can be realized by constructing the data center and network
infrastructure based on hierarchical layers.
VMware, Inc. 8
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Subtending from national data centers are regional data centers. Regional data centers host the
5G core user plane function, voice services functions, and non-call processing infrastructure such
as IPAM, DNS, and NTP servers. Inbound and outbound roaming traffic can also be routed from
the regional data center.
To support new applications and devices that require ultra-low latency and high throughput
networks, CSPs have an opportunity to push the 5G user-plane closer to the application edge.
At the same time, RAN disaggregation enables efficient hardware utilization and pooling gain,
and increases deployment flexibility while reducing the Capital Expenditure (CAPEX) / Operational
Expenditure (OPEX) of Radio Access.
Caution Aggregation data centers such as national, regional, and near edge data centers can
be architected by following Telco Cloud Platform 5G core recommendations, while the cell site
architecture and implementation align with VMware Telco Cloud Platform RAN.
VMware, Inc. 9
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
VMware Telco Cloud Platform is a common platform from Core to RAN. This common platform
self-tunes automatically depending on the workload deployed through VMware Telco Cloud
Automation™. To deploy all VNF/CNFs from 5G Core to RAN, the same automation platform,
®
operational tools, and CaaS layer based on VMware Tanzu Basic for RAN are used.
Figure 2-2. Relationship between Telco Cloud Platform 5G and Telco Cloud Platform RAN
VMware
Telco Cloud
Automation
VC VC VC VC
VMware vCenter
Server (VC)
VC VC VC VC
VMware VMware
VMware VMware VMware
RT ESXi RT ESXi
ESXi ESXi ESXi
Single Node Single Node
vSphere Cluster
Telco Cloud Platform RAN is a new compute workload domain that spans from Central or Regional
Data Centers to Cell sites. The management components of this compute workload domain
reside in the management cluster inside the RDC management domain. Within this compute
workload domain, VMware ESXi™ hosts are managed as single node hosts and distributed across
thousands of cell sites. This distributed architecture also applies to Kubernetes. Kubernetes cluster
management is centralized, and workload VMs are distributed to respective cell sites.
VMware, Inc. 10
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Core and Edge connectivity: Core and Edge connectivity can have a significant impact on
the 5G core deployment and it provides application-specific SLA. The type of radio spectrum,
connectivity, and available bandwidth can have a great influence on the placement of CNFs.
n WAN connectivity: In the centralized deployment model, the WAN connectivity must be
reliable between the sites. Any unexpected WAN outage prevents 5G user sessions from
being established as all 5G control travels from the edge to the core.
n Components deployment in Cell Site: Due to the physical constraints of remote Cell
Site locations, place only the required function at the Cell Site and deploy the remaining
components centrally. For example, the platform monitoring and logging are often deployed
centrally to provide universal visibility and control without replicating the core data center at
the remote edge Cell Site locations. Non-latency-sensitive user metrics are often forwarded
centrally for processing.
n Available WAN bandwidth: The available WAN bandwidth between Cell Site and Central Core
sites must be sized to meet the worst-case bandwidth demand. Also, when multiple classes of
an application share a WAN, proper network QoS is critical.
n Fully distributed 5G core stack: A fully distributed 5G core stack is ideal for private 5G use
cases, where the edge data center must be self-contained. It survives extended outages that
impact connectivity to the core data center. The Enterprise edge can be the aggregation
point for the 5G Core control plane, UPF, distributed radio sites, and selective mobile edge
applications. A fully distributed 5G core reduces the dependency on WAN, but it increases the
compute and storage requirements.
n Network Routing in Cell Site: Each Cell Site can locally route the user plane traffic and all the
Internet traffic through the local Internet gateways, while the management and non-real-time
sensitive applications leverage the core for device communication.
VMware, Inc. 11
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
In 3GPP R15, the division of the upper and lower sections of the RAN was standardized. The
higher-layer split is specified with a well-defined interface (F1) between the Centralized Unit (gNB-
CU) and the Distributed Unit (gNB-DU). The CU and its functions, which are similar to the radio,
have less stringent processing specifications and are more virtualization-friendly than the DU and
its functions. The enhanced Common Public Radio Interface (eCPRI) links the DU to the radio.
n A single uniform hardware platform is used across the core network, RAN, and edge. This
simplifies network management while lowering operational and maintenance costs.
n The network functions and computing hardware are isolated in a completely virtualized RAN.
The network functions of the RAN can be performed on the same hardware, giving the service
provider more versatility. The functionality and capacity of a vRAN can be easily implemented
where and when it is required, giving it more flexibility.
The following figure shows vRAN (also called Centralized RAN) and the terminologies that are
used to define various legs of the transport network:
Midhaul Fronthaul
Connectivity between Connectivity between
the CU and DU the DU and RRU
F1 eCPRI
5G Core Network
gNB gNB-DU gNB-RRU
Backhaul CU
Connectivity between
the 5GC and CU
n Centralized Unit (CU) provides non-real-time processing and access control. It manages
higher layer protocols including Radio Resource Control (RRC) from the Control Plane, and
Service Data Adaptation Protocol (SDAP) and Packed Data Convergence Protocol (PDCP)
from the User Plane. The CU is connected between the 5G core network and the DUs. One CU
can be connected to multiple DUs.
n Distributed Unit (DU) provides real-time processing and coordinates lower layer protocols
including Physical Layer, Radio Link Control (RLC ), and Media Access Control (MAC).
n Remote Radio Unit (RRU) does the physical layer transmission and reception, supporting
technologies such as Multiple Input Multiple Output (MIMO).
VMware, Inc. 12
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
A non-centralized approach utilizes the CU and DU functions co-located, with RRU physically
separated.
5G Core Network
Co-located gNB-CU gNB-RRU
and gNB-DU
Centralized Processing:
In the centralized approach, all functional elements of the gNB are physically separated. A single
CU is responsible for several DUs. The design requirements of the Next-Generation RAN (NG-
RAN) require specific transport network specifications to meet the required distances. This design
model requires fronthaul, midhaul, and backhaul connectivity.
5G Core Network
gNB gNB-DU gNB-RRU
Backhaul CU
If the CU is virtualized,it can
be part of the 5GC NFV
deployment. Hence, backhaul
is potentially very short.
In this vRAN design approach, the DU and RRU are co-located such that they are directly
connected without a fronthaul transport network. This connection is fiber-based and may span
hundreds of meters, supporting scenarios where the DU and RRU are within the same building.
This design approach requires midhaul and backhaul connectivity as shown in the following figure.
VMware, Inc. 13
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
5G Core Network
gNB
CU
gNB-DU gNB-DU
gNB-RRU gNB-RRU
gNB-DU-gNB-RRU Separation
gNB-DU gNB-DU Up to hundreds of meters,
gNB-RRU gNB-RRU via fiber (no fronthaul network)
n Cost Reduction: Centralized processing capability reduces the cost of the DU function.
n Energy Efficiency, Power and Cost Reduction: Reducing the hardware in the cell site, reduces
the power consumption and air conditioning of that site. The cost saving can be significant
when you deploy tens or hundreds of cell sites.
n Flexibility: Flexible hardware deployment leads to a highly scalable and cost-effective RAN
solution. Also, the functional split of the protocol stack has an effect on the transport network.
Physical Infrastructure
The Physical Tier represents compute hardware, storage, and physical networking resources for
the Telco Cloud Platform RAN solution.
For the list of supported platforms, see the VMware Compatibility Guide
VMware, Inc. 14
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Physical Storage
VMware vSAN is used in the compute workload domains in Regional Data Center to provide
high-available shared storage to the clusters. However, the ESXi hosts used for Cell Sites must
have a local disk to provide the storage service to the RAN applications.
The ESXi host deployed at the Cell Site uses the local disk storage and all the disks recommended
in the VMware Compatibility Guide.
The following table lists the sites and storage required for sites:
n For Regional Data Center (RDC), the ESXi hosts must contain four or more physical NICs of the
same speed. Use two physical NICs for ESXi host traffic types and two physical NICs for user
workloads.
n For Cell Site, the ESXi hosts must contain a minimum of two physical NICs of the same speed.
Use these physical NICs for ESXi management and user workloads (fronthaul / midhaul) and
an SR-IOV VF for PTP time synchronization.
Note You can use LAN on Motherboard (LOM) as one of the physical NICs.
n For each ESXi host in combination with a pair of ToR switches, use a minimum of two 10 GbE
connections or 25 GbE or larger connections. 802.1Q network trunks support the required
VLANs.
Note In case of a dual-socket host with NUMA placement requirements, use four physical NICs.
Each socket can be mapped with two physical NICs for user workload and one SR-IOV VF for PTP
time synchronization.
VMware, Inc. 15
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Each Cell Site compute workload domain can support up to 128 Cell Site Groups due to the 128
vSphere Distributed Switch limit per vCenter Server. Telco Cloud Automation dedicates a VDS for
each Cell Site Group.
Also, when designing the Cell Sites for the Telco Cloud Platform RAN solution, review the
maximum supported configurations of vCenter Server. Multiple vCenters can be added to a single
Telco Cloud Platform RAN solution to scale the size of the Telco Cloud Platform RAN deployment.
The following table lists some of the supported configuration maximums. For the latest updates to
maximums, see VMware Configuration Maximums.
Type Maximum
Communications service providers (CSPs) are transitioning from physical to cloud networks to
gain operational agility, network resiliency, and low operating costs. This shift marks a radical
departure from the traditional single‑purpose hardware appliance model, especially as CSPs must
now design and operate services across a web of data centers—bridging physical and virtual
ecosystems—while enabling interoperability across competing vendors.
Due to the complexity of coordinating network functions and managing multiple services, CSPs
require an automated approach that removes complexity and error‑prone manual processes. To
address these challenges and improve operational efficiency, CSPs are turning to VMware Telco
Cloud Automation.
VMware, Inc. 16
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Partner
VMware Telco Cloud Automation
Infrastructure
Admin and VMware
Orchestration VMware Cloud Foundation/ Tanzu |
VMware Cloud
vSphere
VMware Cloud Director Integrated
Kubernetes
OpenStack
CLOUD
Public Clouds Private Clouds Core | Edge (VM and Container-Based)
Telco Cloud Automation accelerates time-to-market for network functions and services while
igniting operational agility through unified automation—across any network and any cloud. It
applies an automated, cloud‑first approach that streamlines the CSP’s orchestration journey with
native integration to VMware Telco Cloud Infrastructure.
Telco Cloud Automation enables multi‑cloud placement, easing workload instantiation and
mobility from the network core to edge and from private to public clouds. It also offers
standards‑driven modular components to integrate any multi‑vendor MANO architecture. VMware
further enhances interoperability by expanding partner network function certification using the
VMware Ready for Telco Cloud program. With simplified and certified interoperability, CSPs can
now leverage best‑of‑breed solutions and reduce their risks.
n Enhances the service experience through workload mobility, dynamic scalability, closed‑loop
healing, and improved resilience.
n Optimizes cloud resource utilization through VMware xNF Manager (G-xNFM), NFVO, and
VIM/ CaaS/NFVI integrations.
VMware, Inc. 17
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Evolves to cloud native with Kubernetes upstream compliance, cloud-native patterns, and
CaaS automation.
n Avoids costly integration fees, maximizes current VMware investments, innovates faster,
reduces project complexity, and enables faster deployment with pre‑built VMware
integrations.
n Reduces the time to provision new network sites or to expand capacity into existing
ones. Leverages best‑of‑breed network functions and benefits from a healthy and thriving
multi‑vendor ecosystem.
n Improves service quality of AI-driven workflows that are integrated with VMware Telco Cloud
Service Assurance.
The following diagram shows different hosts and components of the Tanzu Basic for RAN
architecture:
Tanzu Kubernetes
Management Cluster
Desired State
Configuration
VMware, Inc. 18
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Etcd: Etcd is a simple, distributed key-value store that stores the Kubernetes cluster
configuration, data, API objects, and service discovery details. For security reasons, etcd must
be accessible only from the Kubernetes API server.
n Kube-API-Server: The Kubernetes API server is the central management entity that receives
all API requests for managing Kubernetes objects and resources. The API server serves as the
frontend to the cluster and is the only cluster component that communicates with the etcd
key-value store.
For added redundancy and availability, place a load balancer for the control plane nodes. The
load balancer performs health checks of the API server to ensure that external clients such as
kubectl connect to a healthy API server even during the cluster degradation.
VMware, Inc. 19
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
vRealize Network Insight allows for application discovery, application visibility, and enhanced
troubleshooting capabilities by collecting and analyzing inventory, metadata, and flow telemetry
of the infrastructure traffic using sFlow/IPFIX. vRealize Network Insight provides detailed traffic
distribution patterns and real-time views of network traffic and patterns.
n Connects to other VMware products such as VMware vCenter Server and ESXi hosts to collect
events, tasks, and alarm data.
n Integrates with vRealize Operations to send notification events and enable launch in context.
n Functions as a collection and analysis point for any system that sends syslog data.
To collect additional logs, you can install an ingestion agent on Linux or Windows servers or use
the preinstalled agent on specific VMware products. Preinstalled agents are useful for custom
application logs and operating systems such as Windows that do not natively support the syslog
protocol.
As the Kubernetes and Container adoptions are increasing in Telco 5G, vRealize Log Insight
can also be the centralized log management platform for Tanzu Kubernetes clusters. Cloud
Administrators can easily configure container logs to forward to vRLI using industry-standard
Open-Source log agents such as FluentD and Fluentbit. Any logs written to standard output
(stdout) by the container pod are sent to vRLI by the log agent, with no changes to the CNF itself.
n Single pane of glass providing the CSP Operations teams with rapid insights
VMware, Inc. 20
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n RAN assurance to consume fault and metric data from the RAN environment
vRealize Operations
vRealize Operations tracks and analyzes the operations of multiple data sources by using
specialized analytic algorithms. These algorithms help vRealize Operations learn and predict the
behavior of every object it monitors. Users access this information by using views, reports, and
dashboards.
Note vRealize Operations, vRealize Log Insight, and VMware Telco Cloud Service Assurance are
optional components in VMware Telco Cloud Platform RAN.
VMware, Inc. 21
Telco Cloud Platform RAN
Solution Design 3
This section describes the design and deployment of the Telco Cloud Platform RAN solution.
This design leverages Telco Cloud Platform 5G. It highlights various integration points between
the Telco Cloud Platform 5G Core and RAN design and the dependency services with the virtual
infrastructure to enable a scalable and fault-tolerant Telco Cloud Platform RAN solution design.
n Services Design
n Physical Design
The deployment architecture of the Telco Cloud Platform RAN solution shows the management
and compute workload placement between the Regional Data Center site and the Cell Site. This
guide focuses on the Telco Cloud Platform RAN solution. However, you must understand how a
Cell Site host and application work in conjunction with a Regional Data Center and what are the
dependencies to onboard a Cell Site host.
The following figure shows an end-to-end deployment model of Telco Cloud Platform RAN
solution with 5G core to understand how they work together.
In the Regional Data Center (RDC) site, vSphere administrators deploy and configure one
management workload domain and one or more Cell-Site compute workload domains. The Cell
Site Compute workload domain dedicates a vCenter Server to manage both Regional Data Center
(RDC) and Cell Site ESXi hosts and workloads.
VMware, Inc. 22
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Figure 3-1. End-to-End Deployment Model of the Telco Cloud Platform RAN Solution with Telco
Cloud Platform 5G
TCA-CP2
NSX-M: vCenter: Tanzu Kubernetes Workload Cluster
WLD1 WLD1
vROPs
vRNI
vCU vDU
TCSA vRLI
Control
Plane
Node Worker Worker
TCA TCA-CP1
vRO Kubernetes Cluster
Management Cluster
Note NSX, vRNI, vROPs, TCSA, and vRLI are optional components in the Telco Cloud Platform
RAN design that is leveraging Telco Cloud Platform 5G, and they are typically deployed at RDC /
Central domains.
NSX is typically not deployed as part of Telco Cloud Platform RAN. However, the Telco Cloud
Platform RAN design works in conjunction with Telco Cloud Platform 5G and hence NSX is
included in the management domain.
This deployment model includes one Regional Data Center (RDC) and one Cell Site as part of the
VMware Telco Cloud Platform RAN solution architecture.
n Regional Data Center (RDC) consists of one Management Workload Domain and one Cell Site
Compute Workload Domain.
VMware, Inc. 23
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Management workload domain hosts a dedicated vCenter Server and manages all the
SDDC management and operational management components such as vCenter Server, NSX,
vRealize Operations, vRealize Log Insight, and so on.
n VMware NSX is deployed as part of 5G Core components. It is used only for workloads that are
specific to telco management at RDC. NSX is not used for RAN deployments.
n Compute Workload Domains contain a vSphere cluster such as WLD1-Cluster1 and multiple
Cell Site ESXi hosts.
n In the Compute vCenter Server, WLD1-Cluster1 is hosted at the Regional Data Center and
standalone ESXi hosts are deployed at Cell Site locations.
n A dedicated vSphere Distributed Switch is used for both the RDC vSphere cluster and the Cell
Site.
n Kubernetes cluster components such as control plane nodes are deployed in the vSphere
cluster in the RDC.
n Kubernetes Worker nodes are deployed at both the Regional Data Center and Cell Site
locations to support the CNF workloads such as the CU and DU in a geographically distributed
manner.
Services Design
This section describes common external services such as DNS, DHCP, NTP, and PTP required for
the Telco Cloud Platform RAN solution deployment.
Various external services are required for the deployment of the Telco Cloud Platform RAN
components and Tanzu Kubernetes clusters. If you deploy the Telco Cloud Platform RAN solution
in a greenfield environment, you must first deploy your Regional Data Center site and then
onboard the Cell Sites to the Telco Cloud Platform RAN solution.
The following table lists the required external services and dependencies for the Regional Data
Center sites and Cell Site locations:
Service Purpose
Domain Name Services (DNS) Provides name resolution for various components of the
Telco Cloud Platform RAN solution.
Dynamic Host Configuration Protocol (DHCP) Provides automated IP address allocation for Tanzu
Kubernetes clusters at Regional Data Center and Cell Site
locations.
Note: Ensure that the DHCP service is available local to
each site.
Network Time Protocol (NTP) Performs time synchronization between various Telco
Core management components at Central Data Center or
Regional Data Center.
Precision Time Protocol (PTP) Distributes accurate time and frequency over telecom
mobile networks and ESXi host at Cell Site locations.
VMware, Inc. 24
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
DNS
When you deploy the Telco Cloud Platform RAN solution, provide the DNS domain information
for configuring various components of the solution. DNS resolution must be available for all the
components in the solution, including servers, Virtual Machines (VMs), and virtual IPs. Before you
deploy the Telco Cloud Platform RAN management components or create any workload domains,
ensure that both forward and reverse DNS resolutions are functional for each component.
DHCP
Telco Cloud Platform RAN uses Dynamic Host Configuration Protocol (DHCP) to automatically
configure Tanzu Kubernetes Cluster with an IPv4 address at Regional Data Center and the Cell
Site location. Each RDC and Cell Site must have a dedicated DHCP service locally, and the DHCP
scope must be defined and made available for this purpose. The defined scope must be able to
accommodate all the initial and future Kubernetes workloads used in the Telco Cloud Platform
RAN solution.
The following figure shows the deployment architecture of the DHCP service for a Regional Data
Center and Cell Site location:
Figure 3-2. DHCP Design for Regional Data Center and Cell Site Locations
vCenter
TCA-CP2 DHCP DHCP
WLD-1
Workload Domain
DHCP Network DHCP Network
Mgmt. Network
Midhaul
Note While deploying the Tanzu Kubernetes Control Plane, dedicate a static IP for the
Kubernetes API endpoint.
VMware, Inc. 25
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
NTP
All the management components of Telco Cloud Platform RAN must be synchronized against
a common time by using the Network Time Protocol (NTP). The Telco Cloud Platform RAN
components such as vCenter Server Single Sign-On (SSO) are sensitive to a time drift between
distributed components. The synchronized time between various components also assists
troubleshooting efforts.
n The IP addresses of NTP sources can be provided during the initial deployment of Telco Cloud
platform RAN.
n The NTP sources must be reachable by all the components in the Telco Cloud Platform RAN
solution.
PTP
Precision Time Protocol (PTP) delivers time synchronization in various Telco applications and
environments. It is defined in the IEEE 1588-2008 standard. PTP helps issuing accurate time and
frequency over telecommunication mobile networks. Precise timekeeping is a key attribute for
telco applications. It allows these applications to accurately construct the precise sequence of
events that occurred or occur in real time. So, each ESXi node in the Telco Cloud Platform RAN
solution must be time-synchronized.
Note The precision of a clock describes how consistent its time and frequency are relative
to a reference time source, when measured repeatedly. The distinction between precision and
accuracy is subtle but important.
n PTP profiles: PTP allows various profiles to be defined to amend PTP for use in different
scenarios. A profile is a set of specific PTP configuration options that are selected to meet the
requirements of telco RAN applications.
n PTP traffic: If the network carrying PTP comprises a non-PTP-aware switch in the pathway
between the Grandmaster and Follower clocks, the switch handles PTP as any other data
traffic, affecting the PTP accuracy. In this case, use a proper Quality-of-Service (QoS)
configuration for network delivery to prioritize PTP traffic over all other traffic.
n PTP grandmaster clocks: When networks are distributed geographically across different
locations with Central Data Center, Regional Data Center, and Cell Site and they are connected
over Wide Area Networks (WANs), varying latency across WAN links can compromise PTP
accuracy. In that case, use different PTP Grandmaster clocks in each site and do not extend
PTP across these sites.
VMware, Inc. 26
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The following guidelines apply to the PTP sources: ESXi host must have a third physical NIC
dedicated to PTP synchronization.
n The PTP sources such as Telco Grandmaster Clock (T-GM) must be reachable by all the
components in the Telco Cloud Platform RAN solution.
n Use G.8275.1 PTP profile for accurate time synchronization for RAN applications. ITU–T
G.8275.1 defines the PTP profile for network scenarios with full-timing support, which means
all the intermediate switches support Boundary Clock functionality (BC).
n To implement the ideal separation of Management and Workload switches for the Cell Site
locations, each host must have a minimum of two dedicated physical NICs. The host can also
have a single NIC with port separation and without redundancy. PTP can run on an SR-IOV VF
or a dedicated physical port configured with PCI pass-through.
The following figure shows the PTP configuration on an ESXi host. In this scenario, the SR-IOV for
PTP sync is configured on a VF associated with the pNIC2 port. The secondary NIC is required
only when multiple NICs are used for redundancy and not related to PTP.
In the following diagram, a single socket server with a single worker node is used. For dual-socket
servers, separate SR-IOV VFs or pass-through ports (one per NUMA) are required.
PTP
pNIC2 Grand
PTP Sync Master
VMXNET3
VDS
PTP aware
SR-IOV VF Switches
Note In case of a dual-socket host with NUMA placement, use physical NICs per NUMA socket to
ensure NUMA alignment.
Physical Design
The physical design includes the physical ESXi hosts, storage, and network design for Telco Cloud
Platform RAN.
VMware, Inc. 27
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The device types that are supported as ESXi boot devices are as follows:
n USB or SD embedded devices. The USB or SD flash drive must be at least 8 GB.
n SATADOM devices. The size of the boot device per host must be at least 16 GB.
Caution The use of USB and SD boot devices in ESXi is deprecated. For more information, see
VMware KB85685.
Use hardware based on VMware n Ensures full compatibility with Hardware choices might be limited.
Compatibility Guide. vSphere.
n Allows flexibility and ease of
management of both RDC and
Cell Site hosts.
Ensure that all ESXi hosts have a Ease of management and None
uniform configuration across the Cell maintenance across the Cell Sites.
Sites.
Onboard a Cell Site with a minimum of Cell Sites are limited in space, and Additional ESXi host resources might
one ESXi host only a few telco workloads can be run. be required for redundancy and
maintenance.
Set up each ESXi host with a minimum Ensures full redundancy for the two In servers with two NUMA nodes,
of two physical NICs. Physical NICs for workloads. each NUMA node must have a
minimum of two physical NICs.
VMware, Inc. 28
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Set up each ESXi host in the Cell Site Local storage is the primary storage The local disk must be sized
with two minimum disks. solution for Cell Sites. appropriately.
n ESXi boot drive Note: The disk size must be Note: Local storage does not support
n Local Storage for workloads. considered based on telco workloads. sharing across multiple hosts.
Set up each ESXi host in the Cell Site n A good starting point for most Additional memory might be required
location with a minimum of 192 GB workloads. based on vendor workload and sizing
RAM. n Allows for ESXi and other requirements.
management overhead.
n Configure switch ports that connect to ESXi hosts manually as trunk ports. Virtual switches are
passive devices and do not support trunking protocols, such as Dynamic Trunking Protocol
(DTP).
n Modify the Spanning Tree Protocol (STP) on any port that is connected to an ESXi NIC to
reduce the time it takes to transition ports over to the forwarding state, for example, using the
Trunk PortFast feature on a Cisco physical switch.
n Configure jumbo frames on all switch ports, Inter-Switch Link (ISL), and Switched Virtual
Interfaces (SVIs).
n Spanning Tree Protocol (STP): Although this design does not use the STP, switches usually
include STP configured by default. Designate the ports connected to ESXi hosts as trunk
PortFast.
n MTU: Set MTU for all switch ports, VLANs, and SVIs to jumbo frames for consistency.
VMware, Inc. 29
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Jumbo Frames
IP storage throughput can benefit from the configuration of jumbo frames. Increasing the per-
frame payload from 1500 bytes to the jumbo frame setting improves the efficiency of data transfer.
Jumbo frames must be configured end-to-end. When you enable jumbo frames on an ESXi host,
select an MTU size that matches the MTU size of the physical switch ports.
The workload determines whether to configure jumbo frames on a VM. Configure jumbo frames,
if necessary, if the workload regularly transfers large volumes of network data. Also, ensure that
both the VM operating system and the VM NICs support jumbo frames. Jumbo frames also
improve the performance of vSphere vMotion.
Implement the following physical n Guarantees availability during a Hardware choices might be limited.
network architecture: switch failure.
n A minimum of two 10 GbE ports n Provides compatibility with
(two 25 GbE ports recommended) vSphere host profiles because
on each ToR switch for ESXi host they do not store link-aggregation
uplinks. settings.
n No EtherChannel (LAG/vPC)
configuration for ESXi host
uplinks.
Use two ToR switches for each n This design uses a minimum of Two ToR switches per Cell Tower can
Cell Site location for network high two 10 GbE links or two 25 GbE increase costs.
availability. links recommended for each ESXi
host.
n Provides redundancy and reduces
the overall design complexity.
Use VLANs to segment physical n Supports physical network Requires uniform configuration and
network functions. connectivity without requiring presentation on all the switch ports
many NICs. made available to the ESXi hosts.
VMware, Inc. 30
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Assign static IP addresses to all Ensures that interfaces such as Requires precise IP address
management components. management and storage always have management.
the same IP address. This way,
you provide support for continuous
management of ESXi hosts using
vCenter Server and for provisioning IP
storage by storage administrators
Create DNS records for all ESXi Ensures consistent resolution of Adds administrative overhead.
hosts and management VMs to enable management components using both
forward, reverse, short, and FQDN IP address (reverse lookup) and name
resolution. resolution.
Configure the MTU size to 9000 Improves traffic throughput. When you adjust the MTU size,
bytes (jumbo frames) on the physical you must also configure the
switch ports, VLANs, SVIs, vSphere entire network path (VMkernel port,
Distributed Switches, and VMkernel distributed switch, physical switches,
ports. and routers) to support the same MTU
size.
n LLS-C1 Configuration
n LLS-C2 Configuration
n LLS-C3 Configuration
n LLS-C4 Configuration
Note Consider PTP time synchronization based on these designs. However, Telco Cloud Platform
RAN supports LLS-C3 configuration only.
LLS-C1 Configuration
This configuration is done based on the point-to-point connection between DU and RU by using
the network timing option. LLS-C1 is simple to configure. In this configuration, DU operates as
PTP Boundary Clock (BC). The DU derives the time signal from Grandmaster and communicates
directly with RU to synchronize it.
VMware, Inc. 31
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
LLS-C2 Configuration
In this configuration, DU acts as PTP BC to allocate network timing towards the RU. One or more
PTP-supported switches can be installed between the DU and RU.
LLS-C3 Configuration
In this configuration, the PTP Grandmaster performs network time-sharing between DU and RU
at Cell Sites. One or more PTP switches are allowed in the Fronthaul network to support network
time-sharing. This architecture is widely adopted by introducing the PTP Grandmaster and PTP
Switch, which provide the ideal solution for network time-sharing.
LLS-C4 Configuration
In this configuration, PRTC (usually the GNSS receiver) is used locally to provide timing for RU.
PRTC does not depend on the Fronthaul transport network for timing and synchronization.
VMware, Inc. 32
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
RDC PRTC
Source
PTP
CU DU RU
n Option 2: A high-level CU and DU split. With the Option 2 split, the CU handles Service
Data Adaptation Protocol (SDAP) or Packet Data Convergence Protocol (PDCP) with Radio
Resource Control (RRC) while L2/L1 Ethernet functions reside in the DU. Before the data is
sent across the Medium haul network, aggregation and statistical multiplexing of the data are
done in the DU. So, the amount of data transmitted across the interface for each radio antenna
appliance is reduced. PTP time synchronization is not mandatory for Option-2 split.
n Option 7.x: A low-level DU and RU split. With Option 7 split, the DU handles the RRC/ PDCP/
Radio Link Control (RLC)/MAC and higher Physical (PHY) functions. The RU handles the
lower PHY and RF functions. Mostly, a single DU is co-located with multiple RUs, offloading
resource-intensive processing from multiple RUs. CU can be centrally located across the WAN,
aggregating multiple DUs. Option 7.x lets operators simplify the deployment of the DU and
RU, leading to a cost-effective solution and an ideal option for a distributed RAN deployment.
Use LLC-C3 for PTP synchronization between the RU and DU.
Mobile operators require the flexibility to choose different splits based on the server hardware
and fronthaul availability. Higher-layer functional splits are required for dense urban areas and
scenarios, while a low fronthaul bit rate is required for a fronthaul interface. With Option 7.x, more
functions are shifted to DUs, enabling more virtualization gains. Hence, Option 7.x split is more
cost-effective than Option-2 DU split.
Cell Site design for local storage can be internal hard disks located inside your ESXi host. Local
storage does not support sharing across multiple hosts. Only one host has access to a datastore
on a local storage device. As a result, although you can use local storage to create VMs, you
cannot use VMware features that require shared storage, such as HA and vMotion.
VMware, Inc. 33
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
ESXi supports various local storage devices, including SCSI, IDE, SATA, SAS, flash, and NVMe
devices.
Another option is Network Attached Storage that stores VM files on remote file servers accessed
over a standard TCP/IP network. The NFS client built into ESXi uses Network File System (NFS)
protocol versions 3 and 4.1 to communicate with the NAS/NFS servers. For network connectivity,
the host requires a standard network adapter.
You can mount an NFS volume directly on the ESXi host. You can use the NFS datastore to store
and manage VMs in the same way that you use the VMFS datastores.
NFS Storage depicts a VM using the NFS datastore to store its files. In this configuration, the host
connects to the NAS server, which stores the virtual disk files, through a regular network adapter.
Important Although local storage configuration is possible, it is not recommended. Using a single
connection between storage devices and the host creates a Single Point of Failure (SPOF) that can
cause interruptions when a connection is unreliable or fails. However, because most of the local
storage devices do not support multiple connections, you cannot use multiple paths to access the
local storage.
Before you deploy a Cell Site host, deploy a management domain and a compute workload
domain at your Regional Data Center (RDC) using Telco Cloud Automation. The following sections
describe the VMware SDDC design within a construct of Management workload domain and
Compute workload domain, and how they are related in this RAN design.
Management Domain
The management domain contains a single vSphere cluster called the management cluster. The
management cluster hosts the VMs that manage the solution. This cluster is crucial for the
management and monitoring of the solution. Its high availability deployment ensures that the
management and monitoring services are always available centrally at Regional Data Center.
vRealize Suite Standard Includes vRealize Log Insight and vRealize Operations.
VMware, Inc. 34
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
vRealize Network Insight Communicates with the vCenter Server and NSX Manager
instances to collect metrics that are presented through
various dashboards and views.
VMware Tanzu Basic for RAN Creates Workload clusters in the compute Workload
domain.
Note In the Telco Cloud Platform RAN solution, the Management Domain is hosted at Regional
Data Center (RDC).
The compute workload domain can contain multiple vSphere clusters at RDC. These clusters can
contain a minimum of three ESXi hosts and a maximum of 96 or 64 hosts (when using vSAN),
depending on the resource and availability requirements of the solution being deployed at RDC.
The Cell Site host, which is designed to onboard CNFs on a single node ESXi host, is part of the
compute workload domain. This host provides Kubernetes workload cluster where CNFs such as
vCU and vDU workloads are placed.
Each compute workload domain can support a maximum of 2500 ESXi hosts and 45,000 VMs
in combination with the RDC cluster and Cell Site hosts. If you use other management and
monitoring tools, the vCenter maximums do not apply and the actual number of ESXi hosts and
VMs per workload domain might be less.
vCenter Server is deployed at Regional Data Center and it manages all the Cell Site hosts. So, it
is critical to design vCenter Servers appropriately before onboarding the Cell Site hosts and RAN
applications.
A vCenter Server deployment can consist of two or more vCenter Server instances according to
the scale, number of VMs, number of CNFS, and continuity requirements for your environment.
You must protect the vCenter Server system as it is the central point of management and
monitoring. You can protect vCenter Servers according to the maximum downtime tolerated. Use
the following methods to protect the vCenter Server instances:
VMware, Inc. 35
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Attribute Specification
Number of vCPUs 4
Memory 19 GB
The following table lists different deployment sizes for Compute vCenter Server. Choose the
appropriate size based on your scaling requirements such as the number of Cell Site hosts or
workloads.
Important Ensure that the Compute vCenter Server is dedicated to your Cell Site hosts and RAN
applications.
As a security best practice, replace at least all user-facing certificates with certificates that are
signed by a third-party or enterprise Certificate Authority (CA).
VMware, Inc. 36
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Deploy two vCenter Server systems n Isolates vCenter Server failures Requires licenses for each vCenter
n One vCenter Server supports the to management or compute Server instance.
management workloads. workloads.
Protect all vCenter Servers by using Supports the availability objectives for vCenter Server becomes unavailable
vSphere HA. vCenter Server without the required during the vSphere HA failover.
manual intervention during a failure
event.
Replace the vCenter Server machine n Infrastructure administrators Replacing and managing certificates is
certificate with a certificate signed connect to the vCenter an operational overhead.
by a third-party Public Key Server instances using a
Infrastructure. Web browser to perform
configuration, management, and
troubleshooting.
n The default certificate results in
certificate warning messages.
Use an SHA-2 or higher algorithm The SHA-1 algorithm is considered Not all certificate authorities support
when signing certificates. less secure and is deprecated. SHA-2.
Important In the Telco Cloud Platform RAN solution design, both the management vCenter
Server and Compute vCenter Servers are deployed at Regional Data Center. Compute vCenter
Server manages all the Cell Site hosts.
VMware, Inc. 37
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The cluster design at RDC must consider the workloads that the cluster handles. Different cluster
types in this design have different characteristics. When you design the cluster layout in vSphere,
consider the following guidelines:
n Use a few large-sized ESXi hosts or more small-sized ESXi hosts for Regional Data Center
(RDC)
n Use the ESXi hosts that are sized appropriately for your Cell Site locations.
n Consider the total number of ESXi hosts and cluster limits as per vCenter Server Maximums.
The vSphere HA Admission Control Policy allows an administrator to configure how the cluster
determines available resources. In a small vSphere HA cluster, a large proportion of the cluster
resources is reserved to accommodate ESXi host failures, based on the selected policy.
However, with a Regional Data Center and a Cell Site construct in the Telco Cloud Platform
RAN deployment, you need to only enable vSphere High Availability on your workload cluster at
Regional Data Center. vSphere HA is not required on your cell site host as it is managed as a
standalone host.
VMware, Inc. 38
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
According to the application or service, high latency on specific VM networks can also negatively
affect performance. Determine which workloads and networks are sensitive to high latency
by using the information gathered from the current state analysis and by interviewing key
stakeholders and SMEs.
The following table lists the network segments and VLANs for a Cell Site host configuration.
VLAN Purpose
VMware, Inc. 39
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
In the case of Cell Site ESXi hosts, create a single virtual switch per Cell Site group. The virtual
switch can manage each type of network traffic and configure a port group to simplify the
configuration and monitoring. Cell Site ESXi hosts are added to the data center object of vCenter
server.
The VDS eases this management burden by treating the network as an aggregated resource.
Individual host-level virtual switches are abstracted into one large VDS spanning multiple hosts. In
this design, the data plane remains local to each VDS but the management plane is centralized.
The following figure shows a dedicated VDS at Regional Data Center which is managing
Kubernetes Cluster and Worker nodes along with vCU. Another VDS is configured to manage
all Cell Site group hosts. Both the VDS switches are managed by a Compute vCenter Server which
is hosted at Regional Data Center.
Important Each vCenter Server instance can support up to 128 vSphere Distributed Switches.
Each VDS can manage up to 2000 hosts. So, you must consider your Cell Site scaling
appropriately.
VDS VDS
WLD-1
Cluster-1 Host #1 Host #2 Host #3
VDS Limitation
Shared VDS across Cell Sites in a cell site group. 128 vSphere Distributed Switches supported per vCenter Server.
VMware, Inc. 40
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Managed by RDC
Compute vCenter
Cell Site #1 Cell Site #2 Cell Site #3 Cell Site #1 Cell Site #2 Cell Site #1 Cell Site #2 Cell Site #3
Cell Site Group SFO Cell Site Group Palo Alto Cell Site Group San Jose
SR-IOV
SR-IOV is a specification that allows a single Peripheral Component Interconnect Express (PCIe)
physical device under a single root port to appear as multiple separate physical devices to the
hypervisor or the guest operating system.
SR-IOV uses Physical Functions (PFs) and Virtual Functions (VFs) to manage global functions
for the SR-IOV devices. PFs are full PCIe functions that can configure and manage the SR-IOV
functionality. VFs are lightweight PCIe functions that support data flow but have a restricted
set of configuration resources. The number of VFs provided to the hypervisor or the guest
operating system depends on the device. SR-IOV enabled PCIe devices require appropriate BIOS,
hardware, and SR-IOV support in the guest operating system driver or hypervisor instance.
In vSphere, a VM can use an SR-IOV virtual function for networking. The VM and the physical
adapter exchange data directly without using the VMkernel stack as an intermediary. Bypassing
the VMkernel for networking reduces the latency and improves the CPU efficiency for high data
transfer performance.
VMware, Inc. 41
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
ToR
pNIC pNIC
ESXi Host
PF VF VF
vSphere
Distributed Switch
Port Group
Port 1 Port 2
Association
Data Data
Plane Plane
VF Driver VF Driver
VNFC 1 VNFC 2
Use two physical NICs in Cell Site ESXi Provides redundancy to all port None
host for workloads. groups.
Use vSphere Distributed Switches. Simplifies the management of the Migration from a standard switch to a
virtual network. distributed switch requires a minimum
of two physical NICs to maintain
redundancy.
Use a single vSphere Distributed Reduces the complexity of the Increases the number of vSphere
Switch per Cell Site Group. network design. Distributed Switches that must be
Provides more scalable architecture managed.
for Cell Site locations.
Use ephemeral port binding for the Provides the recovery option for the Port-level permissions and controls
management port group. vCenter Server instance that manages are lost across power cycles, and no
the distributed switch. historical context is saved.
Use static port binding for all non- Ensures that a VM connects to the None
management port groups. same port on the vSphere Distributed
Switch. This allows for historical data
and port-level monitoring.
VMware, Inc. 42
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Enable health check on all vSphere Verifies that all VLANs are trunked to You must have a minimum of two
distributed switches. all ESXi hosts attached to the vSphere physical uplinks to use this feature.
Distributed Switch and the MTU sizes
match the physical network.
Use the Route based on the physical n Reduces the complexity of the None
NIC load teaming algorithm for all network design.
port groups. n Increases resiliency and
performance.
Enable Network I/O Control on all Increases the resiliency and If configured incorrectly, Network I/O
distributed switches. performance of the network. Control might impact the network
performance for critical traffic types.
Telco Cloud Platform RAN consumes resources from the compute workload domain. Resource
pools provide guaranteed resource availability to workloads. Resource pools are elastic; more
resources can be added as their capacity grows. Each Kubernetes cluster can be mapped to a
resource pool. A resource pool can be dedicated to a Kubernetes cluster or shared across multiple
clusters.
In a RAN deployment design with a Regional Data Center and Cell Sites, Kubernetes control plane
node can be placed on a vSphere cluster at Regional Data Center and Worker nodes can be
placed on an ESXi host at Cell Sites to support CNF workloads. Both vSphere clusters and ESXi
hosts can be managed by a vCenter Server in the compute workload domain.
VMware, Inc. 43
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Map the Tanzu Kubernetes clusters Enables Resource Guarantee and During resource contention,
to the vSphere Resource Pool in the Resource Isolation. workloads can be starved for
compute workload domain. resources and can experience
performance degradation.
Note: You must proactively perform
monitoring and capacity management
and add the capacity before the
contention occurs.
n Create dedicated DHCP IP subnet n Simplifies the IP address n DHCP servers must be monitored
pools for the Tanzu Kubernetes assignment to Kubernetes for availability.
cluster management network. clusters. n Address scopes are not
n Dedicate a Static IP for n Use static reservations to reserve overlapping IP addresses that are
Kubernetes Endpoint API. IP addresses in the DHCP pool for being used.
Kube-Vip address.
Place the Kubernetes cluster n Provides connectivity to the n Increases the network address
management network on a virtual vSphere infrastructure. management overhead.
network, which is routable to the n Simplifies the network design and n Increased security configuration to
management network for vSphere, reduces the network complexity. allow traffic between the resource
Harbor, and Airgap mirror. and management domains.
When you allocate resource pools to Kubernetes clusters, consider the following guidelines:
n Enable 1:1 Kubernetes Cluster to Resource Pool mapping for data plane intensive workloads.
n Enable N:1 Kubernetes Cluster to Resource Pool mapping for control plane workloads where
resources are shared.
n Consider the total number of ESXi hosts and Kubernetes cluster limits.
VMware, Inc. 44
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Etcd: Etcd must run in the cluster mode with an odd number of cluster members to establish
a quorum. A 3-node cluster tolerates the loss of a single member, while a 5-node cluster
tolerates the loss of two members. In a stacked mode deployment, etcd availability determines
the number of Kubernetes Control nodes.
n Control Plane node: The Kubernetes control plane must run in redundant mode to avoid a
single point of failure. To improve API availability, Kube-Vip is placed in front of the Control
Plane nodes.
Component Availability
Kube-controller-manager Active/Passive
Kube-scheduler Active/Passive
Worker Node
5G RAN workloads are classified based on their performances. Generic workloads such as web
services, lightweight databases, monitoring dashboards, and so on, are supported adequately
using standard configurations on Kubernetes nodes. In addition to the recommendations outlined
in the Tuning Telco Cloud Platform 5G Edition for Data Plane Intensive Workloads white paper, the
data plane workload performance can benefit from further tuning in the following areas:
n NUMA Topology
n Huge Pages
NUMA Topology: When deploying Kubernetes worker nodes that host high data bandwidth
applications, ensure that the processor, memory, and vNIC are vertically aligned and remain within
a single NUMA boundary.
VMware, Inc. 45
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
1 2 3 4 1 2 3 4
sched.mem.lpage.enable1GPage=TRUE
CNF1 sched.cpu.latencySensitivity=HIGH CNF2
The topology manager is a new component in the Kubelet and provides NUMA awareness to
Kubernetes at the pod admission time. The topology manager figures out the best locality of
resources by pulling topology hints from the Device Manager and the CPU manager. Pods are
then placed based on the topology information to ensure optimal performance.
Note Topology Manager is optional, if the NUMA placement best practices are followed during
the Kubernetes cluster creation.
CPU Core Affinity: CPU pinning can be achieved in different ways. Kubernetes built-in CPU
manager is the most common. The CPU manager implementation is based on cpuset. When a
VM host initializes, host CPU resources are assigned to a shared CPU pool. All non-exclusive CPU
containers run on the CPUs in the shared pool. When the Kubelet creates a container requesting
a guaranteed CPU, CPUs for that container are removed from the shared pool and assigned
exclusively for the life cycle of the container. When a container with exclusive CPUs is terminated,
its CPUs are added back to the shared CPU pool.
n None: Default policy. The kubelet uses the CFS quota to enforce pod CPU limits. The workload
can move between different CPU cores depending on the load on the Pod and the available
capacity on the worker node.
n Static: With the static policy enabled, the CPU request results in the container getting
allocated the whole CPU and no other container can schedule on that CPU.
Note For data plane intensive workloads, the CPU manager policy must be set to static to
guarantee an exclusive CPU core on the worker node.
CPU Manager for Kubernetes (CMK) is another tool used by selective CNF vendors to assign the
core and NUMA affinity for data plane workloads. Unlike the built-in CPU manager, CMK is not
bundled with Kubernetes binaries and it requires separate download and installation. CMK must
be used over the built-in CPU manager if required by the CNF vendor.
VMware, Inc. 46
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Huge Pages: For Telco workloads, the default huge page size can be 2 MB or 1 GB. To report
its huge page capacity, the worker node determines the supported huge page sizes by parsing
the /sys/kernel/mm/hugepages/hugepages-{size}kB directory on the host. Huge pages must
be set to pre-allocated for maximum performance. Pre-allocated huge pages reduce the
amount of available memory on a worker node. A node can only pre-allocate huge pages for
the default size. The Transport Huge Pages must be deactivated.
Use three Control nodes per 3-node cluster tolerates the loss of a n Each Control node requires CPU
Kubernetes cluster to ensure full single member and memory resources.
redundancy. n CPU/Memory overhead is high for
small Kubernetes cluster sizes.
Install and activate the PTP clock Kubernetes and its components rely None
synchronization service. on the system clock to track events,
logs, states, and so on.
Deactivate Swap on all Kubernetes Swap causes a decrease in the overall None
Cluster Nodes. performance of the cluster.
Align processor, memory, and vNIC n High packet throughput can Requires an extra configuration step
vertically and keep them within a be maintained for data transfer on the vCenter Server to ensure
single NUMA boundary for data plane across vNICs within the same NUMA alignment.
intensive workloads. NUMA zone than in different Note: This is not required for generic
NUMA zones. workloads such as web services,
n Latency_sensitivity must be lightweight databases, monitoring
enabled for best-effort NUMA dashboards, and so on.
placement.
Set the CPU manager policy to When the CPU manager is used Requires an extra configuration
static for data plane intensive for CPU affinity, the static mode is step for CPU Manager through
workloads. required to guarantee exclusive CPU NodeConfig Operator.
cores on the worker node for data- Note: This is not required for generic
intensive workloads. workloads such as web services,
lightweight databases, monitoring
dashboards, and so on.
When enabling the static CPU n The kubelet requires a CPU n Requires an extra configuration
manager policy, set aside sufficient reservation to ensure that the step for CPU Manager.
CPU resources for the kubelet shared CPU pool is not exhausted n Less CPU reservation can impact
operation. under load. the Kubernetes cluster stability.
n The amount of CPU to be Note: This is not required for generic
reserved depends on the pod workloads such as web services,
density per node. lightweight databases, monitoring
dashboards, and so on.
VMware, Inc. 47
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Enable huge page allocation at boot n Huge pages reduce the TLB miss. n Pre-allocated huge pages reduce
time. n Huge page allocation at boot the amount of available memory
time prevents memory from on a worker node.
becoming unavailable later due to n Requires an extra configuration
fragmentation. step in the worker node VM GRUB
n Update VM setting for Worker configuration.
Nodes 1G Hugepage. n Enabling huge pages requires a
n Enable IOMMUs to protect system VM reboot.
memory between I/O devices. Note: This is not required for generic
workloads such as web services,
lightweight databases, monitoring
dashboards, and so on.
Set the default huge page size to 1 n For 64-bit applications, use 1 For 1 GB pages, the huge page
GB. GB huge pages if the platform memory cannot be reserved after the
Set the overcommit size to 0. supports them. system boot.
n Overcommit size defaults to 0, no Note: This is not required for generic
actions required. workloads such as web services,
lightweight databases, monitoring
dashboards, and so on.
Mount the file system type hugetlbfs n The file system of type hugetlbfs is Perform an extra configuration step in
on the root file system. required by the mmap system call. the worker node VM configuration.
n Create an entry in fstab so Note: This is not required for generic
the mount point persists after a workloads such as web services,
reboot. lightweight databases, monitoring
dashboards, and so on.
Cell Site Local Storage The Cell Site vSphere Cluster may
have a single host configuration, so
the local disk is the primary choice.
You can also use any NFS storage
available locally.
If vSAN storage is used in Regional Data Center (RDC), you must follow the vSAN storage policies.
VMware, Inc. 48
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
vSAN storage policies define storage requirements for your StorageClass. Cloud Native Persistent
Storage or Volume (PV) inherits performance and availability characteristics made available by
the vSAN storage policy. These policies determine how the storage objects are provisioned and
allocated within the datastore to guarantee the required level of service. Kubernetes StorageClass
is a way for Kubernetes admins to describe the “classes” of storage available for a Tanzu
Kubernetes cluster by the Cloud Admin. Different StorageClasses map to different vSAN storage
policies.
For more information about Cloud Native Storage Design for 5G Core, see the Telco Cloud
Platform 5G Edition Reference Architecture Guide 2.5.
VMware Telco Cloud Automation with infrastructure automation provides a universal 5G Core and
RAN deployment experience to service providers. Infrastructure automation for 5G core and RAN
allows telco providers and telco administrators to provide a virtually zero IT touch and virtually
zero infrastructure onboarding experience. The Telco Cloud Automation appliance automates the
deployment, configuration, and provisioning of RAN sites.
Network Administrators can provision new telco cloud resources, monitor changes to the RDC
and cell sites, and manage other operational activities. VMware Telco Cloud Automation enables
consistent, secure infrastructure and operations across Central Data Centers, Regional Data
Centers, and Cell Sites with increased enterprise agility and flexibility.
n Telco Cloud Automation Manager (TCA Manager) provides orchestration and management
services for Telco clouds.
TCA-CP and TCA Manager components work together to provide Telco Cloud Automation
services. TCA Manager connects with TCA-CP nodes through site pairing. It relies on the inventory
information captured from TCA-CP to deploy and scale Tanzu Kubernetes clusters. TCA manager
does not communicate with the VIM directly. Workflows are always posted by the TCA manager to
the VIM through TCA-CP.
VMware, Inc. 49
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The Kubernetes cluster bootstrapping environment is completely abstracted into TCA-CP. The
binaries and cluster plans required to bootstrap the Kubernetes clusters are pre-bundled into the
TCA-CP appliance. After the base OS image templates are imported into the respective vCenter
Servers, Kubernetes admins can log into the TCA manager and start deploying Kubernetes
clusters directly from the TCA manager console.
Integrate the TCA Manager with n TCA-CP SSO integrates with Requires additional components to
active directory integration for more vCenter Server SSO. manage in the Management cluster.
control over user access. n LDAP enables centralized and
consistent user management.
Deploy a single instance of the TCA n Single point of entry into CaaS. Larger deployments with significant
manager (of a permissible size) to n Simplifies inventory control, scale may require multiple TCA-
manage all TCA-CP endpoints. user onboarding, and CNF Managers
onboarding.
Register the TCA manager with the Management vCenter Server is used None
management vCenter Server. for TCA user onboarding.
Note: Use an account with relevant
permissions to complete all actions.
Deploy a dedicated TCA-CP node Required for the deployment of TCA-CP requires additional CPU and
to control the Tanzu Kubernetes the Tanzu Kubernetes management memory in the management cluster.
management cluster. cluster.
Each TCA-CP node controls a single Cannot distribute TCA-CP to vCenter n Each time a new vCenter Server is
vCenter Server. Multiple vCenter Server mapping deployed, a new TCA-CP node is
Servers in one location require required.
multiple TCA-CP nodes. n To minimize recovery time in
case of TCA-CP failure, each
TCA-CP node must be backed
up independently, along with the
TCA manager.
Share the vRealize Orchestrator Consolidated vRO deployment Requires vRO to be high-available,
deployments across all TCA-CP and reduces the number of VRO nodes to if multiple TCA-CP endpoints are
vCenter Server pairs. deploy and manage. dependent on a shared deployment
VMware, Inc. 50
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Deploy a vRO cluster using three Ensures the high availability of vRO vRO redundancy requires an external
nodes. for all TCA-CP endpoints. Load Balancer.
Schedule TCA manager and TCA-CP n Minimizes the database Backups are scheduled manually. TCA
backups around the same time as synchronization issues upon admin must log into each component
SDDC infrastructure components. restore. and configure a backup schedule and
Note: Your backup frequency and n Proper backup of all TCA and frequency.
schedule might vary based on SDDC components is crucial to
your business needs and operational restore the system to its working
procedure. state in the event of a failure.
n Time-consistent backups taken
across all components require less
time and effort upon restore.
CaaS Infrastructure
Tanzu Kubernetes Cluster automation in Telco Cloud Automation starts with Kubernetes templates
that capture deployment configurations for a Kubernetes cluster. The cluster templates are a
blueprint for Kubernetes cluster deployments and are intended to minimize repetitive tasks,
enforce best practices, and define guard rails for infrastructure management.
A policy engine is used to honor SLA required for each template profile by mapping the Telco
Cloud Infrastructure resources to the Cluster templates. Policies can be defined based on the tags
assigned to the underlying VIM or based on the role and role permission binding. Hence, the
appropriate VIM resources are exposed to a set of users, thereby automating the SDDC to the
Kubernetes Cluster creation process.
The CaaS Infrastructure automation in Telco Cloud Automation consists of the following
components:
n TCA Kubernetes Cluster Template Designer: TCA admin uses the Tanzu Kubernetes Cluster
designer to create Kubernetes Cluster templates to help deploy Kubernetes clusters. A
Kubernetes Cluster template defines the composition of a Kubernetes cluster. A typical
Kubernetes cluster template includes attributes such as the number and size of Control and
VMware, Inc. 51
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
worker nodes, Kubernetes CNI, Kubernetes storage interface, and Helm version. The TCA
Kubernetes Cluster template designer does not capture CNF-specific Kubernetes attributes
but instead leverages the VMware NodeConfig operator through late binding. For late binding
details, see TCA VM and Node Config Automation Design.
n SDDC Profile and Inventory Discovery: The Inventory management component of Telco
Cloud Automation can discover the underlying infrastructure for each VIM associated with
a TCA-CP appliance. Hardware characteristics of the vSphere node and vSphere cluster are
discovered using the TCA inventory service. The platform inventory data is made available by
the discovery service to the Cluster Automation Policy engine to assist the Kubernetes cluster
placement. TCA admin can add tags to the infrastructure inventory to provide additional
business logic on top of the discovered data.
n Cluster Automation Policy: The Cluster Automation policy defines the mapping of the TCA
Kubernetes Cluster template to infrastructure. VMware Telco Cloud Platform allows TCA
admins to map the resources using the Cluster Automation Policy to identify and group
the infrastructure to assist users in deploying high-level components on them. The Cluster
Automation Policy indicates the intended usage of the infrastructure. During cluster creation,
TCA validates whether the Kubernetes template requirements are met by the underlying
infrastructure resources.
n Kubernetes Bootstrapper: When the deployment requirements are met, TCA generates a
deployment specification. The Kubernetes Bootstrapper uses the Kubernetes cluster APIs to
create a cluster based on the deployment specification. Bootstrapper is a component of the
TCA-CP.
Deploy v2 clusters or convert existing n Provides a framework that Requires adaptation of any
workload clusters into v2 clusters. supports the deployment of automation process that is currently
additional PaaS-type components used to build v1 clusters
into the cluster
n Provides access to advanced
cluster topologies
When creating workload clusters, n Network labels are used to create None
define only network labels required vNICs on each node.
for Tanzu Kubernetes management n Data plane vNICs that require SR-
and CNF OAM using network labels. IOV are added as part of the
node customization during the
CNF deployment.
n Late binding of vNIC saves
resource consumption on the
SDDC infrastructure. Resources
are allocated only during CNF
instantiation.
VMware, Inc. 52
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
When creating workload Cluster n Multus CNI enables the Multus is an upstream plugin and
templates, enable Multus CNI for attachment of multiple network follows the community support
clusters that host Pods requiring interfaces to a Pod. model.
multiple NICs. n Multus acts as a "meta-plugin", a
CNI plugin that can call multiple
other CNI plugins.
When creating workload Cluster n Simplifies IP address assignment Whereabout is now available to
templates, enable whereabouts if for secondary Pod NICS. be deployed through the Add-on
cluster-wide IPAM is required for n Whereabouts is cluster wide Framework
secondary Pod NICs. compared to the default IPAM
that comes with most CNIs such as
macvlan.
When defining workload clusters, Some CNF vendors require the NFS backend must be onboarded
enable nfs_client CSI for multiaccess support to read/write many persistent separately, outside of Telco Cloud
read and write support. volumes. NFS provider supports Automation.
When using vSAN, RWX clusters can Kubernetes RWX persistent volume
be supported natively through vSAN types.
File Services and the vSphere-CSI
driver.
When defining workload clusters, Enables Kubernetes to use vSphere Requires creating zone tags on
enable CSI zoning support for storage resources that are not equally vSphere objects.
workload clusters that span vSphere available to all nodes.
clusters or standalone ESXi hosts.
When defining a workload cluster, if n Node Labels can be used with Too many node pools might lead to
a cluster is designed to host CNFs Kubernetes scheduler for CNF resource underutilization.
with different performance profiles, placement logic.
create a separate node pool for each n Node pool simplifies the CNF
profile. Define unique node labels placement logic when a cluster
to distinguish node members among is shared between CNFs with
node pools. different placement logics.
Pre-define a set of infrastructure Tags simplify the grouping of Infrastructure tag mapping requires
tags and apply the tags to SDDC infrastructure components. Tags can administrative-level visibility into the
infrastructure resources based on be based on hardware attributes or infrastructure composition.
the CNF and Kubernetes resource business logic.
requirements.
Pre-define a set of CaaS tags and Tags simplify the grouping of Kubernetes template tag mapping
apply the tags to each Kubernetes Kubernetes templates. Tags can be requires advanced knowledge of CNF
cluster defined by the TCA admin. based on hardware attributes or requirements.
business logic. Kubernetes template mapping can be
performed by the TCA admin with
assistance from Kubernetes admins.
Pre-define a set of CNF tags and Tags simplify the searching of CaaS None
apply the tags to each CSAR file resources.
uploaded to the CNF catalog.
Important After deploying the resources with Telco Cloud Automation, you cannot rename
infrastructure objects such as datastores or resource pools.
VMware, Inc. 53
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Policy and Placement Engine enables intent‑based and multi‑cloud workload/ policy
placements from the network core to edge, and from private to public clouds.
Airgapped Design
Isolating your infrastructure from Internet access is often a best practice, but it impacts the
default operational mode of VMware Telco Cloud Automation. The Airgap solution eliminates the
requirement for internet connectivity.
In the non-airgapped design, VMware Telco Cloud Automation uses external repositories for
Harbor and the PhotonOS packages to implement the VM and Node Config operators, new kernel
builds, or additional packages to the nodes. Internet access is required to pull these additional
components.
The Airgap server is a Photon OS VM that is deployed and configured for use by Telco Cloud
Automation. The airgap server is registered as a partner system within the platform and is used in
internet-restricted or airgapped environments.
The airgap server allows the VMware Tanzu Kubernetes Grid clusters to pull the required Kernels,
Binaries, and OCI images from a local environment.
Note While the Airgap server removes the requirement for Internet access to build and manage
Kubernetes clusters, the Airgap server creation requires Internet access to build and pull all the
external images to be stored locally.
The Airgap server can be built on an Internet-accessible zone (direct or through proxy) and then
migrated to the Internet-restricted environment and reconfigured before use.
n Restricted: This mode uses a proxy server between the Airgap server and the internet. In this
mode, the Airgap server is deployed in the same segment as the Telco Cloud Automation VMs
in a one-armed mode design.
n Airgapped: In this mode, the airgap server is created and migrated/moved to the airgapped
environment. The airgap server has no external connectivity requirements. You can upgrade
the airgap server by a new Airgap deployment or an upgrade patch.
VMware, Inc. 54
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The Airgap server consists of the following main components along with a set of scripts for easy
installation and configuration.
n NGINX is used to request files from the local datastore or harbor environment.
n Harbor is the container registry that hosts the OCI images required by VMware Telco Cloud
Automation and VMware Tanzu Kubernetes Grid.
n Reposyc synchronizes the airgapped repository with the upstream repository located on the
internet.
n BOM Files are used by the VMware Telco Cloud automation platform
Where required, leverage the n Provides a secure environment Requires the airgap server to be
airgapped solution to eliminate direct for the Tanzu Kubernetes Grid deployed, maintained, and upgraded
Internet connectivity requirements. deployment as external access is over time.
restricted.
n Speeds up the Tanzu Kubernetes
Grid deployment process by
accessing the local infrastructure,
without Internet connectivity.
Tanzu Kubernetes Management Cluster is a Kubernetes cluster that functions as the primary
management and operational center for the Tanzu Basic for RAN instance. In this management
cluster, the Cluster API runs to create Tanzu Kubernetes clusters and you configure the shared
and in-cluster services that the clusters use.
Tanzu Kubernetes Workload Cluster is a Kubernetes cluster that is deployed from the Tanzu
Kubernetes management cluster. Tanzu Kubernetes clusters can run different versions of
Kubernetes, depending on the CNF workload requirements. Tanzu Kubernetes clusters support
multiple types of CNIs for Pod-to-Pod networking, with Antrea as the default CNI and the vSphere
CSI provider for storage by default. When deployed through Telco Cloud Automation, VMware
NodeConfig Operator is bundled into every workload cluster to handle the node Operating System
(OS) configuration, performance tuning, and OS upgrades required for various types of Telco CNF
workloads for RAN.
VMware, Inc. 55
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
sddc_mgmt_network
Telco
Cloud DNS NTP
Automation tkg_dhcp_network
vCenter vCenter
TCA CP2 NSX
WLD-1
TCA CP1
Compute Wokload Domain WLD-1
vCU vDU
NSX
DHCP K8s
VRO Control K8s- Worker K8s- Worker
Plane Node
Host
Management- Compute WLD-1
WLD Cluster-1
Cluster-1
QoS
Regional Regional
Cell Site
Data Center Data Center
n In this design, Kubernetes control plane nodes are deployed at Regional Data Center (RDC)
and Kubernetes worker node is deployed at the Cell Site host.
n Telco Cloud Automation onboards the Cell Site host and orchestrates the deployment of Tanzu
Kubernetes clusters.
n A dedicated DHCP server is available locally at RDC and Cell Site to support the DHCP service
offering for Kubernetes clusters.
n Kubernetes Worker nodes are deployed at Regional Data Center and are extended to Cell
Site locations to support the telco CNF workloads such as vCU and vDU in a geographically
distributed way.
The add-on framework moves some of the cluster configuration options into a modular
framework. The modular framework can be used not only for generic cluster elements but to
support an increasing number of the Tanzu Kubernetes Grid CLI Managed packages.
VMware, Inc. 56
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
n Container Networking Interface (CNI) add-ons: Antrea and Calico. These primary CNI add-ons
are selected during the cluster creation.
n Monitoring add-ons: Prometheus and Fluent-bit. These add-ons are used for metric and
syslog collection, and they can be added to a workload cluster.
n System add-ons: Systemsettings (cluster password and generic Syslog configuration), partner
harbor system connectivity, and cert-manager
n TCA-Core add-on: nodeconfig operator. This add-on is deployed automatically as part of Telco
Cloud Automation.
The major new additions to this framework focus on monitoring, backup and system configuration
with cert-manager and Whereabouts.
Prometheus Add-on
Prometheus is a monitoring and alerting platform for Kubernetes. It collects and stores metrics
as time-series data. As part of the Prometheus deployment, cadvisor, kube-state-metrics, node
exporters, and the Prometheus server components are deployed into the workload cluster.
When deploying Prometheus, an additional Custom Resource (CR) can be applied. The default
configuration deploys Prometheus with a service type of clusterip and a PVC of 150GB for
metric retention. The default Prometheus configuration from the Tanzu Kubernetes add-on
framework is deployed through the custom resource. The default configuration can be modified as
required. For more information about the Prometheus deployment and configuration options, see
Prometheus Configuration.
Note Prometheus provides the collected metrics such as vROps to an upstream element to parse.
For more information about integrating Prometheus with vRealize Operations, contact your local
VMware representative.
Fluent-bit Add-on
Fluent-bit is a lightweight logging processor for Kubernetes. You can deploy fluent-bit through the
add-on framework to forward logging information to an external syslog or the Security Information
and Event Management (SIEM) platform.
VMware, Inc. 57
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Similar to other add-ons, the fluent-bit deployment uses an additional Custom Resource (CR) for
its configuration. Specific fluent-bit configuration is required for the appropriate level of logging at
the cluster level.
Note For more information about the Prometheus configuration options, see Fluent-bit
Configuration. For more information about integrating fluent-bit with vRealize Log Insight, contact
your local VMware representative.
Whereabouts Add-on
Whereabouts is an IP Address Management (IPAM) CNI plugin. It is used in conjunction with
Multus to manage the IP address assignment to secondary pod interfaces in a cluster-wide
configuration.
Whereabouts does not require configuration from the add-on framework Custom Resource (CR)
screen. After the add-on is deployed, the NF must create a Network Attachment Definition with
the IPAM type set to 'whereabouts'. The network definition can then be consumed through the
pod or deployment specification.
For more information about Whereabouts consumption, see Multus and Whereabouts
deployment.
Cert-Manager Add-on
cert-manager is an x.509 certificate controller for Kubernetes environments. It allows certificates
or certificate issuers to be added as objects or resources within the Kubernetes cluster.
Note The default cert-manager deployment does not create any issuers or clusterissuers.
Configure the issuers after deploying cert-manager. The configuration varies depending on the
customer and application requirements.
CNF Design
This section outlines the CNF requirements and how CNF can be onboarded and instantiated in
Telco Cloud Platform RAN.
HELM Charts
Helm is the default package manager in Kubernetes, and it is widely leveraged by CNF vendors to
simplify container packaging. With Helm charts, dependencies between CNFs are handled in the
formats agreed upon by the upstream community. This allows Telco operators to consume CNF
packages in a declarative and easy-to-operate manner. With proper version management, Helm
charts also simplify workload updates and inventory control.
VMware, Inc. 58
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Helm repository is a required component in the Telco Cloud Platform RAN. Production CNF Helm
charts must be stored centrally and accessible by the Tanzu Kubernetes clusters. To reduce the
number of management endpoints, the Helm repository must work seamlessly with container
images. A container registry must be capable of supporting both container image and Helm
charts.
CSAR Design
Network Function (NF) Helm charts are uploaded as a catalog offering wrapped around the
ETSI-compliant TOSCA YAML (CSAR) descriptor file. The descriptor file includes the structure and
composition of the NF and supporting artifacts such as Helm charts version, provider, and set
of pre-instantiation jobs. RAN Network Functions have sets of prerequisite configurations on the
underlying Kubernetes clusters. Those requirements are also defined in the Network Function
CSAR. The summary of features supported by the CSAR extension is:
n Latency Sensitivity
VMware, Inc. 59
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
CSAR files can be updated to reflect changes in CNF requirements or deployment models. CNF
developers can update the CSAR package directly within the TCA designer or leverage an external
CICD process to maintain and build newer versions of the CSAR package.
Deploy all containers using the TCA Direct access to the Kubernetes Some containers may not be available
Manager interface. cluster outside of the Kubernetes with a CSAR package to be deployed
cluster admin is not supported. through Telco Cloud Automation.
Define all CNF infrastructure When infrastructure requirements are CNF developers must work closely
requirements using the TOSCA CSAR bundled into the CSAR package, with CNF vendors to ensure that
extension. Telco Cloud Automation provides the infrastructure requirements are
placement assistance to locate a captured correctly in the CSAR file.
Kubernetes cluster that meets CNF
requirements.
If Telco Cloud Automation cannot
place the CNF workload due to the
lack of resources, it leverages the
node Operator to update an existing
cluster with the required hardware
and software based on the CSAR file
definition.
Store the TOSCA-compliant CSAR n GIT repository is ideal for Git and Git-flow are outside the scope
files in a GIT repository. centralized change control. of this reference architecture guide.
n CSAR package versioning,
traceability, and peer review are
built into GIT when a proper git-
flow is implemented and followed.
VMware, Inc. 60
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Develop a unit test framework to A well-defined unit test framework Unit test framework is outside the
validate each commit into the GIT catches potential syntax issues scope of this reference architecture
repository. associated with each commit. guide.
Leverage existing CI workflow to Modern CI pipeline fully integrates CI pipeline is outside the scope of this
maintain the builds and releases of with GIT and unit test framework. reference architecture guide.
CSAR Packages.
For more information about configuring a CSAR package, see the Telco Cloud Automation User
Guide.
Note VMware offers a certification program for CNF and VNF vendors to create certified and
validated CSARs packages.
The notification model provides a way for a consumer (CNF) to subscribe to status and event
updates from the O-Cloud environment.
The CNF can subscribe to notifications through a sidecar co-located in the same pod as the
DU Worker or through a local REST API. The CNF can query, register, and receive notifications
through a callback API specified during the CNF registration.
The initial instantiation of the Cloud API notification framework provides notification for PTP status
events. Rather than each RAN NF vendor incorporating multiple mechanisms to support multiple
timing implementations, this notification framework provides a REST API for the NF vendor to
subscribe and receive PTP synchronization events. The Cloud API Notification framework monitors
the PTP status (PTP Sync Status and PTP Lock Status) and delivers notifications through a
message bus to consumer applications.
If the consumer CNF (DU) requires Provides a way for the notification Requires integration between the
event notifications, deploy and (PTP status changes) to be CNF (DU) and the sidecar
integrate the sidecar with the CNF. communicated by the O-Cloud to applications.
event consumers.
If the consumer CNF (DU) requires Provides a mechanism for the None
event notifications, the O-Cloud API O-Cloud to monitor PTP and
CSAR must also be deployed. communicate to the sidecar.
Note For more information about the O-RAN notification deployment in Telco Cloud Platform
RAN, refer to the Telco Cloud Platform RAN Deployment Guide and contact your local VMware
representative.
VMware, Inc. 61
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Note The components of the operations management layer are not included in the Telco Cloud
Platform RAN release bundle. However, VMware Telco Cloud Infrastructure or VMware Telco
Cloud Platform 5G Edition typically has one or more of these components deployed from the
vRealize Operations Suite.
We recommend that you use the Integrated Load Balancer (ILB) on the three-node cluster so that
all log sources can address the cluster by its ILB. By using the ILB, you do not need to reconfigure
log sources with a new destination address in case of a scale-out or node failure. The ILB also
guarantees that vRealize Log Insight accepts all incoming ingestion traffic.
VMware, Inc. 62
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
The ILB address is required for users to connect to vRealize Log Insight using either the Web UI or
API. It is also required for clients to ingest logs using syslog or the Ingestion API. A vRealize Log
Insight cluster can scale out to 12 nodes: 1 primary and 11 worker nodes.
Note Multiple ingress IP addresses can be allocated to vRealize Log Insight. Each unique
entry can implement ingress tagging for all log messages. Ingress tagging provides a high-level
distinction between different elements of an environment such as RAN, Core, and so on.
VMware, Inc. 63
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
To accommodate all log data in the solution, size the compute resources and storage for the Log
Insight nodes correctly. By default, the vRealize Log Insight appliance uses the predefined values
for small configurations: 4 vCPUs, 8 GB virtual memory, and 530.5 GB disk space. vRealize Log
Insight uses 100 GB disk space to store raw data, index, metadata, and other information.
vRealize Log Insight supports the following alerts that trigger notifications about its health and the
monitored solutions:
n System Alerts: vRealize Log Insight generates notifications when an important system event
occurs. For example, when the disk space is almost exhausted and vRealize Log Insight must
start deleting or archiving old log files.
n Content Pack Alerts: Content packs contain default alerts that can be configured to send
notifications. These alerts are specific to the content pack and are deactivated by default.
n User-Defined Alerts: Administrators and users can define alerts based on the data ingested by
vRealize Log Insight.
Deploy vRealize Log Insight in a n Provides high availability. n You must deploy a minimum of
cluster configuration of three nodes n ILB: three medium nodes.
with an integrated load balancer: n You must size each node
n Prevents a single point of
n one primary node failure. identically.
n two worker nodes n Simplifies the vRealize Log n If the capacity of your vRealize
Insight deployment and Log Insight cluster must expand,
subsequent integration. identical capacity must be added
to each node.
n Simplifies the vRealize Log
Insight scale-out operations
reducing the need to
reconfigure existing logging
sources.
Deploy vRealize Log Insight nodes of Accommodates the number of If you configure vRealize Log Insight
at least medium size. expected syslog and vRealize Log to monitor additional syslog sources,
Insight Agent connections from the increase the size of the nodes.
following sources:
n Management and Compute
vCenter Servers
n Management and Compute ESXi
hosts
n NSX-T Components
n vRealize Operations components
n Telco Cloud Automation
n TKG Clusters
Using medium-size appliances
ensures that the storage space for
the vRealize Log Insight cluster is
sufficient for 7 days of data retention.
VMware, Inc. 64
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Enable alerting over SMTP. Administrators and operators can Requires access to an external SMTP
receive email alerts from vRealize Log server.
Insight.
Leverage fluent-bit on the Tanzu Provides a central logging All logging traffic is sent over the cell
Kubernetes clusters to forward syslog infrastructure for all Cell Sites. site uplinks, ensure available capacity.
Information to vRealize Log Insight.
The vRealize Operations deployment is a single instance of a 3-node analytics cluster that is
deployed in the management cluster along with a two-node remote collector group.
The analytics cluster of the vRealize Operations deployment contains the nodes that analyze and
store data from the monitored components. You deploy a configuration of the analytics cluster
that meets the requirements for monitoring the number of VMs.
VMware, Inc. 65
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Deploy a three-node vRealize Operations analytics cluster that consists of one primary node,
one replica node, and one data node to enable scale-out and high availability. This design uses
medium-size nodes for the analytics cluster and standard-size nodes for the remote collector
group. To collect the required number of metrics, add a virtual disk of 1 TB on each analytics
cluster node.
The remote collectors known as cloud-proxies collect data from the compute vCenter Servers
in the management cluster. The deployment of remote collectors at edge locations provides a
distributed way to collect information about Tanzu Kubernetes clusters deployed throughout the
RAN.
You can use the self-monitoring capability of vRealize Operations to receive alerts about
operational issues. vRealize Operations displays the following administrative alerts:
n Environment alert: Indicates that vRealize Operations stopped receiving data from one or
more resources. This alert might indicate a problem with system resources or network
infrastructure.
n Log Insight log event: Indicates that the infrastructure on which vRealize Operations is running
has low-level issues. You can also use the log events for root cause analysis.
n Custom dashboard: vRealize Operations shows super metrics for data center monitoring,
capacity trends, and a single pane of glass overview.
Deploy vRealize Operations as a n Provides the scale capacity All the nodes must be sized
cluster of three nodes: required for monitoring up to identically.
n one primary node 10,000 VMs.
Deploy two remote collector nodes. Removes the load from the analytics When configuring the monitoring of a
cluster from collecting application solution, you must assign a collector
metrics. group.
If required, deploy remote collectors Provides a distributed way to Requires additional collectors and the
or cloud-proxies at edge locations to collect information from the RAN correct allocation of remote collectors
distribute the collection of information environment. or cloud-proxies per cluster.
from Tanzu Kubernetes clusters.
Configure vRealize Operations to Provides operations management As the Cell Sites are added, more data
collect metrics from the compute infrastructure for all Cell Sites. nodes and remote collectors need to
vCenter Server. be added.
Deploy each node in the analytics Provides the scale required to monitor ESXi hosts in the management cluster
cluster as a medium-size appliance (at the RAN solution. must have physical CPUs with a
minimum). minimum of 8 cores per socket.
vRealize Operations uses a total of 24
vCPUs and 96 GB of memory in the
management cluster.
VMware, Inc. 66
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Add more medium-size nodes to the Ensures that the analytics cluster has n The capacity of the physical
analytics cluster if the number of VMs enough capacity to meet the VM ESXi hosts must be sufficient to
exceeds 10,000. object and metric growth. accommodate VMs that require 32
GB RAM without bridging NUMA
node boundaries.
n The number of nodes must not
exceed the number of ESXi hosts
in the management cluster minus
1. For example, if the management
cluster contains six ESXi hosts,
you can deploy up to five vRealize
Operations nodes in the analytics
cluster.
Deploy the standard-size remote Enables metric collection for the You must provide 4 vCPUs and 8 GB
collector or Cloud-Proxy virtual expected number of objects. memory in the management cluster or
appliances. targeted endpoint.
Add a virtual disk of 1 TB for each Provides enough storage for the You must add the 1 TB disk manually
analytics cluster node. expected number of objects. while the VM for the analytics node is
powered OFF.
Configure vRealize Operations for Enables administrators and operators vRealize Operations must have access
SMTP outbound alerts. to receive email alerts from vRealize to an external SMTP server.
Operations.
For more information about the vRealize Operations and vRealize Log Insight design at Regional
Data Center, see the Telco Cloud Platform 5G Edition Reference Architecture guide.
vRealize Network Insight is deployed as a cluster called the vRealize Network Insight Platform
Cluster. This cluster processes the collected data and presents it using a dashboard. vRealize
Network Insight also uses a Proxy node to collect data from the data sources, such as vCenter
Server and NSX Manager, and send the data to the Platform Cluster for processing.
VMware, Inc. 67
Telco Cloud Platform - RAN Reference Architecture Guide 2.2
Deploy a three-node vRealize Meets the availability and scalability The Management cluster must be
Network Insight Platform cluster. requirements of up to 10,000 VMs properly sized as each vRealize
and 2 million flows per day. Network Insight VM requires a 100%
CPU reservation.
Deploy vRealize Network Insight Large size is the minimum size None
Platform nodes of large size. supported to form a cluster.
Deploy at least a single large-sized A single Proxy node meets the vRealize Network Insight Proxy nodes
vRealize Network Insight Proxy node. requirements of a new deployment. are not high-available but are
As the solution grows, additional protected by vSphere HA.
Proxy nodes might be required.
VMware Telco Cloud Service Assurance can be deployed in a HA or non-HA model based on
the number of managed devices. For more information, see the Telco Cloud Service Assurance
Deployment Architecture.
The sizing of the Telco Cloud Service Assurance deployment is based on footprints that grow from
25,000 to 200,000 devices. For information about the sizing and deployment guidelines, see the
Telco Cloud Service Assurance Deployment Guide.
Deploy Telco Cloud Service Supports scaling from 25,000 to Requires additional worker node
Assurance in a HA model. 200,000 devices. resources
VMware, Inc. 68