Vmware Cloud Foundation On Vxrail Architecture Guide
Vmware Cloud Foundation On Vxrail Architecture Guide
Abstract
This guide introduces the architecture of the VMware Cloud Foundation (VCF) on
VxRail solution. It describes the different components within the solution and also
acts as an aid to selecting the configuration needed for your business
requirements.
July 2020
H17731.3
Revision history
Date Description
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software that is described in this publication requires an applicable software license.
Copyright © 2020 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, Dell EMC, and other trademarks are trademarks of Dell Technologies. or
its subsidiaries. Other trademarks may be trademarks of their respective owners. [12/3/2020] [Architecture Guide] [H17731.3]
VCF on VxRail provides the simplest path to the hybrid cloud through a fully integrated hybrid cloud
platform that leverages native VxRail hardware and software capabilities and other VxRail-unique
integrations (such as vCenter plugins and Dell EMC networking). These components work together to
deliver a new turnkey hybrid cloud user experience with full-stack integration. Full-stack integration
means you get both HCI infrastructure layer and cloud software stack in one complete automated life-
cycle turnkey experience.
By virtualizing all of your infrastructure, you can take advantage of what a fully virtualized infrastructure
can provide, such as resource utilization, workload and infrastructure configuration agility, and advanced
security. With SDDC software life-cycle automation provided by Cloud Foundation (and in particular
SDDC Manager which is a part of Cloud Foundation on top of VxRail), you can streamline the LCM
experience for the full SDDC software and hardware stack.
You no longer need to worry about performing updates and upgrades manually using multiple tools for all
of the SDDC SW and HW components of the stack. These processes are now streamlined using a
common management toolset in SDDC Manager in conjunction with VxRail Manager. You can begin to
leverage the data services benefits that a fully virtualized infrastructure can offer along with SDDC
infrastructure automated LCM. An example of data services is using software-defined networking features
from NSX like micro-segmentation, which before software-defined networking tools, was nearly
impossible to implement using physical networking tools.
Another important aspect is the introduction of a standardized architecture for how these SDDC
components are deployed together using Cloud Foundation, an integrated cloud software platform.
Having a standardized design incorporated as part of the platform provides you with a guarantee that
these components have been certified with each other and are backed by Dell Technologies. You can
then be assured that there is an automated and validated path forward to get from one known good state
to the next across the end-to-end stack.
Architecture Overview
From the VxRail clusters, you can organize separate pools of capacity into WLDs, each with its own set of
specified CPU, memory, and storage requirements to support various workloads types such as Horizon or
business-critical apps like Oracle databases. As new VxRail physical capacity is added by the SDDC
Manager, it is made available for consumption as part of a WLD.
More detail about each type of WLD is provided in the next section.
In the Management WLD cluster, vSphere runs with a dedicated vCenter server and a pair of PSCs in the
same SSO domain. This cluster, backed by vSAN storage, hosts the SDDC Manager and VxRail
Manager VMs, NSX-V, and vRealize Log Insight for Management domain logging. Other components
such as vRealize Operations and vRealize Automation are optional. If a Horizon WLD is deployed, the
management components will also be deployed in the Mgmt WLD. Because the management cluster
contains critical infrastructure, consider implementing a basic level of hardware redundancy for this
cluster. The management cluster must have a minimum of four hosts to provide vSAN FTT=1 during
maintenance operations.
While the deployment and configuration of the management cluster is fully automated, once it is running,
you manage it just like you would any other VxRail cluster using the vSphere HTML5 client.
• It establishes a common identity management system that can be linked between vCenters.
• It allows the SDDC Manager LCM process to life cycle all vCenter and PSC components in the
solution.
vCenter Design
3.2.1 VI WLD
The VI WLD can consist of one or more VxRail clusters. The VxRail cluster is the building block for the
VxRail VI WLD. The first cluster of each VI WLD must have four hosts, but subsequent clusters can start
with three hosts. The VI WLD can be either an NSX-V based WLD or an NSX-T based WLD. This can be
selected when adding the first cluster to the WLD. The vCenter and NSX-V or NSX-T Manager for each
VI WLD are deployed into the Mgmt WLD. For an NSX-V based VI WLD the controllers are deployed to
the first cluster in the VI WLD added by the SDDC Manager. Each new VI WLD requires an NSX-V
Manager to be deployed in the Mgmt WLD and the three controllers deployed into the first cluster of the
VI WLD.
For NSX-T based VI WLD, when the first cluster is added to the first VI WLD, the NSX-T Managers (3 in a
cluster) are deployed to the Mgmt WLD. Subsequent NSX-T based VI WLDs do not require additional
NSX-T mangers but each VI WLD VI is added as a compute manager to NSX-T.
For both NSX-T based VI WLD and NSV-V based VI WLD, the first cluster can be considered a compute-
and-edge cluster as it contains both NSX and compute components. NSX virtual routers can be deployed
to this first cluster. The second and subsequent clusters in a VI WLD can be considered compute-only
clusters as they do not need to host any NSX routing virtual machines.
The Horizon domain consumes one or more VI WLD, but requires additional Horizon desktop
management components to be deployed as part of the Horizon workload creation process. The Horizon
domain is decoupled from resource provisioning - one or more VI WLD must be created before deploying
a Horizon domain. There are several prerequisites that must be completed before deploying a Horizon
domain. They are documented in the Prerequisites for a Horizon Domain.
• Composer Server
• App Volumes
• User Environment Manager
• Unified Access Gateway
The Horizon domain is based on the Horizon reference architecture, which uses Pod Block architecture to
enable you to scale as your use cases grow. For more information about the architecture and number of
supported virtual machines, see the Horizon 7 Pod and Block section in the VMware Workspace ONE
and VMware Horizon 7 Enterprise Edition On-premises Reference Architecture document.
The PKS WLD deploys VMware Enterprise PKS which is built on upstream Kubernetes and delivered as
an integrated solution. The integration within VCF features automated and centralized life cycle
management and operations of the underlying WLD clusters. It includes integrated container networking
and network security with NSX-T, is easily scalable, and includes automated deployment and
configuration.
1. NSX-T VI WLD
2. NSX-T Edge VMs are deployed and configured.
3. IP/Hostnames for the PKS Management components
4. NSX-T segments created for PKS Management and Service networks
5. IP block for the Kubernetes Pods
6. IP block for the Kubernetes Nodes
7. Floating IP pool for the load balancers for each of Kubernetes clusters
8. CA-signed certificates for Operations Manager, Enterprise PKS control plane, and Harbor
Registry
9. Certificate for NSX-T Super User
10. Resource Pools created in the VI WLD vCenter that will be mapped as PKS availability zones
The following figure shows how one rack can be used to host two different WLDs, the Mgmt WLD and
one tenant WLD. Note that a tenant WLD can consist of one or more clusters, this will be discussed later.
1100 W or 2000 W or 2400 W 1100 W or 1600 W PSU 2000 W PSU 1100 W PSU
1600 W PSU PSU 20 capacity drives Up to 3 GPUs 10 GbE or 25
10 GbE or 25 10 GbE 10 GbE or 25 GbE 8 more capacity GbE support
GbE Optane and NVMe support drives
NVMe cache cache 10 GbE or 25 GbE
P580N
support Mixed-use SAS support
cache
1600 W or 2000 W or
2400W PSU
20 capacity drives
10 GbE or 25 GbE
NVMe cache support
Note: The Edge uplink deployment for both NSX-V and NSX-T based VI WLD is a manual process that
must be performed after the VI WLD has been completed.
VxRail Management Route based on the originating virtual port Active Standby
VXLAN VTEP (Only NSX-V) Route based on the originating virtual port Active Active
ESG Uplink 1 (Only NSX-V) Route based on the originating virtual port Active Unused
ESG Uplink 2 (Only NSX-V) Route based on the originating virtual port Unused Active
External Management Route based on Physical NIC Active Active Unused Unused
load
VXLAN VTEP (Only Route based on the Standby Standby Active Active
NSX-V) originating virtual port
ESG Uplink 1 (Only Route based on the Active Unused Unused Unused
NSX-V) originating virtual port
ESG Uplink 2 (Only Route based on the Unused Active Unused Unused
NSX-V) originating virtual port
Note: The NSX-V deployment uses vmnic2 and vmnic3 on the vDS for the VTEPs so VXLAN traffic is
mixed with vSAN and vMotion. As a day 2 operation, you can move this traffic to vmnic0/vmnic1, if
desired, to keep this traffic separate from vSAN.
The NSX Manager ensures security of the control plane communication of the NSX architecture. It
creates self-signed certificates for the nodes of the controller cluster and ESXi hosts that are allowed to
join the NSX domain. Each WLD has an NSX Manager as part of the VCF on VxRail solution.
The NSX vSwitch enables support for overlay networking with the use of the VXLAN protocol and
centralized network configuration. Overlay networking with NSX provides the following capabilities:
• Creation of a flexible logical Layer 2 (L2) overlay over existing IP networks on existing physical
infrastructure
VXLAN Encapsulation
Scaling beyond the 4094 VLAN limitation on traditional switches has been solved by leveraging a 24-bit
identifier, named VXLAN Network Identifier (VNI), which is associated to each L2 segment created in
logical space. This value is carried inside the VXLAN header and is normally associated to an IP subnet,
similarly to what traditionally happens with VLANs. Intra-IP subnet communication occurs between
devices that are connected to the same virtual network or logical switch.
VXLAN tunnel endpoints (VTEPs) are created within the vSphere distributed switch to which the ESXi
hosts that are prepared for NSX for vSphere are connected. VTEPs are responsible for encapsulating
VXLAN traffic as frames in UDP packets and for the corresponding decapsulation. VTEPs are essentially
VMkernel ports with IP addresses and are used both to exchange packets with other VTEPs and to join
IP multicast groups through Internet Group Membership Protocol (IGMP).
The data plane consists of kernel modules running on the hypervisor that provide high performance, low
overhead first-hop routing.
By default, NSX Manager, NSX Controllers, and Edge services gateways are automatically excluded from
DFW function. During deployment, VCF also adds the Management VMs to the DFW exclusion list.
• NSX-T Managers
• NSX-T Transport Nodes
• NSX-T Segments (Logical Switches)
• Central Control Plane (CCP)–The CCP is implemented on the NSX-T cluster of managers, the
cluster form factor provides both redundancy and scalability of resources. The CCP is logically
separated from all data plane traffic, meaning any failure in the control plane does not affect
existing data plane operations.
• Local Control Plane (LCP)–The LCP runs on transport nodes. It is next to the data plane it
controls and is connected to the CCP. The LCP programs the forwarding entries of the data
plane.
The transport nodes are the hosts running the local control plane daemons and forwarding engines
implementing the NSX-T data plane. The N-VDS is responsible for switching packets according to the
configuration of available network services.
A DR is essentially a router with logical interfaces (LIFs) connected to multiple subnets. It runs as a kernel
module and is distributed in hypervisors across all transport nodes, including Edge nodes. The DR
provides East−West routing capabilities for the NSX domain.
An SR, also referred to as a services component, is instantiated when a service is enabled that cannot be
distributed on a logical router. These services include connectivity to the external physical network or
North−South routing, stateful NAT, Edge firewall.
A gateway always has a DR. A gateway has SRs when it is a Tier-0 gateway, or when it is a Tier-1
gateway and has services configured such as NAT or DHCP.
• MTU 9000 for VXLAN traffic (for multi-site dual AZ ensure MTU across ISL)
• IGMP Snooping for VXLAN VLAN on the first hop switches
• IGMP querier is enabled on the connected router or Layer 3 switch.
• VLAN for VXLAN is created on the physical switches.
• DHCP is configured for VXLAN VLAN to assign the VTEPs IP.
• IP Helper on the switches if the DHCP server is in different L3 network
• Layer 3 license requirement for peering with ESGs
• BGP is configured for each router peering with an ESG.
• Two Uplink VLANs for ESGs in Mgmt WLD
• AVN subnets are routable to the Mgmt WLD management network.
• AVN networks are routable at the Core network to reach external services.
Figure 15 shows the NSX-V components that are deployed after the VCF bring-up process is completed.
Figure 16 shows the NSX-V components that are deployed after an NSX-V WLD domain has been added
and a second cluster has been added to the VI WLD. Note when a second cluster is added, only the
preceding Steps 10, 11 and 13 are performed by SDDC Manager.
Notes:
• Any VCF environments upgraded from 3.9 will not automatically get AVN deployed, the Log Insight
VMs remain on the Management network and the vRealize suite would remain on a VLAN backed
network, if migration to AVN is required it will need to be done manually. See VCF 3.9.1 release notes
for further details VCF 3.9.1 Release Notes.
• For VCF version 3.9.1 AVN deployment was mandatory, for VCF version 3.10 the AVN deployment is
optional, if AVN is not enabled during deployment the Log Insight VMs remain on the Management
network and the vRealize suite would remain on a VLAN backed network, if migration to AVN is
required it will need to be done manually with support from VMware.
During the deployment, anti-affinity rules are used to ensure the NSX Controllers, uDLR Control VMs and
the ESGs do not run on the same node simultaneously. This is critical to prevent impact to the network
services if one host fails in the management WLD. The following diagram illustrates how these
components are typically separated by the anti-affinity rules that are created and applied during the
management WLD deployment.
Note: No additional NSX-T Managers are needed when a second NSX-T based VI WLD is added. The
vCenter is added as a Compute Manager and the ESXi hosts are prepared for use for NSX-T.
Figure 24 shows the NSX-V and NSX-T components deployed in the MGMT VI WLD. It shows the VI
WLD with two NSX-T clusters added to the VI WLD.
• Overlay – Used for all Overlay traffic for the Host TEP communication
• VLAN – Used for VLAN backed segments, this includes the Edge VM communications.
When the first cluster is added to the first VI WLD, SDDC Manager creates the Overlay and VLAN
transport zones in NSX-T Manager. Two additional VLAN transport zones must be manually created on
Day 2 for the Edge VM uplink traffic to the physical network.
Note: When subsequent clusters are added to a WLD, or if a new WLD is created, all the nodes
participate in the same VLAN and Overlay Transport Zones. For each cluster the same VLAN or a
different VLAN can be used for the TEP traffic for the Overlay.
Note: Only the Overlay segment is created during the deployment of the NSX-T WLD. The other
segments must be created manually on Day 2.
During the VCF deployment of an NSX-T VI WLD when a new cluster is added to the VI WLD , a
transport profile is created with the settings in the preceding list. When the clusters are added to the NSX-
T VI WLD, the transport node profile is applied to the nodes in the cluster, creating the N-VDS, adding the
nodes to the transport zones, configuring the physical interfaces and creating and assigning an IP to a
TEP so hosts can communicate over the overlay network. Figure 26 shows a compute node with Logical
segments created that can use the N-VDS to communicate to VMs in the same transport zone.
Compute Node
Note: The Overlay and uplink segments used to connect the edge VM overlay and uplink interfaces are in
trunking mode as the Edge transport node NVDS will use VLAN tagging.
The NSX-T edge routing design is based on the VVD design located here Routing Design using NSX-T. A
Tier-0 gateway is deployed in Active/Active mode with ECMP enabled to provide redundancy and better
bandwidth utilization as both uplinks are utilized, two uplink VLANs are needed for North/South
connectivity for the Edge virtual machines in the Edge Node cluster. BGP is used to provide dynamic
routing between the physical environment and the virtual environment, eBGP is used between the Tier-0
Gateway and the physical TORs, an iBGP session is established between the T0 edge VMs SR
components.
• NSX-V or NSX-T ECMP Edge devices establish Layer 3 routing adjacency with the first upstream
Layer 3 device to provide equal cost routing for management and workload traffic.
• The investment you have today in your current physical network infrastructure
• The advantages and disadvantages for both Layer 2 and Layer 3 designs
The following section describes both designs and highlights the main advantages and disadvantages of
each design.
• VLANs carried throughout the fabric –increases the size of the broadcast domain beyond racks if
multiple racks are needed for the infrastructure and clusters span racks.
Advantages:
• VLANs can span racks which can be useful for VxRail system VLANs like vSAN/vMotion and
node discovery.
• Layer 2 design might be considered less complex to implement.
Disadvantages:
• L3 is terminated at the leaf, thus all the VLANs originating from ESXi hosts terminate on leaf.
Advantages:
• Vendor agnostic - Multiple network hardware vendors can be used in the design.
• Reduced VLAN span across racks, thus smaller broadcast domains.
• East−West for an NSX domain can be confined within a rack with intra-rack routing at the leaf.
• East−West across NSX domains or Cross-Rack are routed through the Spine.
• ESG peering is simplified by peering the WLDs with the leaf switches in the rack.
Disadvantages:
• The Layer 2 VLANs cannot span racks. Clusters that span racks will require a solution to allow
VxRail system traffic to span racks using hardware VTEPs.
• The Layer 3 configuration might be more complex to implement.
For more information about Dell Network solutions for VxRail, see the Dell EMC Networking Guides.
Note: For VCF versions earlier than 3.10, adding nodes to an additional vDS is not supported as this
causes an error when adding nodes to a cluster in SDDC manager.
The following physical host connectivity diagrams illustrate the different host connectivity options for NSX-
V and NSX-T based WLDs.
Note: For each cluster that is added to an NSX-T VI WLD, the user will have the option to select the two
pNICs if there are more than two pNICs available. This can provide NIC redundancy if the pNICs are
selected from two different NICs. Any subsequent nodes added to the cluster will use the same pNICs.
Note: Dual Region disaster recovery is not yet supported in VCF on VxRail version 3.10.01 .
1. Witness deployed at a third site using the same vSphere version used in the VCF on VxRail
release
2. All SC configurations must be balanced with the same number of hosts in AZ1 and AZ2.
Note: The VI WLD clusters can only be stretched if the Mgmt WLD cluster is first stretched.
The following network requirements apply for the Mgmt WLD and the VI WLD clusters that need to be
stretched across the AZs:
Note: The VXLAN VLAN ID is the same at each site whether using stretched Layer 2 or Layer 3 routed.
Note: The vSAN traffic can only be extended using Layer 3 networks between sites. If only Layer 2
stretched networks are available between sites with no capability to extend with Layer 3 routed
networks, an RPQ should be submitted.
Increasing the vSAN traffic MTU to improve performance requires the MTU for the witness traffic to the
witness site to also use an MTU of 9000. This might cause an issue if the routed traffic needs to pass
through firewalls or use VPNs for site-to-site connectivity. Witness traffic separation is one option to work
around this issue, but is not yet officially supported for VCF on VxRail.
Note: Witness Traffic Separation (WTS) is not officially supported but if there is a requirement to use
WTS, the configuration can be supported through the RPQ process. The VCF automation cannot be
used for the stretched cluster configuration. It must be done manually using a standard VxRail SolVe
procedure with some additional guidance.
1. The vCenters for each VCF instance that participates in the same SSO Domain are connected
using Enhanced Link Mode (ELM).
2. Maximum number of WLDs is reduced by half.
o Total of 15 WLDs shared across the 2 VCF instances
o This limitation is due to the maximum number of vCenters that can be connected with
ELM.
3. PSC replication configuration should be in a closed-loop design.
4. Recommended Site1 PSC1 > Site1 PSC2 > Site2 PSC1 > Site2 PSC2 > Site1 PSC1
5. Manual configuration is required to point Site2 PSC2 back to Site1 PSC.
1. Keep all VCF instances in the same SSO at the same VCF on VxRail version.
o Upgrades should be performed on each VCF on VxRail system in sequential order.
o Ensure that all VCF instances in the same SSO are at N or N-1 versions.
o Do not upgrade a VCF instance that would result in having a participating VCF instance
at an N-2 version.
2. The compatibility rules in VCF LCM do not extend to external VCF instances.
There are no safeguards that would prevent a user from upgrading one VCF instance that would break
compatibility between the PSCs participating in the shared SSO domain.
Note: The Mgmt WLD must be upgraded first. Upgrades cannot be applied to VxRail VI WLD before they
are applied to the Mgmt WLD.
Note: The vRealize Suite is deployed to a VLAN backed network. If these management components are
going to be protected in a multi-site DR configuration, you must migrate the networking to NSX logical
switches. This might also be desirable for a multi-site with stretched cluster.