Redp 5233

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Front cover

Lenovo and Midokura


OpenStack PoC
Software-Defined Everything

Describes a validated, scale-out Provides business and technical


Proof of Concept (PoC) reasons for Software Defined
implementation of OpenStack Environments

Explains advantages of Lenovo Describes the configurations for


Hardware and MidoNet’s building an agile cloud with a
OpenStack Neutron plugin distributed architecture

Krzysztof (Chris) Janiszewski


Michael Lea
Cynthia Thomas
Susan Wu
Abstract

This document outlines a Software Defined Everything infrastructure that virtualizes compute,
network, and storage resources and delivers it as a service. Rather than by the hardware
components of the infrastructure, the management and control of the compute, network, and
storage infrastructure are automated by intelligent software that is running on the Lenovo x86
platform.

This Proof of Concept (PoC) focuses on achieving high availability and scale-out capabilities
for an OpenStack cloud that uses Lenovo hardware and Midokura's OpenStack Neutron
plugin while helping reduce operating expense and capital expense. Midokura provides an
enterprise version of the open source MidoNet software. MidoNet achieves network
virtualization through GRE or VXLAN overlays with a completely distributed architecture.

This document shows how to integrate OpenStack, MidoNet, and Ceph, while using Lenovo
ThinkServer systems with Lenovo Networking switches managed by xCAT.

This paper is intended for customers and partners looking for a reference implementation of
OpenStack.

At Lenovo Press, we bring together experts to produce technical publications around topics of
importance to you, providing information and best practices for using Lenovo products and
solutions to solve IT challenges.

For more information about our most recent publications, see this website:
http://lenovopress.com

Contents

Executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Business objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Lenovo server environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Cloud installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Network switch solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Virtual machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lenovo physical networking design (leaf/spine). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
OpenStack environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Automating OpenStack deployment and hardware management . . . . . . . . . . . . . . . . . . . . 10
Software-Defined Storage: Ceph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Software-Defined Networking: MidoNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Operational tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Professional services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
About the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Lenovo and Midokura OpenStack PoC: Software Defined Everything


Executive summary
Business are being challenged to react faster to growing customer technology demands by
creating Infrastructure-as-a-Service (Iaas). To achieve the stated business goals, companies
are using OpenStack to integrate so-called “Software-Defined Everything”. With the use of
Software-Defined Everything, data centers can easily scale to swiftly deploy and grow to meet
user demands.

To achieve stated business goals, companies are using OpenStack to integrate so-called
“Software Defined Everything” resources and environments. OpenStack is open source
software that enables deployment and management of cloud infrastructure. The OpenStack
project is not a single piece of software but an umbrella that covers multiple software projects
to manage processing, storage, and networking resources. The key benefit of OpenStack is
the orchestration of the data center to allow for “software defined everything” to reduce cost
and complexity, increase speed of application deployment, and help with security assurance.

One of the key challenges around OpenStack deployments is networking. Because


OpenStack is a highly virtualized environment, it requires a virtualized approach to provide
agility and seamless end-to-end connectivity. To solve the networking issues within
OpenStack, the presented solution uses Midokura Enterprise Midonet (MEM). MEM offers a
distributed architecture that is built to scale.

By using a network virtualization overlay approach, MidoNet provides Layer 2-4 network
services, including routing, load-balancing, and NAT at the local hypervisor. MidoNet also ties
into OpenStack provisioning tools to provide seamless networking integrations.

Although OpenStack is a software-based solution, it still requires physical hardware to


operate while providing compute, storage, and networking infrastructure. In fact, the use of
proper hardware is critical to achieving a successful OpenStack deployment. Proper selection
of hardware helps ensure that reliability and performance metrics are met and reduces capital
expense (CapEx) and operating expense (OpEx) around the solution.

The Lenovo® server, storage, and networking offerings have many clear advantages when it
comes to the following key areas of physical infrastructure:
򐂰 Servers
Lenovo offers high performance Intel based servers to power the virtual machines.
򐂰 Storage
Lenovo can mix a solid-state drive’s (SSD) spinning disks with Lenovo’s AnyBay
technology that offers support for mixing 2.5-inch and 3.5-inch drives so that customers
can achieve the right mix between cost and performance.
򐂰 Networking
Lenovo offers high-performance, low-cost 10 Gb Ethernet and 40 Gb Ethernet networking
solutions to provide connectivity for storage and servers.

© Copyright Lenovo 2015. All rights reserved. 3


Business objectives
The idea of an ephemeral virtual machine (VM) is gaining traction in the enterprise for
application provisioning and decommissioning. Unlike stand-alone offerings that are provided
by public cloud providers, this type of compute service can be all-inclusive: compute, network,
and storage services are abstracted into pooled services and users are presented with
a la carte choices. Moving to this model provides organizations with an elegantly metered,
monitored, and managed style of computing while offering complete isolation and automated
application-level load balancing.

Lenovo and Midokura helped an organization implement this model. The old workflow
required three teams working simultaneously and processes being ping-ponged across the
three teams. The post-OpenStack workflow provides cleaner hand-offs and removes the
redundant tasks and rework. The streamlined workflow that was provided by implementing
OpenStack was the key to providing the operational efficiency and agility this organization is
looking for.

Figure 1 shows the old workflow the organization that was used and the new workflow that
uses OpenStack.

Current Workflow

Request Development Work plan and Validate / Review Validate / Review


Development VM (s) cookbook creation Health check URLs Health check URLs

Build Dev VM(s) Deploy OS Build Staging Deploy OS Build Sandbox Deploy Troubleshoot / Build and Troubleshoot / App
VM(s) VM(s) App to Sandbox Validate ACLs Deploy to Prod. Validate ACLs Deployment
Complete

Engage Engage Engage Engage Engage


Team C if Team C if Team C if Team C if Team C if
issues arise issues arise issues arise issues arise issues arise

Post On-Premise OpenStack Cloud Workflow


Request Work Plan and
Development VM (s) Development
Cookbook Creation

Deploy App to Validate Review Build and Deploy Validate / Review Health App deployment
Sandbox Health check URLs to Production check URLs complete

Approval

Legend | Team A Team B Team C

Figure 1 Comparing the workflow

The Lenovo-Midokura design offers the following technical and business benefits:
򐂰 Rapid deployments and scale up of new applications
򐂰 Reduced management cost and reduced complexity
򐂰 Enterprise class machines that are ideal for cloud-based environments
򐂰 Management tools that can support the management of thousands of physical servers
򐂰 Ability to scale to thousands of VMs per cloud administrator
򐂰 Reduced cost per VM
򐂰 Advanced and agile networking that uses Network Virtualization Overlays
򐂰 Tenant isolation over shared infrastructure

4 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


򐂰 Reduced Networking Hardware costs with the use of Lenovo high-performance Ethernet
switches
򐂰 Simplified underlying Network infrastructure that uses open standards L3 routing protocols
򐂰 Improved IT productivity with reduced time to deploy resources

Lenovo server environment


A Proof-of-Concept Software-Defined Environment was created to measure and validate the
capabilities of a highly available and highly scalable OpenStack deployment with
Software-Defined Networking and Software-Defined Storage. The hardware management
was accomplished through open source xCAT and Confluent Software.

The software and hardware configuration that was used in this paper is described next.

Cloud installation
The Cloud installation included the following components:
򐂰 Operating system: Red Hat Enterprise Linux 7.1
򐂰 OpenStack: Red Hat Enterprise Linux OpenStack Platform 6.0 (Juno)
򐂰 SDN: Midokura Enterprise MidoNet 1.8.5
򐂰 SDS: Ceph (Giant) – 0.87.1
򐂰 Hardware Management: xCAT 2.9.1 with Confluent 1.0

Hardware
The following hardware was used:
򐂰 Four Lenovo ThinkServer® RD550 controller nodes:
– CPU: 2x Intel Xeon E5-2620 v3
– Memory: 4x 16 GB 2Rx4 PC4-17000R (64 GB)
– Media:
• 4x 4 TB HDD, 7200 RPM (RAID-10 Virtualization)
• 2x 32 GB SD Cards (OS)
– RAID: ThinkServer RAID 720IX Adapter with 2 GB supercapacitor upgrade
– Network:
• 2x Emulex CNA OCe14102-UX 10 Gb Dual port (four ports total) (data)
• Mezzanine Quad RJ45 1 Gb Port (management)
򐂰 Eight Lenovo ThinkServer RD650 Ceph OSD nodes:
– CPU: Intel Xeon E5-2620 v3
– Memory: 4x 16 GB 2Rx4 PC4-17000R (64 GB)
– Media: 2x 200 GB 12 Gb SAS SSD 3.5-inch (Journal)
– 8x 6 TB HDD, 7200 RPM, 3.5-inch, 6 Gb SAS, hot swap (OSD)
– 2x 32GB SD Cards (OS)
– RAID: ThinkServer RAID 720IX Adapter with 2 GB supercapacitor upgrade

5
– Network:
• Emulex CNA OCe14102-UX 10 Gb Dual port (data)
• Mezzanine Quad RJ45 1 Gb Port (management)
򐂰 16 Lenovo ThinkServer RD550 compute nodes:
– CPU: 2x Intel Xeon E5-2650 v3
– Memory: 24x 16 GB 2Rx4 PC4-17000R (384 GB)
– Media: 2x 32 GB SD Cards (OS)
– Network:
• 2x Emulex CNA OCe14102-UX 10 Gb Dual port (data)
• Mezzanine Quad RJ45 1 Gb Port (management)

Network switch solution


The following network solution was used:
򐂰 10 GbE: 4x Lenovo RackSwitch™ G8264
򐂰 1 GbE: 1x Lenovo RackSwitch G8052

One of the goals for this environment was to separate management services for better
manageability and easy migration to alternative hosts. By using this configuration, the
environment was highly available and a potential disaster recovery process can be handled in
much more efficient fashion.

Also, capacity utilization metering is easier to accomplish. To achieve sufficient isolation,


management services were contained in a VM that was running under a KVM hypervisor and
managed by xCAT. These management VMs were customized with the minimum resource
overhead. The selected software platform was RHEL 7.1 with latest OpenStack Juno and
Ceph Giant enhancements. All the redundant components ran in active/active mode.

Virtual machines
The following VMs were used:
򐂰 Four OpenStack Controllers
򐂰 Four OpenStack Databases (MariaDB with Galera) (3x active / 1x passive)
򐂰 Four Network State Databases (MidoNet) (3x active / 1x passive)
򐂰 Four HA Proxies
򐂰 Three Ceph monitor nodes
򐂰 xCAT hardware and service VM manager

6 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


Figure 2 shows the Proof of Concept environment.

1GbE mgmt 20GbE data 10GbE bgp

Physical Node Virtual Machine

Switch 1Gb - mgmt

VLAG
Switch 10Gb1 Switch 10Gb2

RD550s - Controllers

MidoNet GW1 MidoNet GW2 MidoNet GW3 MidoNet GW4

NSD1 NSD2 NSD3 NSD4

xCAT CEPHMon1 CEPHMon2 CEPHMon3

HAProxy1 HAProxy2 HAProxy3 HAProxy4

MariaDB1 MariaDB2 MariaDB3 MariaDB4

OpenStkCont1 OpenStkCont2 OpenStkCont3 OpenStkCont4

RD650s - Storage

CEPH1 CEPH2 CEPH3 CEPH4 CEPH5 CEPH6 CEPH7 CEPH8

RD550 - Compute

CPT1 CPT2 CPT3 CPT4 CPT5 CPT6 CPT7 CPT8

CPT9 CPT10 CPT11 CPT12 CPT13 CPT14 CPT15 CPT16

Figure 2 OpenStack PoC environment

Lenovo physical networking design (leaf/spine)


OpenStack deployments depend on a solid physical network infrastructure that can provide
consistent low-latency switching and delivery of data and storage traffic. To meet this need, a
leaf/spine (Clos Network) design was used. By using a leaf/spine design, the infrastructure
can provide massive scale to support over 15,872 servers.

Lenovo 10 GbE and 40 GbE switches were selected for the design because they provide a
reliable, scalable, cost-effective, easy-to-configure, and flexible solution. When a leaf/spine
design is used, there is no need for expensive proprietary switching infrastructure because
the switches need to provide only layer 2 and layer 3 network services.

7
There is also no need for a large chassis switch because the Clos network can scale out to
thousands of servers that use fixed-form, one or two rack unit switches.

In this design, all servers connect to the leaf nodes and the spine nodes provide
interconnects between all of the leaf nodes. Such a design is fully redundant and can survive
the loss of multiple Spine nodes. To facilitate connectivity across the fabric, a Layer 3 routing
protocol was used that offered the benefit of load balancing traffic, redundancy, and increased
bandwidth within the OpenStack environment. For the routing protocol, Open Shortest Path
First (OSPF) was selected because it is an open standard and supported by most switching
equipment.

The use of Virtual Link Aggregation Groups (vLAG), which is a Lenovo Switch feature that
allows for multi-chassis link aggregation, facilitates active-active uplinks of access switches
for server connections. Servers are connected to the vLAG switch pair with the use of Link
Aggregation Control Protocol (LACP). The use of vLAG allows for increased bandwidth to
each server and more network resiliency.

Because MidoNet provides a network overlay that uses VXLAN, there is no need for large
Layer 2 networks. Removing large Layer 2 networks removes the need for large core switches
and the inherent issues of large broadcast domains on physical networks. Also, compute
nodes need only IP connectivity between each other.

Midonet handles tenant isolation by using VXLAN headers. To further enhance network
performance, the design uses Emulex network adapters with hardware VXLAN-offload
capabilities.

The Lenovo-Midokura solution is cost effective and provides a high-speed interconnect


solution and can be modified depending on the customer’s bandwidth requirements. The
fabric that connects the leaf and spine can be modified by using 10 GbE or 40 GbE, which
results in cost savings. It is also possible to use only two spine nodes, but the four post design
increases reliability and provides more bandwidth.

8 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


Figure 3 shows the Leaf/Spine vLAG network topology.

Lenovo G8332
Spine L2 / L3 Switches
32 Ports of 40G or
128 Ports of 10G
L3 ECMP
OSPF Lenovo G8264
L2 / L3 Switches
Leaf 48 Ports 10G
4 Ports 40G
vLAG vLAG vLAG
Redundant Routed
VXLAN Interfaces Providing
Providing LACP Port BGP Peering between
LACP Port Connection Channel LACP Port the Leaf and midonet
Channel to Multiple Channel
Tenants
Emulex Emulex Emulex
VXLAN VXLAN VXLAN
Offload Offload Offload
Compute Controller VXLAN Gateway
CEPH Storage Lenovo RD550 Servers Lenovo RD650 Servers
Dual CPU 2650 – 20 Core Dual CPU 2650 – 20 Core Lenovo RD550
Lenovo RD650 Servers 2.4 TB of RAW Storage 16 TB of RAW Storage Dual CPU 2650 – 20
Single CPU 2620 – 6 Core 384GB RAM 64GB RAM Cores, 16 TB of RAW
48 TB of RAW Storage Storage, 64GB RAM
64GB RAM

Storage 40 Gb
10 Gb
VXLAN

Figure 3 OpenStack network topology that uses Lenovo Leaf/Spine vLAG

OpenStack environment
Red Hat Enterprise Linux OpenStack Platform 6 was selected for this project because of
enterprise class support that the vendor provides to meet customer demands. However,
instead of the use of Red Hat OpenStack installation mechanisms (Foreman), the solution
was implemented by using a manual process for better customization while automating it by
using the xCAT tool. In doing so, the solution benefits from all the premium features of Red
Hat OpenStack solutions, but has more control over each component that is installed and
handled by the system.

To prove the scalability factor of OpenStack, four redundant and active VMs were created to
handle the following OpenStack Management Services:
򐂰 Keystone
򐂰 Glance
򐂰 Cinder
򐂰 Nova
򐂰 Horizon
򐂰 Neutron with the MidoNet plugin
򐂰 Heat
򐂰 Ceilometer

For the database engine and message broker, four instances of MariaDB with Galera were
clustered and RabbitMQ was selected to meet scalability and redundancy needs.

9
Each Service VM was placed on separate hardware with ability to migrate over to another
KVM host if there was a hardware failure. The load balancing between management services
was accomplished with help of four redundant HAproxy VMs with the Keepalive Virtual IP
implemented to create single point of entry for the user.

For the MidoNet Network State Database (NSD) redundancy and to maintain consistency
with the rest of the environment, four instances of the Apache ZooKeeper/Cassandra
databases were created. For reference, it is recommended to use an odd number of
ZooKeeper/Cassandra nodes in the environment for quorum.

To avoid a split brain issue, the database systems were placed in an odd number of active
nodes and remaining node in a passive state.

Memcached daemon was used to address the known nova-consoleauth service scale
limitation and handling tokens with multiple users attempting to access VNC services with
Horizon.

Extensive tests of multiple failing components were performed; entire nodes and even the
entire cluster was brought down to verify the HA, which confirmed that disaster recovery can
be accomplished in relatively quick fashion.

This OpenStack solution was built fully redundant with no single point of failure and ready to
manage large amounts of compute resources. The manual installation approach with xCAT
automation allows for rapid and nondisruptive scaling deployment in the environment.

The separation of the management services in the VMs provides the ability to better monitor
and capacity-plan the infrastructure and easily move resources on demand to dedicated
hardware.

This OpenStack environment meets all the demands of production-grade, highly scalable,
swiftly deployable private clouds.

Automating OpenStack deployment and hardware management


To better manage the cloud infrastructure from a hardware and software perspective, the
open source project - xCAT1 was used with the addition of Confluent.

xCAT offers complete management for HPC clusters, render farms, Grids, web farms, online
gaming infrastructure, clouds, data centers, and complex infrastructure setups. It is agile,
extensible, and based on years of system administration best practices and experience. It is a
perfect fit for custom OpenStack deployments, including this reference architecture. It also
allows for bare-metal deployment and handles post-operating system installation, automation,
and hardware and VM monitoring mechanisms.

xCAT manages infrastructure by setting up and interacting with IPMI/BMC component on the
hardware level. It also uses Serial-over-LAN for each machine to access consoles without the
need of a functional network layer.

1
For more information, see this website:
http://sourceforge.net/p/xcat/wiki/Main_Page

10 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


For the Service VM layer of the solution, xCAT connects to the virsh interface and SOL, so
that managing VM infrastructure is as easy as managing hardware. Moreover, xCAT can read
sensor information and gather inventories directly from the hardware, which allows identifying
hardware failures quickly and easily.

The ability to push firmware updates in an automated fashion by using a built-in update tool
helps maintain hardware features and fixes. These features make the management of
hardware and software much simpler by creating a single stop shop approach for any
management tasks.

Ultimately, xCAT was set up to manage the following tasks:


򐂰 Customize and deploy operating system images to all required type of nodes (Ceph,
Service VM Controller, Compute, OpenStack, MariaDB, Cassandra/ZooKeeper, and
HAProxy)
򐂰 Customize and deploy postinstallation scripts that define the software infrastructure
򐂰 Identify hardware issues with hardware monitoring
򐂰 Identify software issues with a parallel shell mechanism
򐂰 Update firmware for all hardware components
򐂰 Provide Serial-Over-LAN connectivity for bare-metal operating system and VM operating
system
򐂰 Automate expansion of the cloud or node-replacement
򐂰 Provide DHCP/DNS/NAT/FTP/HTTP services to the infrastructure nodes.
򐂰 Provide local package repository (rpm) for all required software, including RHEL 7.1,
Ceph, MidoNet, and Epel
򐂰 Provide a simple Web User Interface (Confluent) for quick overview of hardware health
and SOL console access

11
Software-Defined Storage: Ceph
Cloud and enterprise organizations’ data needs grow exponentially and the classic enterprise
storage solutions do not suffice to meet the demand in a cost-effective manner. Moreover, the
refresh cycle of the legacy storage hardware lags behind x86 commodity hardware. The
viable answer to this problem is the emerging Software-Defined Storage (SDS) approach.
One of the leading SDS solutions, Ceph provides scale-out software that runs on commodity
hardware with the latest performance hardware and the ability to handle exabytes of storage.
Ceph is highly reliable, self-healing, easy to manage, and open source.

The POC environment uses eight dedicated Ceph nodes with over 300 TB of raw storage.
Each storage node is populated with 8x 6TB HDD for OSD and 2x 200 GB supporting SSDs
for journaling. Ceph can be configured with spindle drives only; however, because of
journaling devices performing random reads and writes, it is recommended to use SSDs to
decrease access time and read latency while accelerating throughput. Performance tests on
configurations with SSDs enabled and disabled showed an increase of IOPs by more than
50% with the SSD enabled for journaling.

To save disk cycles from operating system activities, RHEL7.1 was loaded to dual on-board
SD cards (Lenovo ThinkSErver option) to all nodes, including Ceph OSD nodes. Dual-card,
USB3-based reader with class 10 SD cards allowed for enough local storage and speed to
load the operating system and all necessary components without sacrificing the performance.
Software RAID level 1 (mirroring) was used for local redundancy.

Ceph storage availability to compute hosts depends on the Ethernet network; therefore,
ensuring maximum throughput and minimum latency must be established on the internal,
underlying network infrastructure. For best results, dual 10GbE Emulex links were aggregated
by using the OVS LACP protocol with balance-tcp hashing algorithm. Lenovo’s vLAG
functionality on the top-of-rack (TOR) switches allows for full 20Gb connectivity between the
Ceph nodes for storage rebalancing.

All the compute nodes and OpenStack Controller nodes used Linux bond mode 4 (LACP).
The aggregated links were VLAN-trunked for client and the cluster network access. Quick
performance tests showed Ceph’s ability to use aggregated links to the full extent, especially
with read operations to multiple hosts.

12 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


Figure 4 shows optimal Journal-to-OSD disk ratio for Ceph deployment.

CEPH 1 CEPH 2 CEPH 3 CEPH 4

Journal - SSD Journal - SSD Journal - SSD Journal - SSD

Journal - SSD Journal - SSD Journal - SSD Journal - SSD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

CEPH 5 CEPH 6 CEPH7 CEPH 8

Journal - SSD Journal - SSD Journal - SSD Journal - SSD

Journal - SSD Journal - SSD Journal - SSD Journal - SSD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

OSD - HDD OSD - HDD OSD - HDD OSD - HDD

Figure 4 Ceph reference deployment

To protect data from any potential hardware failures, the replication ratio of 3:1 was
configured. A significant number of placement groups were created to safely spread the load
between all the nodes in different failure domains.

13
Ceph global configuration is shown in Figure 5

[global]
fsid = cce8c4ea-2efd-408f-845e-87707d26b99a
mon_initial_members = cephmon1, cephmon2, cephmon3
mon_host = 192.168.0.20,192.168.0.21,192.168.0.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

osd_pool_default_size = 3
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096
public_network = 192.168.0.0/24
cluster_network = 192.168.1.0/24

[client]
rbd cache = true
Figure 5 Ceph reference deployment

Ceph storage was used in multiple OpenStack Services, including Glance for storing images,
Cinder for Block Storage usage, and Volume creation and Nova for creating VMs that are
directly on the Ceph volumes.

Software-Defined Networking: MidoNet


MidoNet is an open source software solution that enables agile cloud networking via Network
Virtualization Overlays (NVO). As a software play, MidoNet enables the DevOps and CI
movement by providing network agility through its distributed architecture. When paired with
OpenStack as a Neutron plugin, MidoNet allows tenants to create logical topologies via virtual
routers, networks, security groups, NAT, and load balancing, all of which are created
dynamically and implemented with tenant isolation over shared infrastructure.

MidoNet provides the following networking functions:


򐂰 Fully distributed architecture with no single points of failure
򐂰 Virtual L2 distributed isolation and switching with none of the limitations of conventional
VLANs
򐂰 Virtual L3 distributed routing
򐂰 Distributed Load Balancing and Firewall services
򐂰 Stateful and stateless NAT
򐂰 Access Control Lists (ACLs)
򐂰 RESTful API
򐂰 Full Tenant isolation
򐂰 Monitoring of networking services
򐂰 VXLAN and GRE support: Tunnel zones and Gateways
򐂰 Zero-delay NAT connection tracking

14 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


MidoNet features a Neutron plugin for OpenStack. MidoNet agents run at the edge of the
network on compute and gateway hosts. These datapath hosts (where the MidoNet agents
are installed), require only IP connectivity between them and must permit VXLAN or GRE
tunnels to pass VM data traffic (maximum transmission unit [MTU] considerations).

Configuration management is provided via a RESTful API server. The API server can typically
be co-located with the neutron-server on OpenStack controllers. The API is stateless and can
be accessed via the MidoNet CLI client or the MidoNet Manager GUI.

Logical topologies and virtual networks devices that are created via the API are stored in the
Network State Database (NSDB). The NSDB consists of ZooKeeper and Cassandra for
logical topology storage. These services can be co-located and deployed in quorum for
resiliency.

For more information about the MidoNet Network Models, see the “Overview” blogs that are
available at this website:
http://blog.midonet.org

Figure 6 shows the MidoNet Reference Architecture.

OpenStack Controller Services


Horizon & MidoNet Manager
Neutron & API servers
NSDB 1 NSDB 2 NSDB 3
ZK/Cass ZK/Cass ZK/Cass

Management Network 192.168.0.0/24

Compute Node 2 Compute Node X


Compute Node 1
Compute Node 1
MidoNet Gateway 1 MidoNet Gateway Y

Datapath Network 172.16.0.0/24


Upstream Connectivity
Upstream Connectivity

10.0.Y.0/30
10.0.1.0/30

External Networks
& Upstream BGP Routers

Figure 6 MidoNet Reference Architecture

MidoNet achieves L2 - L4 network services in a single virtual hop at the edge, as traffic enters
the OpenStack cloud via the gateway nodes or VMs on compute hosts. There is no reliance
on a particular service appliance nor service node for a particular network function, which
removes bottlenecks in the network and allows the ability to scale. This architecture is a great
advantage for production-ready clouds over alternative solutions.

15
Table 1 shows a comparison between the MidoNet and Open vSwitch Neutron plugin.

Table 1 MidoNet and OVS Neutron plugin comparison


Features MidoNet OVS

Open Source Yes Yes

Hypervisors Supported KVM, ESXi, Xen, Hyper-V (Planned) KVM, Xen

Containers Docker Docker

Orchestration Tools OpenStack, oVirt, RHEV, Docker, Custom, OpenStack, oVirt, openQRM, openNebula
vSphere, Mesos (Planned)

L2 BUM traffic Yes Default: Send bcast to every host even if


they do not use the corresponding
network. Send to partial-mesh over
unicast tunnels requires enabling extra
l2population mechanism driver.

Distributed Layer 3 Gateway Scales to 100s, no limitations when Default deployment: Intrinsic architectural
enabled issue with SPOF Neutron Network Node
for routing and higher layer network
services. Does not scale well. Early stage
DVR requires installing an extra agent
(L3-agent) on compute hosts and still
relies on network node for non-distributed
SNAT. Currently, DVR cannot be
combined with L3HA/VRRP.

SNAT Yes Not distributed, requires iptables


(poor scale)

VLAN Gateway Yes Yes

VXLAN Gateway Yes L3 HA: Requires keepalived, which uses


VRRP internally (active-standby
implications);
DVR: Requires external connectivity on
each host (security implications)

HW VTEP L2 Gateway Yes Yes

Distributed Layer 4 Load Yes Relies on another driver (HAProxy)


Balancer

Supports spanning multiple Yes No


environments

GUI-based configuration Yes No

GUI-based monitoring Yes No

GUI-based flow tracing Yes No

Pricing OSS: Free. Free: No Support Option.


MEM: $1899 USD per host (any number of
sockets), including 24x7 support standard.

16 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


In the proof of concept lab, well-capable servers were used and thus some MidoNet
components were co-located. On the four OpenStack Controller nodes, the MidoNet Agents
were installed on bare-metal operating systems to provide Gateway node functionality by
terminating VXLAN tunnels from the OpenStack environment for external access via the
Border Gateway Protocol (BGP) and Equal-Cost Multi-Path routing (ECMP).

Next, a Service VM on each of the four Controller nodes was created for the Network State
Database (consisting of ZooKeeper and Cassandra). These projects require deployment in
quorum (3, 5, 7,…) to sustain themselves if there are N failures. In this POC, the failure
acceptance is equivalent to that achieved by three ZooKeeper/Cassandra nodes in a cluster.

Each of the OpenStack Controller Service VMs was created to serve the main OpenStack
controller functions. Within these Service VMs, the MidoNet API (stateless) server and
MidoNet Manager files (web files to serve up the client-side application) were installed.
MidoNet Manager is part of the Midokura Enterprise MidoNet (MEM) subscription (bundled
with support) and provides a GUI for configuring and maintaining virtual networks in an
OpenStack + MidoNet environment.

Other network-related packages that were installed on the OpenStack Controller Service VMs
include the neutron-server and metadata-agent. Because of the metadata-agent proxy’s
dependency on DHCP namespaces, the dhcp-agent was also installed on the OpenStack
Controller Service VM despite MidoNet's distributed DHCP service. These specific services
were load-balanced by using HAProxy.

Finally, the MidoNet agent is installed on the Compute nodes for providing VMs with virtual
networking services. Because the MidoNet agent uses local compute power to make all L2-
L4 networking decisions, MidoNet provides the ability to scale. As the number of Compute
nodes grow, so does the networking compute power.

Configurations
The Gateway nodes provide external connectivity for the OpenStack + MidoNet cloud. BGP
was implemented between the Gateway nodes and the Lenovo Top of Rack switches for its
dynamic routing capabilities.

To exhibit fast failover, the BGP timers were shortened. These settings can easily be adjusted
based on the needs of the users. In this lab, the parameters that are shown in Figure 7 were
modified in the /etc/midolman/midolman.conf file.

# bgpd
bgp_connect_retry=10
bgp_holdtime=15
bgp_keepalive=5
Figure 7 BGP parameters in midolman.conf on MidoNet Gateway Nodes

These parameters provide a maximum of 15 seconds for failover if the BGP peering session
goes down on a Gateway node.

17
The gateways must have Large Receive Offload (LRO) turned off to ensure MidoNet delivers
packets that are not larger than the MTU of the destination VM. For example, the command
that is shown in Figure 8 turns off LRO for an uplink interface of a gateway.

# ethtool -K p2p1 lro off


Figure 8 Disabling LRO on MidoNet Gateway Nodes

Also, to share state, port groups were created for gateway uplinks. Stateful port-groups allow
the state of a connection to be shared such that gateways can track connections with
asymmetric traffic flows. Figure 9 shows the commands that are used to configure stateful
port-groups.

midonet-cli> port-group create name SPG stateful true


pgroup0
midonet> port-group pgroup0 add member port router0:port0
port-group pgroup0 port router0:port0
midonet> port-group pgroup0 add member port router0:port1
port-group pgroup0 port router0:port1
Figure 9 Configuring stateful port-groups

The default number of client connections for ZooKeeper was changed on the NSDB nodes.
This change is made in the /etc/zookeeper/zoo.cfg file by using the line that is shown in
Figure 10.

maxClientCnxns=500
Figure 10 Increasing the number of client connections for ZooKeeper instances

This configuration change allows the number of MidoNet agents that are connecting to
ZooKeeper to go beyond the default limit.

Logical routers and rules and chains were also created to provide multi-VRF functionality for
upstream isolation of traffic.

Operational tools
MidoNet Manager is a network management GUI that provides an interface for operating
networks in an OpenStack + MidoNet cloud. It allows the configuration of BGP for gateway
functionality and monitoring of all virtual devices through traffic flow graphs.

When VXLAN overlays are used with OpenStack, operating and monitoring tools become
increasingly relevant when you are moving from proof-of-concept into production. Preceding
monitoring and troubleshooting methods (such as RSPAN) capture packets on physical
switches but give no context for a traffic flow.

MidoNet Manager presents flow tracing tools in a GUI to give OpenStack + MidoNet cloud
operators the ability to identify specific tenant traffic and trace their flow through a logical
topology. The flow tracing gives insight into each virtual network device that is traversed,
every security group policy that is applied, and the final fate of the packet. MidoNet Manager
provides insights for NetOps and DevOps for the Operations and Monitoring of OpenStack +
MidoNet environments that are built for enterprise private clouds.

18 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


An example of the initial stage of flow tracing in MidoNet Manager is highlighted in the red box
in Figure 11.

Figure 11 MidoNet Manager flow tracing

19
Professional services
The Lenovo Enterprise Solution Services team helps clients worldwide with deployment of
Lenovo System x® and ThinkServer solutions and technologies. The Enterprise Solution
Services team can design and deliver the OpenStack Cloud solution that is described in this
document and new designs in Software Defined Everything, big data and analytics, HPC,
Virtualization, or Converged Infrastructure. Lenovo Enterprise Solution Services also
provides training to a staff at site to get up to speed with performing health check services for
existing environments.

We feature the following offerings:


򐂰 Cloud: Our cloud experts help design complex IaaS, PaaS, or SaaS cloud solutions with
our Cloud Design Workshop. We specialize in OpenStack and VMware-based private and
hybrid cloud, design, and implementation services.
򐂰 Software-Defined Storage: We provide expertise with design and implementation servers
for software-defined storage environments. Our consultants can provide assistance with
implementing Ceph, Quobyte, General Parallel File System (GPFS), and GPFS storage
server installation and configuration of key operating system and software components or
with other software-defined storage technologies.
򐂰 Virtualization: Get assistance with VMware vSphere or Linux KVM through our design
implementation and health check services.
򐂰 Converged Infrastructure: Learn about Flex system virtualized, Blade server to Flex
system migration assessment, VMware-based private cloud, and Flex System™ manager
quickstart.
򐂰 High-Performance Computing (HPC): Our team helps you get the most out of your
System x or ThinkServer with HPC intelligent cluster implementation services, health
check services, and state-of-the-art cloud services for HPC.

For more information, contact Lenovo Enterprise Solution Services at: x86svcs@lenovo.com

The Midokura team provides professional services and training to enable customers with
OpenStack and MidoNet. Midokura’s expertise is in distributed systems. The Midokura team
has real-world experience building distributed systems for large e-commerce sites, such as
Amazon and Google.

Midokura Professional Services helps customers from architectural design, implementation


into production, and MidoNet training, in only a couple of weeks. Midokura Professional
Services are not only academic; the solutions are practical and come from hands-on
deployments, operational experience, and direct contributions to the OpenStack Neutron
project.

For more information, contact Midokura at: info@midokura.com

20 Lenovo and Midokura OpenStack PoC: Software-Defined Everything


About the authors
Krzysztof (Chris) Janiszewski is a member of Enterprise Solution Services team at Lenovo.
His main background is in designing, developing, and administering multiplatform, clustered,
software-defined, and cloud environments. Chris previously led System Test efforts for the
IBM OpenStack cloud-based solution for x86, IBM System z, and IBM Power platforms.

Michael Lea, CCIE #11662, CISSP, MBA, is an Enterprise System Engineer for Lenovo’s
Enterprise Business Group. He has over 18 years of experience in the field designing
customers networks and data centers. Over the past 18 years, Michael worked with service
providers, managed service providers (MSPs), and large enterprises, delivering cost effective
solutions that include networking, data center, and security assurance. By always looking at
technical and business requirements, Michael makes certain that the proper technologies are
used to help clients meet their business objectives. Previous roles held by Michael include
Consulting Systems Engineer with Cisco Systems and IBM.

Cynthia Thomas Cynthia is a Systems Engineer at Midokura. Her background in networking


spans Data Center, Telecommunications, and Campus/Enterprise solutions. Cynthia has
earned a number of professional certifications, including: Alcatel-Lucent Network Routing
Specialist II (NRS II) written certification exams, Brocade Certified Ethernet Fabric
Professional (BCEFP), Brocade Certified IP Network Professional (BCNP), and VMware
Technical Sales Professional (VTSP) 5 certifications.

Susan Wu is the Director of Technical Marketing at Midokura. Susan previously led product
positions for Oracle/Sun, Citrix, AMD, and Docker. She is a frequent speaker for industry
conferences, such as Interzone, Cloudcon/Data360, and Data Storage Innovation.

Thanks to the following people for their contribution to this project:


򐂰 Srihari Angaluri, Lenovo
򐂰 Michael Ford, Midokura
򐂰 Adam Johnson, Midokura
򐂰 David Watts, Lenovo Press

21
Notices
Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult
your local Lenovo representative for information on the products and services currently available in your area.
Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo
product, program, or service may be used. Any functionally equivalent product, program, or service that does
not infringe any Lenovo intellectual property right may be used instead. However, it is the user's responsibility
to evaluate and verify the operation of any other product, program, or service.

Lenovo may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:

Lenovo (United States), Inc.


1009 Think Place - Building One
Morrisville, NC 27560
U.S.A.
Attention: Lenovo Director of Licensing

LENOVO PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some
jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.

The products described in this document are not intended for use in implantation or other life support
applications where malfunction may result in injury or death to persons. The information contained in this
document does not affect or change Lenovo product specifications or warranties. Nothing in this document
shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or
third parties. All information contained in this document was obtained in specific environments and is
presented as an illustration. The result obtained in other operating environments may vary.

Lenovo may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.

Any references in this publication to non-Lenovo Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this Lenovo product, and use of those Web sites is at your own risk.

Any performance data contained herein was determined in a controlled environment. Therefore, the result
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.

© Copyright Lenovo 2015. All rights reserved.


Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by Global Services
Administration (GSA) ADP Schedule Contract 22
This document REDP-5233-00 was created or updated on June 25, 2015.

Send us your comments in one of the following ways:


򐂰 Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
򐂰 Send your comments in an email to:
redbooks@us.ibm.com

Trademarks
Lenovo, the Lenovo logo, and For Those Who Do are trademarks or registered trademarks of Lenovo in the
United States, other countries, or both. These and other Lenovo trademarked terms are marked on their first
occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law
trademarks owned by Lenovo at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of Lenovo trademarks is available on
the Web at http://www.lenovo.com/legal/copytrade.html.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Flex System™ RackSwitch™ System x®
Lenovo® Lenovo(logo)® ThinkServer®

The following terms are trademarks of other companies:

Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy