0% found this document useful (0 votes)

44 views

M05 Cluster Storage Overview

A cluster storage overview document discusses storage area networks (SANs) which use high-performance interconnects like Fibre Channel cabling to maximize data transfer speeds between computers and storage. SANs provide shared storage that any server can access, allowing for central data management. They enable failover clustering by allowing multiple servers to access a common storage pool. Key benefits of SANs include independent growth of server and storage resources, centralized management, shared access to data for clustering, and direct device-to-device data transfers without using server resources.

Uploaded by

raizadasaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

M05 Cluster Storage Overview

Uploaded by

raizadasaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 47

5.

Cluster Storage Overview

5. Cluster Storage Overview

In environments where information transfers between the computer and storage must be
maximized for speed, efficiency, and reliability, such as in banking or trading businesses, high-
performance interconnects that maximize input/output (I/O) throughput are critical. Such
organizations use storage area networks (SANs), usually with Fibre Channel cabling and
interconnects, or links, in redundant configurations, and storage arrays with hardware based
redundant array of independent disks (RAID) for high availability and high performance.
High-performing equipment helps to ensure that high performance needs can be met, but it does
not guarantee them. The functioning of the storage network is also dependent on the capabilities of
the host operating system, specifically the host operating system drivers that interface with the
storage hardware to pass I/O requests to and from the storage devices. This is especially important
in Fibre Channel SANs, where a complex system of switches and links between servers and
storage requires an effective means of detecting link problems and eliciting the appropriate
response from the operating system.
This module describes in detail how SANs are used in a clustered environment. Storage array
terminology will be discussed in detail. This module also describes the architecture of a multipath
I/O.
It is important to remember that shared storage planning is one of the most critical pieces of cluster
deployment. If due diligence in this process is not done prior to production deployment, this is an
area where customers can have the greatest chance for cluster instability.

Prerequisites
1
Before starting this session, you should be able to:
● Understand the steps needed prior to cluster deployment.
● Describe the hardware needed for a Windows Failover cluster.

What You Will Learn

2
After completing this session, you will be able to:
● Define a Storage Area Network.
● Understand fibre network topologies.
● Discuss SANs in the clustered environment.
● Understand the basics of iSCSI.
● Discuss the architecture of multipath I/O subsystems.

Storage Area Network

Introduction to Windows Server 2003 Cluster Technologies 1

3
A SAN is defined as a set of interconnected devices, such as disks, tapes, and servers, that are
connected to a common communication and data transfer infrastructure, such as Fibre Channel.
The common communication and data transfer mechanism for a given deployment is called the
storage fabric.
The purpose of the SAN is to enable multiple servers access to a pool of storage in which any
server can potentially access any storage unit. In this environment, management, which determines
who is authorized to access which devices, and sequencing or serialization guarantees, which
determines who can access which devices at what point in time, plays a large role in providing
security guarantees.
SANs evolved to address the increasingly difficult job of managing storage at a time when the
storage usage is growing explosively. With devices locally attached to a given server or in the
server enclosure itself, performing day-to-day management tasks becomes extremely complex.
Backing up the data in the data center requires complex procedures as the data is distributed
amongst the nodes and is accessible only through the server to which it is attached. As a given
server outgrows its current storage pool, storage specific to that server has to be acquired and
attached, even if there are other servers with plenty of storage space available. Other benefits can
also be gained, such as multiple servers sharing data sequentially or in parallel, devices back ups
through transferring data directly from device to device without first transferring it to a backup
server.
A SAN is a network like any other network, such as a LAN infrastructure. You can use a SAN to
connect many different devices and hosts to provide access to any device from anywhere. Direct
attached storage (DAS) technologies, such as SCSI, are tuned to the specific requirements of
connecting mass storage devices to host computers. In particular, they are low latency, high
bandwidth connections with extremely high data integrity semantics. Network technology, on the
other hand, is tuned more to providing application-to-application connectivity in increasingly
complex and large-scale environments. Typical network infrastructures have high connectivity, can
route data across many independent network segments, potentially over very large distances, and
have many network management and troubleshooting tools.
SANs capitalize on the best of the storage technologies and network technologies to provide a low
latency, high bandwidth interconnect which can span large distances, has high connectivity, and
good management infrastructure from the start.
These features enable you to implement Failover Clustering Technologies. Without the ability to
connect multiple hosts to a collection of storage, clustering would not be possible.
A SAN environment provides the following benefits:
● Centralization of storage into a single pool enables storage resources and server resources to
grow independently. It also enables storage to be dynamically assigned from the pool as and
when it is required. Storage on a given server can be increased or decreased as needed
without complex reconfiguring or re-cabling of devices.
● Common infrastructure for attaching storage enables a single common management model for
configuration and deployment.

2 Introduction to Windows Server 2003 Cluster Technologies

● Storage devices are inherently shared by multiple systems. Ensuring data integrity guarantees
and enforcing security policies for access rights to a given device is a core part of the
infrastructure.
● Data can be transferred directly from device to device without server intervention. For
example, data can be moved from a disk to a tape without first being read into the memory of
a backup server. This frees up compute cycles for business logic rather than management
related tasks. Direct device-to-device transfer is also used in Geo-Cluster solutions where
SAN replication is needed to keep multiple sites synchronized from a data perspective.
● It enables clusters to be built where shared access to a data set is required. Consider a
clustered Microsoft SQL Server environment. At any point in time, a SQL Server instance
may be hosted on one computer in the cluster and it must have exclusive access to its
associated database on a disk from the node on which it is hosted. In the event of a failure or
an explicit management operation, the SQL Server instance may failover to another node in
the cluster. After it fails over, the SQL Server instance must be able to have exclusive access
to the database on disk from its new host node.
The following sections discuss the different aspects of a SAN that should be evaluated for your
cluster solution. Diligence and time spent on this evaluation can increase the reliability and
availability of your cluster solution, providing a strong foundation component for your
implementation.

SAN vs. NAS

4
There are two industry-wide terms that refer to externally attached storage:
● Storage Area Network (SAN)
● Network Attached Storage (NAS)
Having two, similar sounding terms leads to some confusion and therefore it is worth discussing
the differences between the two different technologies before delving into SAN details.
SANs, as shown in Figure 1, are built using storage-specific network technologies. Fibre channel
technology is most commonly used currently. Servers connect to storage and access data at the
block level. In other words, a server accesses a disk drive on the SAN using the same read and
write disk block primitives that is would use for a locally attached disk. Typically, data and
requests are transmitted using a storage-specific protocol, usually based on the SCSI family of
protocols. These protocols are tuned for low latency, high bandwidth data transfers required by
storage infrastructure.
Figure 1. Storage Area Network

Introduction to Windows Server 2003 Cluster Technologies 3

While Fibre Channel is by far the leading technology today, other SAN technologies are available,
for example SCSI over Infiniband, iSCSI (which is SCSI protocol running over a standard IP
network). All these technologies allow a pool of devices to be accessed from a set of servers,
decoupling the compute needs from the storage needs.

Note iSCSI is discussed in greater detail later in the training.

4 Introduction to Windows Server 2003 Cluster Technologies

Network Attached Storage

5
In contrast, NAS, shown in Figure 2, is built using standard network components, such as Ethernet
or other LAN technologies. The application servers access storage using file system functions,
such as open file, read file, write file, and close file. These higher-level functions are encapsulated
in protocols, such as CIFS, NFS, or AppleShare, and run across standard IP-based connections.
Figure 2. Network Attached Storage

Introduction to Windows Server 2003 Cluster Technologies 5

In a NAS solution, the file servers hide the details of how data is stored on disks and present a
high-level file system view to application servers. In a NAS environment, the file servers provide
file system management functions such as the ability to back up a file server.

Hybrid NAS and SAN Solution

6
As SAN technology prices decrease and the need for highly scalable and highly available storage
solutions increases, vendors are turning to hybrid solutions that combine the centralized file server
simplicity of NAS with the scalability and availability offered by SAN as shown in Figure 3.
Figure 3. Hybrid NAS and SAN Solution

The following table contrasts the SAN and NAS technologies:

Table 1. SAN Versus NAS

Parameter SAN Network Attached Storage

Application Server Block-level access File-level access
Access Methods

6 Introduction to Windows Server 2003 Cluster Technologies

Parameter SAN Network Attached Storage

Communication Protocol SCSI over Fibre Channel iSCSI, which CIFS, NFS, and AppleShare
is SCSI over IP
Network Physical Typically storage-specific, such as General purpose LAN, such as Gigabit
Technology Fibre-channel, but may be high-speed Ethernet
Ethernet also

A SAN provides a broad range of advantages over locally connected devices. It enables computers
to be detached from storage units, providing flexible deployment and re-purposing of servers and
storage to suit current business needs. In a SAN environment, you do not have to be concerned
about buying the right devices for a given server, or with re-cabling a data center to attach storage
to a specific server.
Microsoft supports SANs, both as part of the base Microsoft Windows platform and as part of a
complete Windows Server Clustering, high availability solution. You can deploy multiple server
clusters can be deployed in a single SAN environment, along with stand-alone Windows servers
and with non-Windows–based platforms.

Storage Topologies
7
There are four types of Storage I/O technologies supported in 2003 Server clusters: iSCSI, Parallel
SCSI, Serial Attached SCSI (SAS), and Fibre Channel.

Note Microsoft Windows Server 2003 provides support for SCSI interconnects and Fibre Channel
arbitrated loops (FC-AL) for two nodes only. For configurations with more than two nodes, you
need to use a Switched Fibre Channel (FC-SW), called fabric, environment, or an iSCSI solution.

As a clustering administrator, you should be aware of the following implementation issues with
regard to SAN:
● iSCSI
○ Physical Components
iSCSI uses standard IP networks to transfer block-level data between computer systems
and storage devices. Unlike Fibre Channel, iSCSI uses existing network infrastructure,
such as network adapters, network cabling, hubs, switches, routers, and supporting
software. Using network adapters rather than HBAs enables transfer of both SCSI block
commands and normal messaging traffic. This gives iSCSI an advantage over Fibre
Channel network configurations, which require use of both HBAs and network adapters
to accommodate both types of traffic. While this is not a problem for large servers, thin
servers can accommodate only a limited number of interconnects.
iSCSI is based on the serial SCSI standards, and can operate over existing Category 5, or
higher, copper Ethernet cable or fiber optic wiring.

Introduction to Windows Server 2003 Cluster Technologies 7

○ Transfer Protocols
iSCSI describes the transport protocol for carrying SCSI commands over TCP/IP. TCP
handles flow control and facilitates reliable transmission of data to the recipient by
providing guaranteed in-order delivery of the data stream. IP routes packets to the
destination network. These protocols ensure that data is correctly transferred from
requesting applications, called initiators, to target devices.
Transmissions across Category 5 network cabling are at speeds up to 1 gigabit per
second. Error rates on gigabit Ethernet are in the same low range as Fibre Channel.
○ Limitations
The amount of time it takes to queue, transfer, and process data across the network is
called latency. A drawback of transmitting SCSI commands across a TCP/IP network is
that latency is higher than it is on Fibre Channel networks, in part because of the
overhead associated with TCP/IP protocols. Additionally, many currently deployed
Ethernet switches were not designed with the low latency specifications associated with
Fibre Channel. Thus, although Ethernet cabling is capable of high speeds, the actual
speed of transmission may be lower than expected, particularly during periods of
network congestion.
The second concern about iSCSI transmission is data integrity, both in terms of errors
and security. Error handling is addressed at each protocol level. Security for tampering
or snooping as the data passes over networks can be handled by implementing the IP
security protocol (IPSec).
○ Simplify Implementation
iSCSI enables you to create storage networks from existing network components, rather
than having to add a second network fabric type. This simplifies hardware
configurations because Ethernet switches can be used. It also enables using existing
security methods, such as firewalls and IP security, which includes encryption,
authentication, and data integrity measures.
Sharing storage among multiple systems requires a method for managing storage access
so that systems access only the storage that is assigned to them. In Fibre Channel
networks, this is done by assigning systems to zones. In iSCSI, this can be done by using
virtual LAN (VLAN) techniques. In Fibre Channel, LUN masking must be used to
provide finer granularity of storage access. For iSCSI, this is handled as part of the
design by allowing targets to be specific to individual hosts.
The use of IP traffic prioritization or Quality of Service (QoS) can help ensure that
storage traffic has the highest priority on the network, which helps to alleviate latency
issues.
○ Enable Remote Capabilities
iSCSI is not limited to the metropolitan-wide areas to which Fibre Channel is limited.
iSCSI storage networks can be LANs, MANs, or WANs, allowing global distribution.
iSCSI has the ability to eliminate the conventional boundaries of storage networking,
enabling businesses to access data worldwide, and ensuring the most robust disaster
protection possible. To do this for Fibre Channel based SANs, it is necessary to
introduce additional protocol translations, such as Fibre Channel IP, and devices that
provide this capability on each end of the SAN links.

8 Introduction to Windows Server 2003 Cluster Technologies

○ Simplify Clustering
When multiple servers share access to the same storage, as is done with Server Clusters,
configuration of Fibre Channel SANs can be very difficult—one improperly configured
system affects the entire SAN. iSCSI clusters, unlike Fibre Channel clusters, do not
require complex configurations. Instead, iSCSI configuration is easily accomplished as
part of the iSCSI protocol, with little need for intervention by system administrators.
Changes introduced by hardware replacement are largely transparent on iSCSI but are a
major source of errors on Fibre Channel implementations.
● Parallel SCSI
○ Supported in Windows Server 2003 Enterprise Server only up to two nodes.
○ SCSI Adaptors and storage solutions need to be certified.
○ SCSI cards that host the interconnect, or shared disks, should have different SCSI IDs,
which are normally 6 and 7. Ensure that device access requirements are inline with SCSI
IDs and priorities.
○ SCSI adaptor BIOS should be disabled.
○ If devices are daisy-chained, ensure that both ends of the shared bus are terminated.
○ Use physical terminating devices and not controller-based or device-based termination.
○ SCSI hubs are not supported.
○ Avoid the use of connector converters, such as 68-pin to 50-pin.
○ Avoid combining multiple device types, such as single ended and differential.
● Fibre Channel
○ Fibre Channel Arbitrated Loops (FC-AL) are supported up to two nodes.
○ Fibre Channel Fabric (FC-SW) is supported for all higher node combinations.
○ Components and configuration need to be on the Windows Server Catalog.
○ Supports a multi-cluster environnent.
○ Fault tolerant multi-path I/O drivers and components also need to be certified.
○ Virtualization engines need to be certified.

Note The switch is the only component that is not currently certified by Microsoft. It is
recommended that the end user get the appropriate interoperability guarantees from the switch
vendor before implementing switch fabric topologies. In complicated topologies, where multiple
switches are used and connected through ISLs, it is recommended that the customer work closely
with Microsoft and the switch and storage vendors during the implementation phase to ensure that
all of the components work well together.

Introduction to Windows Server 2003 Cluster Technologies 9

iSCSI Basics
8
An iSCSI-based network consists of:
● Server and storage device end nodes.
● Either network interface cards or HBAs with iSCSI over TCP/IP capability on the server.
● Storage devices with iSCSI-enabled Ethernet connections, called iSCSI Targets.
● iSCSI storage switches and routers.
● Clients attached to the same network with iSCSI initiator software installed.
Most current SANs use Fibre Channel technology, making it necessary to use multi-protocol
bridges, or storage routers capable of translating iSCSI to Fibre Channel. This enables iSCSI-
connected hosts to communicate with existing Fibre Channel devices.
Storage traffic is commonly initiated by a host computer and received by the target storage device.
Since target devices can have multiple storage devices associated with them (each one being a
logical unit), the final destination of the data is not the target per se, but specific logical units
within the target.
iSCSI, or Internet SCSI, is an IP-based storage networking standard developed by the IETF. By
enabling block-based storage over IP networks, iSCSI enables storage management over greater
distances than Fibre Channel technologies, and at lower cost. The Microsoft iSCSI initiator service
helps to bring the advantages of high-end SAN solutions to small and midsized businesses. The
service enables a full range of solutions, from low-end network, adapter-only implementations to
high-end offloaded configurations that can rival Fibre Channel solutions.

10 Introduction to Windows Server 2003 Cluster Technologies

iSCSI Protocol
9
The iSCSI protocol stack links SCSI commands for storage and IP protocols for networking to
provide an end-to-end protocol for transporting commands and block-level data down through the
host initiator layers and up through the stack layers of the target storage devices. This
communication is fully bidirectional, as shown in Figure 4, where the arrows indicate the
communication path between the initiator and the target by means of the network.
Figure 4. iSCSI Protocol Stack Layers

The initiator, usually a server, makes the application requests. These are converted by the SCSI
class driver to SCSI commands, which are transported in command description blocks (CDBs). At
the iSCSI protocol layer, the SCSI CDBs, under control of the iSCSI device driver, are packaged
in a protocol data unit (PDU) which now carries additional information, including the logical unit
number of the destination device. The PDU is passed on to TCP/IP, which encapsulates the PDU.
TCP/IP then passes it to IP, which adds the routing address of the final device destination. Finally,
the network layer, typically Ethernet, adds information and sends the packet across the physical
network to the target storage device.
Additional PDUs are used for target responses and for the actual data flow. Write requests are sent
from the initiator to the target, and are encapsulated by the initiator. Read requests are sent from
the target to the initiator, and the target does the encapsulation.

Naming and Addressing

All initiator and target devices on the network must be named with a unique identifier and assigned
an address for access. iSCSI initiators and target nodes can either use an iSCSI qualified name
(IQN) or an enterprise unique identifier (EUI). Both types of identifiers confer names that are
permanent and globally unique.
Each node has an address consisting of the IP address, the TCP port number, and either the IQN or
the EUI name. The IP address can be assigned by using the same methods commonly employed on
networks, such as Dynamic Host Control Protocol (DHCP) or manual configuration.

Introduction to Windows Server 2003 Cluster Technologies 11

Discovery
SANs can become very large and complex. While using pooled storage resources is a desirable
configuration, initiators must be able to determine both the storage resources available on the
network, and whether or not access to that storage is permitted. A number of discovery methods
are possible, and to some degree, the method used depends on the size and complexity of the SAN
configuration.
● Administrator Control
For simple SAN configurations, you can manually specify the target node name, IP address,
and port to the initiator and target devices. If any changes occur on the SAN, you must
upgrade these names as well.
● SendTargets
A second small storage network solution is for the initiator to use the SendTargets operation
to discover targets. The address of a target portal is manually configured and the initiator
establishes a discovery session to perform the SendTargets command. The target device
responds by sending a complete list of additional targets that are available to the initiator.
This method is semi-automated, which means that the administrator might still be required to
enter a range of target addresses.
● SLP
A third method is to use the Service Location Protocol (SLP). Early versions of this protocol
did not scale well to large networks. In the attempt to rectify this limitation, a number of
agents were developed to help discover targets, making discovery management
administratively complex.
● iSNS
The Internet Storage Name Service (iSNS) is a relatively new device discovery protocol,
ratified by the IETF, that provides both naming and resource discovery services for storage
devices on the IP network. iSNS builds upon both IP and Fibre Channel technologies.
The protocol uses an iSNS server as the central location for tracking information about targets
and initiators. The server can run on any host, target, or initiator on the network. iSNS client
software is required in each host initiator or storage target device to enable communication
with the server. In the initiator, the iSNS client registers the initiator and queries the list of
targets. In the target, the iSNS client registers the target with the server.
iSNS provides the following capabilities:
○ Name Registration Service
This enables initiators and targets to register and query the iSNS server directory for
information regarding initiator and target ID and addresses.
○ Network Zoning and Logon Control Service
iSNS initiators can be restricted to zones so that they are prevented from discovering
target devices outside their discovery domains. This prevents initiators from accessing
storage devices that are not intended for their use. Logon control allows targets to
determine which initiators can access them.
○ State Change Notification Service
This service enables iSNS to notify clients of changes in the network, such as the
addition or removal of targets, or changes in zoning membership. Only initiators that are
registered to receive notifications will get these packets, reducing random broadcast
traffic on the network.

12 Introduction to Windows Server 2003 Cluster Technologies

From its inception, iSNS was designed to be scaleable, working effectively in both
centralized and distributed environments. iSNS also supports Fibre Channel IP, enabling
configurations that link Fibre Channel and iSCSI to use iSNS to get information from Fibre
Channel networks as well. Hence, iSNS can act as a unifying protocol for discovery.

Session Management
For the initiator to transmit information to the target, the initiator must first establish a session with
the target through an iSCSI logon process. This process starts the TCP/IP connection, verifies that
the initiator has access to the target (authentication), and allows negotiation of various parameters
including the type of security protocol to be used, and the maximum data packet size. If the logon
is successful, an ID is assigned to both initiator (an initiator session ID, or ISID) and target (a
target session ID, or TSID). Thereafter, the full feature phase—that allows for reading and writing
of data—can begin. Multiple TCP connections can be established between each initiator target
pair, allowing unrelated transactions during one session. Sessions between the initiator and its
storage devices generally remain open, but logging out is available as an option.

Error Handling
While iSCSI can be deployed over gigabit Ethernets, which have low error rates, it is also designed
to run over both standard IP networks and WANs, which have higher error rates. WANs are
particularly error-prone since the possibility of errors increases with distance and the number of
devices across which the information must travel. Errors can occur at a number of levels, including
the iSCSI session level (connection to host lost), the TCP connection level (TCP connection lost),
and the SCSI level (loss or damage to PDU).
Error recovery is enabled through initiator and target buffering of commands and responses. If the
target does not acknowledge receipt of the data because it was lost or corrupted, the buffered data
can be resent by the initiator, a target, or a switch.
iSCSI session recovery, which is necessary if the connection to the target is lost due to network
problems or protocol errors, can be reestablished by the iSCSI initiator. The initiator attempts to
reconnect to the target, continuing until the connection is reestablished.

Security
Security is critically important because iSCSI operates in the Internet environment. The IP protocol
itself does not authenticate legitimacy of the data source (sender), and it does not protect the
transferred data. ISCSI, therefore, requires strict measures to ensure security across IP networks.
The iSCSI protocol specifies the use of IP security (IPsec) to ensure that:
● The communicating end points (initiator and target) are authentic.
● The transferred data has been secured through encryption and is thus kept confidential.
● Data integrity is maintained without modification by a third party.
● Data is not processed more than once, even if it has been received multiple times.
The Internet Key Exchange (IKE) protocol can assist with key exchanges, a necessary part of the
IPsec implementation.
iSCSI also requires that the Challenge Handshake Authentication Protocol (CHAP) be
implemented to further authenticate end node identities. Other optional authentication protocols
include Kerberos (such as the Windows implementation), which is a highly scalable option.

Introduction to Windows Server 2003 Cluster Technologies 13

Even though the standard requires that these protocols be implemented, there is no such
requirement to use them in an iSCSI network. Before implementing iSCSI, you should review the
security measures to make sure that they are appropriate for the intended use and configuration of
the iSCSI storage network.

Performance
SCSI over TCP/IP can suffer performance degradation, especially in high traffic settings, if the
host CPU is responsible for processing TCP/IP. This performance limitation is especially dramatic
when compared with Fibre Channel, which does not have TCP/IP overhead

iSCSI for Windows

The Microsoft iSCSI initiator package is designed to run on Windows 2000, Windows XP, and
Windows Server 2003. The package consists of the following software components:
● Initiator
The optional iSCSI device driver component that is responsible for moving data from the
storage stack over to the standard network stack. This initiator is only used when iSCSI traffic
goes over standard network adapters, not when specialized iSCSI adapters are used.
● Initiator Service
A service that manages all iSCSI initiators, including network adapters and HBAs, on behalf
of the OS. Its functions include aggregating discovery information and managing security. It
includes an iSNS client, the code required for device discovery.
● Management Applications
The iSCSI command line interface (iSCSICLI), property pages for device management, and a
control panel application.

Fibre Channel Topologies

10
This section provides a basic overview of the components in a Fibre Channel storage fabric as well
as different topologies and configurations open to Windows deployments.
Fibre channel defines three configurations:
● Point-to-point
● Fibre Channel Arbitrated Loop (FC-AL)
● Switched Fibre Channel Fabrics (FC-SW)
Although the term Fibre Channel implies some form of fiber optic technology, the Fibre Channel
specification includes both fiber optic interconnects as well as copper coaxial cables.

Point-to-Point
11
Point-to-point Fibre Channel is a simple way to connect two (and only two) devices directly
together, as shown in Figure 5. It is the Fibre Channel equivalent of direct attached storage (DAS).

14 Introduction to Windows Server 2003 Cluster Technologies

Figure 5. Point-to-Point Connection

From a cluster and storage infrastructure perspective, point-to-point is not a scalable enterprise
configuration.

Arbitrated Loops
12
A Fibre Channel arbitrated loop is exactly what it says; it is a set of hosts and devices that are
connected into a single loop, as shown in Figure 6. It is a cost-effective way to connect up to 126
devices and hosts into a single network.
Figure 6. Fibre Channel Arbitrated Loop

Note An FCAL can only support up to 126 devices because it is running in Half Duplex, unlike
other solutions, such as a fabric environment. An FCAL can support up to 256 devices on each
channel.

Devices on the loop share the media; each device is connected in series to the next device in the
loop and so on around the loop. Any packet traveling from one device to another must pass
through all intermediate devices. In Figure 6, for host A to communicate with device D, all traffic
between the devices must flow through the adapters on host B and device C. The devices in the
loop do not need to look at the packet; they will simply pass it through. This is all done at the
physical layer by the Fibre Channel interface card itself; it does not require processing on the host
or the device. This is very analogous to the way a token-ring topology operates.

Introduction to Windows Server 2003 Cluster Technologies 15

When a host or device wants to communicate with another host or device, it must first arbitrate for
the loop. The initiating device does this by sending an arbitration packet around the loop that
contains its own loop address.
The arbitration packet travels around the loop and when the initiating device receives its own
arbitration packet back, the initiating device is considered to be the loop owner. The initiating
device next sends an open request to the destination device, which sets up a logical point-to-point
connection between the initiating device and target. The initiating device can then send as much
data as required before closing down the connection. All intermediate devices pass the data
through. There is no limit on the length of time for any given connection and therefore other
devices wanting to communicate must wait until the data transfer is completed and the connection
is closed before they can arbitrate.
If multiple devices or hosts want to communicate at the same time, each device or host sends out
an arbitration packet that travels around the loop. If an arbitrating device receives an arbitration
packet from a different device before it receives its own packet back, it knows there has been a
collision.
In this case, the device with the lowest loop address is declared the winner and is considered the
loop owner. There is a fairness algorithm built into the standard that prohibits a device from re-
arbitrating until all other devices have been given an opportunity, however, this is an optional part
of the standard.

Note Not all devices and HBAs support loop configurations because it is an optional part of the
Fibre Channel standard. However, for a loop to operate correctly, all devices on the loop must have
arbitrated loop support. Figure 7 shows a schematic of the wiring for a simple arbitrated loop
configuration.

Figure 7. FC-AL Logical Schematic

Communication in an arbitrated loop can occur in both directions on the loop depending on the
technology used to build the loop. In some cases communication can occur both ways
simultaneously.

16 Introduction to Windows Server 2003 Cluster Technologies

Although a loop can support up to 126 devices, as the number of devices on the arbitrated loop
increases, so does the length of the path. This increases the latency of individual operations.
Many loop devices, such as JBODs, have dipswitches to set the device address on the loop, known
as hard addressing. Most, if not all devices, implement hard addresses so it is possible to assign a
loop ID to a device. However, just as in a SCSI configuration, different devices must have unique
hard IDs. In cases where a device on the loop already has a conflicting address when a new device
is added, the new device either picks a different ID or it does not get an ID at all.

Note Most of the current FC-AL devices are configured automatically to avoid any address
conflicts. However, if a conflict does happen then it can lead to I/O disruptions or failures.

Unlike many bus technologies, the devices on an arbitrated loop do not have to be given fixed
addresses either by software configuration or via hardware switches. When the loop initializes,
each device on the loop must obtain an Arbitrated Loop Physical Address, which is dynamically
assigned. This process is initiated when a host or device sends out a Loop Initialization Primitive
(LIP), which accomplishes similar functionality to a parallel-SCSI Bus Reset; a master is
dynamically selected for the loop and the master controls a well-defined process where each device
is assigned an address.
A LIP is generated by a device or host when the adapter is powered up or when a loop failure is
detected, such as loss of carrier. Unfortunately, this means that when new devices are added to a
loop or when devices on the loop are power-cycled, all the devices and hosts on the loop can
change their physical addresses. LIPs are also issued when the Cluster Server needs to break
reservations to disks and take ownership when nodes are arbitrating for disks.
For these reasons, arbitrated loops provide a solution for small numbers of hosts and devices in
relatively static configurations.

FCAL And Clusters

Fibre channel arbitrated loops can be configured to support multiple hosts and multiple storage
devices. However, arbitrated loop configurations typically have restrictions due to the nature of the
technology.
For example, in some cases, a complete storage controller must be assigned to a given server or
cluster. Individual devices in the controller cannot be assigned to different servers or clusters.
Microsoft recommends and supports only one cluster is attached to any single arbitrated loop
configuration. This is because of the configuration restrictions and the mechanisms that the cluster
service uses to protect disks in a cluster and the widespread impact of LIPs when resetting devices
with FC-AL.
Server clusters fully support fabrics for both a single cluster and for multiple clusters and
independent servers on the same switched fabric (FC-SW). Fabrics provide a much more stable
environment where multiple server clusters are deployed using the same storage infrastructure.
Nodes and storage devices can leave or enter the SAN independently without affecting other parts
of the fabric. Highly available fabrics can be built up, and in conjunction with multi-path drivers,
can provide a highly available and scalable storage infrastructure.

Introduction to Windows Server 2003 Cluster Technologies 17

Fibre Channel Switched Fabric

13
In a switched Fibre Channel fabric (FC-SW), devices are connected in a many-to-many topology
using Fibre Channel switches, as shown in Figure 8. When a host or device communicates with
another host or device, the source and target setup a point-to-point connection between them and
communicate directly with each other. The fabric itself routes data from the source to the target. In
a Fibre Channel Switched Fabric, the media is not shared. Any device can communicate with any
other device, assuming it is not busy, and communication occurs at full bus speed irrespective of
other devices and hosts communicating.
Figure 8. Switched Fibre Channel Fabric

When a host or device is powered on, it must first login to the fabric. This enables the device to
determine the type of fabric (there is a set of characteristics about what the fabric will support) and
it causes a host or device to be given a fabric address. A given host or device continues to use the
same fabric address while it is logged into the fabric and the fabric address is guaranteed to be
unique for that fabric. When a host or device wants to communicate with another device, it must
establish a connection to that device before transmitting data in a way similar to the arbitrated
loop. However, unlike the arbitrated loop, the connection open packets and the data packets are
sent directly from the source to the target. In this case, the switches take care of routing the packets
in the fabric.
Fibre channel fabrics can be extended in different ways, such as by federating switches or
cascading switches. Therefore, Fibre Channel Switched Fabrics provide a much more scalable
infrastructure for large configurations. Switched fabrics provide a much more scalable SAN
environment than is possible using an arbitrated loop configuration.
You can deploy Fibre Channel arbitrated loop configurations in larger switched SANs. Many
incorporate functionality to enable arbitrated loop or point-to-point devices to be connected to any
given port. The ports can typically sense whether the device is a loop device or not and adapt the
protocols and port semantics accordingly. This enables platforms, specific host adapters, or devices
that only support arbitrated loop configurations today, to be attached to switched SAN fabrics.

18 Introduction to Windows Server 2003 Cluster Technologies

Loops vs. Fabrics

14
Both Fibre Channel arbitrated loops and switched fabrics have pros and cons. Before deploying
either, you need to understand the restrictions and issues as well as the benefits of each technology.
The vendor’s documentation provides specific features and restrictions. Table 2 helps to position
the different technologies.
Table 2. FC-AL Versus FC-SW

Pros Cons
FC-AL ● Low Cost ● More complex to deploy
● Loops are easily expanded and ● Maximum of 126 devices
combined with up to 126 hosts ● Devices share media thus lower
and devices overall bandwidth
● Easy for vendors to develop
Switched Fabric ● Easy to deploy ● Increased development
● Supports 16 million hosts and complexity
devices ● Interoperability issues between
● Communicate at full wire-speed, components from different
no shared media vendors
● Switches provide fault isolation ● Higher cost
and re-routing

Host Bus Adapters

A HBA is an interface card that resides inside a server or a host computer. It is the functional
equivalent of the NIC in a traditional Ethernet network. All traffic to the storage fabric or loop is
done via the HBA.
Many HBAs support both FC-AL and FC-SW. However, configuration is not as simple or as
automatic as it could be. It is difficult to figure out if an HBA configures itself to the appropriate
setting. With some switches, it is possible to get everything connected, however, some ports might
be operating as loop and still appear to work. It is important to verify from the switch side that the
hosts are operating in the appropriate mode.

Note Be sure to select the correct HBA for the topology that you are using. Although some
switches can auto-detect the type of HBA in use, using the wrong HBA in a topology can cause
many stability issues to the storage fabric.

Hubs, Switches, Routers, and Bridges

15
Different Fibre Channel topologies are available and these different topologies use different
components to provide the infrastructure.

Introduction to Windows Server 2003 Cluster Technologies 19

Hubs
16
Hubs are the simplest form of Fibre Channel devices and are used to connect devices and hosts
into arbitrated loop (FC-AL) configurations. Hubs typically have 4, 8, 12, or 16 ports, allowing up
to 16 devices and hosts to be attached. However, the bandwidth on a hub is shared by all devices
on the hub. In addition, hubs are typically half-duplex, although newer full duplex hubs are
becoming available. In other words, communication between devices or hosts on a hub can only
occur in one direction at a time. Because of these performance constraints, hubs are typically used
in small and low bandwidth configurations.
Figure 9 below shows two hosts and two storage devices connected to the hub with the arrows
showing the physical loop provided by the hub.
Figure 9. FC-AL Hub Configuration

A typical hub detects empty ports on the hub and does not configure them into the loop. Some
hubs provide higher levels of control over how the ports are configured and when devices are
inserted into the loop.

Switches
17
A switch is a more complex storage fabric device that provides the full Fibre Channel bandwidth to
each port independently, as shown in Figure 10. Typical switches enable ports to be configured in
either an arbitrated loop or a switched mode fabric.

20 Introduction to Windows Server 2003 Cluster Technologies

When a switch is used in an arbitrated loop configuration, the ports are typically full bandwidth
and bi-directional, enabling devices and hosts to communicate at full Fibre Channel speed in both
directions. In this mode, ports are configured into a loop, providing performance, arbitrated loop
configuration.
Switches are the basic infrastructure used for large, point-to-point, switched fabrics. In this mode, a
switch enables any device to communicate directly with any other device at full Fibre Channel
speed, which is currently 1Gbit/Sec or 2Gbit/sec today.
Figure 10. Switched Fibre Configuration

Switches typically support 16, 32, 64, or 128 ports, enabling complex fabric configurations. In
addition, switches can be connected together in a variety of ways to provide larger configurations
that consist of multiple switches. Several manufacturers, such as Brocade, Cisco, and McData,
provide a range of switches for different deployment configurations, from very high performance
switches that can be connected together to provide a core fabric to edge switches that connect
servers and devices with less intensive requirements.
Figure 11 below shows how switches can be interconnected to provide a scalable storage fabric
supporting many hundreds of devices and hosts.

Introduction to Windows Server 2003 Cluster Technologies 21

Figure 11. Core and Edge Switches in a SAN Fabric

The core backbone of the SAN fabric is provided by high performance and high port density
switches. The inter-switch bandwidth in the core is typically 8Gbit/sec and above. Large data
center class machines and large storage pools can be connected directly to the backbone for
maximum performance. Severs and storage with less performance requirements, such as
departmental servers, may be connected via large arrays of edge switches, each of which may have
16 to 64 ports.

Bridges and Routers

18
In an ideal world, all devices and hosts would be SAN-aware and all would interoperate in a
single, ubiquitous environment. Unfortunately, many hosts and storage components are already
deployed using different interconnect technologies. To enable these types of devices to play in a
storage fabric environment, a wide variety of bridge or router devices enable technologies to
interoperate. For example, SCSI-to-fibre bridges or routers enable parallel SCSI-2 and SCSI-3
devices to be connected to a fibre network, as shown in Figure 12. Bridges that enable iSCSI
devices to connect into a switch SAN fabric are emerging now.

22 Introduction to Windows Server 2003 Cluster Technologies

Figure 12. SCSI to Fibre Channel Bridge

Storage Controller
19
In its most basic form, a storage controller is a box that houses a set of disks and provides a single
connection, which is redundant and highly available, to a SAN fabric. Typically, disks in this type
of controller appear as individual devices that map directly to the individual spindles housed in the
controller. This is known as a Just a Bunch of Disks (JBOD) configuration. The controller provides
no value-add, and is just a concentrator to easily connect multiple devices to a single, or small
number for high availability, fabric switch port.
Modern controllers usually provide some level of redundancy for data. For example, many
controllers offer a wide variety of RAID levels, such as RAID 1, RAID 5, RAID 0+1, and many
other algorithms to ensure data availability in the event of the failure of an individual disk drive. In
this case, the hosts do not see devices that correspond directly to the individual spindles. Rather,
the controller presents a virtual view of highly available storage devices, called logical devices, to
the hosts.

Introduction to Windows Server 2003 Cluster Technologies 23

Figure 13. Logical Devices

In Figure 13, although there are five physical disk drives in the storage cabinet, only two logical
devices are visible to the hosts and can be addressed through the storage fabric. The controller does
not expose the physical disks themselves.
Many controllers are capable of connecting directly to a switched fabric. However, the disk drives
themselves are typically either SCSI or are disks that have a built-in FC-AL interface.
In Figure 14, the storage infrastructure that the disks connect to is completely independent from the
infrastructure presented to the storage fabric.
A controller requires at least two ports for highly available storage controllers. Controllers usually
have a small number of ports for connection to the Fibre Channel fabric. The logical devices
themselves are exposed through the controller ports as LUNs.

24 Introduction to Windows Server 2003 Cluster Technologies

Figure 14. Internal Components of a Storage Controller

Highly Available Solutions

20
A benefit of SANs is that the storage can be managed as a centralized pool of resources that can be
allocated and re-allocated as required. This is changing the way data centers and enterprises are
built. However, one of the biggest issues to overcome is that of guaranteed availability of data.
With all of the data detached from the servers, the infrastructure must be architected to provide
highly available access so that the loss of one or more components in the storage fabric does not
lead to the servers being unable to access the application data.
For a highly available SAN implementation, you must consider all pertinent aspects, including the
following:
● No single point of failure of cables or components, such as switches, HBAs, or storage
controllers. Typical highly available storage controller solutions from storage vendors have
redundant components and can tolerate many different kinds of failures.
● Transparent and dynamic path detection and failover at the host. This typically involves
multi-path drivers running on the host to present a single storage view to the application
across multiple, independent HBAs.

Introduction to Windows Server 2003 Cluster Technologies 25

● Built-in hot-swap and hot-plug for all components from HBAs to switches and controllers.
Many high-end switches and most if not all enterprise class storage controllers enable
interface cards, memory, CPU, and disk drives to be hot-swapped.
There are various SAN designs that have different performance and availability characteristics.
Different switch vendors provide different levels of support and different topologies. However,
most of the topologies are derived from standard network topology design. The topologies include:
● Multiple independent fabrics
● Federated fabrics
● Core backbone

Multiple Independent Fabrics

21
In a multiple fabric configuration, each device or host is connected to multiple fabrics, as shown in
Figure 15. In the event of the failure of one fabric, hosts and devices can communicate using the
remaining fabric.
Figure 15. Multiple Independent Fabrics

Listed below are the pros and cons of multiple independent fabrics:
● Pros
Resilient to management or user errors. For example, if security is changed or zones are
deleted, the configuration on the alternate fabric is untouched and can be re-applied to the
broken fabric.
● Cons
○ Managing multiple independent fabrics can be costly and error prone. Each fabric should
have the same zoning and security information to ensure a consistent view of the fabric
regardless of the communication port chosen.

26 Introduction to Windows Server 2003 Cluster Technologies

○ Hosts and devices must have multiple adapters. In the case of a host, multiple adapters
are typically treated as different storage buses. Additional multipathing software, such as
Microsoft MPIO, is required to ensure that the host gets a single view of the devices
across the two HBAs.

Federated Fabrics
22
In a federated fabric, multiple switches are connected together, as shown in Figure 16. Individual
hosts and devices are connected to at least two switches.
Figure 16. Federated Switches for Single Fabric View

Listed below are the pros and cons of multiple independent fabrics:
● Pros
○ Management is simplified, the configuration is a highly available, single fabric, and
therefore there is only one set of zoning information and one set of security information
to manage.
○ The fabric itself can route around failures such as link failures and switch failures.
● Cons
○ Hosts with multiple adapters must run additional multipathing software to ensure that the
host gets a single view of the devices where there are multiple paths from the HBAs to
the devices.
○ Management errors are propagated to the entire fabric.

Introduction to Windows Server 2003 Cluster Technologies 27

Core Backbone
23
A core backbone configuration enables you to scale-out a federated fabric environment. Figure 17
shows a backbone configuration. The core of the fabric is built using highly scalable, high
performance switches where the inter-switch connections provide high performance
communication, currently at speeds of 8-10GBit/Sec.
Figure 17. Backbone Configuration

Redundant edge switches can be cascaded from the core infrastructure to provide high numbers of
ports for storage and hosts devices. Listed below are the pros and cons of a backbone
configuration:
● Pros
○ Highly scalable and available SAN configuration.
○ Management is simplified; the configuration is a highly available single fabric.
Therefore, there is only one set of zoning information and one set of security
information to manage.
○ The fabric itself can route around failures, such as link failures and switch failures.
● Cons
○ Hosts with multiple adapters must run additional multipathing software to ensure that the
host gets a single view of the devices where there are multiple paths from the HBAs to
the devices.

28 Introduction to Windows Server 2003 Cluster Technologies

○ Management errors are propagated to the entire fabric.

Fabric Management
24
SANs are becoming increasingly complex and large configurations are becoming more and more
common. While SANs certainly provide many benefits over direct attach storage, the key issue is
how to manage this complexity.
A storage fabric can have many devices and hosts attached to it. With all of the data stored in a
single, ubiquitous cloud of storage, controlling which hosts have access to what data is extremely
important. It is also important that the security mechanism be an end-to-end solution so that badly
behaved devices or hosts cannot circumvent security and access unauthorized data.
The most common methods to accomplish this are:
● Zoning
● LUN Masking

LUN Masking Versus Zoning

Any server on a SAN can potentially mount and access any drive on the SAN. This can create
several problems, the two most prominent ones being disk resource contention and data corruption.
To deal with these problems, you can isolate and protect storage devices on a SAN by using zoning
and Logical Unit Number (LUN) masking, which enable you to dedicate storage devices on the
SAN to individual servers.

Zoning
25
You can attach multiple devices and nodes to a SAN. When data is stored in a single cloud, or
storage entity, it is important that you control which hosts have access to specific devices. Zoning
controls access from one node to another. Zoning enables you isolate a single server to a group of
storage devices or a single storage device, or associate a grouping of multiple servers with one or
more storage devices, as might be needed in a server cluster deployment.
The basic premise of zoning is to control who can see what in a SAN. There are a number of
approaches broken down according to server, storage and switch.
On any server, there are various mechanisms to control what devices an application can see and
whether or not the application can talk to another device. At the lowest level, the firmware or
driver of an HBA have a masking capability to control whether or not the server can see other
devices. In addition, you can configure the operating system to control which devices it tries to
mount as a storage volume. Finally, you can also use extra-layered software for volume
management, clustering, and file system sharing, which can also control applications access.
For storage zoning, if you ignore JBODS and the earlier RAID subsystems on most disk arrays,
there is a form of selective presentation. The array is configured with a list of which servers can
access which LUNs on which ports and quite simply ignores or rejects access requests from
devices that are not in those lists.
In terms of switch zoning, most Fibre Channel switches support some form of zoning to control
which devices on which ports can access other devices or ports.

Introduction to Windows Server 2003 Cluster Technologies 29

Ideally, you should use a combination of both approaches. You should, using some operating
system or software capability, control what devices or LUNs are mounted on the server. Do not use
a mount-all approach. Use selective presentation on the storage array, and use zoning in the fabric.

How Does Zoning Work?

In very simple terms, when a node comes up and connects to a fabric, it first does is a fabric logon.
In this way, the device gets its 24-bit address, which is used for routing in the fabric (SID or DID
usually refer to a source address or destination address of this form). Each port on a node or device
has a unique port WWN, usually programmed in hardware. There is also a node WWN that
identifies the node or device, and should show up the same on each port.
The next step occurs when a device logs on to the name server service in the SAN and registers
itself. The SAN builds a database of all the devices in the fabric by using a mapping of the node
and port WWNs to the 24-bit address, as well as the capabilities of each device. This includes
whether or not the device is an FCP device, which is a device that talks SCSI commands over
Fibre Channel.
Finally, a server requests the name server to send back a list of what other FCP devices it can see
in the fabric. This is where zoning comes in: the name server only returns a list of those FCP
devices that are in the same zone or in a common zone. In other words, you only find out about the
devices that you are supposed to know about.
The server, therefore, has a list of the 24-bit addresses of all the devices it is supposed to be able to
see. It then typically does a port logon to each one to try to find out what sort of FCP or SCSI
device it is. This is similar to normal SCSI, where the SCSI controller or server scans the bus and
queries the properties of each device it can see on the bus.
When configuring a zone, many SANs enable you to list the members in a zone using the port ID
or 24-bit address. To be precise, the syntax is usually x,y where x is the domain ID of a switch and
y is the port number on the switch. However, if you change the topology of the SAN and recable
part of the SAN, you have to re-configure all your zones.
The other zone configuration approach is to list the members in a zone using their WWNs, which
may be the port or node WWN. The advantage here is that if you change the domain ID of a
switch, the topology of the SAN, or where a device is plugged in, the zone is still good. However,
when you replace an HBA then you change the zones because the WWN is usually burnt into an
HBA, but this is a fairly simple change anyway.

Hard Zoning and Soft Zoning

You can use Fibre Channel switches to implement zoning at the hardware level. You can do this
either on a port basis, called hard zoning, or on a WWN-basis, called soft zoning. WWNs are 64-
bit identifiers for devices or ports. All devices with multiple ports have WWNs for each port,
which provides more granular management. Similar to MAC addresses on network adapters,
WWNs are expressed in hexadecimal numbers because of their length. Zoning is configured on a
per-target and initiator basis. Consequently, if you need to attach multiple nonclustered nodes to
the same storage port, you must also use LUN masking.
The two primary zoning methods are hard zoning and soft zoning.

30 Introduction to Windows Server 2003 Cluster Technologies

Hard Zoning
Hard zoning is a mechanism, implemented at the switch level, which provides an isolation
boundary. A port, which may be either host adapters or storage controller ports, can be configured
as part of a zone. Only ports in a given zone can communicate with other ports in that zone. The
zoning is configured and access control is implemented by the switches in the fabric, so a host
adapter cannot spoof the zones that it is in and gain access to data for which it has not been
configured.
Some switches cannot do hard zoning at all. Some can do it, although not to the granularity of
individual ports and with lots of restrictions. Other switches could only do hard zoning if all the
zones were using the port-ID syntax, hence the assumption that port-ID zoning is the same as hard
zoning. Yet some switches can now do hard zoning of zones that are using either port-ID or WWN
syntax.
Figure 18. Zoning

In Figure 18, hosts A and B can access data from storage controller S1. However, host C cannot
access the data because it is not in Zone A. Host C can only access data from storage S2.
Many current switches allow overlapping zones. This enables a storage controller to reside in
multiple zones, enabling the devices in that controller to be shared amongst different servers in
different zones, as shown in Figure 19. Finer granularity access controls are required to protect
individual disks against access from unauthorized servers in this environment.

Introduction to Windows Server 2003 Cluster Technologies 31

Figure 19. Storage Controller in Multiple Zones

Soft Zoning
Soft Zones are a software-based zoning method. Unlike Hard Zones, which are implemented at a
hardware level and have finite boundaries, a soft zone is based on the World-Wide Name (WWN).
WWNs are a 64-bit identifiers for devices or ports.
All zoning can be implemented in either hardware or software. Software zoning is done by the
name server or other fabric access software. When a host tries to open a connection to a device,
access controls can be checked at that time.
Zoning is not only a security feature, but also limits the traffic flow within a given SAN
environment. Traffic between ports is only routed to those pieces of the fabric that are in the same
zone. With modern switches, as new switches are added to an existing fabric, the new switches are
automatically updated with the current zoning information.
I/Os from hosts or devices in a fabric cannot leak out and affect other zones in the fabric causing
noise or cross talk between zones. This is fundamental to deploying Server clusters on a SAN.

The Difference between Zones and Virtual SANs

The general recommendation, regardless of hot code activation, is to always run two separate
fabrics with each device connected to both fabrics. There are a number of reasons for this, one of
which is that services such as the name server run as a single distributed service within a fabric.
Therefore, there is a small possibility that a badly behaved device could disrupt the name service to
the extent that all devices on the fabric, not just those in the same zone, are impacted.

32 Introduction to Windows Server 2003 Cluster Technologies

A virtual SAN (VSAN) has a higher level construct with a separate name server database rather
than one database that is common to all zones. It may even run as a separate service within the
switch, so the possibility of cross contamination is lower and problems are more highly localized.
Of course, there will still be problems if a device is connected to two separate VSANs, because the
device could behave unexpectedly and can potentially bring down both VSANs.
Alternatively, a standards-based management system might be using the Fibre Channel unzoned
name server query in order to identify all the devices on the fabric.

Common Zoning Schemas

The following are some common zoning schemas:
● Common Host
A common host with the SME environment is the most common zoning schema. This has a
zone per operating system, server manufacturer or HBA brand, or some similar approach.
Therefore, you have a zone consisting of all the common servers, plus the storage devices
they need to access.
● Single target multiple initiator
Traditionally, many storage subsystems had a rule that any port on an array could only be
accessed by multiple servers using the same operating system. Customers who started with
the common host approach, but then wanted better granularity in their zoning, saw the benefit
of having each zone consisting of one port on one storage array with all the devices that were
allowed to access that port. This also made it visibly easy with zoning to monitor that they
were following the arrays operating system support guidelines
● Single Initiator Multiple Target
Increasingly common in heterogeneous SANs, this approach comes from a simple premise --
SCSI initiators (servers) do not need to talk to other SCSI initiators. Therefore, a very robust
approach to avoid any potential problems with servers upsetting each other is to have one
server or indeed only one HBA in any zone, and then also put into that zone all the storage
devices to which the host is allowed to talk.
● Single Initiator Single Target
This is the ultimate in security because this keeps the zones to their absolute usable minimum
size and so provides maximum security from the zoning. This has been used very successful
in a few cases but is not so common. Without good software it is hard work to set up and
manage.

LUN Masking
26
While zoning provides a high-level security infrastructure in the storage fabric, it does not provide
the fine-grain level of access control needed for large storage devices. In a typical environment, a
storage controller may have many gigabytes or terabytes of storage to be shared amongst a set of
servers.
Storage controllers typically provide LUN-level access controls that enable an administrator to
restrict access to a given LUN to one or more hosts. By providing this access control at the storage
controller, the controller itself can enforce access policies to the data.

Introduction to Windows Server 2003 Cluster Technologies 33

LUN masking, performed at the storage controller level, enables you to define relationships
between LUNs and individual servers. Storage controllers usually provide the means for creating
LUN-level access controls that allow access to a given LUN by one or more hosts. By providing
this access control at the storage controller, the controller itself enforces access policies to the
devices. LUN masking provides more granular security than zoning, because LUNs provide a
means for sharing storage at the port level.
LUN masking, performed at the host level, hides specific LUNs from applications. Although the
HBA and the lower layers of the operating system have access to and could communicate with a
set of devices, LUN masking prevents the higher layers from knowing that the device exists and
therefore applications cannot use those devices. LUN masking is a policy-driven software security
and access control mechanism enforced at the host. For this policy to be successful, the
administrator has to trust the drivers and the operating systems to adhere to the policies.
LUN masking is a SAN security technique that accomplishes a similar goal to zoning, but in a
different way. To understand this process, you must know that an initiator, which is typically a
server or workstation, begins a transaction with a target, which is typically a storage device such as
a tape or disk array, by generating an I/O command. A logical unit in the SCSI-based target
executes the I/O commands. A LUN, then, is a SCSI identifier for the logical unit within a target.
In Fibre Channel SANs, LUNs are assigned based on the WWNs of the devices and components.
LUNs represent physical storage components, including disks and tape drives.
In LUN masking, LUNs are assigned to host servers; the server can see only the LUNs that are
assigned to it. If multiple servers or departments are accessing a single storage device, LUN
masking enables you to limit the visibility of these servers or departments to a specific LUN, or
multiple LUNS, to help ensure security.
You can implement LUN masking at various locations within the SAN, including storage arrays,
bridges, routers, and HBAs.
When LUN masking is implemented in an HBA, software on the server and firmware in the HBA
limit the addresses from which commands are accepted. You can configure the HBA device driver
to restrict visibility to specific LUNs. One characteristic of this technique is that its boundaries are
essentially limited to the server in which the HBA resides.
You can also implement LUN masking in a RAID subsystem, where typically a disk controller
orchestrates the operation of a set of disk drives. In this scenario, the subsystem maintains a table
of port addresses via the RAID subsystem controller. This table indicates which addresses are
allowed to issue commands to specific LUNs. In addition, certain LUNs are masked out so specific
storage controllers cannot show them. This form of LUN masking extends to the subsystem in
which the mapping is executed.
If a RAID system does not support LUN masking, you can implement this functionality by using a
bridge or router placed between the servers and the storage devices and subsystems. In this case,
you can configure the system so that only specific servers can to see certain LUNs.
When properly implemented, LUN masking fully isolates servers and storage from events such as
resets. It is important to thoroughly test your design and implementation of LUN masking,
especially if you use LUN masking in server clusters.
As with all things, the approach you use depends as much on your technology as how you operate.
Consider the technology and the implementation carefully.
Although zoning is not the answer to all your problems, it is a vital part of storage provisioning. It
is recommended to implement zoning even if it seems like overkill for a small SAN. After you get
going in the right direction, it will be easier to continue with a robust and reliable approach.

34 Introduction to Windows Server 2003 Cluster Technologies

SAN Management
SAN management is a huge topic on its own and is outside the scope of this training. Different
vendors provide a wide range of tools for setting up, configuring, monitoring, and managing the
SAN fabric, as well as the state of devices and hosts on the fabric.

Virtualized View of Storage

27
The logical devices presented by the controller to the storage fabric are some composite of the real
physical devices in the storage cabinet. Moving forward, the solution for storage management is
that the devices presented to the storage infrastructure are not tied to any physical storage. In other
words, the set of spindles in the cabinet is treated as a pool of storage blocks.
Logical devices can be materialized from that storage pool with specific attributes. For example,
the device must survive a single failure and have specific performance characteristics. The storage
controller is then free to store the data associated with the logical devices anywhere as long as the
desired characteristics are maintained.
At this point, there are no real physical characteristics associated with a logical disk. Any physical
notions, such as a disk serial number or identity, are purely software-generated virtualized views,
as shown in Figure 20.
By taking this route, storage vendors can drive many value-added storage management features
down into the storage infrastructure itself without having to have host involvement.

Introduction to Windows Server 2003 Cluster Technologies 35

Figure 20. Storage Virtualization by the Controller

Hardware-Based Storage Features

28
Listed below are some features of hardware-based storage:
● Hardware Snapshots or Business Recovery Volumes
Hardware snapshots are transparent to clustering and are supported if implemented correctly.
The only requirement is that the snapshot copy and the original should not be exposed to the
cluster at the same time.
● NAS as Shared Storage
Microsoft supports building clustered solutions using the Microsoft NAS offering, Server
Appliance Kit. The kit is supplied to OEMs who ship predefined solutions, some of which
may be clustered.

36 Introduction to Windows Server 2003 Cluster Technologies

● Geographical Clusters (Storage Implications)

In geographic clusters, the storage configurations can tend to get very complicated, depending
upon the level of service users require. To be eligible for support by Microsoft, all solutions
need to be in the Geographic Cluster Windows Catalog. Features, such as data replication,
storage management, and recovery procedures, are entirely the responsibility of the hardware
vendor implementing the solution.
● Replication (Synchronous, Asynchronous)
File Replication Service (FRS), the replication mechanism included natively in Windows
Server, is not used at the application data replication level. Server cluster is replication
agnostic and does not take the feature into consideration when it is running. If hardware or
software ISVs choose to implement this feature, they should make sure the semantics required
by Server clusters at the storage level are honored. It is their responsibility that all application
requirements are met with regard to data replication and consistency.

Note Certain applications, such as Exchange and SQL Server, used in clusters only support
Synchronous data replication. Ensure that the application in question supports the method of
replication you choose.

Storage Tools and Services Provided in Windows Server 2003

29
Listed below are some storage tools and services that are provided in Windows Server 2003:
● Remote Storage
Remote storage services are not supported by Server Cluster.
● Removable Storage
Removable storage services are not supported by Server Cluster.
● Backup
NTBackup and any backup solution that use the Cluster Backup APIs are supported for
Server Cluster Backup scenarios. Using the APIs allows for consistent backup and restore of
cluster configuration data. Shared disks can only be backed up on the node that owns the disk
at the time of backup
● Indexing
Indexing is not supported by Server Cluster. It also has serious implications with regard to
resource utilization and careful consideration must be given before turning it on.
● Distributed File System
DFS is fully supported by Server Cluster in Windows Server 2003; multiple DFS roots can be
hosted in a cluster.
● Offline Files
Client-side caching is a new feature can be enabled on clustered file share in Windows Server
2003. This is meant for user shares only and not for application data.

Introduction to Windows Server 2003 Cluster Technologies 37

Server Cluster and SANs

30
SANs are increasingly being used to host storage that is managed by server clustering. There are
some specific requirements driven from the implementations of clusters and the fact that all storage
on the SANs may not be owned by nodes in a cluster.
Ensure that the SAN configurations are in the Windows Server Catalog (multi-cluster section). In
addition, consider the following areas when configuring your storage:
● Zoning
Zoning enables users to sandbox the logical volumes that will be used by a cluster. Any
interactions between nodes and storage volumes will be isolated to the zone and other
members of the SAN will not be affected by the same. This feature can be implemented at the
controller or switch level and it is important that users have this implemented before installing
clustering
● LUN Masking
This feature enables users to express at the controller level, a specific relationship between a
LUN and a host. In theory, no other host should be able to see that LUN or manipulate it in
any way. However, various implementations differ in functionality and as such; one cannot
assume that LUN masking will always work. Therefore, it cannot be used instead of zoning.
However, you can combine zoning and masking to meet some specific configuration
requirements.
● Firmware and Driver Versions
Some vendors implement specific functionality in drivers and firmware. It is recommended
that users pay close attention to what firmware and driver combinations are compatible with
the installation they are running. This is valid not only when building a SAN and attaching
host to it, but also to the entire life span of the system, including hosts and SAN components.
Pay close attention to issues coming out of applying service packs or vendor specific patches
and upgrades. All nodes in a cluster must use the same make and model HBA connect to the
SAN that are at the same firmware and driver revision.
● Hardware Versus Software Zoning
You can implement zoning in hardware and firmware on controllers or on software on the
hosts. It is recommended that you use controller-based zoning because this enables uniform
implementation of access policy that cannot be interrupted or compromised by node
disruption or failure of the software component.
● Hardware Versus Software LUN Masking
Some vendors offer software-based masking facilities; ensure that software that is closely
attached to storage and is involved with the presentation of the storage to the OS must be
certified. For the same reason, it is not a good practice if you cannot guarantee the stability of
the software component.
● Boot from SAN
This is an increasingly demanded feature is fully supported in Windows Server 2003.
Configurations require support from the HBA and storage vendors. The HBA driver needs to
be a Storport driver. Storport drivers improve performance over SCSIPORT, both in terms of
throughput and in terms of the system resources that are utilized. It also adds a manageability
infrastructure for configuration and management of host-based RAID adapters and is a new
implementation in Windows Server 2003. These features are required if the operating system
is to successfully boot from a SAN, and more importantly, the vendors that have an
implementation need to get them qualified by Microsoft.

38 Introduction to Windows Server 2003 Cluster Technologies

However, such solutions have limited scaling capacities and any additional complexities, such
as storage replication and recovery mechanisms, must be addressed by the hardware vendors.
Some server clusters carry a feature that enables the boot disk, pagefile disks, and the cluster
disks to be hosted on the same channel. There are other performance and operational
implications that need to be considered before implementation.
See Microsoft Knowledge Base article Q305547: Support for Booting from a Storage Area
Network (SAN), which discusses this feature.

Storage Configuration and Setup

31
In case of a fresh cluster installation, ensure that you do not have applications running. When
creating a cluster or adding nodes to a cluster, the Wizard enumerates all of the storage on the node
and clusters all storage that is not somehow determined to be nonclusterable. Nonclusterable
storage includes all drives on any storage bus that have any system file on it, such as boot, system,
page file, crash dump, or hibernate file.
In case of an existing cluster that is being upgraded, then all configuration settings are honored and
a rolling upgrade from Windows 2000 is a fully supported option.

Cluster Service Features in Windows Server 2003

32
The Windows Server 2003 release has a number of storage enhancements. This section covers
enhancements that are specific to supporting Server clusters in a SAN environment.

Volume Mount Points

Server clustering in Windows Server 2003 supports volume mount points, which are directories on
a volume that an application can use to mount a different volume, that is, to set it up for use at the
location a user specifies. With mount points, you can create a drive without a drive letter thereby
bypassing the 26-drive letter limitation. These mount points can then be added as reparse folders to
other physical drives. Once added, these mount points can then be added to the cluster for shared
control.

Encrypting File System

With Windows Server 2003, the encrypting file system (EFS) is supported on clustered file shares.
To enable EFS on a clustered file share, you must perform a number of tasks to configure the
environment correctly:
EFS can only be enabled on file shares when the virtual server has Kerberos enabled. By default,
Kerberos is not enabled on a virtual server. To enable Kerberos you must check the Enable
Kerberos Authentication check box on the network name resource that will be used to connect to
the clustered file share.

Note Enabling Kerberos on a network name has a number of implications that you should ensure
you fully understand before checking the box.

All cluster node computer accounts, as well as the virtual server computer account, must be trusted
for delegation. See online help for how to do this.

Introduction to Windows Server 2003 Cluster Technologies 39

To ensure that the user’s private keys are available to all nodes in the cluster, you must enable
roaming profiles for users who want to store data using EFS. See online help for how to enable
roaming profiles.
After the cluster file shares have been created and the configuration steps above have been carried
out, user data can be stored in encrypted files for added security.

Disk Quotas
Configuring disk quotas on shared disks is fully supported.

Autochk/Chkdsk/Chkntfs
Every time Windows restarts, autochk.exe is called to scan all volumes to check if the volume dirty
bit is set. If the dirty bit is set, autochk performs an immediate chkdsk /f on that volume to repair
any potential corruption. Chkdsk is a native Windows tool that can determine the extent of file and
file system corruption. If Chkdsk runs in write-mode, it will automatically attempt to remedy disk
corruption.
The Chkntfs.exe utility is designed to disable the automatic running of chkdsk on specific volumes,
when Windows restarts from an improper shutdown. Chkntfs can also be used to unschedule a
chkdsk if chkdsk /f was used to schedule a chkdsk on an active volume on the next system restart.
All of the above are supported to run in specific configurations in Server Clusters. The relevant KB
articles that explain the procedures in detail are:
● Q174617 - Chkdsk runs while running Microsoft Cluster Server Setup
● Q176970 - How to Run the CHKDSK /F Command on a Shared Cluster Disk
● Q160963 - CHKNTFS.EXE: What You Can Use It For

Virtual Disk Service

The Virtual Disk Service (VDS) is a new feature in Windows Server 2003. It provides an interface
for software volume management, hardware RAID cabinet management, multi-path and device
allocation in a SAN. VDS is supported on Server Clusters. Additional providers, supplied by ISVs,
are required to enable cluster functionality.

Client Side Caching

Windows Server 2003 server clusters now support the configuration of client-side caching for
clustered file share resources.
CSCCache controls client-side caching for file shares on a server cluster. Client-side caching
enables network users to access files on file shares, even when the client computer is disconnected
from the network. This behavior can occur if files are cached on the client.

Cluster Best Practices for Shared Disks

33
Cluster disks, which are disks that have a corresponding Physical Disk resource type in the cluster
configuration, are just like any other disks hosted by Windows. However, there are some
additional considerations that you need to understand:
● General Best Practices

40 Introduction to Windows Server 2003 Cluster Technologies

○ Cluster server only supports the NTFS file system on cluster disks. This ensures that file
protection can be used to protect data on the cluster disks. Since the cluster disks can
failover between nodes, you must only use domain user accounts, Local System,
Network Service, or Local Service, to protect files. Local user accounts on one machine
have no meaning on other machines in the cluster.
○ Cluster disks are periodically checked to make sure that they are healthy. The cluster
service account must have write access to the top-level directory of all cluster disks. If
the cluster account does not have write access, the disk may be declared as failed.
● Quorum Disk
○ The quorum disk health determines the health of the entire cluster. If the quorum disk
fails, the cluster service will become unavailable on all cluster nodes. The cluster service
checks the health of the quorum disk and arbitrates for exclusive access to the physical
drive using standard I/O operations. These operations are queued to the device along
with any other I/Os to that device. If the cluster service I/O operations are delayed by
extremely heavy traffic, the cluster service will declare the quorum disk as failed and
force a regroup to bring the quorum back online somewhere else in the cluster. To
protect against malicious applications flooding the quorum disk with I/Os, the quorum
disk should be a dedicated disk that is not used by other applications that generate heavy
I/O. Access to the quorum disk should be restricted to the local Administrator group and
the cluster service account.
○ If the quorum disk fills up, the cluster service may be unable to log required data. In this
case, the cluster service will fail, potentially on all cluster nodes. To protect against
malicious applications filling up the quorum disk, access should be restricted to the local
Administrator group and the cluster service account.
○ For both reasons above, do not use the quorum disk to store other application data.
● Cluster Data Disks
○ As with the quorum disk, other cluster disks are periodically checked using the same
technique. When securing clustered data disks, the Cluster Service Account (CSA) needs
to still have privileges to the disk or the health checks will not be able to complete and
the Cluster Service will assume the disk is failed.

Managing File Shares in a Cluster

There are various considerations to manage files shares in a clustered environment. This section
covers the do’s and don’ts of managing cluster file shares and outlines general best practices.

Normal File Shares

In normal file shares, you need administer share level security using the cluster user interface
instead of Windows Explorer. You administer NTFS security settings on files on the cluster disks
using standard Windows tools such as Windows Explorer. Creating and managing file shares on a
cluster is different then on with a normal file share and must be done using the cluster tools, such
as Cluster Administrator (CluAdmin.exe) or the command line tool Cluster.exe.

SAN Do’s and Don’ts

34
This section describes the dos and don’ts of deploying one or more clusters in a SAN.

Introduction to Windows Server 2003 Cluster Technologies 41

Must Do
 Each cluster on a SAN must be deployed in its own zone. The cluster uses mechanisms to
protect access to the disks that can have an adverse effect on other clusters that are in the
same zone. By using zoning to separate the cluster traffic from other cluster or non-cluster
traffic, there is no chance of interference. Figure 21 shows two clusters sharing a single
storage controller. Each cluster is in its own zone. The LUNs presented by the storage
controller must be allocated to individual clusters using fine-grained security provided by
the storage controller itself. LUNs must be setup as visible to all nodes in the cluster and a
given LUN should only be visible to a single cluster.
Figure 21. Clusters assigned to individual zones

 All HBAs in a single cluster must be the same type and at the same firmware revision
level. Many storage and switch vendors require that all HBAs on the same zone, and in
some cases the same fabric, are the same type and have the same firmware revision
number.
 All storage device drivers and HBA device drivers in a cluster must be at the same
software version.
 SCSI bus resets are not used on a Fibre Channel arbitrated loop. They are interpreted by
the HBA and driver software and cause a LIP to be sent. This resets all devices on the
loop.
 When adding a new server to a SAN, ensure that the HBA is appropriate for the topology.
In some configurations, adding an arbitrated loop HBA to a switched fibre fabric can
result in widespread failures of the storage fabric. There have been real-world examples of
this causing serious downtime.

42 Introduction to Windows Server 2003 Cluster Technologies

 The cluster software ensures that access to devices that can be accessed by multiple hosts
in the same cluster is controlled and only one host actually mounts the disk at any one
time. When first creating a cluster, make sure that only one node can access the disks that
are to be managed by the cluster. This can be done either by leaving the other (to be)
cluster members powered off, or by using access controls or zoning to stop the other hosts
from accessing the disks. After a single node cluster has been created, the disks marked as
cluster-managed will be protected and other hosts can be either booted or the disks made
visible to other hosts to be added to the cluster.
This is no different to any cluster configuration that has disks that are accessible from multiple
hosts.

Must Not Do
 You must never allow multiple hosts access to the same storage devices unless they are in
the same cluster. If multiple hosts that are not in the same cluster can access a given disk,
it will lead to data corruption. In addition, you must never put any non-disk device into the
same zone as cluster disk storage devices.

Other Hints
Highly available systems such as clustered servers should typically be deployed with multiple
HBAs with a highly available storage fabric. In these cases, be sure to always load the multi-path
driver software. If the I/O subsystem in the Windows Server 2003 platform sees two HBAs, it will
assume they are different buses and enumerate all the devices assuming that they are different
devices on each bus. In this case, the host is actually seeing multiple paths to the same disks.
Many controllers provide snapshots at the controller level that can be exposed to the cluster as a
completely separate LUN. The cluster does not react well to multiple devices having the same
signature. If the snapshot is exposed back to the host with the original disk online, the base I/O
subsystem will re-write the signature. However, if the snapshot is exposed to another node in the
cluster, the cluster software will not recognize it as a different disk. Do not expose a hardware
snapshot of a clustered disk back to a node in the same cluster. While this is not specifically a SAN
issue, the controllers that provide this functionality are typically deployed in a SAN environment.

Adding and Removing Disks from a Cluster

35
In Windows Server 2003, it is simple to add a disk to the cluster. This involves adding the physical
drives to a storage controller and then creating a logical unit that is available in the correct zone
and with the correct security attributes.
After the disk is visible to the operating system, you can make the disk a cluster-managed disk by
adding a physical disk resource in cluster administrator. The new disk will appear as being capable
of being clustered.

Note Some controllers use a different cluster resource than physical disk. You need to create a
resource of the appropriate type for such environments. Only Basic, MBR format disks that contain
at least one NTFS partition can be managed by the cluster. Before adding a disk, it must be
formatted.

Introduction to Windows Server 2003 Cluster Technologies 43

Remember that the same rules apply when adding disks as in creating a cluster. If multiple nodes
can see the disk before any node in the cluster is managing it, this will lead to data corruption.
When adding a new disk, first make the disk visible to only one cluster node. After the disk is
added as a cluster resource, make the disk visible to the other cluster nodes.
To remove a disk from a cluster, first remove the cluster resource corresponding to that disk. After
it is removed from the cluster, the disk can be removed, either physically or through deletion and
re-assignment of the LUN.
There are several KB articles on replacing a cluster-managed disk. While disks in a cluster should
typically be RAID sets or mirror sets, there are sometimes issues that cause catastrophic failures
leading to a disk having to be rebuilt from the ground up. There are also cases where cluster disks
are not redundant and failure of those disks also leads to a disk having to be replaced. The steps
outlined in those articles should be used if you need to rebuild a LUN due to failures. You can
refer to the following KB articles:
● Q243195 - Replacing a cluster managed disk in Windows NT 4.0
● Q280425 – Recovering from an Event ID 1034 on a Server Cluster
● Q280425 – Using ASR to replace a disk in Windows Server 2003

Expanding Disks
36
In Windows Server 2003, you can expand volumes dynamically without requiring a reboot. You
can use the Diskpart tool from Microsoft to expand volumes dynamically. Diskpart is available for
both Windows 2000 and Windows Server 2003 from www.microsoft.com.
As long as the underlying disk subsystem, primarily the SAN, supports dynamic expansion of a
LUN, and that free space can be seen at the end of the partition in Disk Manager, diskpart can be
used to move the end of the partition anywhere within that free space.
The command switch used with diskpart to expand a volume is:

extend
Extends the volume with focus into next contiguous unallocated space. For basic volumes, the
unallocated space must be on the same disk as the partition with focus. It must also follow (be of
higher sector offset than) the partition with focus. A dynamic simple or spanned volume can be
extended to any empty space on any dynamic disk. Using this command, you can extend an
existing volume into newly created space.
If the partition was previously formatted with the NTFS file system, the file system is
automatically extended to occupy the larger partition. No data loss occurs. If the partition was
previously formatted with any file system format other than NTFS, the command fails with no
change to the partition.
You cannot extend the current system or boot partitions.

Syntax
extend [size=N] [disk=N] [noerr]

44 Introduction to Windows Server 2003 Cluster Technologies

Parameters
size=N
The amount of space in megabytes (MB) to add to the current partition. If no size is given, the
disk is extended to take up all of the next contiguous unallocated space.
disk=N
The dynamic disk on which the volume is extended. An amount of space equal to size=N is
allocated on the disk. If no disk is specified, the volume is extended on the current disk.
filesystem
For use only on disks where the file system was not extended with the volume. Extends the file
system of the volume with focus so that the file system occupies the entire volume.
noerr
For scripting only. When an error is encountered, DiskPart continues to process commands as if
the error did not occur. Without the noerr parameter, an error causes DiskPart to exit with an
error code.

See the following link for a more extensive documentation on the syntax of diskpart:
http://technet2.microsoft.com/WindowsServer/en/library/ca099518-dde5-4eac-a1f1-
38eff6e3e5091033.mspx?mfr=true

Microsoft Multipath I/O

37
Microsoft multipath I/O (MS MPIO) provides redundant failover and load balancing support for
multipath disk devices, which may be disks or logical units. MS MPIO consists of three Microsoft
written drivers, one Microsoft written device specific module (DSM), and one or more vendor-
written DSM. MS MPIO is not shipped with the Windows Server 2003 family but is made
available through storage vendors.
The primary goals for the MS multipath I/O storage driver are:
● Dynamic Configuration and Replacement of Devices
To support multiple paths to the same disk device, the operating system must be able to
dynamically discover and configure adapters connected to the same storage media. The
ability to dynamically configure these adapters requires that the storage media provide a
unique hardware signature or serial number. The signatures or serial numbers (WWNs in the
case of Fiber Channel devices) enable the multipath bus driver, mpio.sys, to uniquely identify
and configure the adapters.
● Generic Device-specific Module
Microsoft supplies a generic DSM that interacts with the multipath bus driver on behalf of the
storage device. To use the Microsoft supplied DSM, the SCSI or Fiber Channel disks must
support EVPD page 80H (Serial Number Page) or page 83H (Device Identification Page).
Intelligent third party enclosures must provide the DSMs mechanisms that enable the
operating system to match target devices accurately, either using Inquiry pages 80H or 83H,
or through a vendor-defined mechanism.
● Transparent System Architecture
System architecture for MS MPIO relies exclusively on PnP for device enumeration, resource
allocation, and device arrival and departure notifications. The PnP notifications to the device
stack for multipath adapters, or HBAs, follow the same rules for device arrival and departure
as for other removable adapters.
○ For device arrival, the multipath bus driver makes a call (PDO Bus Relations) to
invalidate the host bus. This causes the children of the bus to be re-enumerated and starts
the process of identifying devices that are attached to the same physical device.

Introduction to Windows Server 2003 Cluster Technologies 45

○ For device removal, the device receives an alert and the driver determines that the device
can be removed. When an adapter is removed, the multipath bus driver must remove any
child devices in the stack as appropriate.
● Dynamic Load Balancing
The multipath software supports the ability to distribute I/O transactions across multiple
adapters. The device-specific module is responsible for load balancing policy for its storage
device. As an example, this load balancing policy can enable the device-specific module to
recommend which path to take for better performance purposes based upon the current
characteristics of a connected SAN.
● Fault Tolerance
Multipath software can also function in a fault-tolerant mode in which only a singe channel is
active and any given moment. Should a failure in that channel be detected by the MPIO
subsystem, disk I/O can be transparently routed to the inactive channel without interruption
being detected by the OS. Some MPIO solutions can operate in a hybrid type mode where
both channels are active and a failure of any one channel would reroute to the other.

Architecture

The primary goal of the MPIO architecture is to ensure correct operation to a disk device that can
be reached via more than one physical path. When the operating system determines that there are
multiple paths to a disk device, it uses the multi-path drivers to correctly initialize and then use
one of the physical paths at the right time. At a high level, the MS MPIO architecture
accomplishes this by use of pseudo device objects within the operating system that replace the
physical and functional device objects involved.
In multi-path, multiple HBAs can see the same logical disk. The MS MPIO architecture creates a
pseudo HBA that acts in their place whenever an operation is attempted on a disk device that can
be reached via either of the HBAs. This is similar in concept to NIC teaming.
The MS MPIO architecture also creates pseudo logical disk devices, one for each logical disk in
the system that can be reached via a pathway. The drivers place the pseudo logical devices in the
I/O stack and prevent direct I/O to the physical devices involved.
When an I/O operation is sent to one of these pseudo devices, the MS MPIO architecture makes a
determination based upon interaction with the DSM on which physical path to use. The I/O
directed to the pseudo device is instead directed to one of the physical pathways for completion.

Additional References
38
The following Microsoft Knowledge Base articles provide additional information about storage:
● Q174617: Chkdsk runs while running Microsoft Cluster Server Setup
● Q176970: How to Run the CHKDSK /F Command on a Shared Cluster Disk
● Q160963: CHKNTFS.EXE: What You Can Use It For
● Q293778: Multiple-Path Software May cause Disk Signature to Change.
● Q243195: Replacing a cluster managed disk in Windows NT 4.0

46 Introduction to Windows Server 2003 Cluster Technologies

● Q280425: Recovering from an Event ID 1034 on a Server Cluster

● Q280425: Using ASR to replace a disk in Windows Server 2003

Review
39
Topics discussed in this session include:
● Storage Area Network terms and technologies
● iSCSI
● Microsoft disk features in a clustered environment
● Multipath I/O

Introduction to Windows Server 2003 Cluster Technologies 47

San Storage Interview Question
0% (1)
San Storage Interview Question
12 pages
Module 14: Transport Layer: Instructor Materials
No ratings yet
Module 14: Transport Layer: Instructor Materials
54 pages
Ict Notes - A' Level
No ratings yet
Ict Notes - A' Level
115 pages
InterSAN White Paper
No ratings yet
InterSAN White Paper
14 pages
Task Communication
No ratings yet
Task Communication
22 pages
Assignment No. 3
No ratings yet
Assignment No. 3
5 pages
Das / Nas / San
No ratings yet
Das / Nas / San
3 pages
Benefits of Fibre Channel (FC) SANs
No ratings yet
Benefits of Fibre Channel (FC) SANs
8 pages
Networking
No ratings yet
Networking
5 pages
SAN Report
No ratings yet
SAN Report
12 pages
Advanced Storage Technologies in Databases and Big Data Cloud Platform
No ratings yet
Advanced Storage Technologies in Databases and Big Data Cloud Platform
8 pages
Storage Area Network
No ratings yet
Storage Area Network
5 pages
Storage Basics: Storage Area Networks: Drew Bird Send Email More Articles
No ratings yet
Storage Basics: Storage Area Networks: Drew Bird Send Email More Articles
2 pages
San for the Das Customer With Scv2000
No ratings yet
San for the Das Customer With Scv2000
18 pages
Effective Data Backup System Using Storage Area Network
No ratings yet
Effective Data Backup System Using Storage Area Network
11 pages
San Seminar
No ratings yet
San Seminar
13 pages
Storage Area Network, An Introduction of Basic Concepts: Antonella Corno - CCIE Storage CM
No ratings yet
Storage Area Network, An Introduction of Basic Concepts: Antonella Corno - CCIE Storage CM
40 pages
I S A N: Mplementation OF Torage REA Etworks
No ratings yet
I S A N: Mplementation OF Torage REA Etworks
17 pages
Unit V
No ratings yet
Unit V
13 pages
Chapter 1 - San School
No ratings yet
Chapter 1 - San School
17 pages
Storage Area Network SAN SEMINAR
No ratings yet
Storage Area Network SAN SEMINAR
6 pages
Module 08 Storage Area Network: Background
100% (1)
Module 08 Storage Area Network: Background
4 pages
SAN Fundamentals Study Guide
No ratings yet
SAN Fundamentals Study Guide
27 pages
SAN Fundamentals
100% (2)
SAN Fundamentals
107 pages
Storage Basics: The Storage Area Network
No ratings yet
Storage Basics: The Storage Area Network
8 pages
Storage technologies 07
No ratings yet
Storage technologies 07
22 pages
This Document Provides An Overview of SAN and Explains Its Concepts. Hardware and Software Components That Make Up SAN Are Also Described
No ratings yet
This Document Provides An Overview of SAN and Explains Its Concepts. Hardware and Software Components That Make Up SAN Are Also Described
25 pages
TI2122 - Komputasi Awan 10
No ratings yet
TI2122 - Komputasi Awan 10
34 pages
Storage Area Network, An Introduction of Basic Concepts
No ratings yet
Storage Area Network, An Introduction of Basic Concepts
39 pages
Storage Area Network, An Introduction of Basic Concepts: Antonella Corno - CCIE Storage CM
No ratings yet
Storage Area Network, An Introduction of Basic Concepts: Antonella Corno - CCIE Storage CM
40 pages
Storage Area Networks
No ratings yet
Storage Area Networks
13 pages
Transfer of Information (TOI)
No ratings yet
Transfer of Information (TOI)
10 pages
Internetworking The Storage Area Networks
No ratings yet
Internetworking The Storage Area Networks
6 pages
Storage - NAS Vs SAN
No ratings yet
Storage - NAS Vs SAN
8 pages
1110 0667 PDF
No ratings yet
1110 0667 PDF
5 pages
Storage Architectures: Nases Sans
No ratings yet
Storage Architectures: Nases Sans
5 pages
Nas / San: NAS - Network Attached Storage (Filer) SAN - Storage Area Network What Differentiates The Two?
No ratings yet
Nas / San: NAS - Network Attached Storage (Filer) SAN - Storage Area Network What Differentiates The Two?
63 pages
Difference Between Storage Area Network 2
No ratings yet
Difference Between Storage Area Network 2
11 pages
06 Storage Network Technologies
No ratings yet
06 Storage Network Technologies
40 pages
Choosing San Fabric
No ratings yet
Choosing San Fabric
8 pages
Deploying iSCSI SANs
No ratings yet
Deploying iSCSI SANs
114 pages
Storage Area Networks: Chebrolu College of Engineering
No ratings yet
Storage Area Networks: Chebrolu College of Engineering
8 pages
San Vs Nas
No ratings yet
San Vs Nas
7 pages
JISC Technology and Standards Watch Report
No ratings yet
JISC Technology and Standards Watch Report
20 pages
Unit 2 PPT CC
No ratings yet
Unit 2 PPT CC
96 pages
9.storage Area Network
No ratings yet
9.storage Area Network
4 pages
Storage in Cloud
No ratings yet
Storage in Cloud
51 pages
How To Build A SAN: The Essential Guide For Turning Your Windows Server Into Shared Storage On Your IP Network
No ratings yet
How To Build A SAN: The Essential Guide For Turning Your Windows Server Into Shared Storage On Your IP Network
9 pages
Storage Area Network: Summer Training ON
No ratings yet
Storage Area Network: Summer Training ON
14 pages
HCIA-Cloud Computing-Chapter3
No ratings yet
HCIA-Cloud Computing-Chapter3
28 pages
Storage Technologies and Their Foundation: Assingment 2
No ratings yet
Storage Technologies and Their Foundation: Assingment 2
13 pages
Storage Area Network (SAN) Workshop Day 1
No ratings yet
Storage Area Network (SAN) Workshop Day 1
14 pages
004 SAN Technology and Application
No ratings yet
004 SAN Technology and Application
38 pages
Chebrolu College of Engineering
No ratings yet
Chebrolu College of Engineering
8 pages
IP Storage Network
No ratings yet
IP Storage Network
5 pages
mod3-SAN (1)
No ratings yet
mod3-SAN (1)
19 pages
Storage Virtualization 16122024 111213am
No ratings yet
Storage Virtualization 16122024 111213am
49 pages
Data Transport Protocols: San Intro
No ratings yet
Data Transport Protocols: San Intro
14 pages
Virtual Networks Unlocked: Your Guide to Azure Connectivity
From Everand
Virtual Networks Unlocked: Your Guide to Azure Connectivity
Kameron Hussain
No ratings yet
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
AZURE AZ 500 STUDY GUIDE-2: Microsoft Certified Associate Azure Security Engineer: Exam-AZ 500
From Everand
AZURE AZ 500 STUDY GUIDE-2: Microsoft Certified Associate Azure Security Engineer: Exam-AZ 500
Mamta Devi
No ratings yet
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Network Errors Linux
No ratings yet
Network Errors Linux
10 pages
Keyword2000 v1.5 (1997)
No ratings yet
Keyword2000 v1.5 (1997)
160 pages
ASSIGNMENT WEEK5
No ratings yet
ASSIGNMENT WEEK5
2 pages
Bulk Supply Interconnection Guideline 2019 PDF
No ratings yet
Bulk Supply Interconnection Guideline 2019 PDF
36 pages
Comp5311 20231
No ratings yet
Comp5311 20231
2 pages
Integrative Programming and Technology 1 Part 2
No ratings yet
Integrative Programming and Technology 1 Part 2
4 pages
3500-91 EGD Communications Gateway Module Datasheet - 165361
No ratings yet
3500-91 EGD Communications Gateway Module Datasheet - 165361
6 pages
Networking Fundamentals Nishanth
No ratings yet
Networking Fundamentals Nishanth
10 pages
4.4.1.2 Lab - Configuring Zone-Based Policy Firewalls
No ratings yet
4.4.1.2 Lab - Configuring Zone-Based Policy Firewalls
13 pages
What Is AUTOSAR Communication Stack (ComStack) - AUTOSAR Development
No ratings yet
What Is AUTOSAR Communication Stack (ComStack) - AUTOSAR Development
4 pages
UHF Radio Modem - RM10 RapidM
No ratings yet
UHF Radio Modem - RM10 RapidM
2 pages
Factorytalk Batch PhaseManager
No ratings yet
Factorytalk Batch PhaseManager
131 pages
IQ4 Loma RemoteCommands
100% (1)
IQ4 Loma RemoteCommands
6 pages
Unit 2 Networking Assignment 2023 (19696) V2
No ratings yet
Unit 2 Networking Assignment 2023 (19696) V2
10 pages
SR23 Series Controller Instruction Manual (Detailed Version)
No ratings yet
SR23 Series Controller Instruction Manual (Detailed Version)
68 pages
Network Essential Interview Questions
No ratings yet
Network Essential Interview Questions
13 pages
SOMATOM Spirit: Installation of SOMARIS/5.5 VB27
No ratings yet
SOMATOM Spirit: Installation of SOMARIS/5.5 VB27
38 pages
312 38 Demo
No ratings yet
312 38 Demo
8 pages
A2 Level CIE 9618 9608 Computer Science - Complete Syllabus Topics Lectures Videos
No ratings yet
A2 Level CIE 9618 9608 Computer Science - Complete Syllabus Topics Lectures Videos
7 pages
Unit-2 Iot MCQ Bank - HMG
100% (1)
Unit-2 Iot MCQ Bank - HMG
23 pages
Data Communication and Networking Lab: Mid Term Assignment
No ratings yet
Data Communication and Networking Lab: Mid Term Assignment
12 pages
System On Chip SoC Report
100% (1)
System On Chip SoC Report
14 pages
Practice of AUTOSAR China User Group Development of OTA 1634806726
No ratings yet
Practice of AUTOSAR China User Group Development of OTA 1634806726
30 pages
Exam Questions 101: Application Delivery Fundamentals
No ratings yet
Exam Questions 101: Application Delivery Fundamentals
18 pages
P25 Standards
100% (1)
P25 Standards
70 pages
An Autonomous Observation and Control System Based On EPICS and RTS2 For Antarctic Telescopes
No ratings yet
An Autonomous Observation and Control System Based On EPICS and RTS2 For Antarctic Telescopes
20 pages
Wonderware: Abtcp Daserver User'S Guide
No ratings yet
Wonderware: Abtcp Daserver User'S Guide
124 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.