CERC Dell Best
CERC Dell Best
CERC Dell Best
Authored By:
Worldwide Services Team
____________________
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN
TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED
AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.
Trademarks used in this text: Dell, the DELL logo, PowerEdge, PowerVault, Precision, and OpenManage
are trademarks of Dell Inc.; Microsoft, Windows, Windows NT, and Windows Server are either trademarks
or registered trademarks of Microsoft Corporation in the United States and/or other countries; Red Hat, Red
Hat Enterprise Linux, and Red Hat Linux are registered trademarks of Red Hat, Inc. in the United States
and other countries; Novell and Netware are registered trademarks of Novell, Inc., in the United States and
other countries.
Other trademarks and trade names may be used in this document to refer to either the entities claiming the
marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.
Page 3
TABLE OF CONTENTS
Page 4
OBJECTIVE AND SCOPE
This document contains the best practices for routine maintenance of systems using the CERC
SATA 1.5/6ch and CERC SATA 1.5/2s controllers to handle their RAID needs. This document is
not intended for addressing or recommending the type or size of arrays for specific applications.
These maintenance best practices are recommended to all Dell™ Enterprise users to avoid
failures, downtime, and data loss. These practices will help to ensure a better user experience by
maintaining the integrity of data and minimizing cost of downtime.
SECTION 1: INTRODUCTION
The goal of Redundant Array of Independent Disks (RAID) is to provide better performance
and/or reliability from combinations of disk drives than the performance provided with non-RAID
configurations. Serial ATA (SATA) disks are based on a low-cost technology that replaces
Parallel ATA (PATA) disk drives in value servers. Serial ATA incorporates significant technical
enhancements over traditional ATA making it ideal for RAID implementations. Along with several
configuration benefits, SATA improves data transmissions through a point-to-point topology,
which eliminates bus sharing and allows up to a full 1.5 Gb/s bandwidth to each drive. The SATA
standard also specifies a power connector that is different from the 4-pin connector used by
Parallel ATA (PATA) drives. The larger numbers of pins are used to supply three different
voltages if required – 3.3V, 5V and 12V. A key feature supported by some SATA solutions (but
not PATA) is also hot-swapping.
Dell offers two cost-effective SATA RAID solutions, specifically the CERC SATA 1.5/6ch and the
CERC SATA 1.5/2s.
• Optimized Disk Utilization – Enables use of the full capacity of all the drives, even if the
drive sizes vary.
• Online RAID Level Migration – Enables migration between RAID levels without
rebuilding the array from scratch.
• Multiple Arrays – Enables the user to create multiple arrays from a single set of drives.
• SATA Disk Hot Plug – The PowerVault 745N storage solution with a CERC SATA
1.5/6Ch supports hot plug hard drives. Hot plug hard drives can be added and removed
without shutting down the system.
The CERC SATA 1.5/6ch supports RAID levels 0, 1, 5, 10, and simple volume configurations. It
also supports automatic failover, which allows the controller to automatically rebuild an array
when a failed array is replaced with a new drive. This feature applies only to fault-tolerant arrays.
Page 5
The CERC SATA 1.5/6ch card is offered with Power Edge systems 700, 750, 800, 1800, 830,
850, and 1420SC; Power Vault system 745N; and Precision Workstation systems 470 and 670.
Figure 1 is a product image of the CERC SATA 1.5/6ch card:
For more controller specifications and supported operating systems, please refer to Appendix A in
this document.
For more controller specifications and supported operating systems, please refer to Appendix B in
this document.
Migrating or upgrading from the CERC SATA 1.5/2s to the CERC SATA 1.5/6ch is not supported.
In addition, migrating the CERC SATA 1.5/2s from non-RAID mode (“RAID off”) to RAID mode
(“RAID on”) is also not supported.
Page 6
SECTION 2: OVERVIEW OF STEPS ENSURING RAID BEST
PRACTICES
The following is an overview of the steps that can be taken to ensure RAID Best Practices.
Maintenance of Arrays
• Run regular consistency checks on the system.
• Ensure that properly qualified SATA cables are used and that they are not excessively
bent.
Recovery of Arrays
• Capture system logs and details surrounding array failures to assist in the recovery.
• Write down the exact steps or circumstances that caused the system to get in the failed
state
• Proper procedures should be followed when increasing array size depending on whether
an array expansion is done by increasing hard-drive size or by increasing the number of
drives.
• CERC Array Configuration Utility – Used to create, configure, and manage arrays. Also
used to initialize logical drives and rescan hard drives.
• SATASelect – Used to change device and controller settings.
• Disk Utilities – Used to format or verify media.
Note: The CERC SATA 1.5/2s BIOS RAID Configuration Utility only consists of the Array
Configuration Utility and the Disk Utilities. It does not contain a SATASelect utility.
Page 7
To run the utility, press <Ctrl><A> when prompted by the following message during system
startup: "Press <Ctrl><A> for BIOS RAID Configuration Utility".
• During array creation, there is an option available to enable read and write caching for the
array. When enabled (default setting), maximum performance is seen. However, there is a
potential for data loss or corruption during a power failure. Caching should be enabled to
optimize performance, unless the user data is highly sensitive, or the user’s application
performs completely random reads.
• During array creation, 3 options will be provided – Build, Clear and Quick Init. The Build
operation is a background initialization of a redundant array. The array is accessible
throughout. The Clear operation is a foreground initialization of a fault-tolerant array and
zeros out all blocks of the array. The array is not accessible until the clear task is complete.
With the Quick Init operation, an array is available immediately with no on-going background
controller activity. For a RAID 5, write performance is impacted until a Verify with Fix is run on
the array.
• When deleting an array, a backup of the data on the array should be done. Deleted arrays
cannot be restored and all the data on the array will be lost.
• The CERC SATA 1.5/6ch has a Disk Initialization option, which overwrites the partition table
on the disk and makes any data on the disk inaccessible. If the drive is used in an array, the
array may not be able to be used again. A drive that is part of a boot array should not be
initialized. (The boot array is the lowest numbered array – normally 00)
• The CERC SATA 1.5/2s has a Configure Drives option. If an installed disk does not appear in
the disk selection list for creating a new array or if it appears grayed out, it will need to be
configured before it can be used as part of an array. If a drive is configured, but not made
part of a RAID 0 or a RAID 1, it will function as a simple volume. Configuring a single drive
overwrites the partition table on the disk and makes any data on the disk inaccessible. If the
drive is used in an array, the array may not be able to be used again.
Disk Utilities
With the disk utilities, a low-level format or a verify operation of the hard disks can be done via the
Format Disk or Verify Disk Media options. Format Disk is a low-level format of the hard drive that
writes zeros to the entire disk. SATA drives are formatted at the factory and do not need to be re-
Page 8
formatted. Formatting destroys all the data on the drive. It is recommended that a fully tested
backup of all the data that is to be recovered is available before performing the Format Disk
option. Verify Disk Media scans the media of a disk drive for defects and any recoverable
defects are remapped.
Drive Status
Optimal - An array member disk with this status is in the optimal state and there are no errors
detected. For drives that do not belong to an array, this status indicates that they are ready for
use.
Unable to Access Drive - A drive obtains this status when the controller card is unable to detect
any physical connection between the controller and the hard drive. This could occur due to
hardware errors on the drive, loose connections between the controller port and the hard drive, or
accidental unseating of the drive.
Missing member - A drive obtains this status from the previous Unable to Access Drive status
after a rescan or a system reboot is done, during which the bus is rescanned and the
configuration is updated to reflect the missing drive.
Grayed out - A drive with this status is one that used to be part of a logical array, and is
recognized as a previous member of that array, but is not currently incorporated as a member of
the degraded or failed array.
Array Status
Optimal - An array with this status is optimal and ready for use.
Building/Verifying - An array with this state is currently building the mirror for a RAID 1 array or
calculating the parity for a RAID 5 array.
Failed - An array assumes the failed status when two or more hard drives fail and the data is lost.
Impacted - An array obtains this status when its performance becomes impacted. This could
happen when:
Page 9
• The two mirrors of a RAID 1 are not identical.
• There are parity inconsistencies in a RAID 5 array.
• A building (scrubbing) process is aborted before the array becomes optimal.
The CERC SATA 1.5/6ch controller uses nonvolatile flash to store on-board software, such as
BIOS, microprocessor kernel, and monitor. Whenever it becomes necessary to update any of
those components, you can update your controller's flash components using this utility. The utility
updates the controller's flash by reading flash image data from a supplied User Flash Image (UFI)
file and writing it to the controller's flash components. A UFI file contains all of a controller's flash
images, as well as information about each image. It also includes general controller information,
such as controller type, to ensure that the utility uses the correct UFI file when updating the
controller's flash.
• Update - Updates all the flash components on a controller with the flash image data from
a UFI file to ensure the utility uses the correct UFI file when updating the controller’s
flash.
• Save - Reads the contents of a controller's flash components and saves the data to a UFI
file. This enables you to later restore a controller's flash to its previous contents should
the need arise.
• Verify - Reads the contents of a controller's flash components and compares it to the
contents of the specified flash image file.
• Version - Displays version information about a controller's flash components.
• List - Lists all the supported controllers detected in your system.
The array options include Creating, Migrating, Deleting, Rebuilding and Verifying an Array, and
Preparing an array for Windows.
Page 10
Some key points to take note of:
• When performing a RAID level migration, interrupting this process may result in data
loss. Partitioning or formatting the new array will result in complete data loss.
• Deleting an array destroys all data on the array. Deleting an array in which the
operating system resides will destroy the operating system and the system will no
longer boot. RSM will not allow the deletion of an array in which the operating
system resides. The partition must first be deleted, or the array will need to be deleted
from the controller BIOS.
• If multiple drives fail in separate disk groups, replace each defunct drive. If multiple
physical drives fail simultaneously within the same disk group, contact your service
representative.
OMSA has drop-down menus and wizards for executing storage management and configuration
tasks.
With OMSM, the Rebuild rate, Background Initialization rate, and the Check Consistency rate,
can all be set. Foreign configurations can also be imported.
Versions of Dell OpenManage Server Administrator (OMSA) previous to version 4.4 used AM as
the RAID management utility. OMSA versions 4.4 and later use OMSM.
Page 11
Note: The functionality of the Storage Management applications is limited on the CERC SATA
1.5/2s. No reconfiguration of the subsystem can be done. However, the applications are still
useful for obtaining array status, starting consistency check, and forcing rebuilds if they do not
start automatically.
Table 1 is a feature comparison chart of RAID Storage Manager, Array Manager, and Open
Manage Storage Management (OMSM):
.
Raid Storage
Feature Manager Array Manager OMSM
Remote RAID Management No Yes Yes
Alarm Functionality Yes Yes Yes
Adaptec Support Yes Yes Yes
AMI/LSI Support No Yes Yes
Force Online Option No Yes Yes
Automatic Rebuild Yes Yes Yes
For more information on RAID support in Linux, please refer to the CERC SATA 1.5/6ch or CERC
SATA 1.5/2S user guides found on support.dell.com.
Tables 2 and 3 show the various possible Array and Drive status for the CERC SATA 1.5/6ch
across the three main Storage Management utilities.
Page 12
BIOS RAID Utility AM OMSM RSM
Similarly, Tables 4 and 5 show the various possible Array and Drive status for the CERC SATA
1.5/2s across the three main Storage Management utilities.
Page 13
Consistency Check of RAID Arrays – CERC SATA 1.5/6ch
RAID arrays are used mainly to protect critical data through redundancy, either in the form of
parity calculations or simple mirroring. Hard drive media defects have improved over time, even
as drive sizes continue to increase. Hard drives, however, are not expected to be completely
flawless and normal wear on a drive may lead to an increase in media or “grown” defects over
time. These bad blocks will need to be remapped to another location on the drive. If a bad block
is detected during a normal write operation, the controller will mark that block as bad and the
block will be added to the “grown defects list” in the drive’s NVRAM. That write operation will be
considered incomplete until the data is properly written to a remapped location successfully. If a
bad block is detected during a normal read operation, the controller will reconstruct the missing
data and remap to a new location.
A double fault scenario is one in which the controller detects a bad block on a drive in a RAID
array and then detects a second bad block on another drive in the same data stripe. This
scenario can also occur when rebuilding a degraded logical drive, when the controller encounters
a bad block on a good drive in the array. This will lead to a rebuild failure and potential data loss.
For the CERC SATA 1.5/6ch, there are two types of consistency checks offered – the consistency
check and the background consistency check. The consistency check (CC) is used to restore the
consistency for redundant arrays after unexpected events, such as a power loss. For RAID 5
based arrays, it recalculates and restores parity if needed. For RAID 1 data based arrays, it
restores the mirror. If media errors are encountered, data recovery is initiated like in a
background consistency check (BCC). A CC can be initiated via any of the storage management
applications, RSM, AM or OMSS. Regular consistency checks will reduce the risk of double fault
scenarios. To avoid downtime and to ensure data integrity, it is recommended that consistency
checks be included as part of routine maintenance of all RAID systems. For more information on
scheduling consistency checks in Windows, refer to Appendix C. To enable a consistency check
in the following Array Management Utilities, the following steps need to be performed:
Page 14
Background Consistency Check of RAID Arrays – CERC SATA
1.5/6ch
Background consistency check is a method used by the CERC SATA 1.5/6ch controller to detect
hard drive media errors and recover data. It can be enabled in the controller BIOS to run in the
background while other processes are going on, in order to proactively and efficiently detect and
fix media errors. When a hard drive media error is detected, it proceeds to recover the lost data
by regenerating the right data from peer disks and relocating the generated data. Background
consistency check will only run on redundant arrays.
The Background Consistency Check feature was implemented in the CERC SATA 1.5/6ch
firmware version 4.1.0.7417. It is disabled by default and will need to be manually enabled in the
controller BIOS in order to implement it. Typical performance impact when this feature is enabled
is about 1-4%. The worst-case scenario can be approximately 10% for Random Writes.
Performance numbers may also vary depending on the configuration.
To verify the drives, the <Ctrl><S> can be used. A prompt will pop up asking if the utility should
automatically fix any errors. When the verification is complete, a verification complete message
will appear.
Another key point to note is that the Verify command cannot be performed on the CERC SATA
1.5/2s while another operation is queued, such as rebuild or initialization. If the Verify command is
run while another activity is in progress, the system will return to the Manage Arrays section
without completing the verify process.
It is recommended that consistency checks on each RAID logical volume be performed at least
once a month. This will increase the chance of detecting any media defects (bad blocks), remap
them and recalculate the parity on the data stripes. This will also reduce the probability of
encountering double fault scenarios during rebuild and causing inconvenient down times.
Please refer to the Appendix for instructions on how to set up automated scheduling of
consistency checks on Windows Systems.
Page 15
Upgrading Firmware, Drivers and Storage Management Utilities
Concurrently to the Latest Versions
The latest RAID controller firmware and driver for both the CERC SATA 1.5/6ch and CERC SATA
1.5/2s can be found on support.dell.com. This upgrade will ensure maximum performance,
reliability, and functionality of the RAID controllers. Upgrading the firmware and drivers along with
the latest versions of the Storage Management utilities will ensure correct functionality at all levels
and availability of all features. It is recommended that the driver should be updated before
updating the firmware.
All storage management applications’ event logs should also be monitored regularly for any
media errors, including corrected media errors. Corrected media errors are normal, but an
excessive number of such errors within a short period of time may be indicative of a drive that will
need to be proactively replaced during a maintenance cycle. These event logs will also be
available from any Novell Netware server with the Windows console. These events will be
displayed on the Array Manager Event logs, as shown in Figure 2.
The BIOS-based event logs can also be monitored. The BIOS-based event logs store all firmware
events like configuration changes, array creation, boot activity, and so on. This event log has a
fixed size and once full, older events are flushed as newer events are populated. This log is also
volatile, and it is cleared with each system restart.
Page 16
2. From the BIOS RAID Configuration utility menu, press <Ctrl><P>.
The Controller Service menu appears.
3. Select Controller Log Information, and then press <Enter>.
The current log is displayed.
Hotspare Assignments
A hotspare is a drive that is reserved to replace a failed drive in a redundant array. In the event of
a drive failure, the hot spare replaces the failed drive and the array is built automatically. Before
becoming an array member as a result of a failure, a hot spare can be unassigned using a
management utility.
Note: For the rebuild to complete successfully, a hot spare must be of the same size or larger
than the smallest drive in an array.
For the CERC SATA 1.5/s, there is an Add/Delete Hotspares option in the controller BIOS.
However, the CERC SATA 1.5/2s only supports 2 drive configurations, and no hot spares can be
assigned when the array is in an optimal state. This option can only be used in the case of a
degraded array, if there are problems kicking off a rebuild.
• Global: Protects any array that the spare drive has sufficient capacity to protect
• Dedicated: Protects only the array to which it has been assigned
Note: For a RAID 10 array, the system can use the same global hot spare to replace two
failed drives in the same array if the global hot spare is at least twice the size of the failed
drives in the array. This is not recommended because redundancy will be affected. When
assigning a global hotspare in a system with a RAID 10 array, a spare which is the same size
as the members of the array should be used.
Page 17
When a drive in an array containing a dedicated hot spare fails, the spare is automatically used to
store the data contained on the failed drive if the spare has enough capacity. The spare becomes
a member of the array and will no longer be identified as a hot spare. If the spare is larger than
the drive it is replacing, the extra portion will remain unused.
Automatic Failover
The automatic failover feature allows the controller to automatically rebuild an array when a failed
drive is replaced with a new drive. This feature applies only to fault tolerant arrays. In the CERC
SATA 1.5/6ch controller BIOS, to ensure that automatic failover is enabled, the following steps
can be performed:
1. At the BIOS screen, press the <Ctrl> + <A> keys together when prompted to enter the
Adaptec RAID Configuration Utility.
2. From the Options menu, select SATASelect Utility. Then select Controller
Configuration. Verify that Automatic Failover is enabled. If it is disabled, press <Enter>
to select the Enabled option. Press <Esc> to exit and choose Yes to save changes
made.
Cabling Practices
The following are some general cabling best practices that should be followed:
• Ensure that properly qualified cables are being used
• Ensure that the SATA cables are properly connected to the controller or SATA ports and
SATA hard drives and that there are no loose connections
• Ensure that the cables are not excessively bent
• Ensure that the cable lengths are appropriate for installation
• Examine the cables for cuts or exposed shielding
For the Linux operating system, the “Getconfig” tool can be used to retrieve issue information
when possible. Getconfig is a utility that pulls the hardware configuration data from various
Page 18
sources on a Linux box. Retrieving the system data from a Linux box is very labor intensive and
this tool fully automates the process.
Cables – All cables should be approved cables for the particular application. Pins should be
inspected and all external and internal SATA cables attached to the system should be reseated. If
any damage is identified on the cable pins, the female connection should be inspected as well for
resultant damage.
Power – Each system in the rack should be protected by an approved Uninterruptible Power
Supply (UPS) and it should be verified that each server has the proper amount of power. Low
voltage or power spikes will knock an array offline.
Firmware and Driver – A mismatch of firmware and driver could result in random controller
lockups, hangs or BSODs. The system should be at the current approved level. The latest
firmware and driver versions can be found on support.dell.com. When performing a firmware and
a driver update, it is recommended that the driver be updated before the firmware. The driver is
always backward compatible with older firmware. A new firmware with an older driver will usually
result in abnormal behavior.
Defective Hard Drive – In some cases, a drive can cause noise on the SATA bus and knock an
array offline. If this is the case, the diagnostics should fail the drive, unless the drive is producing
sufficient noise to render the bus inoperable. System logs should be reviewed for drive reported
errors. Drives can also be reseated to ensure good connections.
The following message is seen when the firmware detects that a new drive has been inserted
since the last boot:
Page 19
The following Arrays have Missing or Rebuilding or Failed members and are degraded:
Array#4
The following message appears when an array is missing more than one drive and is failed.
The Following Arrays have missing required members and cannot be configured:
Array#4
The following message appears when the firmware has a problem with one of attached drives.
Either the firmware cannot prepare the driver during boot up or a firmware kernel crash has
occurred:
“Fatal Error: Controller monitor failed. Controller not started. Press any key to continue.”
The CERC SATA 1.5/6ch and CERC SATA 1.5/2s user guides available on support.dell.com can
be used to find more troubleshooting tips and guidelines.
For the CERC SATA 1.5/6ch, the Automatic Failover feature allows the controller to automatically
rebuild an array when a failed drive is replaced. This feature only applies to fault-tolerant arrays.
This feature is enabled by default. If disabled, the array will need to be rebuilt manually. The drive
replacement should be done when the system is turned off (except in the case of the PV745N,
which supports hot plugging). A hot spare is a disk that is not used in data storage, but is
reserved for use as a replacement for one of the other drives in the array in the event of a failure.
If the automatic rebuild does not start automatically, storage management utilities like Array
Manager or OMSM can be used to perform a rescan and trigger the rebuild. If the rebuild still
does not start, a manual rebuild should be attempted. In the controller BIOS, the <CTRL><S>
function can be used to assign the newly inserted drive as a dedicated hot spare or <CTRL><G>
can be used to assign it as a global hot spare for the array to be rebuilt. Once the hot spare is
assigned, the rebuild should start. Storage management utilities can also be used to assign a
Page 20
drive as a hot spare. If an error message displays while assigning the hot spare, initialize the
newly inserted hard drive first to erase the old configuration data from the previous usage.
Note: Initializing a hard drive will destroy all the data on that drive.
<CTRL><R> can only be used when the array status is failed. All the original array
member disks must be present in the system. Drives grayed out under Array Members can be
considered to be original members of the failed array. <CTRL><R> can incorporate these
drives back into the array. If the array is in a degraded state, a hot spare should be assigned
to initiate a rebuild to restore the array to its optimal status. <CTRL><R> cannot be used to
recover RAID 0 arrays.
Note: <CTRL><R> cannot guarantee the consistency of the data. The integrity of the data will
need to be verified. This option should be used only to try to recover the data The data may be
lost permanently.
1. At the BIOS screen, press <CTRL><A> to enter the BIOS RAID Configuration Utility.
2. From the Options menu, select Array Configuration Utility. Then select Manage
Arrays.
3. Choose the desired array under List of Arrays. Press <CTRL><R> to Enable/Restore
RAID. When the warning message appears, type Y to continue. Back up as much data as
possible from the recovered array.
The Enable/Restore RAID function is also available in Array Manager and OMSM and is referred
to as Restore Dead Disk Segments. If the OS is up and running, this option can be used to
force the drives online.
1. Delete the array. When asked whether to delete the boot sectors, select NONE.
2. Re-create an array with the same size as the one that failed.
3. Check if the system can be booted to the OS.
4. If the system can be booted to the OS, perform a backup of all the data required. Then,
perform a Verify Disk Media (in the BIOS) or Dell Diagnostics hard drive long DST test on
the originally problematic drives.
5. If the test passes, re-create the array, perform an Array Verify, and restore the data from
the backup.
6. If the test fails, replace the hard drives that fail the hard drive diagnostics.
Page 21
Note: The preceding restoration process is not a design feature of the CERC SATA 1.5/2s and it
cannot be guaranteed to help restore a system in a failed state.
To determine if a double fault scenario has happened under Array Manager or OMSM:
If a rebuild fails, the array disks should be first checked. If a drive other than the one that was
replaced or re-inserted to fix the original issue appears “Degraded” or “Offline”, this indicates the
double fault scenario. If a drive that was re-inserted or replaced appears “Degraded” or “Offline”,
then follow the steps described in the Recovering from Arrays in a Degraded State section.
Alternatively, under the Events log, the ID of the hard drive which showed the medium error can
be checked. If the ID is that of an existing drive in the array, one that was not replaced or re-
inserted, this indicates the double fault scenario. The medium error and rebuild failure error
messages will appear as shown below:
Note: When rebuild fails due to a double fault scenario, it is advisable to back up all critical data,
re-create the array, and restore the data. To avoid this scenario in the future, consistency checks
should be scheduled on a regular basis.
To determine if a double fault scenario has happened under the controller BIOS:
If a rebuild fails, check the Array Members under Array Properties. If one of the existing array
disks, one that was not replaced or re-inserted, appears to be grayed out or missing, this
indicates the double fault scenario.
Rebuilding
If a rebuild fails due to a double fault scenario, the rebuild will not kick off again even if a drive is
assigned as a hot spare. This is working as designed. In a double fault scenario, the firmware is
unable to generate the parity for the stripe due to a bad block on the secondary drive (double fault
on an array), thus disabling rebuilding on the replaced new drive (in other words, it does not kick
off rebuild at all even if the drive is replaced again with another drive).
In some cases, a rebuild might not kick off at all on the new or original drive even if there’s no
double fault scenario. To recover from this situation, initialize the drive and/or assign the
replaced or original drive as a hot spare to kick off the rebuild process.
Page 22
i. The replaced drive is bad. In this case, run hard drive diagnostics on the
replaced drive to verify if the drive is truly bad before replacing it again.
ii. One of the existing drives, in a degraded volume, has a bad block (Double
Fault Scenario). In this case, depending on which virtual disk is affected due
to the bad block, the rebuild will fail ONLY on the affected Virtual Disk.
However, it will continue and complete on rest of the virtual disks.
a. If a user reinserts the same drive or replaces it with a new one, the
rebuild will not restart for ONLY those virtual disks that had failed
earlier (due to dual failure scenario).
b. Array Manager logs the following error in the AM log:
Perc2Pro 544 CERC SATA1.5/6ch Controller 0 , Virtual Disk (OS
0) rebuild failed
This problem can be avoided by updating the CERC SATA 1.5/6ch firmware to
version 4.1.0.7417 or later. It is also recommended that regular consistency checks
be scheduled to avoid running into rebuild failure issues.
3. A RAID 1 rebuild may not start and may generate a stop error on a CERC SATA
1.5/2s system.
If a drive fails in a RAID 1 and the rebuild option is selected within OMSS 1.0, the
rebuild may not start or may generate a “stop error”.
The server should be restarted and the Configure Drives option should be selected
in the controller BIOS. The new disk should be selected, followed by the Add/Delete
Hotspare option. The new disk should then be selected again. After rebooting the
system, the virtual disk will rebuild automatically when the operating system starts.
This procedure can also be performed when the drive is first replaced, in which case,
no operating system boot will be required.
All newer OMSS versions have a fix for this issue. No hardware replacements should
be required.
Note: Virtual disks or arrays larger than 2 Terabyte (TB) cannot be created on any Dell CERC
controller. The SATA specification supports 2 TB Virtual Disk (array). However, the 2TB limitation
on CERC SATA is imposed due to BIOS, driver, and Application Programming Interface (API)
restrictions. Currently, Dell does not have any plans to support Virtual Disks larger than 2TB on
the CERC SATA 1.5/6ch or the CERC SATA 1.5/2s. Due to this limitation, certain CE or RLM
operations beyond the 2TB limit may not work.
Capacity Expansion
Capacity Expansion involves adding a physical disk member to an existing RAID array and
expanding the logical drive by utilizing the additional capacity. CE also allows expansion of the
Page 23
logical drive by utilizing the unused space in the existing drives, without inserting a new drive.
Windows Server 2003, Windows 2000, and Netware, also support Online Capacity Expansion
(OCE). Upon completion of an array expansion, the additional capacity can be used without
restarting the system. This feature is not available in the CERC SATA 1.5/2s.
The following are the basic procedures that need to be followed when expanding an array. The
first way to perform an array expansion is by increasing hard drive size. An example would be
upgrading all 80GB hard drives to 250GB or 400GB hard drives. The second way to perform
array expansion is by increasing the total number of hard drives that make up the array. An
example would be adding an additional drive to a three drive RAID 5 to make a four drive RAID 5.
Note: Before any array expansion operation, all critical data should be backed up in the event of
an array reconstruction failure.
Note: Array expansion cannot be done via the BIOS. A storage management application such as
Array Manager or OMSM is required.
Note: The CERC SATA 1.5/6ch does not support hard drive sizes larger than or equal to 1 TB.
RAID 0 Array:
1. Back up all data, replace the existing drives with the new drives, re-create the array, and
restore the data on the new array from backup.
RAID 1 Array:
1. Back up all data (Recommended).
2. Remove the first drive and add the replacement drive.
3. Perform a rebuild process (either manually or automatically if auto rebuild is enabled).
4. Upon rebuild completion, remove the second drive and add the replacement drive.
5. Perform a rebuild process (either manually or automatically if auto rebuild is enabled).
6. Perform a Capacity Expansion.
Note: This series of steps may be very time consuming depending on the size of the drives and
the system utilization by applications, due to the dual rebuild cycles required. Alternatively, the
following steps can be performed to reduce the number of rebuild cycles:
1. Back up all data.
2. Delete the existing array and create a new RAID 1 array.
3. Restore the data on the new array from backup.
RAID 5 Array:
1. Back up all data (Recommended).
2. Remove the first drive and add the replacement drive.
3. Perform a rebuild process (either manually or automatically if auto rebuild is enabled).
4. Upon rebuild completion, remove the second drive and add the replacement drive.
5. Perform a rebuild process (either manually or automatically if auto rebuild is enabled).
6. Repeat this process until all the drives are replaced.
7. Perform a Capacity Expansion.
Note: This series of steps may be very time consuming depending on the number of drives in the
RAID 5 array, the system utilization by applications, and the size of the drives, due to the multiple
rebuild cycles required. Alternatively, the following steps can be performed to reduce the number
of rebuild cycles.
Page 24
1. Back up all data.
2. Delete the existing array and create a new RAID 5 array.
3. Restore the data on the new array from backup.
RAID 10 Array:
1. Back up all data (Recommended).
2. Remove the first drive and add the replacement drive.
3. Perform a rebuild process (either manually or automatically if auto rebuild is
enabled).
4. Upon rebuild completion, remove the second drive and add the replacement drive.
5. Perform a rebuild process (either manually or automatically if auto rebuild is
enabled).
6. Repeat this process until all the drives are replaced.
7. Perform a Capacity Expansion.
Note: This series of steps may be very time consuming depending on the size of the drives and
system utilization by applications, due to the multiple rebuild cycles required. Alternatively, the
following steps can be performed to reduce the number of rebuild cycles.
1. Back up all data.
2. Delete the existing array and create a new RAID 10 array.
3. Restore the data on the new array from backup.
Note: When there is a drive failure, and the failed drive is replaced with a drive that is larger than
the rest of the drives in the array, the size of the virtual disk will not increase due to disk coercion.
The leftover space on the new drive however is available for use by other virtual disks in the
system.
RAID 1 Array:
No additional drives can be added to a RAID 1 array, because by definition, it is formed using
only 2 drives.
RAID 5 Array:
1. Back up all data (Recommended).
2. Select the additional drives to be added.
3. Perform an Array Reconfiguration operation.
The CERC SATA 1.5/6ch supports modifying existing arrays by expansion, migration from one
array type to another, and changing the stripe size. These migration scenarios are described in
Table 6.
Page 25
Current Array Type New Array Type
RAID 0 RAID 5 or 10
RAID 1 RAID 0 or 5 or 10
RAID 5 RAID 0 or 10
RAID 10 RAID 0 or 5
RLM can occur by migrating from a lower redundancy RAID level to a higher level or from a
higher redundancy RAID level to a lower level, both without taking the array offline. Both types of
RLM must involve migration to an array with a capacity greater than or equal to the original array.
This can be done by combining the RLM operation with the CE operation. Figures 3, 4 and 5
illustrate a RLM from a 2-drive RAID 1 Array to a 4-drive RAID 5 array.
Page 26
Figure 4: Selecting the Attributes for the Reconfigured Virtual Disk
Page 27
2. A second issue involved the CERC SATA 1.5/6ch crashing the Windows Server when
morphing a RAID 5 array. This happened when the morph destination hard drive
sequence did not match the source hard drive sequence on the same set of hard drives.
This issue has been fixed in the latest firmware version 4.1.0.7417.
Note: To ensure that the above listed or other issues pertaining to array morphing are not seen,
the CERC SATA 1.5/6ch firmware should be updated to the latest version 4.1.0.7417, which can
be found on support.dell.com
SECTION 6: PERFORMANCE
The CERC SATA 1.5/2s is a software-based RAID implementation and has no internal cache
memory. This software implementation is integrated within the driver, which contains the code to
run the RAID engine within the OS environment. Driver-based RAID depends completely on the
resources of the system processor and memory for RAID execution, and may affect system
performance in high CPU utilization environments. There is also a known issue with Windows
2003, in which if a system is running the native Microsoft driver atapi.sys, instead of the CERC
SATA 1.5/2s driver, the system may run in Programmable Input/Output (PIO) mode, which may
lead to poor performance. For all CERC SATA 1.5/2s RAID implementations, it is recommended
that the latest aarich.sys driver, found on support.dell.com, be used.
The CERC SATA 1.5/6ch is a hardware RAID implementation, in which dedicated hardware with
embedded firmware is used to control the RAID operations. The performance of a hardware RAID
solution is dependent on the processing power of the controller’s I/O processor and the cache
size, unlike software RAID, whose performance is directly dependent on server CPU performance
and load.
Cache is a fast-access memory on the controller that serves as intermediate storage for data that
is read from, or written to drives. There are 2 caches that can affect performance, the hard drive
cache and the controller cache. The hard drive cache is enabled by default. With the hard drive
cache enabled, a performance gain of up to 40% on write commands can be obtained. The
CERC SATA 1.5/6ch has a controller cache memory of 64 MB, fixed ECC SDRAM, which when
enabled, can significantly improve sequential and random write performance.
The I/O throughput of the CERC SATA 1.5/6ch is mostly determined by the attached hard drive
performance, the onboard I/O processor’s processing power, and the onboard cache size. The
local CPU speed and memory size will not affect the RAID storage subsystem throughput too
much. If a machine is running applications that consume a lot of system memory and free space
becomes scarce, this could affect the RAID subsystem’s operations.
The following are the main SATA configuration options that may affect the performance of the
CERC SATA 1.5/6ch.
Note: When Write Cache is enabled, there is a potential for data loss or corruption during
a power failure. A UPS solution is recommended to ensure fault tolerance.
• DMA (Default: ENABLED) – When enabled, Direct Memory Access (DMA) mode is used
for the drive, providing maximum performance.
Page 28
• Allow Read Ahead (Default: ENABLED) – When enabled, the drive’s read ahead cache
algorithm is used, providing maximum performance under most circumstances.
• Stripe Size (Default: 64MB) – The default stripe size gives the best overall performance
in most network environments.
Write Cache
The write cache policy for the CERC SATA 1.5/6ch is usually set during the creating of a Virtual
Disk. This policy can be changed using an array management utility such as Array Manager or
OMSM. The hard drive cache is enabled by default. With the hard drive cache enabled, a
performance gain of up to 40% on write commands can be obtained.
The write cache policy cannot be changed with Array Manager 3.6 or below. Array Manager 3.7
has provided a way, via registry change, to enable the write cache on the CERC SATA 1.5/6ch
controller. This registry change allows an Array Manager user to perform a Change Policy Virtual
disk command and select the Write Cache Enabled Always setting. This change will permit an
Array Manager user to enable this setting only on CERC SATA 1.5/6ch controllers without
recreating their Virtual Disks. Please note before making any registry changes, it is recommended
that all critical data be backed up. The Write Cache Enable Always setting can lead to cache
data loss. Data in the write cache will be lost if power is lost to the server. This setting should only
be used when there is a UPS battery backup for the system. Even with UPS battery backup for
the server, there is no guarantee that cached data will not be lost during a power failure. This
setting should be selected only on virtual disks that contain non-critical data or data where the
potential for data loss will not be catastrophic.
Note: This functionality was added with Array Manager 3.7 and will not work with earlier versions.
Array Manager 3.7 can only be installed while installing OMSA 4.3 and above.
Page 29
Appendix: SATA BEST PRACTICES
CERC SATA 1.5/6ch Controller Specifications
Minimum System Requirements
Server or workstation with one universal PCI slot and a motherboard and BIOS that complies with
the PCI Local Bus Specification, Revision 2.2 and provides large memory-mapped address
ranges.
Controller Specifications
Component Description
Computer bus 32 or 64-bit PCI local bus
Intel 80302 Intelligent I/O Processor
On-board processors Three Silicon Image SI3512 dual SATA 1.0 controllers with command
queuing
Cache memory 64 MB, fixed ECC SDRAM
Data safety Audible alarm
Device protocol SATA 1.0 and SATA II
RAID levels RAID 0, RAID 1, RAID 10, RAID 5, and Simple volume
Container (array)
Up to 64 containers per controller; 64 partitions maximum per container
support
PCI bus 64-bit, 66 MHz (32-bit, 33 MHz-compatible)
SATA channels Six internal channels
Up to six SATA devices per controller (1 per channel)
Device support
Supports a RAID container as a boot device
Page 30
CERC SATA 1.5/2s Controller Specifications
Controller Specifications
Component Description
SATA Controller Integrated ICH5R (SC1420/SC1425/PE1800) and ICH6R(SC420/PE800)
Cache memory None
Device protocol SATA 1.0
RAID levels RAID 0, RAID 1, Up to two single configured drives
Container support One container. One logical drive
SATA channels Two SATA ports
One SATA drive per port, maximum two HDDs
Device support
Supports Logical Drive as boot device
SMART Support Yes
Page 31
Setting Up Automated Scheduling of Consistency Checks on
Windows Systems
1. For systems with a Windows OS system and Array Manager installed, you can use the
Scheduled Tasks option from the menu under the Accessories folder. Double-click on
Add Scheduled Task and the following wizard will appear:
3. Click Browse and locate the file amcli.exe. The AMCLI executable is located in the Array
Manager installation directory.
4. Select the file and click OK.
Page 32
5. Click Next and the following screen will appear:
6. Enter a name for this task and select how often the task should be run. The minimum
recommendation for this task is to be run at least once a month.
Page 33
7. Click Next and the following screen will appear:
8. Select the time at which the Consistency Check should run. Remember that there will be
a system performance impact so you want to run this at a low traffic time.
9. Click Next and the following screen will appear:
10. Fill in the name and password fields appropriately so the task can be executed correctly.
Page 34
11. Click Next and the following screen will appear:
12. Select the checkbox for Open advanced properties for this task when I click Finish.
13. Click Finish and the following screen will appear:
Page 35
14. In the Run textbox you can type different parameters. The following is example syntax for
scheduling a check consistency on virtual disk 1. "C:\PathName\amcli.exe" /c1 where
PathName is the path to the AMCLI executable. This will be the command executed by
the scheduler every time it runs this event.
15. To run a consistency check, the parameter is amcli /cn where the c option indicates
consistency check and n is the number of a virtual disk as displayed in the Array
Manager tree view.
Page 36