04_-_data_movement_process

Chapter 4
Data Movement Process
CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

46 - Data Movement Process
Understanding how CommVault® software moves data within the production and protected environment is
essential to understanding how to configure your physical environment, the logical environment, and to help
improve performance.
Data Movement Concepts
Chunks
CommVault software writes protected data to media in chunks. During data protection jobs as each chunk is
written to media indexes and the CommServe database are updated. For indexed based jobs this creates points in
which a job can be resumed if network, client or Media Agent problems occur. In this case the job can continue
from the most successfully written chunk. It also allows for indexed jobs to be recovered to the most recent chunk
if the job fails to complete. This partial recovery is performed by using the restore by job option. In this case the
data can be recovered up to the most successfully written chunk.
As a general rule the larger the chunk size, the more efficient the protection operation will be. In the event that
jobs are being conducted over unreliable links such as WAN backups decreasing the chunk size may improve
overall performance. If any disruption occurs during the job any data written to media prior to the chunk update
operation in the cache and CommServe server will have to be rewritten. In this case a smaller chunk size will
limit the amount of data that will have to be re-transmitted over the link. For reliable clients, Media Agents and
networks using a higher chunk size will improve performance.
Chunk sizes are determined by the job type being performed and media being used. Depending on the media type
the default chunk sizes will be based on the following:
 Disk storage will use 2 GB chunk sizes.
 Tape media will write chunks based on whether the job is indexed or non-indexed jobs types.
o 4 GB chunk sizes for indexed based backups.
o 16 GB chunks for non-indexed chunk sizes.
Chunk size can be configured using the following methods:
 Tape Media – Chunk size can globally set in the Media Management applet in Control Panel.
 Tape or Disk Media – Chunk Size can be set at the data path level in the storage policy copy. This is
done in the data path properties in the Data Path tab of the policy copy. Settings at the data path level
will override settings configured in control panel.

Data Movement Process - 47
Blocks
Block size determines the block size when writing data to protected storage. This size can be modified to meet
hardware requirements and also improve performance. The default block size CommVault software uses is 64KB.
Block size can be set in the data path properties in the Data Path tab of the storage policy copy. It is important to
note that block size is hardware dependent. This means that increasing this setting requires all network cards,
Host Buss Adapters, switches, operating systems and drives to support the block size. Consider this aspect not
only in the production environment but also any DR locations where recovery operations may be performed.
Data Interface Pairs

Data Interface Pairs (DIP) are used to explicitly define the physical IP network path data will take from source to
target. This is done by specifying source and destination network interfaces using host name or IP address. When
multiple paths from source to target exist, multiple DIPs can be configured allowing multi-stream operations to
use separate network paths for streams. This will permit the aggregate bandwidth of multiple DIPs source to
target physical connections to improve data movement performance.
Data Interface Pairs can be configured in several different ways:
 Job Configuration tab of client properties – can be used to configure source and target paths for a
client.
 Data Interface Pairs applet in Control Panel – can be used to configure source and target paths for clients
and Media Agents.
 DataIFPairs.exe – resource pack utility allows bulk entry of multiple DIPs using an answer file.
The following diagram illustrates the use of data interface pairs from client to Media
Agent for primary backups and Media Agent to Media Agent when auxiliary copies run
to generate secondary copies. When multi-streaming jobs streams can use separate
physical connection from source to destination to improve performance.

Data Streams
Data Streams are what CommVault software uses to move data from source to destination. The source can be
production data or CommVault protected data. A destination stream will always be to CommVault protected
storage. Understanding the data stream concept will allow a CommCell environment to be optimally configured
to meet protection and recovery windows. This concept will be discussed in great detail in the following sections.
Primary Protection Streams

Primary data protection streams originate at the source file or application that is being protected. One or more
read operations will be used to read the source data. Once the data is read from the source it is processed by the
iDataAgent and then sent to the Media Agent as Job Streams. The Media Agent then processes the data, arranges
the data into chunks and writes the data to storage in Device Streams.
The following diagram illustrates the stream movement process from source to
destination. One or more read operations are performed on source data which is then
moved to the Media Agent as job streams. The Media Agent writes the data to protected
storage as device streams.
Configuring Source Read Streams

Content requiring protection is defined within a subclient. Each subclient will contain one or more streams for
data protection jobs. For most iDataAgents, it is possible to multi-stream subclient operations. Depending on
performance requirements and how the data is organized in the production environment, multi-streaming source
data can be done by adding more subclients or increasing the streams for an individual subclient.

Multiple Subclients
There are many advantages to use multiple subclients in a CommCell environment. These advantages are
discussed throughout this book. This section will focus only on the performance aspects of using multiple
subclients.
Running multiple subclients concurrently allows multi-stream read and data movement during protection
operations. This can be used to improve data protection performance and when using multi-stream restore
methods, it can also improve recovery times. Using multiple subclients to define content is useful in the following
situations:
 Using multiple subclients to define data on different physical drives – This method can be used to
optimize read performance by isolating subclient contents to specific physical drives. By running
multiple subclients concurrently each will read content from a specific drive which can improve read
performance.
 Using multiple subclients for iDataAgents that don’t support multi-stream operations – This
method can be used for agents such as the Exchange mailbox agent to improve performance by running
data protection jobs on multiple subclients concurrently.
 Using multiple subclients to define different backup patterns – This method can be used when the
amount of data requiring protection is too large to fit into a single operation window. Different
subclients can be scheduled to run during different protection periods making use of multiple operation
windows to meet protection needs.
Multi-Stream Subclients
For iDataAgents that support multi-streaming individual subclients can be set to use multiple read streams for
data protection operations. Depending on the iDataAgent being used this can be done through the Data Readers
setting or the Data Streams setting.
Data Readers
Data Readers determine the number of concurrent read operations that will be performed when protecting a
subclient. By default, the number of readers permitted for concurrent read operations is based on the number of
physical disks available. The limit is one reader per physical disk. If there is one physical disk with two logical
partitions, setting the readers to 2 will have no effect. Having too many simultaneous read operations on a single
disk could potentially cause the disk heads to thrash slowing down read operations and potentially decreasing the
life of the disk. The Data Readers setting is configured in the General tab of the subclient and defaults to two
readers.

Allow multiple readers within a drive or Mount Point

When a disk array containing several physical disks is addressed logically by the OS as a single drive letter, the
Allow multiple readers within a drive or mount point can be used as an override. This will allow a backup job
to take advantage of the fast read access of a RAID array. If this option is not selected the CommVault software
will use only use one read operation during data protection jobs.
The following diagram illustrates a client using multiple readers defined through two
subclients to multi-stream data protection jobs. Subclient 1 is defined using two data
readers. Subclient 2 is defined with 3 data readers and Allow multiple readers within a
drive of mount point enabled.
Data Streams
Some iDataAgents will be configured using data streams and not data readers. For example, Microsoft SQL and
Oracle subclients use data streams to determine the number of job streams that will be used for data protection
operations. Data Streams are configured in the Storage Device tab of the subclient. Although they will be
configured differently in the subclient, they still serve the same purpose of multi-streaming data protection
operations.
Source Read Streams Best Practices

Using multiple streams for data protection operations can provide better data protection and restore performance.
Before modifying stream settings consider the following best practice guidelines:
 There should not be more than one stream configured for each physical drive. By default CommVault
software will automatically use one stream per physical drive regardless of the number of streams
configured. However if the Allow multiple readers within a drive or mount point is selected then the
software will use the number of streams specified. This can cause disk contention which may slow down
performance and shorten the physical life of the disks.

 If multiple subclients are configured to read from the same physical disk consider scheduling the
subclients to run at different times to prevent contention on disks.
 Only increase the number of streams to help meet protection windows. Multi-streaming data protection
jobs is used to improve performance. If windows are being met there is no reason to alter streams. For
every extra stream that is configured on source data it will use a corresponding stream in the protected
environment. Using too many read streams may result in other jobs not being able to run until storage
stream resources become available.
 When a job runs it can only use the number of streams that are currently available in the storage
environment. This means that if a subclient is configured to use four streams but the Media Agent and
storage only has two streams available the job will use only two streams. You can use the advanced
backup option Reserve Resources Before Scan to reserve the number of streams configured for the job
to ensure adequate streams are available. This option should only be used for mission critical jobs as the
streams reserved will remain locked for the duration of the job.
 When multi-streaming a subclient for MS-SQL, DB2 or Sybase the streams cannot be multiplexed to a
single tape. Each stream will have to be written to a separate tape and there must be an equivalent
number of drives available to restore all streams concurrently during restore operations. The streams can
be combined to a tape during secondary copy auxiliary jobs but they would have to be pre-staged to a
disk library prior to being recovered. If multiple databases are being protected it is recommended to use
separate subclients for different databases if performance needs to be improved. When using single
stream operations of multiple subclients the streams can be multiplexed or combined to tape.
Job Streams
Job Streams for data primary protection jobs are network streams running from the client to the Media Agent.
The number of concurrent job streams that can run in an environment is based on the number of streams Media
Agents are configured to accept, the number of streams a library will accept and the number of storage policy
device streams configured. The number of job streams a Media Agent will accept is determined by the Maximum
number of parallel data transfer operations setting configured in the General tab of the Media Agent
Viewing the number of streams per job in the Job Controller
Fields in the job controller can be customized to show additional information. The fields
Number of Data Readers and Data Readers in Use can be added to view the number of job
streams being attempted and used for each job. Refer to CommVault documentation for
more information on customizing the Job Controller.

properties. This option defaults to 100 streams and is dependent on the Optimize for concurrent LAN backups
option being selected in the Control tab. This option is enabled by default and it is recommended not to change
this setting. Library and storage policy device stream configuration will be discussed in detail in the next section.
Device Streams
As Job Streams are received by the Media Agent, data is put into chunk format and is written to media as Device
Streams. The number of device streams that can be used will dependent on the library type, library configuration
and storage policy configuration.
Tape Library Device Streams

For tape libraries one sequential write operation can be performed to each drive. If there are eight drives in the
library then no more than eight device streams will be used. By default each job stream will write to a device
stream. To allow multiple job streams to be written to a single tape drive, multiplexing can be enabled. The
multiplexing factor will determine how many job streams can be written to a single device stream. If a
multiplexing factor of four is set and there are eight drives a total of thirty two job streams can be written to eight
device streams.
The following diagram illustrates multiple job streams being multiplexed into device
streams within the Media Agent. Multiplexing to tape libraries can improve write
performance by keeping drive buffers filled allowing the drives to write faster.

Disk Library Device Streams

For disk libraries the number of device streams is based on the total number of mount path writers for all mount
paths within the library. If a disk library has two mount paths with ten writers each, a total of twenty device
streams can write to the library. It is important to note that since disk libraries allow multiple write operations
multiplexing is not recommended. By increasing the number of mount path writers, more job streams can be
written to device streams on a one-to-one ratio. If network, Media Agent and disk resources are adequate
increasing the number of writers for a mount path will have a positive effect on data protection performance.
The following diagram illustrates four job streams writing to a disk library with two
mount paths. Each job stream equates to a device stream when writing to disk libraries.
Storage Policy Device Streams

Device streams are configured in the properties of the storage policy. The general rule of thumb is that the
number of device streams configured in a storage policy should always equal the number of drives or writers of
all libraries defined in the storage policy primary copy. Having fewer number of streams may be used to throttle
parallel throughput, but that doesn‘t make maximum efficient use of the devices and there are other means to
restrict allocation of devices. If the number of device streams is greater than the total number of resources
available no benefit will be gained. The CommVault software uses a throttling mechanism to always use the
lowest stream value throughout the data movement process.
Moving Data from Production to Protected Storage

This section provides a brief overview of running data protection jobs. For more information and step-by-step
guides use CommVault online documentation. There are three methods for running data protection jobs for
production data:
 Job scheduling
 On-demand jobs
 Scripting

Job Scheduling
Jobs are scheduled for subclients through a dedicated schedule at the subclient or data set level or through the use
of schedule policies. Whichever method is used it is important to note that jobs always run at a subclient level. If
a schedule policy is being used to backup five client file systems and each client has two subclients a total of ten
jobs will run when the policy executes.
On-Demand Jobs
On-demand jobs can be run at the subclient or data set level. On demand jobs will run immediately.
Scripting
Scripts can be manually created or automatically generated through the CommCell console. Once a script is
created it can be run at the client by manually executing the script or by calling the script from a separate
scheduling mechanism or another script. When a data protection script executes it will contact the CommServe
server and the based on the scripts parameters the CommServe server will execute the job.
Secondary Copy Streams

For most data centers, performance is the key requirement when performing primary backups from the production
servers. When copying data to secondary copies, media management becomes the primary focus. The aspect of
grouping data with like retention and storage requirements allows for more efficient media management. Using
multiple secondary copies allows for data with like requirements to be managed on the same media set. Using
options such as combine to streams and multiplexing secondary copies improves overall performance and media
management.
Multiple Secondary Copies

One method for grouping data with like protection requirements is to use multiple secondary copies. A month end
selective copy for compliance reasons and a daily synchronous copy for off-site DR can be used to separate data
to meet different retention and media management requirements.
By associating subclients with secondary copies the data will then be copied and managed in subclient groups
using the same media sets. When multiple subclients are associated with a copy and when subclients are using
multiple streams, the management of the streams on media and the movement of the streams becomes important.
If the secondary copy is not configured correctly then media management requirements will not be met. One
method to properly consolidate and manage streams in secondary copies is to use the combine to streams option.
Combine to Streams
A storage policy can be configured to allow the use of multiple streams for primary copy backup. Multi-streaming
of backup data is done to improve backup performance. Normally, each stream used for the primary copy
requires a corresponding stream on each secondary copy. In the case of tape media for a secondary copy, multi-
stream storage policies will consume multiple media. The combine to streams option can be used to consolidate

multiple streams from source data on to fewer media when secondary copies are run. This allows for better media
management and the grouping of like data onto media for storage.
Example: You backup a home folders subclient to a disk library using three streams to maximize performance.
The total size of protected data is 600GB. You want to consolidate those three streams onto a single 800GB
capacity tape for off-site storage.
Solution: By creating a secondary copy and setting the Combine to Streams setting to 1 you will serially place
each stream onto the media.
Combine to streams setting to 1 will take streams A, B, and C and serially write them to
one tape.
In some cases using the combine to streams option may not be the best method to manage data. Multi-streaming
backup data is done to improve performance. When those streams are consolidated to the same media set they can
only be recovered in a single stream operation. Though combining to streams has a media consolidation benefit it
will have a negative effect on restore performance.
Another reason not to combine to streams is for multi-streamed backups of SQL, DB2, and Sybase subclients.
When these iDataAgents use a single subclient with multi-streaming enabled the streams must be restored in the
same sequence they were backed up in. If the streams are combined to the same tape, they must be pre-staged to
disk before they can be recovered. In this case not enabling combine to streams and placing each stream on
separate media will bypass the pre-staging of the data and also allow multiple streams to be restored concurrently
making the restore process considerably faster. Note that this only applies to subclients that have been multi-
streamed. If multiple subclients have been single streamed and combined to media they will NOT have to be pre-
staged prior to recovery.

Moving Data to Secondary Copies

When you want to backup production servers you either schedule the job at the client level or through a schedule
policy. In either case you choose which server you want to backup. Once data is in the backup environment it is
no longer tied to the production server, instead it is managed by the storage policy. There are various methods for
copying data to secondary copies. The following methods can be used to move data to secondary copies:
 Auxiliary copy
o Schedule
o On demand
o Save as script
o Automatic copy
 Inline copy
 Parallel copy
 Deferred copy
Auxiliary Copy Operations

Before discussing auxiliary copy operations, a very important distinction in terms must be made. I am referring to
the difference between secondary copies and auxiliary copy operations Secondary copies allow you to configure
where the data will go (data path), how long it will stay there for (retention), and what data you want to copy
(subclient associations). An auxiliary copy operation determines when the data will be copied (scheduler) plus
media and resource management options such as the CommVault VaultTracker™ feature.
Auxiliary copy operation allows you to schedule, run on-demand, save a job as a script, or set an automatic copy.
When configuring Auxiliary copy operations there are several options you can configure:
 Allocate number of drives to use during auxiliary copy

 Which secondary copies you want to include in the auxiliary copy
 Start new media and mark media full which can be used to isolate jobs on media
 Vault tracker options which can be used to export and track media using VaultTracker™ policies and
reports
 Job priorities to assign different job priorities for auxiliary copies
Automatic Copy
Most jobs run once during a day and a normal schedule can be used for auxiliary copies. The automatic copy
allows you set a check interval for source data to be copied. This can be a great advantage when jobs are being
run multiple times per day or if you are unsure when the source data will be available for copy.

Example: A critical database is running transaction log backups every four hours. You want to run an auxiliary
copy of the source transaction logs to a secondary location, in this case a disk library off-site.
Solution: Schedule the transaction logs to backup every four hours. Then set the automatic copy option to check
for source data. If source data is present the auxiliary copy will run creating an additional copy of the data.
Inline Copy
The Inline Copy feature allows you to create additional copies of data at the same time you are performing
primary backups. This feature can be useful when you need to get two copies of data done quickly. Data is passed
from the client to the Media Agent as job streams. The Media Agent then creates two sets of device streams, each
going to the appropriate library. This can be a quick method for creating multiple copies but there are some
caveats:
 Inline Copy is not supported if Client Side Deduplication has been enabled.
 If the primary copy fails the secondary copy will also fail.
 Since both copies are made at the same time twice as many library resources will be required which may
prevent other jobs from running.
 Since backup data is streamed, data will be sent to both libraries simultaneously, which may cause
overall performance to degrade. Basically your job will run as fast as the slowest resource.
Inline copy receives two streams from the client server and sends those streams to two
different libraries.

Parallel Copy
A parallel copy will generate two secondary copy jobs concurrently when an auxiliary copy job runs. Both
secondary copies must have the Enable Parallel Copy option enabled and the destination libraries must be
accessible from the same Media Agent.
Two secondary copies are run in parallel through the Media Agent.
Deferred Copy
Deferring an auxiliary copy will prevent a copy from running for a specified number of days. Setting this option
will result in data not aging from the source location regardless of the retention on the source until the auxiliary
copy is completed. This option is traditionally used in Hierarchal Storage Management (HSM) strategies where
data will remain in a storage policy copy for a certain period of time. After that time period the data will be
copied to another storage policy copy and deleted from the source once the copy is completed. Although this
method was implemented since traditional HSM solutions worked this way, with CommVault software it is
recommended to copy data to multiple HSM copies to provide for disaster recovery as well as HSM archiving.
This concept will be discussed in more detail in the Compliance, Information Management & eDiscovery chapter.

04_-_data_movement_process

Uploaded by

Copyright:

Available Formats

04_-_data_movement_process

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04_-_data_movement_process

Uploaded by

Copyright:

Available Formats

Chapter 4

Data Movement Process

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Concepts

 Disk storage will use 2 GB chunk sizes.

Chunk size can be configured using the following methods:

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Interface Pairs

Data Interface Pairs can be configured in several different ways:

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Primary Protection Streams

Configuring Source Read Streams

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Allow multiple readers within a drive or Mount Point

Source Read Streams Best Practices

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Viewing the number of streams per job in the Job Controller

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Tape Library Device Streams

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Disk Library Device Streams

Storage Policy Device Streams

Moving Data from Production to Protected Storage

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Secondary Copy Streams

Multiple Secondary Copies

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Moving Data to Secondary Copies

Auxiliary Copy Operations

 Allocate number of drives to use during auxiliary copy

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.