04_-_data_movement_process
04_-_data_movement_process
04_-_data_movement_process
Understanding how CommVault® software moves data within the production and protected environment is
essential to understanding how to configure your physical environment, the logical environment, and to help
improve performance.
Chunks
CommVault software writes protected data to media in chunks. During data protection jobs as each chunk is
written to media indexes and the CommServe database are updated. For indexed based jobs this creates points in
which a job can be resumed if network, client or Media Agent problems occur. In this case the job can continue
from the most successfully written chunk. It also allows for indexed jobs to be recovered to the most recent chunk
if the job fails to complete. This partial recovery is performed by using the restore by job option. In this case the
data can be recovered up to the most successfully written chunk.
As a general rule the larger the chunk size, the more efficient the protection operation will be. In the event that
jobs are being conducted over unreliable links such as WAN backups decreasing the chunk size may improve
overall performance. If any disruption occurs during the job any data written to media prior to the chunk update
operation in the cache and CommServe server will have to be rewritten. In this case a smaller chunk size will
limit the amount of data that will have to be re-transmitted over the link. For reliable clients, Media Agents and
networks using a higher chunk size will improve performance.
Chunk sizes are determined by the job type being performed and media being used. Depending on the media type
the default chunk sizes will be based on the following:
Tape media will write chunks based on whether the job is indexed or non-indexed jobs types.
o 4 GB chunk sizes for indexed based backups.
o 16 GB chunks for non-indexed chunk sizes.
Tape Media – Chunk size can globally set in the Media Management applet in Control Panel.
Tape or Disk Media – Chunk Size can be set at the data path level in the storage policy copy. This is
done in the data path properties in the Data Path tab of the policy copy. Settings at the data path level
will override settings configured in control panel.
Blocks
Block size determines the block size when writing data to protected storage. This size can be modified to meet
hardware requirements and also improve performance. The default block size CommVault software uses is 64KB.
Block size can be set in the data path properties in the Data Path tab of the storage policy copy. It is important to
note that block size is hardware dependent. This means that increasing this setting requires all network cards,
Host Buss Adapters, switches, operating systems and drives to support the block size. Consider this aspect not
only in the production environment but also any DR locations where recovery operations may be performed.
Job Configuration tab of client properties – can be used to configure source and target paths for a
client.
Data Interface Pairs applet in Control Panel – can be used to configure source and target paths for clients
and Media Agents.
DataIFPairs.exe – resource pack utility allows bulk entry of multiple DIPs using an answer file.
The following diagram illustrates the use of data interface pairs from client to Media
Agent for primary backups and Media Agent to Media Agent when auxiliary copies run
to generate secondary copies. When multi-streaming jobs streams can use separate
physical connection from source to destination to improve performance.
Data Streams
Data Streams are what CommVault software uses to move data from source to destination. The source can be
production data or CommVault protected data. A destination stream will always be to CommVault protected
storage. Understanding the data stream concept will allow a CommCell environment to be optimally configured
to meet protection and recovery windows. This concept will be discussed in great detail in the following sections.
The following diagram illustrates the stream movement process from source to
destination. One or more read operations are performed on source data which is then
moved to the Media Agent as job streams. The Media Agent writes the data to protected
storage as device streams.
Multiple Subclients
There are many advantages to use multiple subclients in a CommCell environment. These advantages are
discussed throughout this book. This section will focus only on the performance aspects of using multiple
subclients.
Running multiple subclients concurrently allows multi-stream read and data movement during protection
operations. This can be used to improve data protection performance and when using multi-stream restore
methods, it can also improve recovery times. Using multiple subclients to define content is useful in the following
situations:
Using multiple subclients to define data on different physical drives – This method can be used to
optimize read performance by isolating subclient contents to specific physical drives. By running
multiple subclients concurrently each will read content from a specific drive which can improve read
performance.
Using multiple subclients for iDataAgents that don’t support multi-stream operations – This
method can be used for agents such as the Exchange mailbox agent to improve performance by running
data protection jobs on multiple subclients concurrently.
Using multiple subclients to define different backup patterns – This method can be used when the
amount of data requiring protection is too large to fit into a single operation window. Different
subclients can be scheduled to run during different protection periods making use of multiple operation
windows to meet protection needs.
Multi-Stream Subclients
For iDataAgents that support multi-streaming individual subclients can be set to use multiple read streams for
data protection operations. Depending on the iDataAgent being used this can be done through the Data Readers
setting or the Data Streams setting.
Data Readers
Data Readers determine the number of concurrent read operations that will be performed when protecting a
subclient. By default, the number of readers permitted for concurrent read operations is based on the number of
physical disks available. The limit is one reader per physical disk. If there is one physical disk with two logical
partitions, setting the readers to 2 will have no effect. Having too many simultaneous read operations on a single
disk could potentially cause the disk heads to thrash slowing down read operations and potentially decreasing the
life of the disk. The Data Readers setting is configured in the General tab of the subclient and defaults to two
readers.
The following diagram illustrates a client using multiple readers defined through two
subclients to multi-stream data protection jobs. Subclient 1 is defined using two data
readers. Subclient 2 is defined with 3 data readers and Allow multiple readers within a
drive of mount point enabled.
Data Streams
Some iDataAgents will be configured using data streams and not data readers. For example, Microsoft SQL and
Oracle subclients use data streams to determine the number of job streams that will be used for data protection
operations. Data Streams are configured in the Storage Device tab of the subclient. Although they will be
configured differently in the subclient, they still serve the same purpose of multi-streaming data protection
operations.
There should not be more than one stream configured for each physical drive. By default CommVault
software will automatically use one stream per physical drive regardless of the number of streams
configured. However if the Allow multiple readers within a drive or mount point is selected then the
software will use the number of streams specified. This can cause disk contention which may slow down
performance and shorten the physical life of the disks.
If multiple subclients are configured to read from the same physical disk consider scheduling the
subclients to run at different times to prevent contention on disks.
Only increase the number of streams to help meet protection windows. Multi-streaming data protection
jobs is used to improve performance. If windows are being met there is no reason to alter streams. For
every extra stream that is configured on source data it will use a corresponding stream in the protected
environment. Using too many read streams may result in other jobs not being able to run until storage
stream resources become available.
When a job runs it can only use the number of streams that are currently available in the storage
environment. This means that if a subclient is configured to use four streams but the Media Agent and
storage only has two streams available the job will use only two streams. You can use the advanced
backup option Reserve Resources Before Scan to reserve the number of streams configured for the job
to ensure adequate streams are available. This option should only be used for mission critical jobs as the
streams reserved will remain locked for the duration of the job.
When multi-streaming a subclient for MS-SQL, DB2 or Sybase the streams cannot be multiplexed to a
single tape. Each stream will have to be written to a separate tape and there must be an equivalent
number of drives available to restore all streams concurrently during restore operations. The streams can
be combined to a tape during secondary copy auxiliary jobs but they would have to be pre-staged to a
disk library prior to being recovered. If multiple databases are being protected it is recommended to use
separate subclients for different databases if performance needs to be improved. When using single
stream operations of multiple subclients the streams can be multiplexed or combined to tape.
Job Streams
Job Streams for data primary protection jobs are network streams running from the client to the Media Agent.
The number of concurrent job streams that can run in an environment is based on the number of streams Media
Agents are configured to accept, the number of streams a library will accept and the number of storage policy
device streams configured. The number of job streams a Media Agent will accept is determined by the Maximum
number of parallel data transfer operations setting configured in the General tab of the Media Agent
Fields in the job controller can be customized to show additional information. The fields
Number of Data Readers and Data Readers in Use can be added to view the number of job
streams being attempted and used for each job. Refer to CommVault documentation for
more information on customizing the Job Controller.
properties. This option defaults to 100 streams and is dependent on the Optimize for concurrent LAN backups
option being selected in the Control tab. This option is enabled by default and it is recommended not to change
this setting. Library and storage policy device stream configuration will be discussed in detail in the next section.
Device Streams
As Job Streams are received by the Media Agent, data is put into chunk format and is written to media as Device
Streams. The number of device streams that can be used will dependent on the library type, library configuration
and storage policy configuration.
The following diagram illustrates multiple job streams being multiplexed into device
streams within the Media Agent. Multiplexing to tape libraries can improve write
performance by keeping drive buffers filled allowing the drives to write faster.
The following diagram illustrates four job streams writing to a disk library with two
mount paths. Each job stream equates to a device stream when writing to disk libraries.
Job scheduling
On-demand jobs
Scripting
Job Scheduling
Jobs are scheduled for subclients through a dedicated schedule at the subclient or data set level or through the use
of schedule policies. Whichever method is used it is important to note that jobs always run at a subclient level. If
a schedule policy is being used to backup five client file systems and each client has two subclients a total of ten
jobs will run when the policy executes.
On-Demand Jobs
On-demand jobs can be run at the subclient or data set level. On demand jobs will run immediately.
Scripting
Scripts can be manually created or automatically generated through the CommCell console. Once a script is
created it can be run at the client by manually executing the script or by calling the script from a separate
scheduling mechanism or another script. When a data protection script executes it will contact the CommServe
server and the based on the scripts parameters the CommServe server will execute the job.
By associating subclients with secondary copies the data will then be copied and managed in subclient groups
using the same media sets. When multiple subclients are associated with a copy and when subclients are using
multiple streams, the management of the streams on media and the movement of the streams becomes important.
If the secondary copy is not configured correctly then media management requirements will not be met. One
method to properly consolidate and manage streams in secondary copies is to use the combine to streams option.
Combine to Streams
A storage policy can be configured to allow the use of multiple streams for primary copy backup. Multi-streaming
of backup data is done to improve backup performance. Normally, each stream used for the primary copy
requires a corresponding stream on each secondary copy. In the case of tape media for a secondary copy, multi-
stream storage policies will consume multiple media. The combine to streams option can be used to consolidate
multiple streams from source data on to fewer media when secondary copies are run. This allows for better media
management and the grouping of like data onto media for storage.
Example: You backup a home folders subclient to a disk library using three streams to maximize performance.
The total size of protected data is 600GB. You want to consolidate those three streams onto a single 800GB
capacity tape for off-site storage.
Solution: By creating a secondary copy and setting the Combine to Streams setting to 1 you will serially place
each stream onto the media.
Combine to streams setting to 1 will take streams A, B, and C and serially write them to
one tape.
In some cases using the combine to streams option may not be the best method to manage data. Multi-streaming
backup data is done to improve performance. When those streams are consolidated to the same media set they can
only be recovered in a single stream operation. Though combining to streams has a media consolidation benefit it
will have a negative effect on restore performance.
Another reason not to combine to streams is for multi-streamed backups of SQL, DB2, and Sybase subclients.
When these iDataAgents use a single subclient with multi-streaming enabled the streams must be restored in the
same sequence they were backed up in. If the streams are combined to the same tape, they must be pre-staged to
disk before they can be recovered. In this case not enabling combine to streams and placing each stream on
separate media will bypass the pre-staging of the data and also allow multiple streams to be restored concurrently
making the restore process considerably faster. Note that this only applies to subclients that have been multi-
streamed. If multiple subclients have been single streamed and combined to media they will NOT have to be pre-
staged prior to recovery.
Auxiliary copy
o Schedule
o On demand
o Save as script
o Automatic copy
Inline copy
Parallel copy
Deferred copy
Auxiliary copy operation allows you to schedule, run on-demand, save a job as a script, or set an automatic copy.
When configuring Auxiliary copy operations there are several options you can configure:
Automatic Copy
Most jobs run once during a day and a normal schedule can be used for auxiliary copies. The automatic copy
allows you set a check interval for source data to be copied. This can be a great advantage when jobs are being
run multiple times per day or if you are unsure when the source data will be available for copy.
Example: A critical database is running transaction log backups every four hours. You want to run an auxiliary
copy of the source transaction logs to a secondary location, in this case a disk library off-site.
Solution: Schedule the transaction logs to backup every four hours. Then set the automatic copy option to check
for source data. If source data is present the auxiliary copy will run creating an additional copy of the data.
Inline Copy
The Inline Copy feature allows you to create additional copies of data at the same time you are performing
primary backups. This feature can be useful when you need to get two copies of data done quickly. Data is passed
from the client to the Media Agent as job streams. The Media Agent then creates two sets of device streams, each
going to the appropriate library. This can be a quick method for creating multiple copies but there are some
caveats:
Inline Copy is not supported if Client Side Deduplication has been enabled.
If the primary copy fails the secondary copy will also fail.
Since both copies are made at the same time twice as many library resources will be required which may
prevent other jobs from running.
Since backup data is streamed, data will be sent to both libraries simultaneously, which may cause
overall performance to degrade. Basically your job will run as fast as the slowest resource.
Inline copy receives two streams from the client server and sends those streams to two
different libraries.
Parallel Copy
A parallel copy will generate two secondary copy jobs concurrently when an auxiliary copy job runs. Both
secondary copies must have the Enable Parallel Copy option enabled and the destination libraries must be
accessible from the same Media Agent.
Two secondary copies are run in parallel through the Media Agent.
Deferred Copy
Deferring an auxiliary copy will prevent a copy from running for a specified number of days. Setting this option
will result in data not aging from the source location regardless of the retention on the source until the auxiliary
copy is completed. This option is traditionally used in Hierarchal Storage Management (HSM) strategies where
data will remain in a storage policy copy for a certain period of time. After that time period the data will be
copied to another storage policy copy and deleted from the source once the copy is completed. Although this
method was implemented since traditional HSM solutions worked this way, with CommVault software it is
recommended to copy data to multiple HSM copies to provide for disaster recovery as well as HSM archiving.
This concept will be discussed in more detail in the Compliance, Information Management & eDiscovery chapter.