0% found this document useful (0 votes)
131 views

SRM Mod 1 DR Intro

Disaster Recovery is the process of successfully developing, testing, and implementing Disaster Recovery plans. The creation and maintenance of Disaster Recovery plans requires an understanding of business processes and risks. 93% of companies that lost their datacenter for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster.

Uploaded by

Kedar Vishnu Lad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

SRM Mod 1 DR Intro

Disaster Recovery is the process of successfully developing, testing, and implementing Disaster Recovery plans. The creation and maintenance of Disaster Recovery plans requires an understanding of business processes and risks. 93% of companies that lost their datacenter for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster.

Uploaded by

Kedar Vishnu Lad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Disaster

Recovery

Module 1

VMware Site Recovery Manager Rev A


Copyright 2008 VMware, Inc. All rights reserved.

1-1

Course Map
SRM
Foundations

SRM Installation
and Configuration

Introduction to
Disaster Recovery

Array
Managers

SRM Operations
SRM Alarms
and Site Status
Troubleshooting

SRM Overview
and Architecture

Inventory
Mappings

SRM Planning

SRM Installation
and Configuration
SRM Installation

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

Protection
Groups
Recovery
Plans

1-2

VDM
Load-Balancing
SRM
Testing
and Multi-Server

and Failover

Failover Testing
and Failover
Failback

Importance and Module Objectives


Importance
Disaster recovery is the process of successfully developing, testing,
and implementing disaster recovery plans.
The creation and maintenance of disaster recovery plans requires an
understanding of business processes and risks.

Objectives for the Learner


Identify disaster recovery processes
Identify common disaster recovery terms
Identify how disaster recovery processes and terms map to VMware
Site Recovery Manager (SRM) components and features

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-3

Lesson Topics
What Is a Disaster?
What Is Disaster Recovery?
Risk Assessment
Recovery Sites and Data Replication
Remote Site Separation
Physical Disaster Recovery Process
Complications of Traditional Recovery
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
Business Continuity
Organizational Impact
Challenges of Disaster Recovery
Service Disruptions
Regulatory Compliance
Disaster Recovery Planning and BCP
Failover and Failback
Disaster Recovery Planning
Runbooks

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-4

When Disaster Strikes


93% of companies that lost their datacenter for 10 days or
more due to a disaster filed for bankruptcy within one year
of the disaster. National Archives and Records
Administration
Most backup application servers are located in the same
datacenter as the primary servers.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-5

What Is a Disaster?
Complete loss of a datacenter
Often caused by a natural disaster
Loss might include destruction of the facility, or it might just
render it unusable for a significant amount of time.
Declaration of a disaster usually requires consensus from
multiple parts of the organization (at the CEO/CFO level).
What is not a disaster?
Failure of an individual system
A temporary service interruption
Corporate
Data Center

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-6

What Is Disaster Recovery?


Disaster recovery (DR) is the process of successfully
developing, testing, and implementing disaster recovery
plans.
A disaster recovery plan (DRP) contains procedures to
implement during and immediately following a disaster.
DRPs are designed to do the following:
Reduce further damage
Maintain critical systems
Maintain production
Guide individuals on the exact procedures that should be followed
during and immediately after a disaster

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-7

Disaster Recovery Is Not a Product


Disaster recovery is not a product.
Companies are dynamic.
Protection requires effective DR planning.
VMware Site Recovery Manager (SRM) is a product that
helps you quickly restore your organizations IT
infrastructure.
To be used effectively, SRM must be combined with
effective disaster recovery planning.
VMware Professional Services can help you create
disaster recovery plans.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-8

Risk Assessment
Most businesses are at risk for at least one of the following:
Acts of nature
Fire
Wildfire
Earthquake or volcanic eruption
Tornado
Hurricane
Flooding and water damage
Man-made disasters
Acts of terrorism
Accidents and mistakes

No one can defend against all hazards.


Risk assessment is required.
VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-9

Recovery Sites and Data Replication


Recovery sites are backup datacenters.
They are capable of supporting some or all of the critical
business functions.
Recovery sites should be remote.
Data must be replicated from the primary site to the
remote site.
Replication depends on the following:
WAN network connectivity
Storage vendor replication technology
Sneaker net transfers from portable media

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-10

Remote Site Separation


How remote is remote enough?
Before the September 11, 2001, attacks
Some businesses had datacenters in one of the World Trade Center
towers.
Some of these same businesses had backup datacenters in either the
other tower or another building in the WTC complex.

Before Hurricanes Katrina and Rita


Some businesses in the New Orleans area had backup centers either
on the Texas coast or the Mississippi coast.

Exercise care in choosing the location of remote sites.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-11

Physical Disaster Recovery Process

Recovery

Production
WAN

Copy important data.

Hardware configuration
System disk
Application installation
Application data

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

Transfer data to
the recovery site.

Backup tapes
CDs/DVDs
Images (e.g., Ghost)
Replication

1-12

Start system and


application recovery.

OS recovery
Configuration
Data recovery
Testing

Complications of Traditional Recovery


Physical DR challenges
Variety of information and data to protect

Recovery

Complex recovery process


Inability to sufficiently test recovery

Tier

RPO

RTO

Cost

Immediate

Immediate

$$$

24+ hrs

48+ hrs

$$

7+ days

5+ days

RPO Recovery point objective


RTO Recovery time objective
RPO and RTO are defined in the following two slides.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-13

Recovery Point Objective (RPO)


An RPO is a recovery point objective.
The point in time in which systems must be recovered after an outage
The amount of data loss an organization can endure

Different departments within an organization might have


different RPO requirements.
Example: If the disaster occurs between 8 a.m. and 5 p.m.,
you must return the system to the state it was in at 5 p.m.
the previous day.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-14

Recovery Time Objective (RTO)


An RTO is a recovery time objective.
How long it takes to recover services after the disaster
Defines the amount of downtime an organization can endure

RTO includes:
Fault detection
Recovering data
Bringing applications back online

Example: The help desk system must be returned to


service within 72 hours of disaster.
RTO related to RPO: The help desk system must be
returned to service within 72 hours of the disaster (RTO),
and its operating state must match 5 p.m. of the day
before the disaster (RPO).
VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-15

Business Continuity Organizational Impact


CEO/CFO
Proactive readiness planning

CEO / CFO

Business continuity
High availability

Department /
IT

Department/IT

IT

Reactive procedures
Disaster recovery

Backups

Backup and recovery

Disaster
Recovery
Business
Continuity

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-16

Challenges of Disaster Recovery


Minimize downtime

Many manual processes for recovery


Multiple steps to overcome hardware differences
Incomplete or out-of-date runbooks

Reduce risk

Testing requires additional hardware and


infrastructure.
Usually only data is regularly and cleanly updated.
Frequent failures during recovery

Control cost

Simplest recovery requires identical hardware.


Idle recovery hardware is impossible to repurpose.
Multiple third-party products necessary for recovery

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-17

Service Disruptions
DR plans could be used during service disruptions.
Planned
Maintenance
Shared resource contention

Unplanned
Application-level failure

APPLICATION

Hardware-level failure
Datacenter-level or site-level failure
Natural disaster

HARDWARE
DATACENTER
GEOGRAPHIC

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-18

Regulatory Compliance
What compliance guidelines control your business?
Recovery time objectives (RTOs)
Recovery point objectives (RPOs)
Manual vs. automatic
Failback requirements
Security and access controls
Technologies to use

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-19

Disaster Recovery Planning and Business


Continuity Planning
Disaster recovery planning is not the same as business
continuity planning (BCP).
Disaster recovery planning
Focuses on actions to be taken during and immediately following a
disaster
Primarily concerned with safeguarding assets and personnel
Is procedure-oriented
Includes failover planning

Business continuity planning


Long-term strategy for keeping the business functional
Primarily concerned with stability
Includes failback planning

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-20

Failover and Failback


Failover
The process of moving key systems from a protected site to a
recovery site.
Depending on RPOs and RTOs, different systems might have to fail
over at different times.
Once failover is complete, the system is being run from the recovery
site.

Failback
Businesses usually cannot run on the recovery site forever.
Failback must be done in an orderly manner to prevent further service
disruptions.
Failback might be even harder than failover.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-21

Beginning Disaster Recovery Planning


1. Conduct a business impact analysis (BIA).
2. Determine loss criteria.
3. Determine maximum tolerable downtime (MTD).
4. Define recovery point objectives (RPOs) and recovery
time objectives (RTOs), based on MTD.
5. Create runbooks.
6. Test your disaster recovery plan (DRP).

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-22

Business Impact Analysis


Assets can be thought of as the crown jewels of a
business.
Blueprints and design documents
Formulas
Processes
Tools and special equipment
Data systems
Corporate records

Identify company assets.


Determine all possible threats to those assets.
Determine critical business functions.
Identify interdependencies between functions and
departments.
VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-23

Determine Loss Criteria


What would happen if you lost an asset?
Loss of public confidence
Lost production capacity
Lost profits
Lost revenue
Contract violations
Legal and regulatory violations
Increase in operational expenses

Examining what would happen if you lost an asset can


help you rate its priority for protection and recovery.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-24

Determine MTD for Assets


Maximum tolerable downtime (MTD)
When do business failures and losses determined in loss
criteria analysis begin?
Critical (minutes to hours)
Urgent (within 24 hours)
Important (within 72 hours)
Normal (7-14 days)
Nonessential (30 days or more)

Directly related to recovery point objectives (RPOs) and


recovery time objectives (RTOs)

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-25

Runbooks
A runbook is a specific set of step-by-step procedures to
guide an operator or a system administrator in how to do
the following:
Rebuild a server starting with the operating system
Restore key user account information (user IDs and passwords)
Recreate infrastructure components such as LDAP directory-based
organizational units (OUs), groups, folders, trees, security rights, and
privileges
Reload key application software
Reconfigure the application software
Reload the application data (often by recovering from backup media)
Reopen the system for end users to return to work

Runbooks should be designed for use at the recovery site,


which might have different hardware than the protected
site.
VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-26

Create Runbooks
Create a runbook for each identified asset.
Order the runbooks by priority so that they will be executed
in the correct order.
Order and priority are based on RPOs and RTOs.
Make sure that you account for system, department, and
function interdependencies when you plan the runbook
order.
1 Infrastructure:

DNS

DHCP

AD

2 Production:

Manufacturing
control

Process control

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

3 Customer-facing:

Web site

Order center

Help desk

1-27

4 Financials:

Payroll

Accounting

Problems with Runbooks


Runbooks are hard to create.
Many systems are interdependent.
It is difficult to capture information required for the following:
System configuration
Application configuration
Infrastructure-related dependencies

Runbooks are even harder to maintain.


Systems and applications are patched on a regular basis.
Configurations change.
Both of these can affect restart procedures.

SRM can help automate the creation and maintenance of


runbooks.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-28

Testing Disaster Recovery


Tests usually reveal DRP problems.
Regulatory requirements might govern how often you have
to test.
Yearly
Twice yearly
Quarterly
Monthly
Weekly

Different systems might require testing at different


frequencies.
Testing at the recovery site can cause problems if the site
is dual-purposed as a production site.

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-29

Module Summary
Disaster recovery is the process of successfully
developing, testing, and implementing disaster recovery
plans.
A disaster recovery plan contains procedures to implement
during and immediately following a disaster.
Identify and explain the following terms and concepts:
Disaster recovery is not a product.

Business impact analysis

Risk assessment

Loss criteria

Remote sites

MTD

DRP and BCP

Runbooks

Failover and failback

Testing

RTO and RPO

VMwareSiteRecoveryManagerRevA
Copyright2008VMware,Inc.Allrightsreserved.

1-30

Questions?

VMware Site Recovery Manager Rev A


Copyright 2008 VMware, Inc. All rights reserved.

1-31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy