Domain 7 - Security Operations
Domain 7 - Security Operations
Security Operations
Domain 7: Overview
• Involving the application of information security concepts and best
practices to the operation of enterprise computing systems.
• Cover the tasks and situations that information security professionals
are expected to perform or are presented with on a daily basis.
Domain 7: Overview
•Security operations pertains to everything that takes place to keep
networks, computer systems, applications, and environments up and
running in a secure and protected manner.
•It consists of ensuring that people, applications, and servers have the
proper access privileges to only the resources to which they are entitled
and that oversight is implemented via monitoring, auditing, and
reporting controls.
•Operations take place after the network is developed and implemented.
This includes the continual maintenance of an environment and the
activities that should take place on a day-to-day or week-to-week basis.
These activities are routine in nature and enable the network and
individual computer systems to continue running correctly and securely.
Domain 7: Security Operations
• Administrative Security
• Forensics
• Incident Response Management
• Operational Preventive and Detective Controls
• Asset Management
• Continuity of Operations
• BCP and DRP Overview and Process
• Developing a BCP/DRP
• Backups and Availability
• DRP Testing, Training and Awareness
• Continued BCP/DRP Maintenance
• Specific BCP/DRP Frameworks
Domain 7: Security Operations
Unique Terms and Definitions
• Business Continuity Plan (BCP)—a long-term plan to ensure the
continuity of business operations
• Collusion—An agreement between two or more individuals to subvert
the security of a system
• Continuity of Operations Plan (COOP)—a plan to maintain operations
during a disaster.
• Disaster—any disruptive event that interrupts normal system
operations
• Disaster Recovery Plan (DRP)—a short-term plan to recover from a
disruptive event
Domain 7: Security Operations
Unique Terms and Definitions
• Mean Time Between Failures (MTBF)—quantifies how long a new or
repaired system will run on average before failing
• Mean Time to Repair (MTTR)—describes how long it will take to
recover a failed system
• Mirroring—Complete duplication of data to another disk, used by some
levels of RAID.
• Redundant Array of Inexpensive Disks (RAID)—A method of using
multiple disk drives to achieve greater data reliability, greater speed, or
both
• Striping—Spreading data writes across multiple disks to achieve
performance gains, used by some levels of RAID
Domain 7: Security Operations
Administrative security
• Administrative Security provides the means to control people's
operational access to data
Least Privilege or Minimum Necessary Access
• Dictates that persons have no more than the access that is strictly
required for the performance of their duties
• May also be referred to as the principle of minimum necessary
access
• Discretionary Access Control (DAC) – most often applicable
Domain 7: Security Operations
Need to know
• Mandatory Access Control (MAC)
• Access determination is based upon clearance levels of subjects
and classification levels of objects
• An extension to the principle of least privilege in MAC
environments is the concept of compartmentalization:
• A method for enforcing need to know goes beyond the reliance upon
clearance level and necessitates simply that someone requires access to
information.
Domain 7: Security Operations
Separation of Duties
• Prescribes that multiple people are required to complete critical
or sensitive transactions
• Goal of separation of duties is to ensure that in order for
someone to be able to abuse their access to sensitive data or
transactions; they must convince another party to act in concert
• Collusion is the term used for the two parties conspiring to undermine the security
of the transaction
Domain 7: Security Operations
Rotation of Duties/Job Rotation
• Also known as job rotation or rotation of responsibilities
• Provides a means to help mitigate the risk associated with any one individual having
too many privileges
• Requires that critical functions or responsibilities are not continuously performed by
the same single person without interruption
• “hit by a bus” or “win the lottery” scenario
Exam Warning: Though job or responsibility rotation is an important control, this, like
many other controls, is often compared against the cost of implementing the control.
Many organizations will opt for not implementing rotation of duties because of the cost
associated with implementation. For the exam, be certain to appreciate that cost is
always a consideration, and can trump the implementation of some controls.
Domain 7: Security Operations
Mandatory Leave/Forced Vacation
• Also known as forced vacation
• Can identify areas where depth of coverage is lacking
• Can also help discover fraudulent or suspicious behavior
• Knowledge that mandatory leave is a possibility might deter some
individuals from engaging in the fraudulent behavior in the first
place
Domain 7: Security Operations
Non-Disclosure Agreement
• A work-related contractual agreement that ensures that, prior to
being given access to sensitive information or data, an individual
or organization appreciates their legal responsibility to maintain
the confidentiality of sensitive information.
• Often signed by job candidates before they are hired, as well as
consultants or contractors
• Largely a directive control
Domain 7: Security Operations
Background Checks
• Also known as background investigations or preemployment screening
• Majority of background investigations are performed as part of a
preemployment screening process
• The sensitivity of the position being filled or data to which the individual will
have access strongly determines the degree to which this information is
scrutinized and the depth to which the investigation will report
• Ongoing, or postemployment, investigations seek to determine whether the
individual continues to be worthy of the trust required of their position
• Background checks performed in advance of employment serve as a
preventive control while ongoing repeat background checks constitute a
detective control and possibly a deterrent.
Domain 7: Security Operations
Privilege Monitoring
• Heightened privileges require both greater scrutiny and more
thoughtful controls
• Some of the job functions that warrant greater scrutiny include:
account creation/modification/deletion, system reboots, data
backup, data restoration, source code access, audit log access,
security configuration capabilities, etc.
Domain 7: Security Operations
Digital Forensics
• Provides a formal approach to dealing with investigations and evidence
with special consideration of the legal aspects of the process
• Forensics is closely related to incident response
• Main distinction between forensics and incident response is that forensics is
evidence-centric and typically more closely associated with crimes, while incident
response is more dedicated to identifying, containing, and recovering from security
incidents
• The forensic process must preserve the “crime scene” and the
evidence in order to prevent unintentionally violating the integrity of
either the data or the data's environment
Domain 7: Security Operations
Digital Forensics
• Prevent unintentional modification of the system
• Antiforensics makes forensic investigation difficult or impossible
• One method is malware that is entirely memory-resident, and not installed on the disk drive. If an
investigator removes power from a system with entirely memory-resident malware, all volatile
memory including RAM is lost, and evidence is destroyed.
• Valuable data is gathered during the live forensic capture
• The main source of forensic data typically comes from binary images of secondary
storage and portable storage devices such as hard disk drives, USB flash drives, CDs,
DVDs, and possibly associated cellular phones and mp3 players
• A binary or bit stream image is used because an exact replica of the original data is
needed
• Normal backup software will only capture the active partitions of a disk, and only
that data which is marked as allocated
Domain 7: Security Operations
Digital Forensics
The four types of data that exist:
• Allocated space—portions of a disk partition which are marked as
actively containing data.
• Unallocated space—portions of a disk partition that do not contain
active data. This includes memory that has never been allocated, and
previously allocated memory that has been marked unallocated. If a
file is deleted, the portions of the disk that held the deleted file are
marked as unallocated and available for use.
Domain 7: Security Operations
Digital Forensics
The four types of data that exist:
• Slack space—data is stored in specific size chunks known as clusters. A cluster
is the minimum size that can be allocated by a file system. If a particular file,
or final portion of a file, does not require the use of the entire cluster then
some extra space will exist within the cluster. This leftover space is known as
slack space: it may contain old data, or can be used intentionally by attackers
to hide information.
• “Bad” blocks/clusters/sectors—hard disks routinely end up with sectors that
cannot be read due to some physical defect. The sectors marked as bad will
be ignored by the operating system since no data could be read in those
defective portions. Attackers could intentionally mark sectors or clusters as
being bad in order to hide data within this portion of the disk.
Domain 7: Security Operations
Digital Forensics
• Numerous tools that can be used to create the binary backup including free
tools such as dd and windd as well as commercial tools such as Ghost (when
run with specific nondefault switches enabled), AccessData's FTK, or
Guidance Software's EnCase.
• The general phases of the forensic process are:
• the identification of potential evidence;
• the acquisition of that evidence;
• analysis of the evidence;
• production of a report
• Hashing algorithms are used to verify the integrity of binary images
• When possible, the original media should not be used for analysis
Domain 7: Security Operations
Live Forensics
• Forensics investigators have traditionally removed power from a
system, but the typical approach now is to gather volatile data.
Acquiring volatile data is called live forensics.
• The need for live forensics has grown tremendously due to non-
persistent tools that don’t write anything to disk
• One example from Metasploit…
Domain 7: Security Operations
Live Forensics - Metasploit
• Popular free and open source exploitation framework
• Metasploit framework allows for the modularization of the underlying
components of an attack, which allows for exploit developers to focus
on their core competency without having to expend energy on
distribution or even developing a delivery, targeting, and payload
mechanism for their exploit
• Provides reusable components to limit extra work
• A payload is what Metasploit does after successfully exploiting a target
Domain 7: Security Operations
Live Forensics – Metasploit & Meterpreter
• One of the most powerful Metasploit payloads
• Can allow password hashes of a compromised computer being dumped to an
attacker's machine
• The password hashes can then be fed into a password cracker
• Or the password hashes might be capable of being used directly in Metasploit's
PSExec exploit module, which is an implementation of functionality provided by
Sysinternal's (now owned by Microsoft) PSExec, but bolstered to support Pass the
Hash functionality.
NOTE: There are many and varied RAID configurations which are simply combinations of the standard RAID
levels. Nested RAID solutions are becoming increasingly common with larger arrays of disks that require a
high degree of both reliability and speed. Some common nested RAID levels include RAID 0+1, 1+0, 5+0, 6+0,
and (1+0)+0, which are also commonly written as RAID 01, 10, 50, 60, and 100, respectively.
Domain 7: Security Operations
Fault Tolerance - System Redundancy
Redundant Hardware
• Built-in redundancy (power supplies, disk controllers, and NICs are most
common)
• An inventory of spare modules to service the entire datacenter's servers
would be less expensive than having all servers configured with an installed
redundant power supply
Redundant Systems
• Entire systems available in inventory to serve as a means to recover
• Have an SLA with hardware manufacturers to be able to quickly procure
replacement equipment in a timely fashion
Domain 7: Security Operations
BCP and DRP Overview and Process (used to be Domain by itself)
Unique terms and definitions
• Business Continuity Plan (BCP)—a long-term plan to ensure the continuity of
business operations
• Continuity of Operations Plan (COOP)—a plan to maintain operations during a
disaster.
• Disaster—any disruptive event that interrupts normal system operations
• Disaster Recovery Plan (DRP)—a short-term plan to recover from a disruptive event
• Mean Time Between Failures (MTBF)—quantifies how long a new or repaired system
will run on average before failing
• Mean Time to Repair (MTTR)—describes how long it will take to recover a failed
system.
Domain 7: Security Operations
BCP and DRP Overview and Process
Business Continuity Planning and Disaster Recovery Planning are two very distinct
disciplines
Business Continuity Planning (BCP)
• Goal of a BCP is for ensuring that the business will continue to operate
before, throughout, and after a disaster event is experienced
• Focus of a BCP is on the business as a whole
• Business Continuity Planning provides a long-term strategy
• Takes into account items such as people, vital records, and processes in
addition to critical systems
Domain 7: Security Operations
BCP and DRP Overview and Process
Business Continuity Planning and Disaster Recovery Planning are two very
distinct disciplines
Disaster Recovery Planning (DRP)
• Disaster Recovery Plan is more tactical in its approach
• Short-term plan for dealing with specific IT-oriented disruptions
• Provides a means for immediate response to disasters
• Does not focus on long-term business impact
Domain 7: Security Operations
BCP and DRP Overview and Process
Business Continuity Planning and Disaster Recovery Planning are two very distinct
disciplines
Relationship between BCP and DRP
• Business Continuity Plan is an umbrella plan that includes multiple specific
plans, most importantly the Disaster Recovery Plan
• Two plans, which have different scopes, are intertwined
• Disaster Recovery Plan serves as a subset of the overall Business Continuity
Plan
• NIST Special Publication 800-34, provides a visual means for understanding
the interrelatedness of a BCP and a DRP, as well as Continuity of Operations
Plan (COOP), Occupant Emergency Plan (OEP), and others.
Domain 7: Security Operations
Domain 7: Security Operations
Disasters or Disruptive Events
Classifications of disasters
• Three common ways of categorizing the causes for disasters are as to whether the threat agent is
natural, human, or environmental in nature
• Natural—the most obvious type of threat that can result in a disaster are naturally occurring. This category includes
such threats as earthquakes, hurricanes, tornadoes, floods, and some types of fires (closely related to geographical
location)
• Human—the human category of threats represents the most common source of disasters. Human threats can be
further classified as to whether they constitute an intentional or unintentional threat
• Examples of human-intentional threats include terrorists, malware, rogue insider, Denial of Service, hacktivism,
phishing, social engineering, etc.
• Examples of human-unintentional threats are primarily those that involve inadvertent errors and omissions, in which
the person through lack of knowledge, laziness, or carelessness served as a source of disruption
• Environmental—focused on environment as it pertains to the information systems or datacenter. This class of threat
includes items such as power issues (blackout, brownout, surge, spike), system component or other equipment failures,
application or software flaws
• Analysis of threats and associated likelihoods is an important part of the BCP and DRP process
Domain 7: Security Operations
Disasters or Disruptive Events
Domain 7: Security Operations
Disasters or Disruptive Events
Errors and omissions
• Typically considered the single most common source of disruptive events
• Threat is inadvertently caused by humans, most often in the employ of the
organization, who unintentionally serve as a source of harm
• Data entry mistakes are an example of errors and omissions
Natural Disasters
• Include earthquakes, hurricanes, floods, tsunamis, etc.
• Likelihood of natural threats occurring is largely based upon the geographical location
of the organization's information systems or datacenters
• Generally have a rather low likelihood of occurring
• Impact can be severe
Domain 7: Security Operations
Domain 7: Security Operations
Disasters or Disruptive Events
Electrical or power Problems
• Much more common than natural disasters
• Considered an environmental disaster
• Uninterruptible power supplies (UPS) and/or backup generators
Temperature and Humidity Failures
• Critical controls that must be managed during a disaster
• Increased server density can provide for significant heat issues
• Mean Time Between Failures (MTBF) for electrical equipment will decrease if
temperature and humidity levels are not within an tolerable range.
Domain 7: Security Operations
Disasters or Disruptive Events
Warfare, terrorism, and sabotage
• Human-intentional threats
• Threat can vary dramatically based on geographic location, industry, brand
value, as well as the interrelatedness with other high-value target
organizations
• Cyber-warfare
• “Aurora” attacks (named after the word “Aurora,” which was found in a
sample of malware used in the attacks). As the New York Times reported on
2/18/2010: “A series of online attacks on Google and dozens of other
American corporations have been traced to computers at two educational
institutions in China, including one with close ties to the Chinese military, say
people involved in the investigation.”
Domain 7: Security Operations
Disasters or Disruptive Events
Financially-motivated Attackers
• Exfiltration of cardholder data, identity theft, pump-and-dump stock
schemes, bogus anti-malware tools, or corporate espionage, etc.
• Organized crime syndicates
Personnel Shortages
• Another significant source of disruption can come by means of having staff
unavailable
• Most organizations will have some critical processes that are people-
dependent
Domain 7: Security Operations
Disasters or Disruptive Events
Domain 7: Security Operations
Disasters or Disruptive Events
Personnel Shortages
• Pandemics and Disease
• Major biological problems such as pandemic flu or highly communicable infectious disease
outbreaks
• A pandemic occurs when an infection spreads through an extremely large geographical area, while
an epidemic is more localized
• Strikes
• Strikes usually are carried out in such a manner that the organization can plan for the occurrence
• Most strikes are announced and planned in advance, which provides the organization with some
lead time
• Personnel Availability
• Sudden separation from employment of a critical member of the workforce
Domain 7: Security Operations
Disasters or Disruptive Events
Communications Failure
• Increasing dependence of organizations on call centers, IP telephony, general
Internet access, and providing services via the Internet
• One of the most common disaster-causing events is telecommunications lines
being inadvertently cut by someone digging where they are not supposed to
NOTE: One of the eye-opening impacts of Hurricane Katrina was a rather significant outage of Internet2,
which provides high-speed connectivity for education and research networks. Qwest, which provides the
infrastructure for Internet2, suffered an outage in one of the major long-haul links that ran from Atlanta to
Houston. Reportedly, the outage was due to lack of availability of fuel in the area. In addition to this outage,
which impacted more than just those areas directly affected by the hurricane, there were substantial outages
throughout Mississippi, which at its peak had more than a third of its public address space rendered
unreachable.
Domain 7: Security Operations
The Disaster Recovery Process
The general process of disaster recovery involves responding to the
disruption; activation of the recovery team; ongoing tactical
communication of the status of disaster and its associated recovery;
further assessment of the damage caused by the disruptive event; and
recovery of critical assets and processes in a manner consistent with the
extent of the disaster.
• Different organizations and experts alike might disagree about the
number or names of phases in the process
• Personnel safety remains the top priority
Domain 7: Security Operations
The Disaster Recovery Process
Respond
• Initial response begins the process of assessing the damage
• Speed is essential (initial assessment)
• The initial assessment will determine if the event in question constitutes a
disaster
• The initial response team should be mindful of assessing the facility's safety
for continued personnel usage
Activate Team
If during the initial response to a disruptive event a disaster is declared, then
the team that will be responsible for recovery needs to be activated.
Domain 7: Security Operations
The Disaster Recovery Process
Communicate
• Ensure that consistent timely status updates are communicated back to the central
team managing the response and recovery process
• Communication often must occur out-of-band
• The organization must also be prepared to provide external communications
Assess
• More detailed and thorough assessment
• Assess the extent of the damage and determine the proper steps to ensure the
organization's ability to meet its mission and Maximum Tolerable Downtime (MTD)
• Team could recommend that the ultimate restoration or reconstitution occurs at the
alternate site
Domain 7: Security Operations
The Disaster Recovery Process
Reconstitution
• Successfully recover critical business operations either at primary or
secondary site
• If an alternate site is leveraged, adequate safety and security controls
must be in place in order to maintain the expected degree of security
the organization typically employs
• A salvage team will be employed to begin the recovery process at the
primary facility that experienced the disaster
Domain 7: Security Operations
Developing a BCP/DRP
• High-level steps, according to NIST 800-34:
• Project Initiation
• Scope the Project
• Business Impact Analysis
• Identify Preventive Controls
• Recovery Strategy
• Plan Design and Development
• Implementation, Training, and Testing
• BCP/DRP Maintenance
• NIST 800-34 is the National Institute of Standards and Technologies Information
Technology Contingency Planning Guide, which can be found at
http://csrc.nist.gov/publications/nistpubs/800-34/sp800-34.pdf.
Domain 7: Security Operations
Project Initiation
In order to develop the BCP/DRP, the scope of the project must be determined
and agreed upon. This involves seven distinct milestones:
• 1. Develop the contingency planning policy statement: A formal department
or agency policy provides the authority and guidance necessary to develop an
effective contingency plan.
• 2. Conduct the business impact analysis (BIA): The BIA helps to identify and
prioritize critical IT systems and components. A template for developing the
BIA is also provided to assist the user.
• 3. Identify preventive controls: Measures taken to reduce the effects of
system disruptions can increase system availability and reduce contingency
life cycle costs.
Domain 7: Security Operations
Project Initiation
In order to develop the BCP/DRP, the scope of the project must be determined
and agreed upon. This involves seven distinct milestones:
• 4. Develop recovery strategies: Thorough recovery strategies ensure that the
system may be recovered quickly and effectively following a disruption.
• 5. Develop an IT contingency plan: The contingency plan should contain
detailed guidance and procedures for restoring a damaged system.
• 6. Plan testing, training, and exercises: Testing the plan identifies planning
gaps, whereas training prepares recovery personnel for plan activation; both
activities improve plan effectiveness and overall agency preparedness.
• 7. Plan maintenance: The plan should be a living document that is updated
regularly to remain current with system enhancements.
Domain 7: Security Operations
Management Support
“C”-level managers:
• Must agree to any plan set forth
• Must agree to support the action items listed in the plan if an emergency
event occurs
• Refers to people within an organization like the chief executive officer (CEO),
the chief operating officer (COO), the chief information officer (CIO), and the
chief financial officer (CFO)
• Have enough power and authority to speak for the entire organization when
dealing with outside media
• High enough within the organization to commit resources
Domain 7: Security Operations
Other Roles
BCP/DRP Project Manager
• Key Point of Contact for ensuring that a BCP/DRP is completed and routinely
tested
• Must be a good manager and leader in case there is an event that causes the
BCP or DRP to be implemented
• Point of Contact (POC) for every person within the organization during a crisis
• Must be very organized
• Credibility and enough authority within the organization to make important,
critical decisions with regard to implementing the BCP/DRP
• Does not need to have in-depth technical skills
Domain 7: Security Operations
Other Roles
Continuity Planning Project Team (CPPT)
• Comprises those personnel that will have responsibilities if/when an
emergency occurs
• Comprised of stakeholders within an organization
• Focuses on identifying who needs to play a role if a specific emergency event
were to occur
• Includes people from the human resources section, public relations (PR), IT
staff, physical security, line managers, essential personnel for full business
effectiveness, and anyone else responsible for essential functions
Domain 7: Security Operations
Scoping the Project
• Define exactly what assets are protected by the plan, which emergency
events the plan will be able to address, and determining the resources
necessary to completely create and implement the plan
• “What is in and out of scope for this plan?”
• After receiving C-level approval and input from the rest of the
organization, objectives and deliverables can be determined
Domain 7: Security Operations
Scoping the Project
• Objectives are usually created as “if/then” statements
• For example, “If there is a hurricane, then the organization will enact plan H—the Physical
Relocation and Employee Safety Plan.” Plan H is unique to the organization but it does encompass
all the BCP/DRP subplans required
• An objective would be to create this plan and have it reviewed by all members of the organization
by a specific date.
• The objective will have a number of deliverables required to create and fully vet this plan: for
example, draft documents, exercise planning meetings, table top preliminary exercises, etc.
• Executive management must at least ensure that support is given for three BCP/DRP
items:
• 1. Executive management support is needed for initiating the plan.
• 2. Executive management support is needed for final approval of the plan.
• 3. Executive management must demonstrate due care and due diligence and be held liable under
applicable laws/regulations.
Domain 7: Security Operations
Assessing the Critical State
• Assessing the critical state can be difficult
because determining which pieces of the IT
infrastructure are critical depends solely on
the how it supports the users within the
organization.
• When compiling the critical state and asset
list associated with it, the BCP/DRP project
manager should note how the assets impact
the organization in a section called the
“Business Impact” section.
Domain 7: Security Operations
Conduct Business Impact Analysis (BIA)
• Formal method for determining how a disruption to the IT system(s) of an
organization will impact the organization
• An analysis to identify and prioritize critical IT systems and components
• Enables the BCP/DRP project manager to fully characterize the IT contingency
requirements and priorities
• Objective is to correlate the IT system components with the critical service it
supports
• Also aims to quantify the consequence of a disruption to the system component and
how that will affect the organization
• Determine the Maximum Tolerable Downtime (MTD) for a specific IT asset
• Also provides information to improve business processes and efficiencies because it
details all of the organization's policies and implementation efforts
The BIA is comprised of two processes; Identification of critical
assets and a comprehensive risk assessment.
Domain 7: Security Operations
Conduct Business Impact Analysis (BIA)
Identify Critical Assets
• BIA and Critical State Asset List is conducted for every IT system within the
organization, no matter how trivial or unimportant, leading to…
• A list of those IT assets that are deemed business-essential by the
organization
Conduct BCP/DRP-focused Risk Assessment
• Determines what risks are inherent to which IT assets
• A vulnerability analysis is also conducted for each IT system and major
application
Domain 7: Security Operations
Conduct Business Impact Analysis (BIA)
Domain 7: Security Operations
Determine Maximum Tolerable Downtime
• Describes the total time a system can be inoperable before an organization is
severely impacted
• It is also the maximum time it takes to execute the reconstitution phase
• Comprised of two metrics; Recovery Time Objective (RTO) and the Work
Recovery Time (WRT)
Alternate terms for MTD
• Depending on the business continuity framework that is used, other terms
may be substituted for Maximum Tolerable Downtime. These include
Maximum Allowable Downtime (MAD), Maximum Tolerable Outage (MTO),
and Maximum Acceptable Outage (MAO).
Domain 7: Security Operations
Failure and Recovery Metrics
• Used to quantify how frequently systems fail, how long a system may
exist in a failed state, and the maximum time to recover from failure.
• These metrics include the Recovery Point Objective (RPO), Recovery
Time Objective (RTO), Work Recovery Time (WRT), Mean Time
Between Failures (MTBF), Mean Time to Repair (MTTR), and
Minimum Operating Requirements (MOR).
Domain 7: Security Operations
Recovery Point Objective
• The amount of data loss or system inaccessibility (measured in time)
that an organization can withstand.
• “If you perform weekly backups, someone made a decision that your
company could tolerate the loss of a week's worth of data. If backups
are performed on Saturday evenings and a system fails on Saturday
afternoon, you have lost the entire week's worth of data. This is the
recovery point objective. In this case, the RPO is 1 week.”
• RPO represents the maximum acceptable amount of data/work loss
for a given process because of a disaster or disruptive event
Domain 7: Security Operations
Recovery Time Objective (RTO) and Work Recovery Time (WRT)
• Recovery Time Objective (RTO) describes the maximum time allowed
to recover business or IT systems
• RTO is also called the systems recovery time. One part of Maximum
Tolerable Downtime: once the system is physically running, it must be
configured.
• Work Recovery Time (WRT) describes the time required to configure a
recovered system.
• “Downtime consists of two elements, the systems recovery time and
the work recovery time. Therefore, MTD = RTO + WRT.”
Domain 7: Security Operations
Mean Time Between Failures
• Quantifies how long a new or repaired system will run before failing
• Typically generated by a component vendor and is largely applicable to
hardware as opposed to applications and software.
• A vendor selling LCD computer monitors may run 100 monitors 24 hours a
day for 2 weeks and observe just one monitor failure. The vendor then
extrapolates the following:
100 LCD Monitors x 14 days x 24 hours/day = 1 failure/33,600 hours
• The BCP/DRP team determines the correct amount of expected failures
within the IT system during a course of time.
• Calculating the MTBF becomes less reliant when an organization uses fewer
and fewer hardware assets.
Domain 7: Security Operations
Mean Time to Repair (MTTR)
• Describes how long it will take to recover a specific failed system
• Best estimate for reconstituting the IT system so that business continuity may
occur
Minimum Operating Requirements
• Describes the minimum environmental and connectivity requirements in
order to operate computer equipment
• Important to determine and document for each IT-critical asset because, in
the event of a disruptive event or disaster, proper analysis can be conducted
quickly to determine if the IT assets will be able to function in the emergency
environment
Domain 7: Security Operations
Identify Preventive Controls
• Preventive controls prevent disruptive events from having an impact
• The BIA will identify some risks which may be mitigated immediately
Recovery Strategy
• Once the BIA is complete, the BCP team knows the Maximum Tolerable
Downtime. This metric, as well as others including the Recovery Point
Objective and Recovery Time Objective, are used to determine the recovery
strategy.
• Always maintain technical, physical, and administrative controls when using
any recovery option
Domain 7: Security Operations
Recovery Strategy
Domain 7: Security Operations
Recovery Strategy
Supply Chain Management
• In an age of “just in time” shipment of goods, organizations may fail to acquire
adequate replacement computers.
• Some computer manufactures offer guaranteed replacement insurance for a specific
range of disasters. The insurance is priced per server, and includes a service level
agreement that specifies the replacement time. All forms of relevant insurance
should be analyzed by the BCP team.
Telecommunication Management
• Ensures the availability of electronic communications during a disaster
• Often one of the first processes to fail during a disaster
• Wired circuits such as T1s, T3s, frame relay, etc., need to be specifically addressed
• Power can be provided by generator if necessary.
Domain 7: Security Operations
Recovery Strategy
Utility Management
• Utility management addresses the availability of utilities such as power,
water, gas, etc. during a disaster
• The utility management plan should address all utilities required by business
operations, including power, heating, cooling, and water.
• Specific sections should address the unavailability of any required utility.
Recovery options
• Once an organization has determined its maximum tolerable downtime, the
choice of recovery options can be determined. For example, a 10-day MTD
indicates that a cold site may be a reasonable option. An MTD of a few hours
indicates that a redundant site or hot site is a potential option.
Domain 7: Security Operations
Recovery Options
Redundant Site
• A redundant site is an exact production duplicate of a system that has the capability to seamlessly
operate all necessary IT operations without loss of services to the end user of the system.
• A redundant site receives data backups in real time so that in the event of a disaster, the users of the
system have no loss of data.
• The most expensive recovery option
Hot Site
• A hot site is a location that an organization may relocate to following a major disruption or disaster.
• It is a datacenter with a raised floor, power, utilities, computer peripherals, and fully configured
computers.
• Will have all necessary hardware and critical applications data mirrored in real time.
• A hot site will have the capability to allow the organization to resume critical operations within a
very short period of time—sometimes in less than an hour.
• Has all the same physical, technical, and administrative controls implemented of the production site.
Domain 7: Security Operations
Recovery Options
Warm Site
• Has some aspects of a hot site, for example, readily-accessible hardware and connectivity, but it will have
to rely upon backup data in order to reconstitute a system after a disruption.
• It is a datacenter with a raised floor, power, utilities, computer peripherals, and fully configured computers.
• MTD of at least 1-3 days
• The longer the MTD is, the less expensive the recovery solution will be.
Cold Site
• The least expensive recovery solution to implement.
• Does not include backup copies of data, nor does it contain any immediately available hardware.
• Longest amount of time of all recovery solutions to implement and restore critical IT services for the
organization
• MTD—usually measured in weeks, not days.
• Typically a datacenter with a raised floor, power, utilities, and physical security, but not much beyond that.
Domain 7: Security Operations
Recovery Options
Reciprocal Agreement
• A bi-directional agreement between two organizations in which one organization
promises another organization that it can move in and share space if it experiences a
disaster.
• Documented in the form of a contract
• Also referred to as Mutual Aid Agreements (MAAs)
Mobile Site
• “datacenters on wheels”: towable trailers that contain racks of computer equipment,
as well as HVAC, fire suppression and physical security.
• A good fit for disasters such as a datacenter flood
• Typically placed within the physical property lines, and are protected by defenses
such as fences, gates, and security cameras
Domain 7: Security Operations
Recovery Options
Subscription Services
• Some organizations outsource their BCP/DRP planning and/or
implementation by paying another company to perform those
services.
• Effectively transfers the risk to the insurer company.
• Based upon a simple insurance model, and companies such as
IBM have built profit models and offer services for customers
offering BCP/DRP insurance.
Domain 7: Security Operations
Related Plans
The Business Continuity Plan is an umbrella plan that contains others
plans:
• Disaster recovery plan
• Continuity of Operations Plan (COOP)
• Business Resumption/Recovery Plan (BRP)
• Continuity of Support Plan
• Cyber Incident Response Plan
• Occupant Emergency Plan (OEP)
• Crisis Management Plan (CMP)
Domain 7: Security Operations
Related Plans
Domain 7: Security Operations
Related Plans
Continuity of Operations Plan (COOP)
• Describes the procedures required to maintain operations during a disaster
• Includes transfer of personnel to an alternate disaster recovery site, and operations of that
site.
Business Recovery Plan (BRP)
• Also known as the Business Resumption Plan
• Details the steps required to restore normal business operations after recovering from a
disruptive event
• May include switching operations from an alternate site back to a (repaired) primary site.
• Picks up when the COOP is complete
• Narrow and focused: the BRP is sometimes included as an appendix to the Business Continuity
Plan
Domain 7: Security Operations
Related Plans
Continuity of Support Plan
• Focuses narrowly on support of specific IT systems and applications
• Also called the IT Contingency Plan, emphasizing IT over general business support
Cyber Incident Response Plan
• Designed to respond to disruptive cyber events, including network-based attacks, worms, computer
viruses, Trojan horses, etc.
Occupant Emergency Plan (OEP)
• Provides the “response procedures for occupants of a facility in the event of a situation posing a potential
threat to the health and safety of personnel, the environment, or property. Such events would include a
fire, hurricane, criminal attack, or a medical emergency.”
• Facilities-focused, as opposed to business or IT-focused.
• Focused on safety and evacuation, and should describe specific safety drills, including evacuation drills
(also known as fire drills)
• Specific safety roles should be described, including safety warden and meeting point leader
Domain 7: Security Operations
Related Plans
Crisis Management Plan (CMP)
• Designed to provide coordination among the managers of the
organization in the event of an emergency or disruptive event
• Details the actions management must take to ensure that life and
safety of personnel and property are immediately protected in case of
a disaster
• Crisis Communications Plan
• Component of the Crisis Management Plan
• Sometimes called the communications plan
• A plan for communicating to staff and the public in the event of a disruptive event
Domain 7: Security Operations
Related Plans
• Crisis Communications Plan
• Call Trees
• Is used to quickly communicate news throughout an organization without
overburdening any specific person
• Works by assigning each employee a small number of other employees they are
responsible for calling in an emergency event
• Most effective when there is two-way reporting of successful communication
• Should contain alternate contact methods, in case the primary methods are
unavailable
Domain 7: Security Operations
Calling Tree
Domain 7: Security Operations
Related Plans
• Crisis Communications Plan
• Automated Call Trees
• Automatically contact all BCP/DRP team members after a disruptive event
• Tree can be activated by an authorized member, triggered by a phone call, email,
or Web transaction
• Once triggered, all BCP/DRP members are automatically contacted
• Can require positive verification of receipt of a message, such as “press 1 to
acknowledge receipt.”
• Automated call trees are hosted offsite, and typically supported by a third-party
BCP/DRP provider
Domain 7: Security Operations
Related Plans
• Crisis Communications Plan
• Emergency Operations Center (EOC)
• The command post established during or just after an emergency event
• Placement of the EOC will depend on resources that are available
• Vital Records
• Should be stored offsite, at a location and in a format that will allow access during
a disaster
• Have both electronic and hardcopy versions of all vital records
• Include contact information for all critical staff. Additional vital records include
licensing information, support contracts, service level agreements, reciprocal
agreements, telecom circuit IDs, etc.
Domain 7: Security Operations
Executive Succession Planning
• Organizations must ensure that there is always an executive
available to make decisions during a disaster
• A common mistake is allowing entire executive teams to be
offsite at distant meetings
• One of the simplest executive powers is the ability to endorse
checks and procure money.
Domain 7: Security Operations
Plan Approval
• Now that the initial BCP/DRP plan has been completed, senior
management approval is the required next step
• It is ultimately senior management's responsibility to protect an
organization's critical assets and personnel
• Senior management must understand that they are responsible
for the plan, fully understand the plan, take ownership of it, and
ensure its success.
Domain 7: Security Operations
Backups and availability (again…)
• In order to be able to successfully recover critical business operations,
the organization needs to be able to effectively and efficiently backup
and restore both systems and data
• Verification of recoverability from backups is often overlooked
• Critical backup media must be stored offsite
• Ensure that the organization can quickly procure large high-end tape
drives (if necessary)
• If the MTTR is greater than the MTD, then an alternate backup or
availability methodology must be employed
Domain 7: Security Operations
Backups and availability (again…)
Hardcopy Data
• Hardcopy data is any data that are accessed through reading or
writing on paper rather than processing through a computer
system.
• In weather-emergency-prone areas such as Florida, Mississippi,
and Louisiana, many businesses develop a “paper only” DRP,
which will allow them to operate key critical processes with just
hard copies of data, battery-operated calculators, and other small
electronics, as well as pens and pencils
Domain 7: Security Operations
Backups and availability (again…)
Electronic Backups
• Archives that are stored electronically
• Full Backups
• Every piece of data is copied and stored on the backup repository
• Time consuming, bandwidth intensive, and resource intensive
• Will ensure that any necessary data is available
• Incremental Backups
• Archive data that have changed since the last full or incremental backup
• Differential Backups
• Archive data that have changed since the last full backup
Domain 7: Security Operations
Backups and availability (again…)
Electronic Backups
• Archives that are stored electronically
• Electronic vaulting
• Batch process of electronically transmitting data that is to be backed up on a routine, regularly
scheduled time interval
• Used to transfer bulk information to an offsite facility
• Good tool for data that need to be backed up on a daily or possibly even hourly rate
• Stores sensitive data offsite
• Can perform the backup at very short intervals to ensure that the most recent data is backed up
• Occurs across the Internet in most cases (important that the information sent for backup be sent
via a secure communication channel and protected through a strong encryption protocol)
Domain 7: Security Operations
Backups and availability (again…)
Electronic Backups
• Archives that are stored electronically
• Remote Journaling
• A database journal contains a log of all database transactions
• May be used to recover from a database failure
• Remote Journaling saves the database checkpoints and database journal to a remote
site
• Database shadowing
• Uses two or more identical databases that are updated simultaneously
• Can exist locally, but it is best practice to host one shadow database offsite
• Allows faster recovery when compared with remote journaling
Domain 7: Security Operations
Software Escrow
• Maintain the availability of their applications even if the vendor
that developed the software initially goes out of business
• Allow a neutral third party to hold the source code
• Should the development organization go out of business or
otherwise violate the terms of the software escrow agreement,
then the third party holding the escrow will provide the source
code and any other information to the purchasing organization.
Domain 7: Security Operations
DRP testing, training, and awareness
• Skipping these steps is one of the most common BCP/DRP mistakes
• A DRP is never complete, but is rather a continually amended method
for ensuring the ability for the organization to recover in an acceptable
manner
• Used to correct mistakes
• A DRP that will be effective will have some inherent complex
operations and maneuvers to be performed by administrators
• Each member of the DRP should be exceedingly familiar with the
particulars of their role in a DRP
Domain 7: Security Operations
DRP Testing
• In order to ensure that a Disaster Recovery Plan represents a viable
plan for recovery, thorough testing is needed
• Routine infrastructure, hardware, software, and configuration changes
materially alter the way in which the DRP needs to be carried out
• Ensure both the initial and continued efficacy of the DRP as a feasible
recovery methodology, testing needs to be performed.
• Different types of tests
• At an minimum, regardless of the type of test selected, tests should be
performed on an annual basis
Domain 7: Security Operations
DRP Testing
DRP Review
• Most basic form of DRP testing
• Focused on simply reading the DRP in its entirety to ensure completeness of coverage
• Typically performed by the team that developed the plan, and will involve team members
reading the plan in its entirety to quickly review the overall plan for any obvious flaws
Checklist
• Also known as consistency testing
• Lists all necessary components required for successful recovery, and ensures that they are, or
will be, readily available should a disaster occur
• Often performed concurrently with the structured walkthrough or tabletop testing as a first
testing threshold
• Focused on ensuring that the organization has, or can acquire in a timely fashion, sufficient
resources on which their successful recovery is dependent
Domain 7: Security Operations
DRP Testing
Parallel Processing
• Common in environments where transactional data is a key component of the
critical business processing
• Typically involves recovery of critical processing components at an alternate
computing facility, and restore data from a previous backup
• Regular production systems are not interrupted
• Transactions from the day after the backup are then run against the newly
restored data, and the same results achieved during normal operations for
the date in question should be mirrored by the recovery system's results
• Organizations that are highly dependent upon mainframe and midrange
systems will often employ this type of test.
Domain 7: Security Operations
DRP Testing
Partial and Complete Business Interruption
• This type of test can actually be the cause of a disaster, so
extreme caution should be exercised before attempting an
actual interruption test
• Testing will include having the organization stop processing
normal business at the primary location, and instead leverage
the alternate computing facility
• More common in organizations where fully redundant, load-
balanced, operations exist
Domain 7: Security Operations
Training
• An element of DRP training comes as part of performing the tests
• More detailed training on some specific elements of the DRP process may be
required.
Starting Emergency Power
• Converting a datacenter to emergency power, such as backup generators
• Specific training and testing of changing over to emergency power should be
regularly performed.
Calling Tree Training/Test
• Individuals with calling responsibilities are expected to be able to answer
within a very short time period, or otherwise make arrangements.
Domain 7: Security Operations
Awareness
Even for those members who have little active role with
respect to the overall recovery process, there is still the
matter of ensuring that all members of an organization are
aware of the organization's prioritization of safety and
business viability in the wake of a disaster.
Domain 7: Security Operations
Continued BCP/DRP maintenance
• The BCP/DRP must be kept up to date
• BCP/DRP plans must keep pace with all critical business and IT changes.
Change Management
• The Change Management process is designed to ensure that security is not adversely
affected as systems are introduced, changed, and updated.
• Includes tracking and documenting all planned changes, formal approval for
substantial changes, and documentation of the results of the completed change
• All changes must be auditable
• The change control board manages this process
• The BCP team should be a member of the change control board, and attend all
meetings to identify any changes that must be addressed by the BCP/DRP plan
Domain 7: Security Operations
BCP/DRP Mistakes
Common BCP/DRP mistakes include:
• Lack of management support
• Lack of business unit involvement
• Lack of prioritization among critical staff
• Improper (often overly narrow) scope
• Inadequate telecommunications management
• Inadequate supply chain management
• Incomplete or inadequate crisis management plan
• Lack of testing
• Lack of training and awareness
• Failure to keep the BCP/DRP plan up to date
Domain 7: Security Operations
Specific BCP/DRP frameworks
A handful of specific frameworks include NIST SP 800-34,
ISO/IEC-27031, and BCI.
NIST SP 800-34
• The National Institute of Standards and Technology (NIST)
Special Publication 800-34 “Contingency Planning Guide for
Information Technology Systems”
• May be downloaded at
http://csrc.nist.gov/publications/nistpubs/800-34/sp800-
34.pdf.
Domain 7: Security Operations
Specific BCP/DRP frameworks
ISO/IEC-27031
• Draft guideline that is part of the ISO 27000 series, which also includes ISO 27001 and ISO 27002
• Focuses on BCP (DRP is handled by another framework)
• The current formal name is “ISO/IEC 27031 Information technology—Security techniques—Guidelines for ICT Readiness
for Business Continuity (final committee draft).” According to http://www.iso27001security.com/html/27031.html,
ISO/IEC 27031 is designed to:
• “Provide a framework (methods and processes) for any organization—private, governmental, and nongovernmental;
• Identify and specify all relevant aspects including performance criteria, design, and implementation details, for improving ICT readiness as
part of the organization's ISMS, helping to ensure business continuity;
• Enable an organization to measure its continuity, security and hence readiness to survive a disaster in a consistent and recognized manner.”
• Terms and acronyms used by ISO/IEC 27031 include:
• ICT—Information and Communications Technology
• ISMS—Information Security Management System
• A separate ISO plan for disaster recovery is ISO/IEC 24762:2008, “Information technology—Security techniques—
Guidelines for information and communications technology disaster recovery services.” More information is available at
http://www.iso.org/iso/catalogue_detail.htm?csnumber=41532
Domain 7: Security Operations
Specific BCP/DRP frameworks
BS-25999
• British Standards Institution (BSI, http://www.bsigroup.co.uk/) released BS-25999, which is in two parts:
• “Part 1, the Code of Practice, provides business continuity management best practice recommendations. Please note that this is a guidance
document only.
• Part 2, the Specification, provides the requirements for a Business Continuity Management System (BCMS) based on BCM best practice.
This is the part of the standard that you can use to demonstrate compliance via an auditing and certification process.”14
BCI
• The Business Continuity Institute (BCI, http://www.thebci.org/) published a six-step Good Practice Guidelines (GPG) in
2008, latest version is 2013 which describes the Business Continuity Management (BCM) process:
• Management Practices
• PP1 Policy & Program Management
• PP2 Embedding Business Continuity
• Technical Practices
• PP3 Analysis
• PP4 Design
• PP5 Implementation
• PP6 Validation
Domain 7: Security Operations
Thank you.