WP Creating Maintaining Soc PDF
WP Creating Maintaining Soc PDF
WP Creating Maintaining Soc PDF
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Define the Security Operations Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
This white paper was written by:
McAfee® Foundstone® Determine the Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Professional Services
Required templates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Reporting process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Understand the Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Developing Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Identify the Customer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Staff the SOC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Staffing schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Holiday coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Shift logs, incident logs, and turnover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Event Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Incident assignment, update, and escalation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Security severity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Incident and event categorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Incident resolution and escalation procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Third-party resolution and escalation procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Incident escalation contact list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Escalation guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Tier functional responsibilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Leveraging the IT Infrastructure Library (ITIL) Service Management Lifecycle. . . . . . . . . . . . . . . . . . . . . . 14
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
About McAfee Foundstone Professional Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Introduction
Security is becoming more and more established in the corporate structure—it is no longer
acceptable for security to be a secondary function of an IT department. To address this challenge,
organizations are investing in the development of security operations centers (SOCs) to provide
increased security and rapid response to events throughout their networks. Building a SOC can be
a monumental task. Although the finer points of SOC deployment are very much network-specific,
there are several major components that every organization must include: people, process, and
technology. The three exist in all elements of security and should be considered equally critical
components. This paper explains how strong people and well-defined processes can result in an
operationally effective SOC.
Proper planning is critical in the development and implementation phases. As with many security
programs an iterative process is most effective in developing a refined set of procedures. This
approach will allow an organization to more quickly recognize benefits from their investment,
positioning them to take advantage of knowledge gained and lessons learned through the
actual operation of the SOC. It is important to set appropriate expectations and timelines for the
deployment of the SOC so the initial operational period is viewed as a period for refinement.
This manual will continually be used as a reference for the SOC staff and management. The
definition statement should be clear and provide specific detail as described in the below
example statement:
“The SOC is responsible for monitoring, detecting, and isolating incidents and the
management of the organization’s security products, network devices, end-user devices,
and systems. This function is performed seven days a week, 24 hours per day. The SOC is
the primary location of the staff and the systems dedicated for this function.”
The above example may not be comprehensive for some organizations and should be expanded
upon with more specific details based on your organization’s mission and objectives. Once the
responsibility definition has been documented, a list of service functions for the SOC must be
defined. These may include:
Service Functions
• Status Monitoring and Incident Detection. • Computing Equipment and Endpoint Devices.
–– SIEM Console. –– Remote administration.
–– AV Console. –– Update antivirus.
–– IPS Console. –– Tune HIPS alerts.
–– DLP Console. –– Configure whitelisting.
• Initial Diagnostics and Incident Isolation. • Work with Third-Party Vendors.
• Problem Correction. • Escalation to Next Tier Level.
• Security Systems and Software. • Closure of Incidents.
–– Update and test DAT definitions. –– Coordination with tier levels.
–– Apply corrective IDS/IPS and Firewall Rules. –– Coordination with end users and system administrators.
–– Apply other corrective software as instructed or required. • Persistent Threat Investigation.
The service functions, once defined, will guide the daily processes and procedures for the SOC staff.
Once each service is defined, each tier within the SOC can be assigned a series of responsibilities
based on each individual’s expertise within the tier level. For example, monitoring the antivirus (AV)
and security information and event management (SIEM) console may be a service function of every
tier; however working with third-party vendors may be a service function only reserved for tier 2 or
tier 3 SOC staff. Once each service function is defined, a series of documents must be developed
to ensure the appropriate information is gathered during an event or incident and to ensure
consistency across all SOC staff.
Many of the procedures listed above may need to be customized based on the type of technology
in use. For example, a monitoring procedure for McAfee® Enterprise Security Manager—the Intel®
Security SIEM solution—would be very different than the monitor procedure for another vendor’s
SIEM product, although they share some of the same characteristics.
Required templates
A series of baseline templates should be created to help maintain documentation consistency by
establishing the same format and basic information sets across policy and procedure documents.
For example, templates for proper data input into ticketing systems and the GRC system will need to
be developed to help ensure the appropriate technical information is gathered. A few key templates
required are:
■■ Shift log templates for each use case.
■■ Templates for each incident trouble ticket category.
Reporting process
As a primary function, regular reports will need to be generated and provided to different audiences
within the organization. Usually a weekly report is prepared for incidents, detailing the activity within
the SOC. These reports can be delivered to management and other members on the core escalation
contact list.
The SOC manager should review all incident records regularly to ensure they were resolved within
the parameters of the defined severity levels. The manager should also audit incident records
that have exceeded standard resolution times to validate that the incident records were handled
appropriately. The SOC processes and procedures should be reviewed regularly and updated based
on the report data reviews and audits. In addition, many other reports can be created depending on
the type of data received or requested by management.
For a very detailed list of reports, refer to the “Operationalizing Information Security Putting the Top
10 SIEM Best Practices to Work” by Scott Gordon in the references section. Among these items are
other key reports to measure staff on, including:
■■ Shift log metrics.
■■ Trouble Ticket metrics.
As a part of the SOC’s service functions the security architecture will be defined and the SOC staff
will have access to the different components and tools within that architecture. These may include,
but are not limited to:
■■ SIEM monitoring and correlation.
■■ Antivirus monitoring and logging.
■■ Network and host IDS/IPS monitoring and logging.
■■ Network and host DLP monitoring and logging.
■■ Centralized logging platforms (syslog, etc.).
■■ Email and spam gateway and filtering.
Use Cases
Use Case development is a critical component within a SOC and it must be understood. Below are
two good write-ups that can be used to help understand the process for creating Use Cases as well
as additional reporting that can be defined for the SOC environment.
■■ “SIEM Use Cases—What You Need to Know” by msonomer.
■■ “Operationalizing Information Security Putting the Top 10 SIEM Best Practices to Work”
by Scott Gordon.
Also see the References Section for more details.
Customer Customer
Inbound Helpdesk
Inbound
Process
1. Phone
2. Email SOC Level 1
Escalation
SOC Level 1
Supervisor SOC Level 2
Escalation
1. Forensics
2. 3rd Party SOC Level 3
Finding the right skills and hiring staff is a difficult task at the current time because there are a
limited number of security professionals in the market. The security staff within the SOC must have
a solid background in many different aspects of computer technology usually focusing on networks,
applications, and in some cases, reverse engineering. In addition, a good manager or director is
required to ensure documentation, optimization, and reporting are maintained appropriately. Typical
roles within a SOC may include:
■■ Security Analyst.
■■ Security Specialists.
■■ Forensics or Threat Investigators.
■■ Manager or Director.
Staffing schedule
When setting up a SOC, ensuring you have appropriate coverage is critical. Some SOC operations
will support 24/7 operations, and others will have limited remote support after certain hours. The
following tables are a partial representation of the staffing hours for an eight-week period. Each SOC
engineer is assigned per the shift schedule for the eight-week period. These engineers are identified
by A which reflects the morning shift in the SOC and the afterhours shift Monday through Friday.
The B represents the afternoon shift in the NOC center and the pager shift over the weekends.
Creating and Maintaining a SOC 7
White Paper
Staff Level 1 2 3 4 5 6 7 8
Manager M
SOC
SE A A A A A A A A
Engineer
SOC
SE B B B B B B B B
Engineer
00:00–00:30 A A A A A B B
….. A A A A A B B
06:00–06:30 A A A A A B B
06:30–07:00 A A A A A B B
07:00–07:30 AB AB AB AB AB B B
07:30–08:00 B B B B B B B
…… B B B B B B B
Holiday coverage
One item typically overlooked is holiday coverage. In most cases, holidays should be treated as
normal business days. There should be dedicated staff in the SOC for the given shift as described in
the organization’s staffing schedule. All responsibilities regarding standard shift schedules should
also be in effect.
Some very specific shift log procedures that are typically overlooked are:
■■ Entries on the shift log are mandatory for each shift; a “blank” entry is not acceptable.
■■If there is no activity or no open problems to turn over, put an entry in the log that says
“No incidents to turn over.”
Shift log entries should use a defined format that includes the following:
■■ Details of the event.
■■ Impact of the threat to the organization or asset.
■■ Description of the items found during the investigation while researching the event.
■■ Recommendations for the next analyst that might be taking over the incident.
If possible, shift logs should be maintained in a secure role access controlled system such as a GRC.
A typical example of a shift log is below.
Details:
The SOC has detected traffic from <source IP / hostname> to <destination IPs> over <ports>.
Information gathered would indicate the asset is infected with malware. Traffic activity is being
reported by <device detecting traffic>.
Impact:
Malware is performing a remote call back, possibly leaking data or expanding its presence in
the network.
Description:
<Detailed observations of the pattern and activity>.
Recommendations:
Find the source IP asset. Contain the device. If no signs of malware are found, determine the
cause for the detected event and remediate. If signs of malware are found, perform the required
antivirus updates and/or forensics on the machine. Remediate or clean the system prior to
connecting it back on the network.
In addition to shift logs, incident log entries should also be kept. Although incidents should be
maintained in a ticketing system, daily log entries should be used to transfer incidents. This log
should follow a defined format that includes the following information: time stamp, staff initials, the
incident record number, and a brief description of the incident or event. An example of a typical
incident log entry is below.
Event Management
The core function and technology within a SOC are based on events from hundreds or even
thousands of different systems. Essentially the SOC is the correlation point for every event logged
within the organization that is being monitored. For each of these events, the SOC must decide how
they will be managed and acted upon. The management of events must include a list of instructions
that apply on a 24x7 basis. This does not necessarily have to be the Incident Response Program
Guide or Handbook. An event is any element that comes into the SOC and is monitored; while an
incident is an event that must be acted upon.
As a part of event management, the SOC provides telephone and email assistance to its customers
covering some of the following areas:
■■ Malware outbreak.
■■ Phishing attacks.
■■ Social engineering calls.
■■ Access to the organization’s security portal.
■■ Data leak/loss incidents.
■■ Customer account lockout.
■■ Customer inquiries.
Also defining the guidelines for the level-one SOC support is important. These may include:
■■ Open an incident ticket for any problems noticed and reported.
■■ Serve as the initial point of contact for customers on the organization’s network.
■■ Maintain daily shift logs.
■■ Perform rudimentary testing and diagnosis.
■■ Validate that the incident is not a user error.
■■ Formally assign the incident to the SOC.
Priority 1 Multiple systems and devices affected/compromised or possible data breach. Within 10 minutes
Priority should not be confused with severity. Severity will be explained below. Priority is the level of
response time identified when the incident ticket is created or updated based on the extent of the
impact.
Security severity
Providing clear and adequate details on severity levels is required for all levels of the SOC and its
customers. Typically four or five severity levels are used. Organizations will want to be very specific
in defining the different levels. Below is an example.
Severity Explanation
In addition, each severity must be expanded upon. For example, Severity Level 1 may be
described as:
SEVERITY 1: HIGH
■■ System component complete compromise and possible full data-privacy breach.
■■ Critical impact to the organization (reputational).
■■ Attack possibly still in progress.
■■ Multiple systems, groups, and users affected.
■■ Resolution Goal: 1 hour to immediate.
■■ Immediate manager notification when incident record is created.
Severity level 2 (Medium), level 3 (Low), etc. should also be defined in a similar manner.
On high-priority incidents, the SOC should have a defined distribution list that is used for sending
the problem resolution and assigned incident record ID.
If an issue is not resolved at the first tier, then an escalation to the next tier is required and the SOC
must have documented procedures in place to address the escalations. For example, if an issue is
escalated to tier 2 the procedure in place may dictate something like the following:
As initial Incident Record Owner, the Level 1 SOC engineer evaluates the problem and determines
if he/she has the ability to resolve the issue.
If the Level 1 SOC engineer has the ability to resolve the Incident Record, he/she:
■■ Defines the incident in specific terms.
■■ Gathers additional facts necessary for troubleshooting and resolving the issue(s).
■■ Considers possible causes or options and creates an action plan.
■■ Implements the action plan and observes results.
■■ Iterate steps until issue is resolved or it needs Level 2 SOC assistance.
If the Level 1 SOC engineer does not have the ability to resolve the Incident Record, the Level
1 SOC professional determines if another Level 1 SOC professional or Level 2 SOC assistance is
required.
If field security support is required, the SOC professional uses the escalation process and then
refers to the documented escalation procedures to dispatch an on-site security analyst.
If Level 2 assistance is required, the SOC technician assigns the Incident Record to the Level 2
group responsible for resolving the problem, and then refers to escalation procedures to notify
the appropriate Level 2 security professional.
There must also be additional escalation procedures in place. The SOC must have clearly defined
procedures for the escalation tier that address, at a minimum:
■■ Resources to assist with resolution of incidents.
■■ Review of open incident records.
■■ Status updates.
■■ No response from customer (again customer is defined as part of the SOC services and
in many cases may be the end user or system administrator).
■■ Adding notes to the incident record.
■■ Additional escalations.
■■ Incident record closure.
■■ High priority / high severity handling.
■■ Lack of resolution.
In addition, a detailed step-by-step process needs to be documented for each level in the SOC for
the analyst to know exactly what information is required, who to contact, and how to deliver the
known information quickly and accurately. Below is an example of this escalation process:
Helpdesk
Escalation
N
Event Y
SOC Level 2 Identify Host &
Incident Information
1. Hack Document,
2. Malware Update &
Notification
Notification Resolve
Y Incident
Escalation
Process Escalation
N
Notification
Y Forensics
Escalation guidelines
The process of correcting incidents requires that detection, isolation, circumvention, and resolution
disciplines be established and practiced by all levels of the SOC. This process can and should be
mapped to the phases in the incident response plan, where applicable. A structured progression
of recommended actions that directs individuals to perform the appropriate meaningful analysis
and actions while troubleshooting is required. The SOC staff must also have guidelines for referring
incidents to the proper specialists when they cannot be resolved. These can be organized in a simple
table format as shown in the high-level example below.
Notice the phases of the incident resolution process evolve from left to right and from Level 1 to
Level 2. When activities at one skill level have been exhausted on an incident, the incident should be
escalated to the next skill level for further action.
Service Design—During this stage of the ITIL framework, it is important that the SOC has analyzed
and documented all the business requirements. This enables the SOC to provide value to the
business and align the SOC’s strategies and business objectives with the organization. This also
enables the SOC to define key performance indicators (KPI) that can be leveraged to design services
in accordance with the business requirements.
As we defined earlier, the service catalogue or “Service Functions” must be defined. For each of the
SOC core functions, service level agreements (SLA) will need to be clearly defined with management.
Typically, the business should drive the SLAs. Other key considerations that should be addressed are
personnel management of the SOC and the continuity of the operations.
Service Transition—The important items to consider within this section are changes to the
infrastructure. The SOC must be made aware of changes implemented across the enterprise.
Otherwise, if monitoring systems are setup correctly, alarms will go off and unnecessary work will
occur. Also, the SOC may perform specific services where they are responsible for change. As a
result, tight integration between the SOC and change management is required.
Service Operations—The service operations were defined earlier. This is mostly how event and
incident management is conducted for the business. Several items must be in place for service
operations, including:
■■ Trend analysis.
■■ Tracking of remediation items.
■■ Reporting to the organization on SOC activities.
■■ Classification of issues.
■■ Software license compliance.
■■ Tracking and inventory of assets.
Conclusion
Security becomes integrated into an organization’s processes and every day it becomes more
mature and over time, many organizations will choose to implement some type of security
operations center. A SOC can provide significant value as long as the proper planning occurs and
sound processes have been created. Hopefully this document has provided insight for those either
embarking on a new SOC or looking for improvements to their current operations. With a solid
managed operation and well trained employees an organization can rest easy knowing its customer
base is happy with quality service and feels confident in the response to security events.
References
ITIL; http://www.itil-officialsite.com/
Chairman of the Joint Chiefs of Staff Manual; CJCSM 6510.01B, 10 July 2012
http://www.dtic.mil/cjcs_directives/cdata/unlimit/m651001.pdf
Operationalizing Information Security Putting the Top 10 SIEM Best Practices to Work by Scott Gordon;
http://www.eslared.org.ve/walcs/walc2012/material/track4/Monitoreo/Top_10_SIEM_Best_
Practices.pdf
McAfee. Part of Intel Security. Intel and the Intel logo are registered trademarks of the Intel Corporation in the US and/or other countries. McAfee, the McAfee logo, and Foundstone
2821 Mission College Boulevard are registered trademarks or trademarks of McAfee, Inc. or its subsidiaries in the US and other countries. Other marks and brands may be claimed
Santa Clara, CA 95054 as the property of others. The product plans, specifications and descriptions herein are provided for information only and subject to change without
888 847 8766 notice, and are provided without warranty of any kind, express or implied. Copyright © 2013 McAfee, Inc. 60059wp_creating-soc_0613B_ETMG
www.intelsecurity.com