0% found this document useful (0 votes)
520 views41 pages

Problem Management Process and Procedures Version 1.0

Problem Management Process and Procedures

Uploaded by

chessbuzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
520 views41 pages

Problem Management Process and Procedures Version 1.0

Problem Management Process and Procedures

Uploaded by

chessbuzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Problem Management Process and Procedure

Fermilab Computing Division-PM-1.0

Problem Management
Fermilab Process and Procedure

Prepared for:
Fermi National
Laboratory
June 12, 2009

Page 1 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

GENERAL
Description

This document establishes a Problem Management (PM) process and


procedures for the Fermilab Computing Division. Adoption and
implementation of this process and supporting procedures ensures the
timely recovery of services and will minimize the adverse impact on
business operations.

Purpose

The purpose of this process is to establish a problem management (PM)


process for the Fermilab Computing Division.
Adoption and
implementation of this process provides a structured method to seek and
establish the root cause of incidents and to initiate actions to improve or
correct the situation. This minimizes the adverse impact on operational
ability of a business due to incidents and problems caused by errors within
the IT infrastructure.

Applicable to

Problem Management process in support of the ISO20000 initiative.

Supersedes

N/A

Document
Owner

Problem Manager

Owner Org

Computing Division

Effective Dates

07-01-2009 to 07-01-2010

Revision Date

06-14-2009

VERSION HISTORY
Version
1.0

Date
6/14/2009

Author(s)
Gerald
Guglielmo,
Problem
Coordinators,
David Whitten
Plexent LLP

Change Summary
Initial document

Page 2 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0
TABLE OF CONTENTS
PROBLEM MANAGEMENT GOAL, BENEFITS............................................................................................................5
PROBLEM MANAGEMENT............................................................................................................................................ 6
PROCESS CONTEXT DIAGRAM INTERFACING PROCESS FLOW...........................................................................6
PROBLEM MANAGEMENT PROCESS FLOW.............................................................................................................7
16 PROBLEM MANAGEMENT PROCESS ROLES AND RESPONSIBILITIES...........................................................8
16 PROBLEM MANAGEMENT RACI MATRIX............................................................................................................12
16.1 PROACTIVE PROBLEM MANAGEMENT PROCEDURE....................................................................................13
16.1 PROACTIVE PROBLEM MANAGEMENT BUSINESS PROCEDURE RULES...................................................14
16.1 PROACTIVE PROBLEM MANAGEMENT PROCEDURE NARRATIVE..............................................................14
16.1 PROACTIVE PROBLEM MANAGEMENT RISKS................................................................................................16
16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE.........................................................17
16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING BUSINESS PROCEDURE RULES........................18
16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE NARRATIVE...................................18
16.2 DETECTION AND LOGGING RISKS...................................................................................................................20
16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION PROCEDURE.................................21
16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION BUSINESS PROCEDURE RULES..22
16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION PROCEDURE NARRATIVE.............22
16.3 CATEGORIZATION AND PRIORITIZATION RISKS............................................................................................23
16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE...............................................24
16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE RULES..................................25
16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE NARRATIVE.........................25
16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS RISKS...........................................................26
16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE........................................................................27
16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE RULES............................................................27
16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE NARRATIVE...................................................28
16.5 PROBLEM MANAGEMENT ERROR CONTROL RISKS.....................................................................................32
16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE.......................................................................................33
16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE RULES..........................................................................34
16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE NARRATIVE.................................................................34
16.6 PROBLEM MANAGEMENT CLOSURE RISKS...................................................................................................37
16.7 PROBLEM MANAGEMENT CONTINUOUS IMPROVEMENT PROCESS FLOW..............................................38
16.7 CONTINUOUS IMPROVEMENT PROCESS BUSINESS PROCEDURE RULES................................................39
16.7 CONTINUOUS IMPROVEMENT PROCESS PROCEDURE NARRATIVE..........................................................39
16.7 PROBLEM MANAGEMENT CONTINUOUS IMPROVEMENT RISKS.................................................................40
POTENTIAL PROBLEM MANAGEMENT PROCESS MEASUREMENTS (KPIS)......................................................41
PROBLEM MANAGEMENT SUPPORTING DOCUMENTS.........................................................................................43

Page 3 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

PROBLEM MANAGEMENT GOAL, BENEFITS


Goal

Benefits

To contribute to the mission of the laboratory by providing the highest possible


levels of IT Service availability through minimization of the impact of Incidents
and Problems within the environment by:
Proactive prevention of Incidents and Problems

Elimination of recurring Incidents

Understanding the root cause of Incidents so that corrective action can


be undertaken

Higher IT Service availability and user productivity, less disruption, reduced


expenditure on fixes, and reduced costs in resolving repeat incidents as a
result of the following Problem Management activities:
Proactive discovery and prevention of Incidents and Problems through
trending analysis of ITSM data
Reactive discovery of the root cause of Incidents so that corrective
action can be undertaken
A reduction over time in the number and impact of Problems and
Known Errors through permanent resolution

Page 4 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

PROBLEM MANAGEMENT
Process Context Diagram Interfacing Process Flow

NOTE: This graphic illustrates the basic interactions between Problem Management and the
ITIL processes at a high level and does not represent detailed dependencies.

Page 5 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

PROBLEM MANAGEMENT PROCESS FLOW

Page 6 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16 PROBLEM MANAGEMENT PROCESS ROLES AND RESPONSIBILITIES


Roles

Problem
Manager

Responsibilities

Receives Major Incident notification from Incident Management

Determines IT Services and CIs affected

Analyzes symptoms

Confirms Incident Report number

Confirms that Problem Management will engage with incidents as


necessary

Selects the appropriate Service Support Providers who will respond


to the Problem tickets. If a Known Error and matching Workaround
exist, a decision should be made about whether this Workaround
should be employed to resolve the Incident/Problem at this time.

Discusses the root cause analysis and Known Error

Discusses options for resolving the Known Error with Technical


Experts and Finance team members

Documents options for resolution

Presents proposed Resolution options to Decision Authority

Discusses the proposed options in terms of risks, costs, timescales,


etc.

Observes the implementation of the Request for Change and receive


information on the outcome via the Release Management process

Decides whether the implemented Change has successfully resolved


the Problem/Known Error

Discusses Problem Managements activities during the Major Incident

Take away Lessons Learned from the meeting

Passes information from the Major Incident Review to the Problem


Coordinator so that necessary updates can be made to the Problem
Record, Workaround, Known Error

Applies Lessons Learned to the Problem Management process as


necessary

Decides on course of action


Page 7 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16 PROBLEM MANAGEMENT PROCESS ROLES AND RESPONSIBILITIES


Roles

Problem
Coordinator

Responsibilities

Receives request from Problem Manager to partake in Problem


Management response to the Major Incident

Gathers the data collected to date by Incident Management

Analyzes the data collected from various sources relating to the Major
Incident

Analyzes historical data to see whether a new Problem Record needs


to be created or whether an existing Problem Record needs to be
updated or reopened and updated

Undertakes any necessary actions to create a Problem Record

Uses agreed trending analysis techniques on data in the Problem


Management System, Incident Management System, and
Configuration Management Data Base to uncover trends

If a Problem Record has been created as a result of a Major Incident,


logs the Incident Records that have been created by the Service Desk
in the Problem Record

If a Problem Record has been created as a result of proactive


Problem Management trending analysis, logs the Incident Records
that have been created by the Service Desk in the Problem Record

Using established criteria, attaches a category code to the Problem


Record

Using established criteria, attaches a Priority to the Problem Record

Verifies that an appropriate Technical Expert has been assigned the


Problem

Undertakes an investigation into the Problem using documented


techniques

Using the root cause analysis data, completes the Problem diagnosis
and documents results in the Problem Record

Verifies whether there is already a Known Error and matching


Workaround in the Knowledge Management System that relates to
this Problem type

Takes the results of the root cause analysis and documents the
Known Error in the Knowledge Management System

Updates the Problem Record to indicate the Known Error has been
Page 8 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16 PROBLEM MANAGEMENT PROCESS ROLES AND RESPONSIBILITIES


Roles

Responsibilities

documented noting its reference number

If necessary updates the Incident Record

Creates a link from all existing Incident and Problem Records to the
Known Error in the Knowledge Management Database

Discusses the root cause analysis and Known Error

Discusses options for resolving the Known Error with Technical


Experts and Finance team members

Documents options for resolution

Creates a Workaround that allows users to bypass or mitigate the


Known Error

Tests the Workaround

Gains Approval for the Workaround

Documents the Workaround

Associates Problem Records in the Problem Management System to


the Workaround

Associates Known Errors in the Knowledge Management System to


the Workaround

Communicates the Workaround

Confirms with users that the Workaround is working

Decides whether the Workaround will provide an ongoing fix to the


Known error or whether the impact and severity of the Error are so
severe that the costs of a permanent fix via a RFC are justified

Generates a Request for Change (RFC) intended to permanently


resolve the Problem/Known Error

Submits the RFC through the Change Management process

Makes necessary updates to the Problem Record

Makes necessary updates to the Known Error record

Takes the information provided to the Problem Manager at the Major


Problem Review and makes necessary updates to the Problem
Record, Workaround, Known Error
Page 9 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16 PROBLEM MANAGEMENT PROCESS ROLES AND RESPONSIBILITIES


Roles

Responsibilities

When all necessary updates have been made to the Problem Record,
reviews for accuracy and then closes the Problem Record

Generates Reports and Management information as necessary

Assists Problem Coordinator in an investigation into the Problem


using documented techniques, and in Root Cause Analysis

Creates a Workaround that allows users to bypass or mitigate the


Known Error

Tests the Workaround

Gains Approval for the Workaround

Documents the Workaround

Confirms with users that the Workaround is working

Decides whether the Workaround will provide an ongoing fix to the


Known error or whether the impact and severity of the Error are so
severe that the costs of a permanent fix via a RFC are justified

Proposes options for resolution of the Problem

Observes the implementation of the Request for Change and receives


information on the outcome via the Release Management process

Decides whether the implemented Change has successfully resolved


the Problem/Known Error

Change
Manager

Observes the implementation of the Request for Change and receives


information on the outcome via the Release Management process

Customer

Decides whether the implemented Change has successfully resolved


the Problem/Known Error

Technical Expert

Page 10 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16 PROBLEM MANAGEMENT RACI MATRIX

Page 11 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.1 PROACTIVE PROBLEM MANAGEMENT PROCEDURE

Page 12 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.1 PROACTIVE PROBLEM MANAGEMENT BUSINESS PROCEDURE RULES


Inputs

Entry
Criteria

Monitoring Events
Incidents

Regularly-scheduled proactive Problem Management trending analysis activity


is due
A request to undertake trend analysis has been received
Suspicion that a Problem exists has been communicated and requires ad hoc
analysis

General
Comment
s

The purpose of this procedure is to proactively identify problems to reduce the


occurrence of repeating incidents and first time incidents.

16.1 PROACTIVE PROBLEM MANAGEMENT PROCEDURE NARRATIVE


Step

Responsible Role

Action

Analyze incident, problem, and (known) error data to


produce management information and identify underlying
problems.
Identify trends by considering these types of
questions:

16.1.1
Analysis of
Incident and
Problem Data

Problem
Coordinator

Is the number of incidents of a particular type


increasing?
Is the number of incidents within a particular site
increasing?
Is the number of incidents involving a particular
CI or service increasing?
Is the number of unresolved incidents increasing?
Is the number of incidents by status changing?
Are there indicators of trouble in lab critical
areas?
Are there observed patterns that indicate hidden
problems?

Page 13 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.1 PROACTIVE PROBLEM MANAGEMENT PROCEDURE NARRATIVE


Step

Responsible Role

Action

Produce Trending and Analysis reports including:

16.1.2
Produce
Trending and
Analysis
Reports

Problem
Coordinator

Change of pattern in number of incidents of a


particular type, site, Configuration Item (CI) or
Asset
Trend analysis of the number of incidents by
status
Review of indicators of trouble in lab critical areas

Reasoning which describes patterns that indicate


hidden problems

Other appropriate information as deemed


necessary

Include recommendations as to whether a


problem should be opened or not

May designate issues for immediate advancement


Determine if trend or systemic issues should be
advanced to a Problem. This may be in conjunction
with the other Problem Coordinator or the problem
manager.
Questions to consider:

16.1.3
Determine if
Issues Should
be Advanced

Problem
Manager/Problem
Coordinator

Is there an increase in Incidents for a particular issue


that is not already identified as a Problem or Known
Issue?
Was there a significant impact to the Service Desk
from multiple incidents that was not already captured
as a Problem, but needs investigation to prevent
similar occurrences?
Will a root cause analysis and solution produce a
possible benefit large enough to warrant the cost of
an Investigation, Diagnosis, and possible RFCs?
Is the potential problem in question repeatable or
likely to happen again, for which an analysis and
solution may prevent a future outage?

Page 14 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

Outputs

Identified Problems

Exit
Criteria

Problem Management team engaged

16.1 PROACTIVE PROBLEM MANAGEMENT RISKS


Risk

Impact

Analysis not undertaken

Problem Management is only engaged in


reactive duties (i.e. engagement by Incident
Management) and not proactive duties. This
could mean missing Problems and Known
Errors that would be uncovered by trend
analysis along with an opportunity to erase
these from the environment.

Inadequate analysis

Creation of spurious problems, reducing staff


efficiency.
Failure to identify problems and take the
necessary corrective action

Page 15 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE

Page 16 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING BUSINESS


PROCEDURE RULES

Identified Problems

Major Incident record

Multiple Incidents

Known Error information from external source

Known Error information from Release Management

Proactive Trend Analysis has been completed

Entry
Criteria

Problem Management team engaged in support of a Major Incident

Major Incident notification received from Incident Management

General
Comment
s

The purpose of this procedure is to detail the steps necessary to complete the
Problem Detection and Logging process for the Fermilab Computing Division.

Inputs

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE


NARRATIVE
Step

Responsible Role

16.2.1
Problem
Detection

Problem Manager,
Problem
Coordinator

Action

Involves one or more of the following:

Receives Major Incident notification from Incident


Management

Determines IT Services and CIs affected

Gathers the data collected to date by Incident


Management

Analyzes symptoms

Analyzes the data collected from various sources


relating to the Major Incident

Confirms Incident Report number

Confirms that Problem Management will engage

Selects the appropriate Service Support team who will


respond to the Problem and verifies that an
Page 17 of 41

Fermi National Accelerator Lab Private / Proprietary


Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE


NARRATIVE
Step

Responsible Role

Action

appropriate team member(s) has been assigned

Analyzes historical data to see whether a new Problem


Record needs to be created or whether an existing
Problem Record needs to be updated or reopened and
updated

Updates the Incident Record if necessary/appropriate

Page 18 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE


NARRATIVE
Step

Responsible Role

Action

Undertakes any necessary actions to create a Problem


Record. This involves creating a new record in the
Problem Management System (information from the Major
Incident Record may need to be copied across from the
Incident Management System this may be automated if
an integrated tool suite is in use). The type of information
that may be captured includes:
Unique identifier, Date and time stamps

16.2.2
Problem
Logging

Problem Manager,
Problem
Coordinator,
Technical Expert

Name and contact information of the Problem initiator

Incident count/linking incidents

Linked RFCs

Problem details/description

Problem category

Priority

Service and SLAs affected

Links to further information

History/ Details of all diagnostic or attempted recovery


actions taken

Status

Workarounds

Permanent solution

If it is ascertained that this is a repeat Problem, a new


Problem Record can be created, or an existing Problem
Record can be updated, or an existing closed Problem
Record may need to be reopened depending on the nature
of the Problem and the length of time since it last occurred.

Page 19 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 PROBLEM MANAGEMENT DETECTION AND LOGGING PROCEDURE


NARRATIVE
Step

Responsible Role

16.2.3
Associate
Records

Outputs
Exit Criteria

Problem
Coordinator

Action

If a Problem Record has been created as a result


of a Major Incident, or by a Technical Expert, link
the Incident Records that have been created by the
Service Desk to the Problem Record using existing
tool functionality

If a Problem Record has been created as a result


of proactive Problem Management trending
analysis, links the Incident Records that have been
created by the Service Desk to the Problem Record
using existing tool functionality

If a Problem Record has been created by a


Technical Expert, review and associate the ticket
with existing Problem Records if possible.

Analyzed/Updated Major Incident data

Updated Problem Record

A new Problem Record has been created or an existing Problem Record has
been updated

16.2 DETECTION AND LOGGING RISKS


Risk

Impact

If Problem Management is not engaged by


Incident Management
If Problem Records are not generated

Problem Record not created


Records not associated

Incidents will be resolved without root cause


being investigated and understood
The opportunity to learn about the root cause
of Incidents is lost, the Incidents are never
permanently resolved and keep being rereported to the Service Desk
Organization misses the opportunity to
investigate and drive Known Errors out of the
environment. These would keep being rereported to the Service Desk using resources
unnecessarily each time.
Records that fall through the gap continue
to be treated as separate events taking up

Page 20 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.2 DETECTION AND LOGGING RISKS


Risk

Impact
resources and duplicating effort

16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION


PROCEDURE

Page 21 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION


BUSINESS PROCEDURE RULES

Analyzed/Updated Major Incident data

Updated Problem Record

Entry
Criteria

A Problem Record has been created or updated

General
Comment
s

The Priority level in will dictate the resources attached to the Problem by
Problem Management
The Priority level will also dictate the timeliness of actions associated with the
Problem as documented in SLAs/OLAs. These may include timeliness of
communications, updates to the Problem Record, Workaround creation,
permanent resolution proposals, etc.

Inputs

16.3 PROBLEM MANAGEMENT CATEGORIZATION AND PRIORITIZATION


PROCEDURE NARRATIVE
Step

Responsible Role

16.3.1
Problem
Classificatio
n

Problem Coordinator

Action

Using established criteria, a category code is attached


to the Problem Record
Using established criteria a Priority is attached to the
Problem Record. In addition to the information above,
these criteria could also include:

16.3.2
Problem
Prioritization

Problem Coordinator

Duration of Problem to date

Impact (cost) to date

Whether the system can be recovered, or whether


it needs to be replaced

How much it will cost to fix

How long it will take to fix the Problem

How extensive the Problem is

See IM Appendix 11 for further criteria in


determining urgency and impact

Page 22 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

Outputs

Exit
Criteria

Problem is categorized and prioritized

Updates to IM/PM Tool


Categorized and Prioritized Problem Record

16.3 CATEGORIZATION AND PRIORITIZATION RISKS


Risk
Incorrect Categorization
Incorrect Prioritization

Impact
Inaccurate reporting
Inaccurate attempts at root cause analysis
Inappropriate level of attention and resources
applied to the Problem

Page 23 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE

Page 24 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE


RULES
Inputs
Entry
Criteria
General
Comment
s

Incident information
Change information
Problem Records

A Categorized and Prioritized Problem Record

The purpose of this procedure is to detail the steps necessary to complete the
problem investigation and diagnosis process for the Fermilab Computing Division.

16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS PROCEDURE


NARRATIVE
Step

Responsible Role

16.4.1
Problem
Investigatio
n

Technical Expert,
Problem Coordinator,
Problem Manager

Action

Problem analysis to identify the root cause, workarounds,


and potential solutions to the problem should include:

Identify the team as necessary.

Using tools as available, document findings and


store evidence into the Problem Management
tool.
Review standard Operational Level Agreements
(OLAs) and monitor progress.
As necessary, utilize problem analysis
techniques, such as Ishikawa diagrams, KepnerTregoe, Flow diagrams, other analysis
methodologies as needed.

16.4.2
Problem
Diagnosis

Technical Experts
Problem Coordinator
Problem Manager

Determine if a Problem can be associated with a


Known Error. Possibilities to note include:
o Root Cause and CI is known
o

There is a possibility of a recurrence

Identify workarounds.

Determine Root Cause(s) and record in data


record.
Assess the problem and recommend action to
resolve problem.
Record details in data record

Page 25 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

Update Knowledge Base

Outputs

Updated Problem Record

Exit
Criteria

Workaround, Root Cause or Known Error identified

16.4 PROBLEM MANAGEMENT INVESTIGATION AND DIAGNOSIS RISKS


Risk

Impact

Problem not investigated

Root cause not understood, Problem cannot


be fullyC
investigated
resolved. Continued
16.5 PROBLEM MANAGEMENT ERROR
ONTROL Pand
ROCEDURE
inefficiency.
Problem diagnosis not captured
Future need to re-analyze similar problem.
Wasted effort. Permanent resolution not
achieved.
Incorrect diagnosis captured
Root cause not understood. Incorrect
resolution attempts. Wasted effort.
Permanent resolution not achieved.

16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE RULES


Inputs

Entry
Criteria
General
Comment
s

Root Cause data

Diagnosed Problem Record

Financial Information

A Problem Record with root cause analysis undertaken and Diagnosis


completed
A Workaround is a temporary means of resolving and overcoming the symptoms
of an Incident. However, even if a Workaround is found, it is still important to
work on a permanent resolution.
When a Workaround is identified, the Problem Record still remains open and
the details of the Workaround are recorded in the Problem Record (and the
Known Error Database or Knowledge Management System) and communicated
to Service Desk personnel.
A Known Error record must be created and saved in the Knowledge Management
System or Known Error Database once diagnosis is complete. This is so that
further occurrences of Incidents and/or Problems can be more easily identified and
Page 26 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE RULES


linked together, and so that necessary actions can quickly be undertaken.

16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE NARRATIVE


Step

Responsible Role

Action

Problem Coordinator

Verifies whether there is already a Known Error


and matching Workaround in the Knowledge
Management System that relates to this
Problem

Reports findings to Problem Manager

Problem Manager (and others if necessary)


16.5.1
Identify
Known Error

Problem Coordinator
Problem Manager

16.5.2
Create
Known Error
Record

Problem Coordinator

If a Known Error and matching Workaround


exist, a decision should be made about whether
this Workaround should be employed to resolve
the Incident/Problem at this time.

If no Known Error is in place, proceed to


procedure 16.5.2

If Known Error is in place , proceed to


procedure 16.5.3

If Workaround in place is approved for use


with this Problem, communicate this fact to
necessary parties (Service Desk etc.) and
proceed to procedure 16.5.3

Using the results of the root cause analysis,


document the Known Error in the Knowledge
Management System
Update the Problem Record to indicate the Known
Error has been documented noting its reference
number
If necessary update the Incident Record(s) and
ensures communication to the Service Desk

Page 27 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE NARRATIVE


Step

Responsible Role

16.5.3
Identify
Workaround

Problem Coordinator

16.5.4
Associate
Records

Problem Coordinator

16.5.5
Plan
Resolution(s)

Action

Problem
Coordinator,
Technical Expert

Determine if work around exists for the known error


If not, develop a workaround if possible and record
in the Known Error record
Determine suitability of workaround
Creates a link from all existing Incident and
Problem Records to the Known Error in the
Knowledge Management System

Technical Expert and Problem Coordinator


Discuss the root cause analysis and Known
Error
Discuss options for resolving the Known Error
with the team
Document options for resolution. These could
include a temporary Workaround, creating a
Request for Change to permanently resolve the
Known Error, or both. Risks of performing
actions, of not performing actions, of costs, and
estimated timescales should all be documented
so that the Problem Coordinator is able to
balance all facts in making the final decision
Problem Manager/Coordinator
Discusses the proposed options in terms of
risks, costs, timescales, etc.
Decides on course of action
If a workaround will be utilized go to
16.5.6
If a Workaround will not be utilized but a
Request for Change will, proceed to
procedure 16.5.7

Page 28 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.5 PROBLEM MANAGEMENT ERROR CONTROL PROCEDURE NARRATIVE


Step

Responsible Role

16.5.6
Document
Workaround

Problem Coordinator
Technical Expert

16.5.7
Documen
t RFC(s)

Problem Coordinator
Technical Expert

Outputs

Exit
Criteria

Action

Problem Coordinator and Technical Expert


Create a Workaround that allows users to
bypass or mitigate the Known Error
Test the Workaround
Publish the Workaround
Document the Workaround in the Knowledge
Management System
Associate Problem Records in the Problem
Management System to the Workaround
Associate Known Errors in the Knowledge
Management System to the Workaround
Communicate the Workaround
Confirm with users that the Workaround is
working
If an RFC is also required, proceed to
procedure 16.5.7
If no RFC is required, proceed to Procedure
16.6.2

Generates a Request for Change (RFC) intended


to permanently resolve the Problem/Known Error
Submits the RFC through the Change Management
process

Updated Problem Record


Known Error documented
Work around documented
Request For Change
Workaround identified
RFC generated to Change Management, if change leading to permanent
resolution can be identified

Page 29 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.5 PROBLEM MANAGEMENT ERROR CONTROL RISKS


Risk

Impact

An existing Workaround is not recognized and


the Problem continues to be investigated

Wasted time and resources, unnecessary


extended outages
Subsequent reports of the Incident/Problem
at the Service Desk will not be associated
with the Known Error and investigated
independently wasting time and resources
and leading to unnecessary extended
outages
When reported to the Service Desk the newly
reported Incidents and Problems are
investigated independently leading to wasted
time and resources and potentially extended
outages
The Problem will remain, leading to extended
or repeated outages, until an option is agreed
A full cost/benefit analysis cannot be
performed without all appropriate options
having been documented
Incident Management would need either to
re-develop the same workaround for each
similar incident, or to informally remember the
workaround used.
Incident will remain alive, causing user
difficulties and requiring Incident Management
attention, until at least a temporary
workaround is available.

Known Error record not created

Incidents and Problems not associated to a


Known Error
No resolution options are documented
Too few resolution options are documented

No Workaround documented

Page 30 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE

Page 31 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE RULES

Inputs

Entry
Criteria

General
Comments

Incident Data

Problem Data

Known Error data

Workaround data

Root Cause Analysis

Resolution options

Request(s) For Change

Implemented RFC in support of permanently resolving a Known Error; or


Implemented Workaround without an associated RFC
When a Major Problem occurs, a Major Problem Review must be held as soon
as possible thereafter. The Major Problem Review is an opportunity to examine:

Things that were done correctly

Things that were done incorrectly

Items that can be improved in the future

How to prevent reoccurrence

Whether or not a third-party is responsible

Whether follow-up actions are required


No review is required for minor problems

16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE NARRATIVE


Step

Responsible Role

Action

16.6.1
Resolution

Problem Manager
Technical Expert
Change Manager

Problem Manager, Technical Expert, and Change


Manager
Observe the implementation of the Request for
Change and receive information on the outcome
via the Release Management process
Problem Manager, Technical Expert, and Customer

Decide whether the implemented Change has


successfully resolved the Problem/Known Error

If YES, ensure any necessary communications

Page 32 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE NARRATIVE


Step

Responsible Role

Action

are undertaken. Proceed to procedure 16.6.2

If NO and the service is no longer used, then


proceed to 16.6.2If NO, subsequent research
will need to be undertaken; the Workaround will
need to remain in effect and necessary
communications undertaken. Resume process
at procedure 16.5.5
Problem Coordinator

Makes necessary updates to the Problem


Record

Makes necessary updates to the Known Error


record and Workaround documentation

If minor problem, proceed to 16.6.3


Discusses Problem Managements activities during the
Major Problem that the review is discussing, including:

16.6.2
Major
Problem
Review

Problem Manager

Incident data provided to Problem Management

Problem data

Known Error data

Workaround data

Root Cause Analysis information

Proposed resolution options

Request(s) For Change

The operation of the process


Takes away Lessons Learned from the meeting which
could include:

Process improvement recommendations for


support processes and ITIL processes

Page 33 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.6 PROBLEM MANAGEMENT CLOSURE PROCEDURE NARRATIVE


Step

Responsible Role

16.6.3
Update
Problem
Record

Problem Manager
Problem Coordinator

16.6.4
Close
Problem
Record

Problem Coordinator

16.6.5
Managemen
t Reporting

Problem Manager
Problem Coordinator

Action

Make necessary updates to the Problem


Record, Workaround, Known Error

Problem Manager applies Lessons Learned to


the Problem Management process as necessary

Assigns appropriate closure code.

When all necessary updates have been made to


the Problem Record, reviews for accuracy and
then closes the Problem Record

Informs Incident Management (and update the


Knowledge Base) of the problem closure so that
all linked incidents receive the appropriate
attention to ensure their proper closure.

Generates and disseminates Reports and Management


Information as necessary

Outputs

Exit
Criteria

Closed Problem and Incident Record(s)

Lessons Learned
Updated Knowledge Base
Closed Problem Record
Closed Incident Record(s)
Management Information (reports)

Page 34 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.6 PROBLEM MANAGEMENT CLOSURE RISKS


Risk

Impact

Planned resolution fails


Major Problem Review not held
Problem Record not updated as necessary
Problem Record not closed
Reports not disseminated

Problem/Known Error will be ongoing.


Another attempt at resolution will need to be
undertaken. Extended outages will result.
Opportunities for process improvement lost
A full account of the entire history of the
Problem is not maintained, reporting is
inhibited, action items may be lost, potential
for process improvement may be lost
Assumption that it is ongoing and requires
action leading to unnecessary work
Management unable to act on contents

Page 35 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.7 PROBLEM MANAGEMENT CONTINUOUS IMPROVEMENT PROCESS FLOW

Page 36 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.7 CONTINUOUS IMPROVEMENT PROCESS BUSINESS PROCEDURE RULES

Inputs

Entry
Criteria
General
Comments

Trending reports
Process reports
Problem Review Reports
Regularly-scheduled proactive Problem Management trending analysis
activity
Process reports indicate a need for improvement in the process itself
Problem Review Reports indicate a Problem Management process failure
The purpose of this procedure is to proactively identify issues with the Problem
Management process itself and to make needed corrections in conjunction with
Service Level Management.

16.7 CONTINUOUS IMPROVEMENT PROCESS PROCEDURE NARRATIVE


Step

Responsible Role

16.7
Analysi
s of
Incident
and
Proble
m Data

Problem Manager
Problem Coordinator

Action

Produce Trending and Analysis reports to relate


potential problems or problem successes to the
incident environment.
The success of Problem Management is demonstrated by:
The reduction in the number of incidents within a given
category.
The reduction of time needed to resolve incidents.
Decrease of other costs incurred associated with
resolution.
Problem Management reports shall consider, but not be
limited to, the following subjects:

Effectiveness of Problem Management: details about


the number of incidents, before and after solving a
problem, recorded problems; number of Request for
Changes (RFCs) raised, and resolved known errors.
Relationship between reactive and proactive
Problem Management: increasing proactive
intervention instead of reacting to incidents shows an
increasing maturity of the process.
Quality of the products being developed: products
handed over from the development environment should
be of a high quality; otherwise they will introduce new
problems. Reports about new products and their known
Page 37 of 41

Fermi National Accelerator Lab Private / Proprietary


Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

16.7 CONTINUOUS IMPROVEMENT PROCESS PROCEDURE NARRATIVE


Step

Responsible Role

Action

Outputs

Exit
Criteria

errors are relevant for quality monitoring.


Status and Action Plans for open problems:
summary of what has been done so far, and what will
be done next to advance top problems, including
planned RFCs and required time and resources.
Proposals to improve Problem Management. If the
information about the above factors indicates that the
process does not comply with the objectives, then
proposals may be made for recording, investigation,
proactive activities, and other processes as necessary.
Regular process audits may be carried out to the plan
for continual process improvement.

Lessons Learned
Problem Management Service Improvement Project (SIP)
Problem Management requirements document
Action plans for improving Problem Management
Management Information (reports)
Action plan for performing a Service Improvement Project or a decision to not
change the process.

16.7 PROBLEM MANAGEMENT CONTINUOUS IMPROVEMENT RISKS


Risk

Impact

Problem Management processes are not


reviewed on a regular basis
Quality of products resulting from Problem
Management process
Status and Action Plans not developed or
followed through

Problem Management fails to meet the need


of Fermilab
Problem Management becomes stale and no
longer serves the interest of Fermilab
Computing Division.
Failure to actively manage the Problem
Management process will result in a
reintroduction of Problems into the
operational environment

Page 38 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

POTENTIAL PROBLEM MANAGEMENT PROCESS MEASUREMENTS (KPIS)


Select 3 or 4 of these KPIs that best fit the organizational requirements for measuring
performance. As the organization and process matures, the selected KPIs are likely to change.
Short Term (0-3 Months Learning the Process)
Number of Incidents requiring Problem Management engagement
Number of Problem Records created
Number of times trends discovered
Number of or Percentage of Problems identified through reactive Problem Management
Number of or Percentage of Problems identified through proactive Problem
Management
Percentage of successful associations between Incidents and Problems
Percentage of Problem Records Categorized
Percentage of Problem Records Categorized correctly
Percentage of Problem Records Prioritized
Percentage of Problem Records Prioritized correctly
Percentage of Problem Records investigated
Percentage of Problem Records diagnosed
Percentage of Problem Records diagnosed correctly
Number of times an existing Workaround is assigned to a Problem
Percentage of Problems with Workarounds assigned
Number of new Known Errors
Percentage of Known Errors with documented Workaround
Number of Incidents associated to a Known Error
Number of Problems associated to a Known Error
Percentage of Incidents correctly associated to a Known Error
Percentage of Problems correctly associated to a Known Error
Number of or Percentage of Problems by related CI
Medium Term (4-9 Months Process is maturing)
Average time to find root cause
Plans for resolution of open Problems
Number of documented options for resolving Known Errors
Time taken to create Workaround
Number of created RFCs to resolve Known Errors
Proportion of RFCs to Known Errors
Number of successful permanent resolutions
Number of Major Problem Reviews held
Long Term 9+ Months
Number of or Percentage of Problems by owner
Number of or Percentage of Problems by status
Percentage of Major Problem Reviews/Major Problems
Page 39 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

POTENTIAL PROBLEM MANAGEMENT PROCESS MEASUREMENTS (KPIS)

Percentage of Records updated following Major Problem Review


Percentage of closed Problem Records within timescales
Percentage of Problems resolved within SLA/OLA targets
Number of or Percentage of Problems by originating area
Number of or Percentage of Problems by owner
Number of or Percentage of Problems by status

Page 40 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

Problem Management Process and Procedure


Fermilab Computing Division-PM-1.0

PROBLEM MANAGEMENT SUPPORTING DOCUMENTS


Document Name
Fermilab Problem
Management Policy
Fermilab Problem
Management Process and
Procedures
Problem Management
Process Metrics
Fermilab Incident
Management Process and
Procedures Appendix 11

Description

Relationship

Policy

Policy

Process

This document

Performance Management
Metrics

This Document

Severity Table and Escalation


Table

Priority and Urgency


guidelines

Page 41 of 41
Fermi National Accelerator Lab Private / Proprietary
Copyright 2009 All Rights Reserved

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy