System Safety Handbook
System Safety Handbook
System Safety Handbook
1-1
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
1.1 Introduction
The System Safety Handbook (SSH) was developed for the use of Federal Aviation Administration
(FAA) employees, supporting contractors and any other entities that are involved in applying
system safety policies and procedures throughout FAA. As the Federal agency with primary
responsibility for civil aviation safety, the FAA develops and applies safety techniques and
procedures in a wide range of activities from NAS modernization, to air traffic control, and aircraft
certification. On June 28, 1998, the FAA Administrator issued Order 8040.4 to establish FAA
safety risk management policy. This policy requires all the Lines of Business (LOB) of the FAA to
establish and implement a formal risk management program consistent with the LOB’s role in the
FAA. The policy reads in part: “The FAA shall use a formal, disciplined, and documented decision
making process to address safety risks in relation to high-consequence decisions impacting the
complete life cycle.”
In addition, the Order established the FAA Safety Risk Management Committee (SRMC)
consisting of safety and risk management professionals representing Associate/Assistant
Administrators and the offices of the Chief Counsel, Civil Rights, Government and Industry
Affairs, and Public Affairs. The SRMC provides advice and guidance, upon request from the
responsible program offices to help the program offices fulfill their authority and responsibility for
implementing Order 8040.4.
This System Safety Handbook provides guidance to the program offices. It is intended to describe
“how” to set up and implement the safety risk management process. The SSH establishes a set of
consistent and standardized procedures and analytical tools that will enable each LOB or program
office in the FAA to comply with Order 8040.4.
In FAA, the Acquisition Management System (AMS) provides agency-wide policy and guidance
that applies to all phases of the acquisition life cycle. Consistent with Order 8040.4, AMS policy
is that System Safety Management shall be conducted throughout the acquisition life cycle (section
2.9.13) of the AMS. The SSH is designed to support this AMS system safety management policy.
It is included in the FAA Acquisition System Toolset (FAST), and is referenced in several of the
FAST process documents. It is also designed to support safety risk management activities in FAA
not covered by AMS policy and guidance.
This SSH is intended for use in support of specific system safety program plans. While the SSH
provides guidance on “how” to perform safety risk management, other questions concerning
“when, who, and why” should be addressed through the three types of plans discussed in this
document: System Safety Management Plan (SSMP), and a System Safety Program Plan (SSPP),
and an Integrated System Safety Program Plan (ISSPP). The SSH focuses on “how” to perform
safety risk management, while these planning documents describe, in Chapter 5, the organization’s
processes and procedures for implementing system safety.
1-2
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
High-level SSMPs describe general organizational processes and procedures for the
implementation of system safety programs, while more specific SSPPs are developed for individual
programs and projects. The ISSPP is intended for large complex systems with multiple
subcontractors. The SRMC is responsible for developing an overall FAA SSMP, while the System
Engineering Council develops the SSMP for AMS processes, such as Mission Analysis,
Investment Analysis, and Solution Implementation. Integrated Product Team (IPT) leaders,
program managers, project managers and other team leaders develop SSPPs appropriate to their
activities. Chapter 4 of the SSH provides guidance for the development of a SSPP.
1.2 Purpose
The purpose of this handbook is to provide instructions on how to perform system safety
engineering and management for FAA personnel involved in system safety activities, including
FAA contractor management, engineering, safety specialists, team members on Integrated Product
Development System (IPDS) teams, analysts and personnel throughout FAA regions, centers,
facilities, and any other entities involved in aviation operations.
1.3 Scope
This handbook is intended to support system safety and safety risk management throughout the
FAA. It does not supercede regulations, or other procedures or policies; however, this handbook
provides best practices in system safety engineering and management. When these regulations or
procedures exist, this handbook will indicate the reference and direct the reader to that document.
If a conflict exists between the SSH and FAA policies and regulations, the policies and regulations
supercede this document. However, if results of analysis using the tools and techniques in this
SSH identify policy or regulatory issues that conflict with existing FAA policies and regulations,
the issues should be brought to the attention of the Office of System Safety (ASY), and
consideration should be given to changing the policy or regulation. This handbook is also intended
to provide guidance to FAA contractors who support the FAA by providing systems and/or
analyses. This handbook does not supercede the specific contract, but can be referenced in the
statement of work or other documents as a guide.
facilities and equipment operation. Chapter 13 is a special discussion of the commercial launch
vehicle safety and certification process. Chapter 14 addresses training, Chapter 15 discusses
operational risk management, Chapter 16 treats Organizational Systems in Aviation, and Chapter
17 concludes with Human Factors Safety Principles.
In addition, the following eight appendices to the Investment Analysis Plan (IAP) contain guidance
related to system safety:
Where these FAST documents indicate a requirement for including system safety activities, or
results of safety analyses in documentation or briefings, they generally reference the appropriate
chapter in the SSH for a discussion of how to comply with the requirement. Figure 1-1 shows the
flowdown of system safety relationships from the AMS Section 2 the other FAST documents listed
above Section 2.9.13 System Safety Management is the primary policy statement in Section 2. It
states as a requirement that each line of business shall implement a system safety program in
1-4
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
accordance with FAA Order 8040.4. The second tier of documents provide further guidance on
how to implement the order, and the Appendices to the Investment Analysis Process document
provide templates and formats for documentation that will be taken to the JRC.
Table 1-1 shows the applicability of each chapter in this handbook to the applicable AMS segment.
1-5
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
AMS Mission
SEC 2 Analysis Process
Investment
Analysis Process Appendix A
Acquisition Appendix B
Strategy Paper
Appendix C
Integrated
Program Plan Appendix D
Appendix G
FAST Appendix H
DOCUMENTS
Appendix J
Figure 1-1: Documents Affected by the System Safety Policy Changes to the Acquisition
Management System (AMS)
1-6
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
1.7 Glossary
Appendix A contains a glossary of terms that are used throughout the handbook. It is important to
understand the difference between a hazard and a risk, for example, and how these terms relate to
the system safety methods. The glossary also provides discussion on different definitions
associated with specific system safety terminology. It is important to understand the different
definitions. The glossary can be used as a reference, i.e., as a dictionary. Many terms and
definitions associated with system safety are included. The glossary can be used for training and
1-7
FAA System Safety Handbook, Chapter 1: Introduction
December 30, 2000
educational purposes. Depending on the need, these terms and definitions can be used when
discussing methodology or when conducting presentations. There are terms referenced that are not
specifically addressed in the handbook. These additional terms are important, however, as
reference material.
1-8
Chapter 10
10.1 Introduction
Much of the information in this chapter has been extracted from the JSSSC Software System Safety
Handbook, December, 1999, and concepts from DO-178B, Software Considerations in Airborne Systems
and Equipment Certification, December 1, 1992.
Since the introduction of the digital computer, system safety practitioners have been concerned with the
implications of computers performing safety-critical or safety-significant functions. In earlier years, software
engineers and programmers constrained software from performing in high risk or hazardous operations
where human intervention was deemed both essential and prudent from a safety perspective. Today,
however, computers often autonomously control safety critical functions and operations. This is due
primarily to the capability of computers to perform at speeds unmatched by its human operator counterpart.
The logic of the software also allows for decisions to be implemented unemotionally and precisely. In fact,
some current operations no longer include a human operator.
Software that controls safety-critical functions introduce risks that must be thoroughly addressed (assessed
and mitigated?) during the program by both management and design , software , and system safety
engineering. In previous years, much has been written pertaining to "Software Safety" and the problems
faced by the engineering community. However, little guidance was provided to the safety practitioner that
was logical, practical, or economical. This chapter introduces an approach with engineering evidence that
software can be analyzed within the context of both the systems and system safety engineering principles.
The approach ensures that the safety risk associated with software performing safety-significant functions is
identified, documented, and mitigated while supporting design-engineering objectives along the critical path
of the system acquisition life cycle.
The concepts of risk associated with software performing safety-critical functions were introduced in the
1970's. At that time, the safety community believed that traditional safety engineering methods and
techniques were no longer appropriate for software safety engineering analysis. This put most safety
engineers in the position of “wait and see.” Useful tools, techniques, and methods for safety risk
management were not available in the 1970's even though software was becoming more prevalent in system
designs.
In the following two decades, it became clear that traditional safety engineering methods were indeed
partially effective in performing software safety analysis by employing traditional approaches to the
problem. This situation does not imply, however, that some modified techniques are not warranted. Several
facts must be realized before a specific software safety approach is introduced. These basic facts must be
considered by the design engineering community to successfully implement a system safety methodology
that addresses the software implications.
• Software safety is a systems issue, not a software-specific issue. The hazards caused by
software must be analyzed and solved within the context of good systems engineering
principles.
• An isolated safety engineer may not be able to produce effective solutions to potential
software-caused hazardous conditions without the assistance of supplemental expertise.
The software safety "team" should consist of the safety engineer, software engineer,
system engineer, software quality engineer, appropriate "ility" engineers (configuration
10-2
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
management, test & evaluation, verification & validation, reliability, and human
factors), and the subsystem domain engineer.
• Today's system-level hazards, in most instances, contain multiple contributing factors
from hardware, software, human error, and/or combinations of each, and,
• Finally, software safety engineering cannot be performed effectively outside the
umbrella of the total system safety engineering effort. There must be an identified link
between software faults, conditions, contributing factors, specific hazards and/or
hazardous conditions of the system.
The safety engineer must also never lose sight of the basic, fundamental concepts of system safety
engineering. The product of the system safety effort is not to produce a hazard analysis report, but to
influence the design of the system to ensure that it is safe when it enters the production phase of the
acquisition life cycle. This can be accomplished effectively if the following process tasks are performed:
• The software performs a function that is not required, i.e., getting the wrong answer,
issuing the wrong control instruction, or doing the right action but under inappropriate
conditions.
• The software possesses timing and/or sequencing problems, i.e., failing to ensure that
two things happen at the same time, at different times, or in a particular order.
10-3
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
• The software failed to recognize that a hazardous condition occurred requiring corrective
action.
• The software failed to recognize a safety-critical function and failed to initiate the
appropriate fault tolerant response.
• The software produced the intended but inappropriate response to a hazardous condition.
• The specific causes most commonly associated with the software failure mechanisms
listed above are:
• Design and Coding Errors: These errors are usually introduced by the programmer and
can result from specification errors, usually the direct result of poor structured
programming techniques. These errors can consist of incomplete interfaces, timing
errors, incorrect interfaces, incorrect algorithms, logic errors, lack of self-tests, overload
faults, endless loops, and syntax errors. This is especially true for fault tolerant
algorithms and parameters.
• Hardware/Computer Induced Errors: Although not as common as other errors, then can
exist. Possibilities include random power supply transients, computer functions that
transform one or more bits in a computer word that unintentionally change the meaning
of the software instruction, and hardware failure modes that are not identified and/or
corrected by the software to revert the system to a safe state.
• Documentation Errors: Poor documentation can be the cause of software errors through
miscommunication. Miscommunication can introduce the software errors mentioned
above. This includes inaccurate documentation pertaining to system specifications,
design requirements, test requirements, source code and software architecture documents
including data flow and functional flow diagrams.
10-4
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
Software Safety
Process Steps
Planning Provisions
The software system safety plan should contain provisions assuring that:
Several typical activities expected of the team range from identifying software-based hazards to tracing
safety requirements, from identifying limitations in the actual code to developing software safety test plans
and ultimately reviewing test results for their compliance with safety requirements.
Management
Software System Safety program management begins as soon as the System Safety Program (SSP) is
established and continues throughout the system development. Management of the effort requires a variety
of tasks or processes from establishing the Software Safety Working Group (SwSWG) to preparing the
System Safety Assessment Report (SSAR). Even after a system is placed into service, management of the
software system safety effort continues to address modifications and enhancements to the software and the
system. Often, changes in the use or application of a system necessitate a re-assessment of the safety of the
software in the new application. Effective management of the safety program is essential to the effective
reduction of the system risk. Initial efforts parallel portions of the planning process since many of the
required efforts need to begin very early in the safety program. Safety management pertaining to software
generally ends with the completion of the program and its associated testing; whether it is a single phase of
the development process or continues throughout the development, production, deployment and maintenance
phases. Management efforts end when the last safety deliverable is completed and is accepted by the FAA.
Management efforts may then revert to a “caretaker” status in which the safety manager monitors the use of
10-6
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
the system in the field and identifies potential safety deficiencies based on user reports and accident/incidents
reports. Even if the developer has no responsibility for the system after deployment, the safety program
manager can develop a valuable database of lessons learned for future systems by identifying these safety
deficiencies.
Establishing a software safety program includes establishing a SwSWG. This is normally a sub-group of the
SSWG and chaired by the safety manager. The SwSWG has overall responsibility for the following:
Risk Severity
Regardless of the contributory factors (hardware, software, human error, and software influenced human
error) the severity of the risk could remain constant. This is to say that the consequence of risk remains the
same regardless of what actually caused the hazard to propagate within the context of the system. As the
severity is the same, the severity tables presented in Chapter 3 remain applicable criteria for the
determination of risk severity for those hazards possessing software causal factors.
Risk Probability
With the difficulty of assigning accurate probabilities to faults or errors within software modules of code, a
supplemental method of determining risk probability is required when software causal factors exist. Figure
10-2 demonstrates that in order to determine a risk probability, software contributory factors must be
assessed in conjunction with the contributors from hardware and human error. The determination of
hardware and human error contributor probabilities remain constant in terms of historical “best” practices.
However, the likelihood of the software aspect of the risk's cumulative causes must be addressed.
10-7
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
Contributory
HAZARD
There have been numerous methods of determining the software’s influence on system-level risks. Two of
the most popular software listings are presented in MIL-STD 882C and RTCA DO-178B (see Figure 10-3).
These do not specifically determine software-caused risk probabilities, but instead assesses the software’s
“control capability” within the context of the software contributors . In doing so, each software contributors
can be labeled with a software control category for the purpose of helping to determine the degree of
autonomy that the software has on the hazardous event. The software safety team must review these lists and
tailor them to meet the objectives of the system safety and software development programs.
10-8
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
(I) Software exercises autonomous control over potentially hazardous (A) Software whose anomalous behavior, as shown by the system
hardware systems, subsystems or components without the possibility of safety assessment process, would cause or contribute to a failure
intervention to preclude the occurrence of a hazard. Failure of the software of system function resulting in a catastrophic failure condition for
or a failure to prevent an event leads directly to a hazards occurrence. the aircraft.
(IIa) Software exercises control over potentially hazardous hardware (B) Software whose anomalous behavior, as shown by the System
systems, subsystems, or components allowing time for intervention by Safety assessment process, would cause or contribure to a failure
independent safety systems to mitigate the hazard. However, these of system function resulting in a hazardous/severe-major failure
systems by themselves are not considered adequate. condition of the aircraft.
(IIb) Software item displays information requiring immediate operator (C) Software whose anomalous behavior, as shown by the system
action to mitigate a hazard. Software failure will allow or fail to prevent safety assessment process, would cause or contribute to a failure
the hazard’ s occurrence. of system function resulting in a major failure condition for the
the aircraft.
(IIIa) Software items issues commands over potentially hazardous
hardware systems, subsystem, or components requiring human action to (D) Software whose anomalous behavior, as shown by the system
complete the control function. There are several, redundant, independent safety assessment process, would cause or contribute to a failure of
safety measures for each hazardous event. system function resulting in a minor failure condition for the
aircraft.
(IIIb) Software generates information of a safety critical nature used to make
safety critical decisions. There are several, redundant, independent safety (E) Software whose anomalous behavior, as shown by the system
measures for each hazardous event. safety assessment process, would cause or contribute to a failure of
function with no effect on aircraft operational capability or pilot
(IV) Software does not control safety critical hardware systems, subsystems, workload. Once software has been confirmed as level E by the
or components and does not provide safety critical information. certification authority, no further guidelines of this document apply.
Once again, the concept of labeling software contributors with control capabilities is foreign to most software
developers and programmers. They must be convinced that this activity has utility in the identification and
prioritization of software entities that possesses safety implication. In most instances, the software
development community desires the list to be as simplistic and short as possible. The most important aspect
of the activity must not be lost, that is, the ability to categorize software causal factors for the determining of
both risk likelihood, and the design, code, and test activities required to mitigate the potential software cause.
Autonomous software with functional links to catastrophic risks demand more coverage than software that
influences low-severity risks.
10-9
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
Severity
Control Category Catastrophic Critical M arginal Negligible
(I) Software exercises autonomous control over potentially hazardous
hardware systems, subsystems or components without the possibility of 1 1 3 5
intervention to preclude the occurrence of a hazard. Failure of the software
or a failure to prevent an event leads directly to a hazards occurrence.
(IIa) Software exercises control over potentially hazardous hardware
systems, subsystems, or components allowing time for intervention by 1 2 4 5
independent safety systems to mitigate the hazard. However, these
systems by themselves are not considered adequate.
(IIb) Software item displays information requiring immediate operator
action to mitigate a hazard. Software failure will allow or fail to prevent
1 2 4 5
the hazard’ s occurrence.
(IV) Software does not control safety critical hardware systems, subsystems,
or components and does not provide safety critical information. 3 4 5 5
10-10
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
10-11
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
The specifics of how to perform either a subsystem or system hazard analysis are briefly described in
Chapters 8 and 9. The fundamental basis and foundation of a system safety (or software safety) program is a
systematic and complete hazard analysis process.
One of the most helpful steps within a credible software safety program is to categorize the specific causes of
the hazards and software inputs in each of the analyses (PHA, SSHA, SHA, and Operating & Support Hazard
Analysis (O&SHA)). Hazard causes can be identified as those caused by; hardware, and/or hardware
components; software inputs or lack of software input; human error; and/or software influenced human error
or hardware or human errors propagating through the software. Hazards may result from one specific cause
10-12
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
or any combination of causes. As an example, “loss of thrust” on an aircraft may have causal factors in any
of the four below listed categories.
The preliminary software design SSHA begins upon the identification of the software subsystem and uses the
derived system specific safety-critical software requirements. The purpose is to analyze the system, software
architecture and preliminary CSCI design. At this point, all generic and functional Software Safety
Requirements (SSRs) should have been identified and it is time to begin allocating them to the identified
safety-critical functions and tracing them to the design.
The allocation of the SSRs to the identified hazards can be accomplished through the development of SSR
verification trees that links safety critical and safety significant SSRs to each Safety-Critical Function (SCF).
The SCFs in turn are already identified and linked to each hazard. By verifying the nodes through analysis,
(code/interface, logic, functional flow, algorithm and timing analysis) and/or testing (identification of
specific test procedures to verify the requirement), the Software Safety Engineer (SwSE) is essentially
verifying that the design requirements have been implemented successfully. The choice of analysis and/or
testing to verify the SSRs is up to the individual Safety Engineer whose decision is based on the criticality of
the requirement to the overall safety of the system and the nature of the SSR. Whenever possible, the Safety
Engineer should use testing for verification.
Numerous methods and analytical techniques are available to plan, identify, trace and track safety-critical
CSCIs and Computer Software Units (CSUs). Guidance material is available from the Institute of Electrical
and Electronic Engineering (IEEE) (Standard for Software Safety Plans), the Department of Defense (DOD)
Defense Standard 00-55-Annex B, DOD-STD-2167, NASA-STD-2100.91, MIL-STD-1629, the JSSSC
Software System Safety Handbook and DO-178B.
10.3.5 Testing
Two sets of analyses should be performed during the testing phase:
10-13
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
Test Coverage
For small pieces of code it is sometimes possible to achieve 100% test coverage (i.e., to exercise every
possible state and path of the code). However, it is often not possible to achieve 100 % test coverage due to
the enormous number of permutations of states in a computer program execution, versus the time it would
take to exercise all those possible states. Also there is often a large indeterminate number of environmental
variables, too many to completely simulate.
Some analysis is advisable to assess the optimum test coverage as part of the test planning process. There is
a body of theory that attempts to calculate the probability that a system with a certain failure probability will
pass a given number of tests.
“White box” testing can be performed at the modular level. Statistical methods such as Monte Carlo
simulations can be useful in planning "worst case" credible scenarios to be tested.
All test discrepancies of safety critical software should be evaluated and corrected in an appropriate manner.
• The safety criteria and methodology used to classify and rank software related hazards
(causal factors). This includes any assumptions made from which the criteria and
methodologies were derived,
• The results of the analyses and testing performed,
• The hazards that have an identified residual risk and the assessment of that risk,
• The list of significant hazards and the specific safety recommendations or precautions
required to reduce their safety risk; and
• A discussion of the engineering decisions made that affect the residual risk at a system
level.
10-14
FAA System Safety Handbook, Chapter 10: System Software Safety
December 30, 2000
The final section of the SSAR should be a statement by the program safety lead engineer describing the
overall risk associated with the software in the system context and their acceptance of that risk.
10-15
FAA System Safety Handbook, Chapter 11: T&E Safety
December 30, 2000
Chapter 11:
Test and Evaluation Safety
11.1 Introduction
Verification testing will be required at some point in the life cycle of a system and the
component(s) of a system. Tests may be conducted at many hierarchical levels and
involve materials, hardware, software, interfaces, processes, and procedures or
combinations of these. These tests determine whether requirements have been met by the
design, compatibility of personnel with equipment and operating conditions, and
adequacy of design and procedures. There are two broad types of testing which may be of
benefit to safety, which are discussed below.
11 -2
FAA System Safety Handbook, Chapter 11: T&E Safety
December 30, 2000
11 -3
FAA System Safety Handbook, Chapter 11: T&E Safety
December 30, 2000
11 -4
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Chapter 12:
Facilities System Safety
12.1 Introduction
The purpose of facility system safety is to apply system safety techniques to a facility from its initial design
through its demolition. This perspective is often referred to as the Facility Acquisition Life Cycle. The
term “facility” is used in this chapter to mean a physical structure or group of structures in a specific
geographic site, the surrounding areas near the structures, and the operational activities in or near the
structures. Some aspects that facility system safety address are: structural systems, Heating, Ventilation,
and Air-conditioning (HVAC) system, electrical systems, hydraulic systems, pressure and pneumatic
systems, fire protection systems, water treatment systems, equipment and material handling, and normal
operations (e.g. parking garage) and unique operational activities (e.g. chemical laboratories). This Life
Cycle approach also applies to all activities associated with the installation, operation, maintenance,
demolition and disposal rather than focusing only on the operator.
Facilities are major subsystems providing safety risks to system and facility operational and maintenance
staff. Control of such risks is maintained through the timely implementation of safety processes similar to
those employed for safety risk management for airborne and ground systems. MIL-STD-882, Section 4
“General Requirements” defines the minimum requirements of a safety program. These requirements define
the minimum elements of a risk management process with analysis details to be tailored to the application.
12 - 2
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Ž FAA Orders
Ž LessonS Learned
Structure Equipment
Ž Re-Eng. Ž Re-Eng.
Ž Renovation Ž Modify/
Upgrade
It is important to note that there is a hierarchy of safety and health directives and specifications in the FAA.
All efforts should start with FAA 3900.19, Occupational Safety and Health Program rather than other
related FAA Orders (e.g. FAA Order 6000.15, General Maintenance Handbook for Airway Facilities) and
12 - 3
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
FAA Specifications (e.g. FAA-G-2100, Electronic Equipment, General Requirements). These related
documents contain only a small part of the safety and health requirements contained in FAA Order
3900.19, FAA Occupational Safety and Health Program and the Occupational Safety and Health
Administration (OSHA) Standards.
The methodologies as defined in MIL-STD-882 are applicable to both construction and equipment design
and re-engineering. As with all safety significant subsystems, the System Safety process for facilities
should be tailored to each project in scope and complexity. The effort expended should be commensurate
with the degree of risk involved. This objective is accomplished through a facility risk assessment process
during the mission need and/or Demonstration and Evaluation (DEMVAL) phase(s).
Facilities system safety involves the identification of the risks involving new facility construction and the
placement of physical facilities on site. The risks associated with construction operations, the placement of
hazardous facilities and materials, worker safety and facility design considerations are evaluated. Hazard
analyses are conducted to identify the risks indicated above.
Consideration should be given to physical construction hazards i.e. materials handling, heavy equipment
movement, fire protection during construction. Facility designs are also evaluated from a life safety
perspective, fire protection view, airport traffic consideration, structural integrity and other physical
hazards. The location of hazardous operations are also evaluated to determine their placement and
accessibility, i.e. high hazard operations should be constructed away from general populations.
Consideration should also be given to contingency planning, accident reconstruction, emergency
egress/ingress, emergency equipment access and aircraft traffic flow. Line of sight considerations should
be evaluated as well as factors involving electromagnetic environmental effects. Construction quality is
also an important consideration, where physical designs must minimally meet existing standards, codes and
regulations.
System safety is also concerned with the analysis of newly installed equipment. The following generic
hazards should be evaluated within formal analysis activities. Generic hazards areas are: electrical,
implosion, explosion, material handling, potential energy, fire hazards, electrostatic discharge, noise,
rotational energy, chemical energy, hazardous materials, floor loading, lighting and visual access,
electromagnetic environmental affects, walking/working surfaces, ramp access, equipment
failure/malfunction, foreign object damage, inadvertent disassembly, biological hazards, thermal non
12 - 4
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
ionizing radiation, pinch/nip points, system hazards, entrapment, confined spaces, and material
incompatibility.
The environmental assessment and consultation process provides officials and decision makers, as well as
members of the public, with an understanding of the potential environmental impacts of the proposed
action. The final decision is to be made on the basis of a number of factors. Environmental considerations
are to be weighed as fully and as fairly as non-environmental considerations. The FAA's objective is to
enhance environmental quality and avoid or minimize adverse environmental impacts that might result from
a proposed Federal action in a manner consistent with the FAA's principal mission to provide for the safety
of aircraft operations.
In conducting site evaluations the following risks must be evaluated from a system safety perspective.
• Noise
• Environmental Site Characterization
• Compatible Land Use
• Emergency Access and existing infrastructure
• Water supply
• Local emergency facilitates
• Social Impacts
• Induced Socioeconomic Impacts
• Air & Water Quality
• Historic, Architectural, Archeological, and Cultural Resources.
• Biotic Communities
• Local Weather Phenomena (tornadoes, hurricanes and lighting)
• Physical Phenomena (e.g. mudslide and earth quakes)
• Endangered and Threatened Species of Flora and Fauna.
• Wetlands.
• Animal Migration
• Floodplains.
• Coastal Zone Management
• Coastal Barriers.
12 - 5
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
• Wild and Scenic Rivers
• Farmland.
• Energy Supply and Natural Resources.
• Solid Waste
• Construction Impacts.
The final step before the user takes control of the facility is the occupancy inspection. This inspection
verifies the presence of critical safety features incorporated into the design. The use of a hazard tracking
system can facilitate the final safety assessment. This review may identify safety features that might
otherwise be overlooked during the inspection. A Hazard Tracking Log can generate a checklist for safety
items that should be part of this inspection.
The results of the occupancy inspection can serve as a measure of the effectiveness of the SSPP. Any
hazards discovered during the inspection will fall into one of two categories. A hazard that was previously
identified and the corrective action to be taken to control the determined hazard, or a hazard not previously
identified requiring further action. Items falling in this second category can be used to measure the
effectiveness of the SSPP for a particular facility.
• Ensure the application of all relevant building safety codes, including OSHA, National
Fire Protection Association, and FAA Order 3900.19B safety requirements.
• Conduct hazard analyses to determine safety requirements at all interfaces between the
facility and those systems planned for installation.
• Review equipment installation, operation, and maintenance plans to make sure all
design and procedural safety requirements have been met.
• Continue updating the hazard correction tracking begun during the design phases.
12 - 6
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
• Evaluate accidents or other losses to determine if they were the result of safety
deficiencies or oversight.
• Update hazard analyses to identify any new hazards that may result from change
orders.
In addition, guidance for conducting a Hazardous Material Management Program (HMMP) is provided in
National Aerospace Standard (NAS) 411. The purpose of a HMMP is to provide measures for the
elimination, reduction, or control of hazardous materials. A HMMP is composed of several tasks that
complement an SSPP:
• HMMP Plan
• Cost analysis for material alternatives over the life cycle of the material
• Documented trade-off analyses
• Training
• HMMP Report
Low-risk facilities; i.e., housing, and administrative buildings. In these types of facilities, risks to building
occupants are low and limited normally to those associated with everyday life. Accident experience with
similar structures must be acceptable, and no additional hazards (e.g., flammable liquids, toxic materials,
etc.) are to be introduced by the building occupants. Except in special cases, no further system safety
hazard analysis is necessary for low risk facility programs.
Medium-risk facilities; i.e., maintenance facilities, heating plants, or benign facilities with safety critical
missions such as Air Traffic Control (ATC) buildings. This group of facilities often presents industrial
type safety risks to the building occupants and the loss of the facility's operation has an impact on the
safety of the NAS. Accidents are generally more frequent and potentially more severe. A preliminary
hazard analysis (PHA) is appropriate. System hazard Analysis (SHA) and Subsystem Hazard Analysis
(SSHA) may also be appropriate. The facility design or systems engineering team members are major
contributors to these analyses. User community participation is also important.
High-risk facilities; i.e., high-energy-related facilities, fuel storage, or aircraft maintenance. This category
usually contains unique hazards of which only an experienced user of similar facility will have detailed
knowledge. Because of this, it is appropriate for the user or someone with applicable user experience to
prepare the PHA in addition to the PHL. Additional hazard analyses (e.g., system, subsystem, operating
and support hazard analyses may be required).
Another example is presented in FAA Order 3900.19, FAA Occupational Safety and Health Program.
This Order requires that “increased risk workplaces be inspected twice a year and all general workplaces
once a year.” Increased risk workplaces are based on an evaluation by an Occupational Safety and Health
professional and include areas such as battery rooms and mechanical areas.
In facility system safety applications, there are many ways of classifying risk which are based o n
exposures, such as fire loading, or hazardous materials. The National Fire Protection Association provides
details on these various risk categorization schemes. (See page 12-34 NFPA Health (hazard) Identification
System).
12 - 8
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12.4 Facility System Safety Program
Preparation of a facility system safety program involves the same tasks detailed in Chapter 5. However,
there are unique applications and facility attributes which are discussed in this section.
12 - 9
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
The concept of operational risk management is the application of operational safety and facility system
safety. More explicit information on Operational Risk management is found in Chapter 15.
12 - 10
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Other factors influencing the SSPP are overall project time constraints, manpower availability, and
monetary resources. The degree of system safety effort expended depends on whether the project replaces
an existing facility, creates a new facility, involves new technology, or is based on standard designs. A
more detailed discussion of each of the elements of a System Safety Program Plan is in Chapter 5.
ORMG Process
The ORMG process consists of nine major elements, which are depicted in Figure 12-3.
12 - 11
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Hazard Update
Identification SER
(Master M atrix)
Requirements Document in
Cross-Check Initial
SER (iterative)
12 - 12
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
• Existing Human Factors Review documents
• Existing Computer-Human Interface Evaluations
• Safety Assessment Review documents
• Site Transition & Activation Plan (STAP)
• System Technical Manuals
• Site Transition and Activation Management Plan (STAMP)
• System/Subsystem Specification (SSS)
The basis of the analysis relates to generic hazards and controls to specific maintenance steps required for
maintaining and repairing the system. The maintenance steps identified during the review should be
integrated into a matrix. In evaluating hazards associated with the maintenance procedures, the specific
procedures could fall into generic maintenance categories, which are characterized for example as listed
below:
12 - 13
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
1
12.5.1 Change Analysis
Change analysis examines the potential affects of modifications to existing systems from a starting point or
baseline. The change analysis systematically hypothesizes worse case effects from each modification from
that baseline. Consider existing, known system as a baseline. Examine the nature of all contemplated
changes and analyze the potential effects of each change (singularly) and all changes (collectively) upon
system risks. The process often requires the use of a system walk down, which is the method of physically
examining the system or facility to identify the current configuration.
Alternatively, a change analysis could be initiated on an existing facility by comparing “as designed” with
the “as built” configuration. In order to accomplish this, there would first be the need to physically identify
the differences from the “as designed” configuration. The process steps are:
This PHL effort serves several important functions. It provides the FAA with an early vehicle for
identifying safety, health, and environmental concerns. The results of this determination are used to size
the scope of the necessary safety effort for the specification, design and construction activities. It provides
the Associate Administrator with the data necessary to assess the cost of the safety effort and include it in
requests for funding. By requiring the PHL to accompany the funding documentation, funding for system
safety tasks becomes an integral part of the budget process.
Generation of the initial PHL includes identification of safety critical areas. Areas that need special safety
emphasis (e.g., walk-through risk analysis) are identified. The process for identifying hazards can be
accomplished through the use of checklists, lessons learned, compliance inspections/audits, accidents/near
1
System Safety Analysis Handbook, System Safety Society, July 1993.
12 - 14
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
misses, regulatory developments, and brainstorming sessions. For existing facilities, the PHL can be
created using information contained in the Environment and Safety Information System (ESIS). All
available sources should be used for identifying, characterizing, and controlling safety risks. Examples of
such inputs that may be found are in Figure 12-3. The availability of this information permits the FAA to
incorporate special requirements into the detailed functional requirements and specifications. This input
may be in the form of specific design features, test requirements, of SSP tasks. The resulting contract
integrates system safety into the design of a facility starting with the concept exploration phase.
PHL
PHA
User-defined unacceptable or Safety Risk
undesirable events Identification and
Design Reviews Characterization
Hazard Analysis Outputs
Health Hazard Reports
Figure 12-3 Sample Inputs for Safety Risk Identification and Characterization
The PHL also generates an initial list of risks that should initiate a Hazard Tracking Log, a database of
risks, their severity and probability of occurrence, hazard mitigation, and status. New risks are identified
throughout the design process, entered into and tracked by the log. As the design progresses, corrective
actions are included and risks are eliminated or controlled using the system safety order of precedence (See
Chapter 3, Table 3-1). Status is tracked throughout the design and construction process.
Safety risks may be logged closed in one of three ways. Those: (1) eliminated or controlled by design are
simply “closed.” (2) that are to be controlled by procedures or a combination of design and procedures are
marked closed but annotated to ensure that standard and operating procedures (SOPs) are developed to
reduce the risk. A list of operation and maintenance procedures to be developed is generated and turned
over to the user. (3) that are to be accepted as is, or with partial controls, are closed and risk acceptance
documentation prepared. This process documents all risks, their status, and highlights any additional
needed actions required. Thus, the hazard tracking system documents the status of safety risks throughout
the life of the facility's life cycle.
12 - 15
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
As an expanded version of the PHL, the PHA contains greater detail in three areas. First, hazard control
information is added to identified hazards. Second, a more comprehensive and systematic analysis to
identify additional hazards is performed. Third, greater detail on hazards previously identified in the PHL
is provided.
Detailed knowledge of all operations to be conducted within the facility and any hazards presented by
nearby operations is required. Based on the best available data, including lessons learned, hazards
associated with the proposed facility design or functions are evaluated for risk severity and probability,
together with operational constraints.
If the PHA indicates that the facility is a “low-risk” building and no further analysis is necessary, a list of
applicable safety standards and codes are still required. If the facility is “medium” or “high” risk, methods
to control risk must be instituted.
For existing systems the O&SHA is intended to address changing conditions through an iterative process
that can include subject matter expert (SME) participation and a review of installed systems. This
information could be documented in subsequent Safety Engineering Reports.
O&SHA is limited to the evaluation of risks associated with the operation and support of the system. The
materials normally available to perform an O&SHA include the following:
12 - 16
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Operating and Support Hazard Analysis Approach
This approach is based on the guidance of MIL-STD-882, System Safety Program Plan Requirements and
the International System Safety Society, Hazard Analysis Handbook. The O&SHA evaluates hazards
resulting from the implementation of operations or tasks performed by persons and considers the following:
Throughout the process, the human is considered an element of the total system, receiving inputs and
initiating outputs during the conduct of operations and support. The O&SHA methodology identifies the
safety-related requirements needed to eliminate hazards or mitigate them to an acceptable level of risk using
established safety order of precedence. This precedence involves initial consideration of the elimination of
the particular risk via a concept of substitution. If this is not possible, the risk should be eliminated by the
application of engineering design. Further, if it is not possible to design out the risk, safety devices should
be utilized. The order of progression continues and considers that if safety devices are not appropriate,
design should include automatic warning capabilities. If warning devices are not possible, the risks are to
be controlled via formal administrative procedures, including training.
The O&SHA is a more formal system safety engineering method that is designed to go beyond a JSA.
System safety is concerned with any possible risk associated with the system. This includes consideration
of the human/hardware/software/environmental exposures of the system. The analysis considers human
factors and all associated interfaces and interactions. As an additional outcome of the O&SHA, different
JSAs could be developed and presented depending on exposure and need. It is anticipated that JSAs will be
12 - 17
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
utilized to conduct training associated with new systems. Specific JSAs addressing particular maintenance
tasks, specific operations, and design considerations can be developed.
12 - 18
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
For further information concerning operating and support hazards and risks associated with aviation,
contact the FAA Office of System Safety.
12 - 19
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
An example of a risk assessment matrix is provided in Table 12-1. This matrix indicates the related hazard
code, hazard or scenario description, and scenario code. Both initial risk and final risk associated with the
specific scenario is also indicated. There is also a section for supportive comments.
12 - 20
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 21
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Table 12-2 Hazard Tracking Log Example: LOCATION: Building 5 Paint Booth
ITEM/FUNCTION PHASE HAZARD CONTROL CORRECTIVE ACTION
& STATUS
Cranes Lifting Loads exceed crane hoist Rated capacity painted on both Closed.
(2) 1000 LB capacity. sides if Figures readable from Use of cranes limited by
(top of paint booth frame) the floor level. Ref. Operating procedure to loads less
Manual.... than 600 lbs.
Crane Lifting Loads exceed crane hoist All bridge cranes proof loaded Closed.
(1) 10,000 LB bridge capacity. every 4 years. Certification tag No anticipated loads
(In front of paint booth) containing date of proof load, exceed 5000 lbs.
capacity, and retest date located
near grip.
Lifting Loss of control through All crane operators qualified Closed.
operator error. and authorized by floor
supervisor.
12 - 23
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
The requirement cross check analysis is a technique that relates the hazard description or risk to specific
controls and related requirements. TABLE 12.-3 is an example of a requirement cross check analysis
matrix. It is comprised of the following elements: hazard description code, hazard description, or accident
scenario, the hazard rationale, associated with a specific exposure or piece of equipment. The matrix also
displays a control code, hazard controls, and it also provides reference columns for appropriate requirement
cross check. For this example, OSHA requirements, FAA requirements and National Fire Protection
Association requirements are referenced.
12 - 24
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 25
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 26
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 27
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 28
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 29
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 30
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
12 - 31
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Table 12-4 Hazard Tracking Log Example: LOCATION: Building 5 Paint Booth
ITEM/FUNCTION PHASE HAZARD CONTROL CORRECTIVE ACTION
& STATUS
Cranes Lifting Loads exceed crane hoist Rated capacity painted on both Closed.
(2) 1000 LB capacity. sides if Figures readable from Use of cranes limited by
(top of paint booth frame) the floor level. Ref. Operating procedure to loads less
Manual.... than 600 lbs.
Crane Lifting Loads exceed crane hoist All bridge cranes proof loaded Closed.
(1) 10,000 LB bridge capacity. every 4 years. Certification tag No anticipated loads
(In front of paint booth) containing date of proof load, exceed 5000 lbs.
capacity, and retest date located
near grip.
Lifting Loss of control through All crane operators qualified Closed.
operator error. and authorized by floor
supervisor.
12-30
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Listing by an NRTL such as UL, does not automatically ensure that an item can be used at an acceptable
level of risk. These listings are only indications that the item has been tested and listed according to the
laboratory’s criteria. These criteria may not reflect the actual risks associated with the particular
application of the component or its use in a system. Hazard analysis techniques should be employed to
identify these risks and implement controls to reduce them to acceptable levels. The hazard is related to the
actual application of the product. A computer powered by 110 VAC might be very dangerous if not used
as intended. For example, if it were used by a swimming pool, it would be dangerous regardless of the UL
standard that it was manufactured to comply with. Therefore, the use of products manufactured to product
manufacturing standards require the same system safety analysis as developmental items to ensure that they
are manufactured to the correct standard and used in an acceptable manner.
Conformance to codes, requirements, and standards is no assurance of acceptable levels of risk when
performing tasks. Risks should be diagnosed by hazard analysis techniques like the O&SHA. When risks
are identified, they are either eliminated or controlled to an acceptable level by the application of hazard
controls.
12-31
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
Commercial-off-the-shelf, non-developmental items (COTS NDI) pose risks that must be isolated by
formal hazard analysis methods. The use of COTS-NDI does not ensure that the components or systems
that they are used in are OSHA compliant. COTS NDI components cannot be considered as having been
manufactured to any specific standards unless they have been tested by an NRTL. Therefore, the use of
COTS-NDI requires the same system safety analysis as developmental items to ensure that they are
manufactured and used in an acceptable manner.
The identification of hazardous materials in facilities and equipment that have been designated for
disposition. Failure to comply with these regulations can lead to fines, penalties, and other regulatory
actions. As per the Federal Facilities Compliance Act of 1992, states and local authorities may fine and/or
penalize federal officials for not complying with state and local environmental requirements.
Improper disposal of equipment containing hazardous materials would expose the FAA to liability in terms
of regulatory actions and lawsuits (e.g. fines, penalties, and cleanup of waste sites)
There are many regulatory drivers when dealing with hazardous materials disposition. These include:
12-32
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
• National Environmental Policy Act (NEPA)
• Toxic Substance Control Act (TSCA)
• Federal Facilities Compliance Act of 1992 (FFCA)
• Community Environmental Response Facilitation Act (CERFA)
• DOT Shipping Regulations - Hazardous Materials Regulation
• OSHA Regulations (HAZCOM)
• State, local, and tribal laws
• FAA Orders
• Disposal guidance provided in FAA Order 4660.8, Real Property Management and
Disposal
• Disposition guidance contained in FAA Order 4800.2C, Utilization and Disposal of
Excess and Surplus Personal Property
• Material that on exposure under fire conditions would offer no hazard beyond that of
ordinary combustible material. (Example: peanut oil)
• Material that on exposure would cause irritation but only minor residual injury.
(Example: turpentine)
• Material that on intense or continued but not chronic exposure could cause temporary
incapacitation or possible residual injury. (Example: ammonia gas)
12-33
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
• Material that on very short exposure could cause death or major residual injury.
(Example: hydrogen cyanide)
12-34
FAA System Safety Handbook, Chapter 12: Facilities Safety
December 30, 2000
FAA Order 1600.46, Physical Security Review of New Facilities, Office Space or Operating Areas
Human Factors Design Guide. Daniel Wagner, U.S. Dept of Transportation, FAA, January 15, 1996.
Public Law 91-596; Executive Order 12196, Occupational Safety and Health Programs for Federal
Employees
System Safety 2000, A Practical Guide for Planning, Managing, and Conducting System Safety
Programs, J. Stephenson, 1991.
System Safety Analysis Handbook, System Safety Society (SSS), July 1993.
12-35
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Chapter 13:
13.1 Introduction
The office of the Associate Administrator for Commercial Space Transportation (AST), under Title 49,
U.S. Code, Subtitle IX, Sections 70101-70119 (formerly the Commercial Space Launch Act), exercises
the FAA’s responsibility to:
regulate the commercial space transportation industry, only to the extent necessary to ensure
compliance with international obligations of the United State and to protect the public health
and safety, safety of property, and national security and foreign policy interest of the United
States, …encourage, facilitate, and promote commercial space launches by the private sector,
recommend appropriate changes in Federal statutes, treaties, regulations, policies, plans,
and procedures, and facilitate the strengthening and expansion of the United States space
transportation infrastructure. [emphasis added]
The mandated mission of the AST is “…to protect the public health and safety and the safety of
property….”
AST has issued licenses for commercial launches of both sub-orbital sounding rockets and orbital
expendable launch vehicles. These launches have taken place from Cape Canaveral Air Station (CCAS),
Florida, Vandenburg Air Force Base (VAFB), California, White Sands Missile Range (WSMR), New
Mexico, Wallops Flight Facility (WFF), Wallops Island, Virginia, overseas, and the Pacific Ocean.
AST has also issued launch site operator licenses to Space Systems International (SSI) of California, the
Spaceport Florida Authority (SFA), the Virginia Commercial Space Flight Authority (VCSFA), and the
Alaska Aerospace Development Corporation (AADC). SSI operates the California Spaceport located on
VAFB; SFA the Florida Space Port located on CCAS; VCSFA the Virginia Space Flight Center located
on WFF; and AADC the Kodiak Launch Complex, located on Kodiak Island, Alaska.
13 -1
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
There are a number of technical analyses; some quantitative and some qualitative, that the applicant may
perform in order to demonstrate that their commercial launch operations will pose no unacceptable threat
to the public. The quantitative analyses tend to focus on 1) the reliability and functions of critical safety
systems, and 2) the hazards associated with the hardware, and the risk those hazards pose to public
property and individuals near the launch site and along the flight path, to satellites and other on-orbit
spacecraft. The most common hazard analyses used for this purpose are Fault Tree Analysis, Failure
Modes and Effects Analysis, and Over-flight Risk and On-Orbit Collision Risk analyses using the Poisson
Probability Distribution. The qualitative analyses focus on the organizational attributes of the applicant
such as launch safety policies and procedures, communications, qualifications of key individuals, and
critical internal and external interfaces.
It is AST/LASD’s responsibility to ensure that the hazard analyses presented by the applicant
demonstrates effective management of accident risks by identifying and controlling the implicit as well as
explicit hazards inherent in the launch vehicle and proposed mission. LASD must evaluate the applicant’s
safety data and safety related hardware/software elements and operations to ascertain that the
demonstrations provided by the applicant are adequate and valid.
Specifically, the LASD evaluation is designed to determine if the applicant has:
• Identified all energy and toxic sources and implemented controls to preclude accidental or
inadvertent release.
• Evaluated safety critical aspects, potential safety problems, and accident risk factors.
• Identified potential hazardous environments or events, and assessed their causes, possible
effects and probable frequency of occurrence.
• Implemented effective hazard elimination, prevention or mitigation measures or techniques to
minimize accident risk to acceptable levels.
• Specified the means by which hazard controls or mitigation methodology can be verified and
validated.
13 -2
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
• A third party for death, bodily injury, or property damage or loss resulting from an activity
carried out under the license; and
• The U.S. Government against a person for damage or loss to government property resulting
from an activity carried out under the license.
Section 70112 also requires that the Department of Transportation set the amounts of financial
responsibility required of the licensee. The licensee can then elect to meet this requirement by:
The methodology developed for setting financial responsibility requirements for commercial launch
activities is called Maximum Probable Loss (MPL) analysis1. MPL analysis was developed to protect
launch participants from the maximum probable loss due to claims by third parties and the loss of
government property during commercial launch activities. Note that this is maximum probable loss, not
maximum possible loss. Generally speaking, MPL is determined by identifying all possible accident
scenarios, examining those with the highest potential losses for both government property and third party,
and then estimating the level of loss that would not be exceeded at a given probability threshold. If the
launch is to take place from a private licensed range and no government property is at risk, no
government property financial responsibility requirement will be issued.
An integral part of, and critical input to the MPL, is the Facility Damage and Personnel (DAMP) Injury
Analysis2: DAMP uses information about launch vehicles, trajectories, failure responses, facilities and
populations in the launch area to estimate the risk and casualty expectations from impacting inert debris,
secondary debris and overpressures from impact explosions. Together, the MPL and DAMP analyses are
used to determine the financial responsibility determinations necessary to insure compensation for losses
resulting from an activity carried out under the commercial license.
1
Futron Corporation developed the MPL Analysis methodology employed by AST.
2
Research Triangle Institute developed the DAMP Analysis methodology employed by AST.
13 -3
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Figure 13-3
System Safety, Software Acquisition and
Systematic Software Acquisition Process
System Safety Program Plan (Initiated Conceptual Phase – updated remainder of system life cycle
Configuration Management
13 -4
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
13.4.1 Overview
The System Safety Engineering Process is the structured application of system safety engineering and
management principles, criteria, and techniques to address safety within the constraints of operational
effectiveness, time, and cost throughout all phases of a system’s life cycle. The intent of the System
Safety Engineering Process is to identify, eliminate, or control hazards to acceptable levels of risk
throughout a system’s life cycle.
This process is performed by the vehicle developer/operator. Because of the complexity and variety of
vehicle concepts and operations, such a process can help ensure that all elements affecting public safety
are considered and addressed. Without such a process, very detailed requirements would have to be
imposed on all systems and operations, to ensure that all hazards have been addressed which could have
the undesired effect of restricting design alternatives and innovation or could effectively dictate design
and operations concepts.
The process (as described in Mil Std 882C) includes a System Safety Program Plan (SSPP). The SSPP (or
its equivalent) provides a description of the strategy by which recognized and accepted safety standards
and requirements, including organizational responsibilities, resources, methods of accomplishment,
milestones, and levels of effort, are to be tailored and integrated with other system engineering functions.
The SSPP lays out a disciplined, systematic methodology that ensures all risks – all events and system
failures (probability and consequence) that contribute to expected casualty – are identified and eliminated,
or that their probability of occurrence is reduced to acceptable levels of risk.
The SSPP should indicate the methods employed for identifying hazards, such as Preliminary Hazards
Analysis (PHA), Subsystem Hazard Analysis (SSHA), Failure Mode and Effects Analysis (FMEA), Fault
Tree Analysis. Risk Mitigation Measures are likewise identified in the plan. These include avoidance,
design/redesign, process/procedures and operational rules and constraints.
The System Safety Engineering Process identifies the safety critical systems. Safety critical systems are
defined as any system or subsystem whose performance or reliability can affect public health and safety
and safety of property. Such systems, whether they directly or indirectly affect the flight of the vehicle,
may or may not be critical depending on other factors such as flight path and vehicle ability to reach
populated areas. For this reason, it is important to analyze each system for each phase of the vehicle
mission from ground operations and launch through reentry and landing operations. Examples of
potentially safety critical systems that may be identified through the system safety analysis process using
PHA or other hazard analysis techniques may include, but are not limited to:
13 -5
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
• Landing Systems
• Reentry Propulsion System
• Guidance, Navigation and Control System(s), Critical Avionics (Hardware and Software) -
includes Attitude, Thrust and Aerodynamic Control Systems
• Health Monitoring System (hardware and software)
• Flight Safety System (FSS)
• Flight Dynamics (ascent and reentry) for stability (including separation dynamics) and
maneuverability
• Ground Based Flight Safety Systems (if any) including telemetry, tracking and command and
control systems
• Depending on the concept, additional “systems” might include pilot and life support systems
and landing systems if they materially affect public health and safety
• Others identified through hazard analysis
AST uses a pre-application consultation process to help a potential applicant to understand what must be
documented and to help identify potential issues with an applicant’s proposed activities that could
preclude its obtaining a license. The pre-application process should be initiated by the applicant early in
their system development (if possible during the operations concept definition phase) and maintained
until their formal license application is completed. This pre-application process should be used to provide
AST with an understanding of the safety processes to be used, the safety critical systems identified,
analysis and test plan development, analysis and test results, operations planning and flight rules
development.
Analyses may be acceptable as the primary validation methodology in those instances where the flight
regime cannot be simulated by tests, provided there is appropriate technical rationale and justification.
Qualification tests, as referenced in the safety demonstration process and the System Safety Program
Plan, are normally conducted to environments higher than expected. For example, expendable launch
vehicle (ELV) Flight Safety Systems (FSS) are qualified to environments a factor of two or higher than
expected. (See Figure 13-2) These tests are conducted to demonstrate performance and adequate design
margins and may be in the form of multi-environmental ground tests, tests to failure, and special flight
tests. Such tests are normally preceded with detailed test plans and followed by test reports.3
3
Test plans are important elements of the ground and flight test programs. Such plans define, in advance, the nature of the test (what
is being tested and what the test is intended to demonstrate with respect to system functioning, system performance and system
reliability). The test plan should be consistent with the claims and purpose of the test and wherever appropriate, depending on the
purpose of the test, clearly defined criteria for pass and fail should be identified. A well-defined test plan and accompanying test
report may replace observation by the FAA.
13 -6
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
T
e
m
p
e
r
a
t Qualification
u Use Environment Test Environment
r
e
Vibration
In addition, Quality assurance (QA) records are useful in establishing verification of both design
adequacy and vehicle assembly and checkout (workmanship).
Table 13-1, Validation Acceptance Matrix, identifies sample approaches that may be employed to
validate acceptance for critical systems. Examples of types of analyses, ground tests, and flight tests are
provided following this matrix. (Note: Quality Assurance programs and associated records are essential
where analysis or testing, covering all critical systems, are involved.)
13 -7
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Thermal Protection X P P
Reentry (de-orbit) X P P
Health Monitoring * X X X
13 - 8
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
13.4.3 Analyses
There are various types of analyses that may be appropriate to help validate the viability of a critical
system or component. The following provides examples of some types of critical systems analysis
methodologies and tools.
4
ISO 9000-3 is used in the design, development, and maintenance of software. Its purpose is to help produce software products
that meet the customers' needs and expectations. It does so by explaining how to control the quality of both products and the
processes that produce these products. For software product quality, the standard highlights four measures: specification, code
reviews, software testing and measurements.
13 - 9
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Mechanical Systems and Components (Vehicle Structure, Pressurization, Propulsion System including
engine frame thrust points, Ground Support Equipment)
Types of Tests: Load, Vibration (dynamic and modal), Shock, Thermal, Acoustic, Hydro-static,
Pressure, Leak, Fatigue, X-ray, Center of Gravity, Mass Properties, Moment of Inertia, Static
Firing, Bruceton Ordnance, Balance, Test to Failure (simulating non-nominal flight conditions),
Non-Destructive Inspections
Electrical/Electronic Systems (Electrical, Guidance, Tracking, Telemetry and Command, Flight Safety
System (FSS), Ordnance, Flight Control and Recovery)
Types of Tests: Functional, Power/Frequency Deviation, Thermal Vacuum, Vibration, Shock,
Acceleration, X-ray, recovery under component failures, abort simulations, TDRSS integration
testing (up to and including pre-launch testing with flight vehicle)
Software (Electrical, Guidance, Tracking, Telemetry, Command, FSS, Ordnance, Flight Control and
Recovery)
Types of Tests: Functional, Fault Tolerance, Cycle Time, Simulation, Fault Response,
Independent Verification and Validation, Timing, Voting Protocol, Abort sequences (flight and
in-orbit) under non-nominal conditions with multiple system failures, Integrated Systems Tests
The purpose of flight-testing is to verify the system performance, validate the design, identify system
deficiencies, and demonstrate safe operations. Experience repeatedly shows that while necessary and
important, analyses and ground tests cannot and do not uncover all potential safety issues associated with
new launch systems. Even in circumstances where all known/identified safety critical functions can be
13 - 10
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
exercised and validated on the ground, there is still the remaining concern with unrecognized or unknown
interactions (“the unknown unknowns”).
The structure of the test program will identify the flight test framework and test objectives, establish the
duration and extent of testing; identify the vehicle’s critical systems, identify the data to be collected, and
detail planned responses to nominal and unsatisfactory test results.
Test flight information includes verification of stability, controllability, and the proper functioning of the
vehicle components throughout the planned sequence of events for the flight. All critical flight parameters
should be recorded during flight. A post-flight comparative analysis of predicted versus actual test flight
data is a crucial tool in validating safety critical performance. Below are examples of items from each test
flight that may be needed to verify the safety of a reusable launch vehicle. Listed with each item are
examples of what test-flight data should be monitored or recorded during the flight and assessed post-
flight:
Vehicle/stage launch phase: Stability and controllability during powered phase of flight.
• Vehicle stage individual rocket motor ignition timing, updates on propellant flow rates,
chamber temperature, chamber pressure, and burn duration, mixture ratio, thrust, specific
impulse (ISP)
• Vehicle stage trajectory data (vehicle position, velocity, altitudes and attitude rates, roll,
pitch, yaw attitudes)
Staging/separation phase of boost and upper stages: Stable shutdown of engines, and nominal
separation of the booster & upper stages.
• Separation activity (timestamp, i.e., separation shock loads, and dynamics between stamps)
13 - 11
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Booster stage flyback phase (if applicable): Flyback engine cut-off, fuel dump or vent (if
required), nominal descent to the planned impact area, proper functioning and reliability of the
RLV landing systems.
• Booster stage post-separation (flyback) trajectory data
• Electrical power usage and reserves
• Booster stage landing system deployment activity (timestamp)
• Actual thermal and vibroacoustic environment
• Actual structural loads environment
• Functional performance of the Vehicle Health Monitoring System
• Functional performance of the Flight Safety System/Safe Abort System
• Attitude, Guidance and Control system activities
Vehicle stage ascent phase (if multistage): nominal ignition of the stage’s engine, stability and
controllability of the stage during engine operation, orbital insertion – simulated (for suborbital)
or actual – of the vehicle.
• Vehicle individual rocket motor ignition timing, updates on propellant flow rates, chamber
temperature, chamber pressure, and burn duration
• Vehicle circularization and phasing burn activities (ignition timing, updates on propellant
flow rates, chamber temperature, chamber pressure, and burn duration)
• Vehicle trajectory data (vehicle position, altitude, velocity, roll, pitch, yaw attitudes at a
minimum)
• Attitude, guidance and control system activities
Vehicle descent (including vehicle’s de-orbit burn targeting and execution phases): Function of
the programmed flight of the vehicle/upper stage to maintain the capability to land (if reusable) at
the planned landing site, or to reenter for disposal (if expendable), assurance of fuel dump or
depletion, and proper descent and navigation to the planned or alternate landing site.
• Vehicle pre-deorbit burn trajectory data
• Vehicle deorbit burn data (ignition timing, updates on propellant flow rate, chamber
temperature, chamber pressure, and burn duration)
• Vehicle descent trajectory data (position, velocity, and attitude)
• Attitude, Guidance and Control system activities
13 - 12
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
The use of similarity is not new to launch operations. EWR 127-1, Paragraph. 4.14.1.2, states: as
required, qualification by similarity analysis shall be performed; if qualification by similarity is not
approved, then qualification testing shall be performed. For example, if component A is to be considered
as a candidate for qualification by similarity to a component B that has already been qualified for use,
component A shall have to be a minor variation of component B. Dissimilarities shall require
understanding and evaluation in terms of weight, mechanical configuration, thermal effects, and dynamic
response. Also, the environments encountered by component B during its qualification or flight history
shall have to be equal to or more severe than the qualification environments intended for component A.
Test parameters and analytic assumptions will further define the limits of flight operations. The scope of
the analyses and environmental tests, for example, will constitute the dimensions of the applicant’s
demonstration process and therefore define the limits of approved operations if a license is issued. Such
testing limits, identified system and subsystem limits, and analyses also are expected to be reflected in
13 - 13
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
mission monitoring and mission rules addressing such aspects as commit to launch, flight abort, and
commit to reentry.
Vehicle capabilities/limitations and operational factors such as launch location and flight path each affect
public risk. The completion of system operation demonstrations, such as flight simulations and controlled
flight tests, provide additional confidence in the vehicle systems and performance capabilities. As
confidence in the systems overall operational safety performance increases, key operational constraints
such as restrictions on overflight of populated areas may be relaxed.
The following are examples of the types of operations-related considerations that may need to be
addressed by the applicant when establishing their operations scenarios.
Limits on flight regime (ties in with analysis, testing and demonstrating confidence in
system performance and reliability)
13 - 14
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Figure 13-2: Interrelationship between Safety Critical Systems and Safety Critical Operations
Public Risk
13 - 15
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
(or function) whose performance or reliability can affect (i.e. malfunction or failure will endanger) public
health, safety and safety of property.5
Introduction
The Systematic Software Safety Process (SSSP) encompasses the application of an organized periodic
review and assessment of safety-critical software and software associated with safety-critical system,
subsystems and functions. The Systematic Software Safety Process consist primarily of the following
elements:
• Software safety organization is properly chartered and a safety team is commissioned in time.
• Acceptable levels of software risk are defined consistently with risks defined for the entire
system.
• Interfaces between software and the rest of the system’s functions are clearly delineated and
understood.
• Software application concepts are examined to identify safety-critical software functions for
hazards.
• Requirements and specifications are examined for safety hazards (e.g. identification of
hazardous commands, processing limits, sequence of events, timing constraints, failure
tolerance, etc.)
• Design and implementation is properly incorporated into the software safety requirements.
• Appropriate verification and validation requirements are established to assure proper
implementation of software system safety requirements.
• Test plans and procedures can achieve the intent of the software safety verification
requirements.
5
Reference D.
13 - 16
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
The Institute of Electrical and Electronic Engineering (IEEE) offers a comprehensive standard (Standard
for Software Safety Plans) focusing solely on planning. The Standard articulates in sufficient detail both
software safety management and supporting analyses. The Standard’s annex describes the kind of
analyses to be performed during the software requirements, design, code, test and change phases of the
traditional life cycle. Similar planning models are provided by the Department of Defense (DOD)
Defense Standard 00-55-Annex B.
A software safety organization can take one of many shapes, depending on the needs of the applicant or
licensed operator. However, the following requisites are recommended:
Centralization allows a single organization to focus entirely on hazards and their resolutions during any
life cycle phase, be it design, coding or testing. Independence prevents bias and conflicts of interest
during organizationally sensitive hazard assessment and management. A high status empowers the team
to conduct its mission with sufficient visibility and importance. By endorsing these requisites, CST
applicants and operators will indicate they are attentive to the safety aspects of their project or mission.
Several typical activities expected of the team range from identifying software-based hazards to tracing
safety requirements and limitations in the actual code, to developing software safety test plans and
reviewing test results for their compliance with safety requirements.
13 - 17
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
waterfall and spiral methodologies. Although different models may carry different lifecycle emphasis, the
adopted model should not affect the SSSP itself. For discussion purposes only, this enclosure adopts a
waterfall model (subject to IEEE/IEA Standard for Information Technology-software life cycle processes
No. 12207.) For brevity, only some phases (development, operation, maintenance and support) of the
Standard are addressed in terms of their relationship to software safety activities. This relationship is
summarized in Table 13-2 The table’s contents partly reflect some of the guidance offered by the National
Aeronautics and Space Administration (NASA) Standard 8719.13A and NASA Guidebook GB-1740.13-
96.
13 - 18
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Table 13-2: Software Safety Activities Relative to the Software Life Cycle
Life Cycle Phase Corresponding Inputs Expected Results Milestones To
Be Met
Safety Activity
Concept/ -Review software concept for -Preliminary PSHA Report Software
safety provisions Hazard Analysis Concept Review
Requirements/ PHL
(PHA) [from (SCR)
-Derive generic and system-
Specifications system safety
specific software safety Software
analysis]
requirements. Requirements
-Generic and Review (SRR)
-Analyze software requirements
system-wide and Software
for hazards.
safety specs. Specification
-Identify potential software/system Review (SSR)
interface hazards
-Develop Functional Hazards List
(FHL)
-Develop initial Preliminary
Software Hazard Analysis (PSHA)
Architecture/ At high design level: PSHA Software Safety Preliminary
Architectural Design Design Review
Preliminary -Identify Safety Critical Computer
Hazard Analysis (PDR)
Software Components (SCCSCs)
Software Design (SSADHA) Report
-Verify correctness &
completeness of architecture
-Ensure test coverage of software
safety requirements.
Detailed Design At the low design(unit) level: PSHA Software Safety Detailed Critical Design
SSADHA Design Hazard Analysis Review (CDR)
-Focus on SCCSCs at the unit
(SSDDHA) Report
level.
-Verify correctness/ completeness
of detail. Design
Implementation -Examine correctness & PSHA, Software Safety Test Readiness
completeness of code from safety SSADHA, Implementation Hazard Review (TRR)
Coding
requirements. SSDDHA Analysis (SSIHA) report
-Identify possibly unsafe code.
-Walk-through/audit the code
Integration and -Ensure test coverage of software Test documents -Software Safety Acceptance
safety requirements. Integration Testing
Testing
(SSIT) Report
-Review test documents and results
for safety requirements. -Final SSHA report
-Final SSHA
Operations and -Operating and Support Hazard All of the above O&SHA Report(s), as Deployment
Analysis (O&SHA) plus all incidents required
Maintenance
reports
13 - 19
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Figure 3 provides a composite overview of the entire safety process. The figure consists of three parts.
The top part reflects the broader System Safety Process described in draft Advisory Circular 431.35-2.
The middle part illustrates a typical waterfall Software Acquisition Process life cycle. The bottom part
also partly corresponds to the Systematic Software Safety Process. In Figure 3, all processes shown in
horizontal bars are subject to a hypothetical schedule with time duration not drawn to any scale.
Special Provisions
Commercial Off the Shelf (COTS): COTS software targets a broad range of applications, with no
specific one envisioned ahead of time. Therefore, care must be taken to ensure COTS software presence
minimizes risk when it becomes embedded or coupled to specific applications. Consideration ought to be
given to designing the system such that COTS software remains isolated from safety-critical functions. If
isolation is not possible, then safeguards and oversight should be applied.
Software Reuse: Reusable software originates from a previous or different application. Usually,
developers intend to apply it to their current system, integrating it “as is” or with some minor
modifications. The Software Safety Team verification/validation plan, etc.) Annex B should serve as a
general model for preparing software safety documents
The results of most of the safety analyses activities usually require preparing several hazard analysis
reports documenting the findings of the safety team. The team has also the responsibility of presenting
their findings to decision-making management at critical milestones, like the Software Requirements
Review (SRR), Preliminary Design Review (SDR), Critical Design Review (CDR), etc. Towards this end,
DOD Defense Standard 00-55-Annex E describes how to prepare a software safety “case”. The Standard
defines a case as “a well-organized and reasoned justification, based on objective evidence, that the
software does or will satisfy the safety aspects of the Software Requirement”.
The team also has the responsibility of presenting their findings to decision-making management at
critical milestones, like the Software Requirements Review (SRR), Preliminary Design Review (SDR),
13 - 20
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Critical Design Review (CDR), etc. Towards this end, DOD Defense Standard 00-55-Annex E describes
how to prepare a software safety “case”. The Standard defines a case as “a well-organized and reasoned
justification, based on objective evidence, that the software does or will satisfy the safety aspects of the
Software Requirement”.
In turn, defective software can be labeled hazardous if it consists of safety-critical functions that
command, control and monitor sensitive CST systems. Some typical software functions considered
safety-critical include:
• Ignition Control: any function that controls or directly influences the pre-arming, arming,
release, launch, or detonation of a CST launch system.
• Flight Control: any function that determines, controls, or directs the instantaneous flight path
of a CST vehicle.
• Navigation: any function that determines and controls the navigational direction of a CST
vehicle.
• Monitoring: any function that monitors the state of CST systems for purposes of ensuring its
safety.
• Hazard Sensing: any function that senses hazards and/or displays information concerning the
protection of the CST system.
• Energy Control: any function that controls or regulates energy sources in the CST system.
• Fault Detection: any function that detects, prioritizes, or restores software faults with
corrective logic.
• Interrupt Processing: any function that provides interrupt priority schemes and routines to
enable or disable software-processing interrupts.
• Autonomous Control: any function that has autonomous control over safety-critical hardware.
• Safety Information Display: any function that generates and displays the status of safety-
critical hardware or software systems.
• Computation: any function that computes safety-critical data.
13 - 21
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Risk Assessment
A key element in system safety program planning is the identification of the acceptable level of risk for
the system. The basis for this is the identification of hazards. Various methodologies used in the
identification of hazards are addressed in Sections 2.3 & 2.4 of draft AC 431.35-2. Once the hazards and
risks are identified, they need to be prioritized and categorize so that resources can be allocated to the
functional areas having an unacceptable risk potential. Risk assessment and the use of a Hazard Risk
Index (HRI) Matrix as a standardized means with which to group hazards by risk are described in
Attachment 2, Sections 6.1 & 6.2 of draft AC 431.35-2. This section presents specialized methods of
analyzing hazards, which possess software influence or causal factors and supplements the HRI presented
in draft AC 431.35-2.
The Hazard Risk Index presented in draft AC 431.35-2 is predicated on the probability of hazard
occurrence and the ability to obtain component reliability information from engineering sources.
Hardware reliability modeling of a system is well established; however, there is no uniform, accurate or
practical approach to predicting and measuring the software reliability portion of the system. Since
software does not fail in the same manner as hardware, in that it is not a physical entity, it does not wear
out, break, or degrade over time; software problems are referred to as a software error. Software errors
general occur due to implementation or human failure mechanisms (such as documentation errors, coding
errors, incorrect interpretation of design requirements, specification oversight, etc.) or requirement errors
(failure to anticipate a set of conditions that lead to a hazard). Unlike hardware, software has many more
failure paths than hardware, making it difficult to test all paths. Thus the ultimate goal of software system
safety is to find and eliminate the built-in unintended and undesired hazardous functions driven by
software in a CST system.
The second half of the equation for the classification of risk is applying an acceptable methodology for
determining the software’s influence on system level hazards. The probability factors contained in draft
AC 431.35-2 has been determined for hardware based upon historical “best” practices. Data for the
assignment of accurate probabilities to software error has not matured. Thus alternate methods for
determining probability propagated by software causal factors need to be used. Numerous methods of
determining software effects on hardware have been developed and two of the most commonly used are
presented in MIL-STD 882C and RTCA DO-178 and are shown in Figure 4. These methods address the
software’s “control capability” within the context of the software casual factors. An applicant Software
System Safety Team should review these lists and tailor them to meet the objectives of their CST system
and integrated software development program.
This activity of categorizing software causal factors is for determining both likelihood, and the design,
coding, and test activities required to mitigate the potential software contributor. A Software Hazard
13 - 22
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
Criticality (SHC) Matrix, similar to the hazard risk index (HRI)6 matrix is used to determine the
acceptability of risk for software hazards. Figure 3 shows an example of a typical SHC matrix using the
control categories of MIL-STD 882C [Mil882C]. The SHC matrix can assist the software system safety
team in allocating software safety requirements against resources and in the prioritization of software
design and programming tasks.
6
See Attachment 2, Section 6.2 of AC 431.35-2 for discussion and illustration of HRI.
7
The actual analysis techniques used to identify hazards, their causes and effects, hazard elimination, or risk reduction
requirements and how they should be met should be addressed in the applicant’s System Safety Program Plan. The System
Safety Society’s System Safety Handbook identifies additional system safety analysis techniques that can be used.
8
Reference E
13 - 23
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
13 - 24
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
CONTROL
CATEGORY CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE
13 - 25
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
It is intended to provide a limited representative sampling of those software safety analysis methods and
tools available to the CST licensee or operator. General systems safety analysis have been omitted in that
they are addressed in Paragraph 4.3. It is the licensee or operator’s responsibility to assess the
applicability and viability of a particular analysis method or tool to their CST, methods of operations, and
organizational capabilities.
• Code Inspection: a formal review process during which a safety team checks the actual code,
comparing it stepwise to a list of hazard concerns.
• Hardware/Software Safety Analysis9: this analysis is a derivative of the system PHA10. The
PHA when integrated with the requirements leveled upon the software will identify those
programs, routines, or modules that are critical to system safety and must be examined in
depth.
• Software Failure Modes and Effects Analysis (SFMEA)11: identifies software related design
deficiencies through analysis of process flow-charting. It also identifies interest areas for
verification /validation and test and evaluation. Technique is used during and after the
development of software specifications. The results of the PHA and SSHA, if complete, can
be used ass a guide for focusing the analysis.
• Software Fault Tree Analysis (SFTA)12: used to identify the root cause(s) of a “top”
undesired event. When a branch of the hardware FTA leads to the software of the system, the
SFTA is applied to that portion of software controlling that branch of the hardware FTA. The
outputs from the SFMEA, Software Requirements Hazard Analysis (SRHA), Interface
Analysis, and Human Factors/Man-Machine Interface Analysis can provide inputs to the
SFTA. SFTA can be performed at any or all levels of system design and development.
• Software Hazard Analysis (SHA)13: used to identify, evaluate, and eliminate or mitigate
software hazards by means of a structured analytical approach that is integrated into the
software development process.
• Software Sneak Circuit Analysis (SSCA)14: is used to uncover program logic that could cause
undesired program outputs or inhibits, or incorrect sequencing/timing. When software
controls a safety critical event, an SSCA can help detect a condition that would cause a
catastrophic mishap if the cause were an inadvertent enabling condition.
9
Alternate Names: Software Hazard Analysis (SHA) and Follow-On Software Hazard Analysis.
10
See Paragraph 4.3.
11
Alternate Names: Also knows as Software Fault Hazard Analysis (SFHA) and Software Hazardous Effects Analysis (SHEA).
12
Alternate Name: Also know as Soft Tree Analysis (STA).
13
Alternate Name: Software Safety Analysis (SSA).
14
Should be cross-referenced to system SCA.
13 - 26
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
• A software quality assurance program should be established for systems having safety-critical
functions.
• At least two people should be thoroughly familiar with the design, coding, testing and
operation of each software module in the CST system.
• The software should be analyzed throughout the design, development, and maintenance
processes by a software system safety team to verify and validate the safety design
requirements have been correctly and completely implemented.
• The processes as described in the software development plan should be enforceable and
auditable. Specific coding standards or testing strategies should be enforced and they should
be independently audited.
• Desk audits, peer reviews, static and dynamic analysis tools and techniques, and debugging
tools should be used to verify implementation of identified safety-critical computing system
functions.
• The CST system should have at least one safe state identified for each operation phase.
• Software should return hardware systems under the control of software to a designed safe
state when unsafe conditions are detected.
• Where practical, safety-critical functions should be performed on a standalone computer. If
this is not practical, safety-critical functions should be isolated to the maximum extent
practical from non-critical functions.
• Personnel not associated with the original design team should design the CST system and its
software for ease of maintenance.
• The software should be designed to detect safety-critical failures in external hardware input
or output hardware devices and revert to a safe state upon their occurrence.
• The software should make provisions for logging all system errors detected.
• Software control of safety-critical functions should have feedback mechanisms that give
positive indications of the function’s occurrence.
• The system and software should be designed to ensure that design safety requirements are not
violated under peak load conditions.
• Applicant should clearly identify an overall policy for error handling. Specific error detection
and recovery situations should be identified.
• When redundancy is used to reduce the vulnerability of a software system to a single
mechanical or logic failure, the additional failure modes from the redundancy scheme should
be identified and mitigated.
• The CST system should be designed to ensure that the system is in a safe state during power-
up.
13 - 27
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
• The CST system should not enter an unsafe or hazardous state after an intermittent power
transient or fluctuation.
• The CST system should gracefully degrade to a secondary mode of operation or shutdown in
the event of a total power loss so that potentially unsafe states are not created.
• The CST system should be designed such that a failure of the primary control computer will
be detected and the CST system returned to a safe state.
• The software should be designed to perform a system level check at power-up to verify that
the system is safe and functioning properly prior to application of power to safety-critical
functions.
• When read-only memories are used, positive measures, such as operational software
instructions, should be taken to ensure that the data is not corrupted or destroyed.
• Periodic checks of memory, instruction, and data buss(es) should be performed.
• Fault detection and isolation programs should be written for safety-critical subsystems of the
computing system.
• Operational checks of testable safety-critical system elements should be made immediately
prior to performance of a related safety-critical operation.
• The software should be designed to prevent unauthorized system or subsystem interaction
from initiating or sustaining a safety-critical sequence.
• The system design should prevent unauthorized or inadvertent access to or modification of
the software and object coding.
• The executive program or operating system should ensure the integrity of data or programs
loaded into memory prior to their execution.
• The executive program or operating system should ensure the integrity of data and program
during operational reconfiguration.
• Safety-critical computing system functions and their interfaces to safety-critical hardware
should be controlled at all times. The interfaces should be monitored to ensure that erroneous
or spurious data does not adversely affect the system, that interface failures are detected, and
that the state of the interface is safe during power-up, power fluctuations & interruptions, in
the event of system errors or hardware failure.
• Safety-critical operator display legends and other interface functions should be clear, concise
and unambiguous and, where possible, be duplicated using separate display devices.
• The software should be capable of detecting improper operator entries or sequences of entries
or operations and prevent execution of safety-critical functions as a result.
• The system should alert the operator to an erroneous entry or operation.
• Alerts should be designed such that routine alerts are readily distinguished from safety-
critical alerts.
• Safety-critical computing system functions should have one and only one possible path
leading to their execution.
• Files used to store safety-critical data should be unique and should have a single purpose.
13 - 28
FAA System Safety Handbook, Chapter 13: Launch Safety
December 30, 2000
• The software should be annotated, designed, and documented for ease of analysis,
maintenance, and testing of future changes to the software. Safety-critical variables should be
identified in such a manner that they can be readily distinguished from non-safety-critical
variables.
Configuration Control
The overall System Configuration Management Plan should provide for the establishment of a Software
Configuration Control Board (SCCB) prior to the establishment of the initial baseline. The SCCB should
review and approve all software changes (modifications and updates) occurring after the initial baseline is
been established.
The software system safety program plan should provide for a thorough configuration management
process that includes version identification, access control, change audits, and the ability to restore
previous revisions of the system.
Modified software or firmware should be clearly identified with the version of the modification, including
configuration control information. Both physical and electronic “fingerprinting” of the version are
encouraged.
Testing
Systematic and thorough testing should provide evidence for critical software assurance. Software test
results should be analyzed to identify potential safety anomalies that may occur. The applicant should use
independent test planning, execution, and review for critical software. Software system testing should
exercise a realistic sample of expected operational inputs. Software testing should include boundary, out-
of-bounds and boundary crossing test conditions. At a minimum, software testing should include
minimum and maximum input data rates in worst case configurations to determine the system capabilities
and responses to these conditions. Software testing should include duration stress testing. The stress test
time should be continued for at least the maximum expected operation time for the system. Testing should
be conducted under simulated operational environments. Software qualification and acceptance testing
should be conducted for safety-critical functions.
References:
AST Licensing And Safety Division Directive No. 001, Licensing Process and Procedures dated March
15, 1996.
FAA Advisory Circular AC 431-01, Reusable Launch Vehicle System Safety Process, dated April 1999
(Draft)
Code of Federal Regulations, Commercial Space Transportation, Department of Transportation Title 14,
Federal Aviation Administration, Chapter III, Part 415 – Launch Licenses, and Part 431 – Launch and
Reentry of a Reusable Launch Vehicle (RLV)
FAA Advisory Circular AC 431-03, Software System Safety (Draft)
System Safety Society, System Safety Handbook, 2nd Edition, dated July 1997
Joint Software System Safety Committee Software System Safety Handbook
Eastern and Western Range Safety Requirements, EWR 127-1.
The Application of System Safety to the Commercial Launch Industry Licensing Process, FAA/ASY
Safety Risk Assessment News Reports No. 97-4 and 97-5
13 - 29
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
14 - 1
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
This section provides guidance to a system safety trainer to successfully conduct a systematic safety
training activity. Specific topics discussed include Training Needs Analysis, Task Analysis, Learning
Objectives, Learning Behaviors, and Delivering Effective Safety Training.
Safety training plays a vital role in a system safety program. The trainer must assess the needs in which
he/she is going to provide training with the following questions in mind (all of which are important):
What is the extent of system safety knowledge of the participants within the organization?
What are the participant’s tasks that involve system safety knowledge?
What are the background, experience, and education of the participants?
What training has been provided in the past?
What is the management’s attitude toward system safety and training?
Is training being provided to management, or system safety working group participants?
Will participants be trained in hazard analysis?
1
Bob Thornburgh, President of Environmental Services, Inc.; Presentation at 15th International Systems Safety Conference,
Wash. D.C., Aug. 1997
14 - 2
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Here are some guidelines for bringing the organization into compliance with safety training requirements:
• Read the pertinent regulations. The regulations are often difficult to comprehend,
and it may be necessary to read them several times. However, whoever has primary
responsibility for safety training should read them rather than rely solely on other
people for interpretation.
• Attend professional development workshops and talk to colleagues. In addition to
reading the regulations, the trainer should attend professional development workshops
and talk with colleagues and regulatory personnel to stay current and to share
implementation strategies.
• Work with management to set training priorities. After analyzing requirements and
safety training needs, management and the training unit must meet to set safety
training priorities and to develop a training calendar.
• Design, deliver, and evaluate systematic instruction. Most regulations state training
requirements in terms of hour requirements and topics. The trainer must translate the
requirements into a systematic plan of instruction, including learning objectives,
instructional strategies, and evaluation methods. This Chapter provides the
fundamentals for designing safety-training programs, but does not cover basic
information on delivering or evaluating safety-training programs.
• Document training. Documentation of training is an essential ingredient of all
training, and is especially crucial for safety training. Inspectors usually review
documentation, and documentation is often used as evidence of good intent on the
industry’s behalf. With easy storage of information available through computers,
many companies are maintaining safety-training records over the life spans of their
personnel. They are also asking employees to verify with a signature that safety
training has been delivered.
Once you have determined the training expectations, put down the training objectives in writing and secure
consensus from the organization. If the expectations are unrealistic, then they should be discussed.
Unrealistic expectations are usually a result of a failure to understand what constitutes effective training. A
common example is a request to train 200 people with a wide variation in knowledge of background
information and need-to-know. Look for creative solutions to this problem, such as several safety-training
14 - 3
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
sessions for different groups, the use of several safety trainers, the use of multiple teaching strategies,
and/or multi-media, etc.
Another example of an unrealistic expectation might be a request to have training at 7:00 a.m. on Saturday
with no additional pay for workers who have just worked a shift from 11:00 p.m. to 7:00 a.m. It is easy to
anticipate a problem in motivating the group. Be sure to set appropriate times and dates.
Another type of unrealistic expectation that is even more serious, results from a request to minimize
dangers, encourages shortcuts, or overlook hazards. Deliberately misinforming trainees could result in
liability for the trainer. Therefore, the trainer should feel comfortable with the philosophy and practices of
the organization. On rare occasions, trainers elect to walk away from training opportunities rather than
compromise their personal training standards. Normally, however, organizations are supportive when the
trainer explains how the training will promote effectiveness, efficiency, and safety.
14 - 4
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
• During the task analysis, the safety trainer often identifies environmental constraints and/or
motivational problems as well as problems with lack of skills and knowledge. If the trainer can
assist management in resolving environmental constraints and/or motivational problems,
barriers to effective training will be reduced.
• The safety trainer determines pre-requisite skills and knowledge needed to perform the task so
that training can begin at the appropriate level.
There are several ways to begin a task analysis, depending upon the safety-training situation:
• The safety trainer can observe the task being performed. This is an excellent method for
analyzing routine tasks. It may not work as well for tasks such as emergency procedures that
are rarely, if ever, performed under normal circumstances.
• The safety trainer can interview one or more workers who perform or supervise the task. Once
a task inventory has been developed, it should always be reviewed and validated by job
incumbents.
• The safety trainer may be able to perform the task, develop a task inventory, and submit it for
review and validation by job incumbents.
• Some tasks have prescribed steps that are outlined by the policies and procedures manual. It is
always important to review this manual so that the training and the written policy and
procedures are properly aligned. However, the safety trainer should be alert to situations where
actual practice varies from written policy.
Verbs or action words used to describe behavior are as specific as possible. Words to avoid include popular
but vague terms such as “know,” “learn,” “comprehend,” “study,” “cover,” and “understand.”
Right: Participants will be able to measure and record the concentration of Volatile Organic
Compound (VOC) in a sample of ground water.
Wrong: Participants will learn about ground water sampling.
14 - 5
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
The desired behavior must be observable and measurable so that the trainer can determine if it has been
learned.
Objectives should be given orally and in writing to the participants, so that they understand the purpose of
the training session.
Target Audience
The target audience (participants or trainees) must be considered because the same topic may be
approached differently based on the background of the groups to be trained. The following examples of
learning objectives describe the audience. In each learning objective, the target audience is highlighted.
When an entire training course is designed for a particular audience, often the audience is described only
once in a blanket statement, such as the following: “This course is designed as a safety orientation for new
personnel.” Once the audience is established, then the audience component does not have to be repeated
each time.
Behavior
The behavior component of the objective is the action component. It is the most crucial component of the
objective in that it pinpoints the way in which trainees will demonstrate they have gained knowledge.
Learning is measured by a change in behavior. How will trainees prove what they have learned? Will they
explain...? Will they calculate...? Will they operate...? Will they repair...? Will they troubleshoot...? The
highlighted verbs in the following examples indicate the behavior required.
The behavior component should be easy to determine based on the task analysis, which was written in
behavioral terms.
Conditions
14 - 6
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
The condition component of the objective describes special conditions (constraints, limitations,
environment, or resources) under which the behavior must be demonstrated. If trainees are expected to
demonstrate how to don a respirator in a room filled with tear gas rather than in a normal classroom
environment, that would constitute a special condition. Please note that the condition component indicates
the condition under which the behavior will be tested, not the condition under which the behavior was
learned. Examples:
Right: Given a list of chemical symbols and their atomic structure, participants in
beginning chemistry course will construct a Periodic Table of Elements. (This
condition is correct; participants will be able to refer to symbols and atomic
structure while they are being tested.)
Right: From memory, participants in an advanced course will construct a Periodic Table
of Elements. (This condition is also correct; it outlines a testing condition.)
Wrong: Given a unit of instruction on the Periodic Table, participants will then
construct a Periodic Table of Elements. (This tells something about how the
knowledge was learned, not a condition under which the knowledge will be tested.)
The condition component does not have to be included if the condition is obvious, such as the one in the
following example:
Given paper and pencil, trainees will list the safety rules regarding
facility areas. (The condition is obvious and does not need to be stated.)
The hazardous waste supervisor will calculate required statistics with an accuracy of plus
or minus 0.001.
Given a facility layout, the employees will circle the location of fire extinguishers with a
minimum of 80% accuracy.
Given a scenario of an emergency situation, employees will respond in less than three
minutes.
14 - 7
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Cognitive behaviors describe observable, measurable ways the trainees demonstrate that they have gained
the knowledge and/or skill necessary to perform a safety task. Most learning objectives describe cognitive
behaviors. Some cognitive behaviors are easy to master; others are much more difficult. In designing safety
and environmental instruction, trainers move from the simple to the complex in order to verify that trainees
have the basic foundation they need before moving on to higher level skills. It is crucial to identify the level
of knowledge required because knowledge-level objectives can be taught in a lecture session, and
comprehension-level objectives can be taught with a guided discussion format. However, most training
sessions are designed for trainees to apply the information and to solve problems. Therefore, participants
need to achieve by doing; they need to be drilled on actual safety case problems.
This does not mean that the basic skills have to be re-taught if the trainer can verify through observations,
pretests, training records, etc., that pre-requisite skills have been mastered. However, many training
sessions have turned into a disaster because the trainer made the assumption that the trainees had mastered
basic skills and began the training at too high a level. In contrast, some training sessions have bored the
participants by being too basic. Therefore, it is important for safety trainers to be able to label learning
objectives and design safety training sessions appropriate to the level of cognitive behavior required to
perform a task. Following are descriptions and examples of types of cognitive behaviors.
Knowledge-level cognitive behaviors are the easiest to teach, learn, and evaluate. They often refer to rote
memorization or identification. Trainees often “parrot” information or memorize lists or name objects.
Common knowledge-level behaviors include action words such as these: identify, name, list, repeat,
recognize, state, match, and define. Examples:
Comprehension-level cognitive behaviors have a higher level of difficulty than knowledge-level cognitive
behaviors, because they require learners to process and interpret information; however, learners are not
required to actually apply/demonstrate the behavior. Commonly used action words at this level include
verbs such as these: explain, discuss, interpret, classify, categorize, cite evidence for, compare, contrast,
illustrate, give examples of, differentiate, and distinguish between. Examples:
Application-level cognitive behaviors move beyond the realm of explaining concepts orally or in writing;
they deal with putting ideas into practice and involve a routine process. Trainees apply the knowledge they
have learned. Some examples of action words commonly used in application-level cognitive behaviors
include the following: demonstrate, calculate, do, operate, implement, compute, construct, measure,
prepare, and produce. Examples:
14 - 8
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Problem-solving cognitive behaviors involve a higher level of cognitive skills than application-level
cognitive behaviors. The easiest way to differentiate between application-level and problem-solving level is
to apply application-level to a routine activity and problem-solving level to non-routine activities which
require analysis (breaking a problem into parts), synthesis (looking at parts of a problem and formulating a
generalization or conclusion), or evaluation (judging the appropriateness, effectiveness, and/or efficiency
of a decision or process and choosing among alternatives). Some examples of action words commonly used
in problem-solving cognitive behaviors include the following: troubleshoot, analyze, create, develop, devise,
evaluate, formulate, generalize, infer, integrate, invent, plan, predict, reorganize, solve, and synthesize.
Examples:
There is no way to prepare a list stating that an action word is always on a certain level. The lists of
example action words included in the discussion above are suggestions and are not all-inclusive. Safety
trainers must use professional judgement to determine the level of cognitive behavior indicated. The same
action word can be used on different levels. Example:
14 - 9
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Psychomotor Behaviors
Learning new behaviors always includes cognitive skills (knowledge, comprehension, application and/or,
problem solving). In addition, the trainer needs to be cognizant of psychomotor skills that may be required
in the application phase of learning. Psychomotor behaviors pertain to the proper and skillful use of body
mechanics and may involve gross and/or fine motor skills. Examples:
Safety training sessions for psychomotor skills should involve as many of the senses as possible. The safety
trainer should adapt the format of training to match the skill level of the learner and the difficulty of the
task. Following is an example of a sound process for teaching psychomotor skills:
Step 1: The safety instructor shows a respirator and explains its function and importance.
(Lecture)
Step 2: The trainees explain the function and importance of the respirator. (Cognitive -
comprehension level)
Step 3: The safety instructor holds up the respirator, names the parts, and explains functions.
(Lecture/demonstration)
Step 4: The trainees hold up respirators, name the parts, and explain the functions. (Cognitive -
knowledge and comprehension levels)
Step 5: The instructor explains and demonstrates how to don a respirator. (Lecture/demonstration)
Step 6: The trainees explain how to don a respirator while the safety instructor follows trainees’
instructions. (Cognitive - comprehension level)
Important Note: Step 6 allows the safety instructor an opportunity to check for
understanding and would be especially useful when one is teaching a task that could be
potentially dangerous to the trainee or others or that involves expensive tools or equipment
that could be damaged.
14 - 10
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Step 7: The trainees don a respirator properly. (Cognitive - application level and psychomotor)
Step 8: Explain and practice; explain and practice; EXPLAIN AND PRACTICE. (Cognitive -
comprehension and application levels and psychomotor)
The key to teaching psychomotor skills is that the more the learner observes the task, explains the task, and
practices the task correctly, the better he/she performs the task.
Affective Behaviors
Affective behaviors pertain to attitudes, feelings, beliefs, values, and emotions. The safety trainer must
recognize that affective behaviors influence how efficiently and effectively learners acquire cognitive and
psychomotor behaviors. Learning can be influenced by positive factors (success, rewards, reinforcement,
perceived value, etc.) and by negative factors (failure, disinterest, punishments, fears, etc.) Examples:
Supervisors resent training time and tell employees they must make up time lost.
Employees develop negative attitude toward training.
OR
Supervisors explain the training could save lives, attend training with employees, and
reinforce training on the job.
Employees are afraid of chemical spills and are anxious to learn how to avoid them.
OR
Employees have been told through the grapevine that the safety and training is boring and
a waste of time. Employees have a negative attitude toward training.
Employees have just received a bonus for 365 accident-free days and have a positive
attitude toward the company and toward safety training.
OR
The company announces 30 minutes before the safety training session begins that there
will be a massive layoff. Training will probably not be a priority for employees today.
Other affective behaviors (attitudes and emotions) that must be considered go beyond positive or negative
motivations toward learning. Examples:
An employee may have the knowledge and skills to repair an air conditioning system, but
fear of heights causes him/her not to be able to repair a unit located on the roof.
14 - 11
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
An employee may know how to don a self-contained breathing apparatus, but panics when
he/she does so.
Training objectives which state affective behaviors are usually much more difficult to observe and measure
than cognitive behaviors. Nevertheless, they are crucial to the ultimate success of the safety-training
program. Following are some examples of affective objectives:
Employees will demonstrate safety awareness by leaving guards on equipment and wearing
safety glasses in designated areas.
Employees will demonstrate awareness of chemical flammability by smoking only in
designated areas.
Employees will state in a survey that they appreciate safety-training sessions.
A critical factor to remember is that while training can stress the importance of affective behaviors, people
are most influenced by the behavioral norms of an organization. Remember: Before attempting to make
changes in an organization, it is first important to identify existing norms and their effects on employees.
Behavioral norms refer to the peer pressure that results from the attitudes and actions of the
employees/management as a group. Behavioral norms are the behaviors a group expects its members to
display. Examples:
Although training may emphasize the importance of wearing a face mask and helmet in a
“clean” room, if most employees ignore the rule, new employees will “learn” to ignore the
rule as well.
Although smoking and non-smoking areas may be clearly labeled in the plant, if new
employees observe supervisors and “old-timers” breaking the rules, they will tend to
perceive the non-smoking rule as not very important, despite what was stated in an
orientation session.
Although a new employee learns to perform a task well in safety training sessions,
he/she will quickly change performance if the supervisor undermines the safety
training and insists there is a better, faster way to do the job.
For safety training to be successful, it must have the support of all levels of management. Safety training
does not occur in a vacuum. The organizational climate and behavioral norms, in fact, are likely to be more
powerful than the behavior taught in safety training sessions, because the group can enforce its norms with
continual rewards, encouragement, and pressure. Supervisors should see themselves as coaches who
continue to reinforce safety training. Otherwise, the safety training is unlikely to have a long-term impact
on the organization.
14 - 12
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
• Despite the cliché that “old dogs can’t learn new tricks,” healthy adults are capable of lifelong
learning. At some point, rote memorization may take more time, but purposeful learning can be
assimilated as fast or faster by an older adult as by high school students.
• Most adults want satisfactory answers to these questions before they begin to learn: “Why is it
important?” and “How can I apply it?”
• Adults are used to functioning in adult roles, which means they are capable of and desirous of
participating in decision making about learning.
• Adults have specific objectives for learning and generally know how they learn best.
Delegation of decisions on setting objectives may help learners, especially managers, gain the
knowledge and skills they really need.
• Adults do not like to be treated “like children” (neither do children) and especially do not
appreciate being reprimanded in front of others.
• Adults like organization and like to know the “big picture.”
• Adults have experienced learning situations before and have positive and/or negative
preconceptions about learning and about their own abilities.
• Adults have had a wealth of unique individual experiences to invest in learning and can
transfer knowledge when new learning is related to old learning.
• Adults recognize good training and bad training when they see it.
There are several guidelines to remember when one is designing adult training sessions:
• Early in the safety training session, explain the purpose and importance of the session.
• Share the framework (organization) of the safety learning session with the participants.
• Demonstrate a fundamental respect for the learners. Ask questions and really listen to
their responses. Never reprimand anyone in front of others, even if it means taking an
unscheduled break to resolve a problem.
• Acknowledge the learners’ experience and expertise when appropriate. Draw out their
ideas, and try not to tell them anything they could tell you. Do not embarrass them
when they make mistakes.
• Allow choices when possible within a structured framework. Example: “For this
exercise, would you rather work in pairs or individually?”
14 - 13
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Reading manuals/books
Watching audio-visual presentations
Hearing a lecture
Observing demonstrations
Participating in discussions
Role-playing
Performing an experiment
Taking a field trip
Hands-on learning
Responding to a scenario
Making a presentation
14 - 14
FAA System Safety Handbook, Chapter 14: System Safety Training
December 30, 2000
Some learners prefer to learn by themselves; others prefer to work in-groups. Some people need a lot of
organization and learn small steps sequentially; others assimilate whole concepts with a flash of insight or
intuition.
Some people are very visual and learn best through drawings, pictorial transparencies, slides,
demonstrations, etc.; others learn best through words and enjoy reading transparencies and slides with
words, and lectures.
Increased retention results from what we know of split hemisphere learning. Just as different sides of the
brain control opposite sides of the body, so does the brain absorb and record different types of information:
It is the combination of the effects of both sides that allows us to think and react to information.
Although various tests have been developed to try to identify how people learn best, they are not practical
for most safety training sessions. Rather, the trainer needs to be aware that differences in learning styles
exist and try to combine as many types of activities and media as possible so that learners can have access
to the way they learn best and also learn to adapt to other learning styles as well. That means that a safety
training session might include a handout for readers, a lecture for listeners, and an experiment for doers,
depending on the objective.
The key to accommodating learning styles is that instructional strategies and media be selected as a means
to help the learner and not as a convenience for the instructor. For example, a new employee orientation
pamphlet and videotape should be selected if they prove to be an excellent instructional strategy for
teaching new employees; they should not be selected just because they are a convenient means of
orientation. Also, the safety trainer should constantly look for alternate strategies and media so that if one
strategy or type of media is ineffective, the safety trainer has multiple strategies from which to select.
FAA Academy
FAA Office of System Safety, System Safety Engineering and Analysis Division
14 - 15
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Risk management, as discussed throughout this handbook is pre-emptive, rather than reactive.
The approach is based on the philosophy that it is irresponsible and wasteful to wait for an
accident to happen, then figuring out how to prevent it from happening again. We manage risk
whenever we modify the way we do something to make our chances of success as great as
possible, while making our chances of failure, injury or loss as small as possible. It’s a common-
sense approach to balancing the risks against the benefits to be gained in a situation and then
choosing the most effective course of action.
Often, the approach to risk management is highly dependent on individual methods and
experience levels and is usually highly reactive. It is natural to focus on those hazards that have
caused problems in the past. In the FAA's operational environment where there is a continual
chance of something going wrong, it helps to have a well-defined process for looking at tasks to
prevent problems. Operational Risk Management, or ORM, is a decision-making tool that helps
to systematically identify risks and benefits and determine the best courses of action for any given
situation. ORM is designed to minimize risks in order to reduce mishaps, preserve assets, and
safeguard the health and welfare.
Risk is defined as the probability and severity of accident or loss from exposure to various
hazards, including injury to people and loss of resources. All FAA operations in the United
States, and indeed even our personal daily activities involve risk, and require decisions that
include risk assessment and risk management. Operational Risk Management (ORM) is simply a
formalized way of thinking about these things. ORM is a simple six-step process, which
identifies operational hazards and takes reasonable measures to reduce risk to personnel,
equipment and the mission.
In FAA operations, decisions need to take into account the significance of the operation, the
timeliness of the decision required, and what level of management is empowered to make the
decision. Risk should be identified and managed using the same disciplined process that governs
other aspects of the Agency’s endeavors, with the aim of reducing risk to personnel and resources
to the lowest practical level.
Risk management must be a fully integrated part of planning and executing any operation,
routinely applied by management, not a way of reacting when some unforeseen problem occurs.
Careful determination of risks, along with analysis and control of the hazards they create results
in a plan of action that anticipates difficulties that might arise under varying conditions, and pre-
15 - 2
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
determines ways of dealing with these difficulties. Managers are responsible for the routine use of
risk management at every level of activity, starting with the planning of that activity and
continuing through its completion.
Figure 15-1 illustrates the objectives of the ORM process: protecting people, equipment and other
resources, while making the most effective use of them. Preventing accidents, and in turn
reducing losses, is an important aspect of meeting this objective. In turn, by minimizing the risk
of injury and loss, we ultimately reduce costs and stay on schedule. Thus, the fundamental goal of
risk management is to enhance the effectiveness of people and equipment by determining how
they are most efficiently to be used.
Maximize
Operational
Capability
15 - 3
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
15 - 4
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
6. Supervise 1. Identify
and Review the Hazards
4. Make 3. Analyze
Control Risk Control
Decisions Measures
15 - 5
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Unidentified risk: That risk that has not yet been identified. Some risk is not identifiable or
measurable, but is no less important for that. Mishap investigations may reveal some previously
unidentified risks.
Total risk: The sum of identified and unidentified risk. Ideally, identified risk will comprise the
larger proportion of the two.
Acceptable risk: The part of identified risk that is allowed to persist after controls are applied.
Risk can be determined acceptable when further efforts to reduce it would cause degradation of
the probability of success of the operation, or when a point of diminishing returns has been
reached.
Unacceptable risk: That portion of identified risk that cannot be tolerated, but must be either
eliminated or controlled.
Residual risk: The portion of total risk that remains after management efforts have been
employed. Residual risk comprises acceptable risk and unidentified risk.
Figure 15-3: Types of Risk
Unacceptable/Eliminate
Acceptable
Residual
Unidentified
Unacceptable/Control
15 - 8
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
15.8.1 Managers
• Are responsible for effective management of risk.
• Select from risk reduction options recommended by staff.
• Accept or reject risk based upon the benefit to be derived.
• Train and motivate personnel to use risk management techniques.
• Elevate decisions to a higher level when it is appropriate.
15.8.2 Staff
• Assess risks and develop risk reduction alternatives.
• Integrate risk controls into plans and orders.
• Identify unnecessary risk controls.
15.8.3 Supervisors
• Apply the risk management process
• Consistently apply effective risk management concepts and methods to operations
and tasks.
• Elevate risk issues beyond their control or authority to superiors for resolution.
15.8.4 Individuals
• Understand, accept and implement risk management processes.
• Maintain a constant awareness of the changing risks associated with the operation or
task.
• Make supervisors immediately aware of any unrealistic risk reduction measures or
high-risk procedures.
The 5-M model, depicted in Figure 15-4, is adapted from military ORM. In this model, “Man” is
used to indicate the human participation in the activity, irrespective of the gender of the human
involved. “Mission” is the military term that corresponds to what we in civil aviation call
“operation.” This model provides a framework for analyzing systems and determining the
relationships between the elements that work together to perform the task.
The 5-M's are Man, Machine, Media, Management, and Mission. Man, Machine, and Media
interact to produce a successful Mission (or, sometimes, an unsuccessful one). The amount of
overlap or interaction between the individual components is a characteristic of each system and
evolves as the system develops. Management provides the procedures and rules governing the
interactions between the other elements.
When an operation is unsuccessful or an accident occurs, the system must be analyzed; the inputs
and interaction among the 5-Ms must be thoroughly reassessed. Management is often the
controlling factor in operational success or failure. The National Safety Council cites the
management processes in as many as 80 percent of reported accidents.
15.9.1 Man
The human factor is the area of greatest variability, and thus the source of the majority of risks.
Selection: The right person psychologically and physically, trained in event proficiency,
procedures and habit patterns.
Performance: Awareness, perceptions, task saturation, distraction, channeled attention, stress,
peer pressure, confidence, insight, adaptive skills, pressure/workload, fatigue (physical,
motivational, sleep deprivation, circadian rhythm).
Personal Factors: Expectancies, job satisfaction, values, families/friends, command/control,
perceived pressure (over tasking) and communication skills.
15.9.2 Media
Media are defined as external, and largely environmental and operational conditions. For
example:
15.9.3 Machine
Hardware and software used as intended, limitations interface with man.
15.9.4 Management
Directs the process by defining standards, procedures, and controls. Although management
provides procedures and rules to govern interactions, it cannot completely control the system
elements. For example: weather is not under management control and individual decisions affect
personnel far more than management policies.
15 - 11
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
15.10.1 Time-Critical
Time-critical risk management is an "on the run" mental or verbal review of the situation using
the basic risk management process without necessarily recording the information. This time-
critical process of risk management is employed by personnel to consider risk while making
decisions in a time-compressed situation. This level of risk management is used during the
execution phase of training or operations as well as in planning and execution during crisis
responses. It is also the most easily applied level of risk management in off-duty situations. It is
particularly helpful for choosing the appropriate course of action when an unplanned event occurs
during execution of a planned operation or daily routine.
15.10.2 Deliberate
Deliberate Risk Management is the application of the complete process. It primarily uses
experience and brainstorming to identify risks, hazards and develops controls and is therefore
most effective when done in a group. Examples of deliberate applications include the planning of
upcoming operations, review of standard operating, maintenance, or training procedures, and
damage control or disaster response planning.
15.10.3 Strategic
This is the deliberate process with more thorough hazard identification and risk assessment
involving research of available data, use of diagram and analysis tools, formal testing, or long
term tracking of the risks associated with the system or operation (normally with assistance from
technical experts). It is used to study the hazards and their associated risks in a complex operation
or system, or one in which the hazards are not well understood. Examples of strategic
applications include the long-term planning of complex operations, introduction of new
equipment, materials and operational, development of tactics and training curricula, high risk
facility construction, and major system overhaul or repair. Strategic risk management should be
used on high priority or high visibility risks.
15 - 12
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
15 - 13
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Logic Diagram
Change Analysis Tool
Opportunity Assessment
Training Realism Assessment.
Strategic Tools
If time and resources permit, and additional hazard information is required, use strategic hazard
analysis tools. These are normally used for medium and long term planning, complex operations,
or operations in which the hazards are not well understood.
The first step of in-depth analysis should be to examine existing databases or available historical
and hazard information regarding the operation. Suggested tools are:
Accident analysis
Cause and effect diagrams
The following tools are particularly useful for complex, coordinated operations in which multiple
units, participants, and system components and simultaneous events are involved:
15 - 14
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
The following tools are particularly useful for analyzing the hazards associated with physical
position and movement of assets:
Mapping tool.
Energy trace and barrier analysis.
Interface analysis.
Accident/Incident Reports: These can come from within the organization, for it represents
memory applicable to the local workplace, cockpit, flight, etc. Other sources might be NTSB
reports, medical reports, maintenance records, and fire and police reports.
Operational Personnel: Relevant experience is arguably the best source of hazard identification.
Reinventing the wheel each time an operation is proposed is neither desired nor efficient. Seek
out those with whom you work who have participated in similar operations and solicit their input.
Outside Experts: Look to those outside your organization for expert opinions or advice.
Current Guidance: A wealth of relevant direction can always be found in the guidance that
governs our operations. Consider regulations, operating instructions, checklists, briefing guides,
SOPs, NOTAMs, and policy letters.
Surveys: The survey can be a powerful tool because it pinpoints people in the operation with first
hand knowledge. Often, first line supervisors in the same facility do not have as good an
understanding of risk as those who confront it every day.
15 - 15
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Inspections: Inspections can consist of spot checks, walk-through, checklist inspections, site
surveys, and mandatory inspections. Utilize staff personnel to provide input beyond the standard
third-party inspection.
15 - 16
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
The "standard order of precedence" indicates that the ideal action is to “plan or design for
minimum risk” with less desirable options being, in order, to add safety devices, add warning
devices, or change procedures and training. This order of preference makes perfect sense while
the system is still being designed, but once the system is fielded this approach is frequently not
cost effective. Redesigning to eliminate a risk or add safety or warning devices is both expensive
and time consuming and, until the retrofit is completes, the risk remains unabated.
Normally, revising operational or support procedures may be the lowest cost alternative. While
this does not eliminate the risk, it may significantly reduce the likelihood of an accident or the
severity of the outcome (risk) and the change can usually be implemented quickly. Even when a
redesign is planned, interim changes in procedures or maintenance requirements are usually
required. In general, these changes may be as simple as improving training, posting warnings, or
improving operator or technician qualifications. Other options include preferred parts substitutes,
instituting or changing time change requirements, or increased inspections.
The feasible alternatives must be evaluated, balancing their costs and expected benefits in terms
of operational performance, dollars and continued risk exposure during implementation. A
completed risk assessment should clearly define these tradeoffs for the decision-maker.
Some Special Considerations in Risk Control. The following factors should be considered
when applying the third step of ORM.
Try to apply risk controls only in those activities and to those who are actually at risk. Too often
risk controls are applied indiscriminately across an organization leading to wasted resources and
unnecessary irritation of busy operational personnel.
Apply redundant risk controls when practical and cost effective. If the first line of defense fails,
the back up risk control(s) may prevent loss.
Involve operational personnel, especially those likely to be directly impacted by a risk control, in
the selection and development of risk controls whenever possible. This involvement will result in
better risk controls and in general a more positive risk control process.
Benchmark (find best practices in other organizations) as extensively as possible to reduce the
cost associated with the development of risk controls. Why expend the time and resources
necessary to develop a risk control and then have to test it in application when you may be able to
find an already complete, validated approach in another organization?
Establish a timeline to guide the integration of the risk control into operational processes.
15 - 17
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Develop the risk control within the organization’s culture. Every organization has a style or a
culture. While the culture changes over time due to the impact of managers and other
modifications, the personnel in the organization know the culture at any given time. It is
important to develop risk controls, which are consistent with this culture. For example, a rigid,
centrally directed risk control would be incompatible with an organizational culture that
emphasizes decentralized flexibility. Conversely, a decentralized risk control may not be effective
in an organization accustomed to top down direction and control. If you have any doubts about
the compatibility of a risk control within your organization, ask some personnel in the
organization what they think. People are the culture and their reactions will tell you what you
need to know.
Develop the best possible supporting tools and guides (infrastructure) to aid operating personnel
in implementing the risk control. Examples include standard operating procedures (SOPs), model
applications, job aids, checklists, training materials, decision guides, help lines, and similar items.
The more support that is provided, the easier the task for the affected personnel. The easier the
task, the greater the chances for success.
Develop a time line for implementing the risk control. Identify major milestones, being careful to
allow reasonable timeframes and assuring that plans are compatible with the realities of
organizational resource constraints.
15 - 19
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Manager and supervisor’s influence behind a risk control can greatly increase its chances of
success. It is usually a good idea to signal clearly to an organization that there is interest in a risk
control if the manager in fact has some interest. Figure 15-8 illustrates actions in order of priority
that can be taken to signal leader support. Most managers are interested in risk control and are
willing to do anything reasonable to support the process. Take the time as you develop a risk
control to visualize a role for organization leaders.
Action 1—Supervise
Monitor the operation to ensure:
15 - 20
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Any time the personnel, equipment, or tasking change or new operations are anticipated in an
environment not covered in the initial risk management analysis, the risks and control measures
should be reevaluated. The best tool for accomplishing this is change analysis.
Successful performance is achieved by shifting the cost versus benefit balance more in favor of
benefit through controlling risks. By using ORM whenever anything changes, we consistently
control risks, those known before an operation and those that develop during an operation. Being
proactive and addressing the risks before they get in the way of operation accomplishment saves
resources, enhances operational performance, and prevents the accident chain from ever forming.
Action 2—Review
The process review must be systematic. After assets are expended to control risks, then a cost
benefit review must be accomplished to see if risk and cost are in balance. Any changes in the
system (the 5-M model, and the flow charts from the earlier steps provide convenient benchmarks
to compare the present system to the original) are recognized and appropriate risk management
controls are applied.
To accomplish an effective review, supervisors need to identify whether the actual cost is in line
with expectations. Also the supervisor will need to see what effect the control measure has had on
operational performance. It will be difficult to evaluate the control measure by itself so focus on
the aspect of operational performance the control measure was designed to improve.
A review by itself is not enough, a feedback system must be established to ensure that the
corrective or preventative action taken was effective and that any newly discovered hazards
identified during the operation are analyzed and corrective action taken. When a decision is made
to assume risk, the factors (cost versus benefit information) involved in this decision should be
recorded. When an accident or negative consequences occur, proper documentation allows for the
review of the risk decision process to see where errors might have occurred or if changes in the
procedures and tools lead to the consequences. Secondly, it is unlikely that every risk analysis
will be perfect the first time. When risk analyses contain errors of omission or commission, it is
important that those errors be identified and corrected. Without this feedback loop, we lack the
benefit of knowing if the previous forecasts were accurate, contained minor errors, or were
completely incorrect.
Measurements are necessary to ensure accurate evaluations of how effectively controls eliminated
hazards or reduced risks. After action reports, surveys, and in progress reviews provide great
starting places for measurements. To be meaningful, measurements must quantitatively or
qualitatively identify reductions of risk, improvements in operational success, or enhancement of
capabilities.
15 - 21
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Action 3—Feedback
A review by itself is not enough: a feedback system must be established to ensure that the
corrective or preventative action taken was effective and that any newly discovered hazards
identified during the operation are analyzed and corrective action taken. Feedback informs all
involved as to how the implementation process is working, and whether or not the controls were
effective. Whenever a control process is changed without providing the reasons, co-ownership at
the lower levels is lost. The overall effectiveness of these implemented controls must also be
shared with other organizations that might have similar risks to ensure the greatest possible
number of people benefit. Feedback can be in the form of briefings, lessons learned, cross-tell
reports, benchmarking, database reports, etc. Without this feedback loop, we lack the benefit of
knowing if the previous forecasts were accurate, contained minor errors, or were completely
incorrect.
Direct Measures of Behavior. When the target of a risk control is behavior, it is possible to
actually sample behavior changes in the target group. Making a number of observations of the use
of restraints before initiating the seat belt program and a similar sample after, for example, can
assess the results of an effort to get personnel to wear seat belts. The change, if any, is a direct
measure of the effectiveness of the risk control. The sample would establish the percent of
personnel using belts as a percentage of total observations. Subsequent samples would indicate
our success in sustaining the impact of the risk control.
Direct Measures of Conditions. It is possible to assess the changes in physical conditions in the
workplace. For example, the amount of foreign objects found on the flight line can be assessed
before and after a risk control initiative aimed at reducing foreign object damage.
15 - 22
FAA System Safety Handbook, Chapter 15: Operational Risk Management
December 30, 2000
Measures of Attitudes. Surveys can also assess the attitudes of personnel toward risk-related
issues. While constructing survey questions is technical and must be done right, the FAA often
conducts surveys and it may be possible to integrate questions in these surveys, taking advantage
of the experts who manage these survey processes. Nevertheless, even informal surveys taken
verbally in very small organizations will quickly indicate the views of personnel.
Measures of Knowledge. Some risk controls are designed to increase knowledge of some hazard
or of hazard control procedures. A short quiz, perhaps administered during a safety meeting
before and after a training risk control is initiated.
Safety and Other Loss Control Reviews Procedures. Programmatic and procedural risk control
initiatives (such as revisions to standard operating procedures) can be assessed through various
kinds of reviews. The typical review involves a standard set of questions or statements reflecting
desirable standards of performance against which actual operating situations are compared.
15.12 Conclusion
Operational risk management provides a logical and systematic means of identifying and
controlling risk. Operational risk management is not a complex process, but does require
individuals to support and implement the basic principles on a continuing basis. Operational risk
management offers individuals and organizations a powerful tool for increasing effectiveness and
reducing accidents. The ORM process is accessible to and usable by everyone in every
conceivable setting or scenario. It ensures that all FAA personnel will have a voice in the critical
decisions that determine success or failure in all our operations and activities. Properly
implemented, ORM will always enhance performance.
15 - 23
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
Many years ago Heinrich conducted a statistical study of accidents and determined that out of 300
incidents, one fatal accident may occur. This provided a general analogy of a ratio of 1 to 300. Years later,
Frank Byrd conducted a similar study and noted that out of 600 incidents, one fatal accident occurred,
indicating a ratio of 1 to 600. Figure 16-1 illustrates the concept that for every accident or incident that is
reported, there may be a much larger number that are not reported.
It is important to identify incidents that could have resulted in accidents. An incident is any occurrence that
could have resulted in an accident, i.e., fatal harm. But since the harm did not occur, it is considered an
incident. The point is that all incidents that could have resulted in an accident should be reported to
determine the relevant factors associated with that incident.
Heinrich Pyramid
ACCIDENTS
INCIDENTS
UNREPORTED
OCCURRENCES
Figure 16-1
privately owned and operated international information infrastructure that would use a broad variety of
worldwide aviation data sources together with comprehensive analytical techniques to assist in identifying
emerging safety concerns.
As the aviation community exchanged ideas on the GAIN concept over the first 2 ½ years after its
announcement, a variety of descriptions were applied to GAIN by various segments of the aviation
community. The GAIN Steering Committee considered various comments and recommendations on GAIN
and agreed upon the following description of GAIN in January 1999:
“GAIN promotes and facilitates the voluntary collection and sharing of safety information by and among
users in the international aviation community to improve safety.”
The Steering Committee also changed the meaning of the GAIN acronym to “ Global Aviation Information
Network” to better define the program.
The GAIN organization consists of the Steering Committee, Working Groups, Program Office, and a
planned Government Support Team.
The Steering Committee consists of industry stakeholders (airlines, manufacturers, employee groups and
their trade associations) that set high-level GAIN policy, issue charters to direct the Working Groups, and
guide the Program Office. Represented on the GAIN Steering Committee are Airbus Industrie, Air France,
Air Line Pilots Association (ALPA), Air Transport Association (ATA), Boeing Commercial Airplane
Group, British Airways, Continental Airlines, Flight Safety Foundation, International Association of
Machinists (IAM), Japan Airlines, National Air Traffic Controller Association (NATCA), National
Business Aviation Association (NBAA), Northwest Airlines, and the U.S. military. The Steering
Committee meets on a quarterly basis.
The Executive Committee is comprised of several Steering Committee members and acts on behalf of the
whole Steering Committee on administrative matters or as directed.
The Working Groups are interdisciplinary industry/government teams that work GAIN issues in a largely
autonomous fashion, within the charters established for them by the Steering Committee. Working Groups
are listed below in paragraph 16.1.2.
The Program Office administers GAIN and supports the Steering Committee, Working Groups, and the
Government Support Team by communicating with GAIN participants, planning meetings and conferences,
preparing meeting minutes, and other tasks.
A Government Support Team (GST) is planned, which will include representatives of government
regulatory authorities from various countries plus related international groups. The GST will provide
assistance to airlines and air traffic organizations in developing or improving safety reporting systems and
sharing safety information.
16-2
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
WG A: Aviation Operator Safety Practices - This group will develop products to help operators obtain
information on starting, improving, or expanding their internal aviation safety programs. The products
should include commonly accepted standards and best operating practices, methods, procedures, tools and
guidelines for use by safety managers. The group will identify currently available materials that support
the development of these products. These materials could include sample safety reporting forms, computer
programs for tracking safety reports, suggested procedures, manuals, and other information to help
operators start or improve programs without "reinventing the wheel." The working group will then develop
products that safety officers can use to implement programs to collect, analyze, and share aviation safety
information.
WG B: Analytical Methods and Tools - The group will: (a) identify and increase awareness of existing
analytical methods and tools; (b) solicit requirements for additional analytical methods and tools from the
aviation community; and (c) promote the use of existing methods and tools as well as the development of
new ones. The group will endeavor to address various types of safety data and information (including
voluntary reports and digitally derived aircraft and ATC system safety performance data). They will also
benchmark or validate to the extent possible the usefulness and usability of the tools and level of
proficiency needed as a guide for potential users, identify data needs where required for use of tools, and
transfer knowledge about methods and tools to users.
WG C: Global Information Sharing Prototypes - This group will develop prototypes to begin global
sharing of aviation safety information. These prototypes could include (a) a sharing system capability for
automated sharing of safety incident/event reports derived from existing and new safety reporting systems
to enhance current sharing activities among airline safety managers; (b) a sharing library containing safety
16-3
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
information "published" by airlines and other aviation organizations; (c) an aviation safety Internet site to
encourage use of existing "public" information/data sources.
WG D: Reducing Impediments (Organizational, Regulatory, Civil Litigation, Criminal Sanction,
and Risk of Public Disclosure) - This working group will identify and evaluate barriers that prevent the
collection and sharing of aviation safety information among various organizations and propose solutions
that are reasonable and effective. They will pursue changes in ICAO Annexes to appropriately protect
information from accident/incident prevention programs. They will propose means to obtain legislation to
protect reporters and providers of safety information. They will promote “jeopardy-free” reporting
procedures and create methods to obtain organizational commitment to sharing safety information.
"This rule is intended to encourage the voluntary implementation of FOQA by providing assurance that
information obtained from such programs cannot be used by the FAA for punitive enforcement purposes,"
FOQA is the voluntary collection, analysis, and sharing of routine flight operation data, obtained by
analysis of flight data recorder information. The FOQA program is one of several where the FAA is
working in partnership with industry and labor to enhance aviation safety.
The FAA also has a new program where the FAA is working in partnership with industry to use improved
methods and technology to detect potential defects in aircraft engines
16-4
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
go-around #3
One of the models under development is the Aircraft Performance Risk Assessment Model (ASPRAM). It
has the objective of using empirical data and expert judgment to quantify the risk of incidents and
accidents. The general approach is to develop an automated means of analyzing commercial aircraft flight
16-5
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
recorder data from non-accident precursors and their causes. Expert opinion is incorporated into the
automated model through the use of knowledge-based rules, which are used to identify precursor events
and assess the risk of incidents and accidents.
The GAIN "Aviation Operator's Safety Practices" Working Group has developed the “Operator’s Flight
Safety Handbook” (OFSH). Specifically, the international aviation safety community, in coordination with
industry and government, worked together to modify the Airbus "Flight Safety Manager's Handbook" to a
generic, worldwide product. It is intended to serve as a guide for the creation and operation of a flight
safety function within an operator’s organization. The operator is encouraged to tailor the document as
necessary to be compatible with the philosophy, practices, and procedures of the organization.1
Section 1 of the OFSH2 lists the important elements of an effective safety program:
Section 2 of the OFSH discusses Organization and Administration. “A safety programme is essentially a
coordinated set of procedures for effectively managing the safety of an operation.” 3 Management should:
specify the company’s standards, ensure the everyone knows the standard and accepts them, make sure
there is a system in place so that deviations from the standard are recognized, reported, and corrected.
The Company’s Policy Manual should contain a signed statement the Chief Executive Officer which
specifies the safety culture and commitment in order to give credence and validation.
1
GAIN Working Group A, “Aviation Operator’s Safety Handbook”, 3rd Draft Review, March 13-14, 2000.
2
IBID, GAIN Working Group A.
3
IBID, GAIN Working Group A
16-6
FAA System Safety Handbook, Chapter 16: Operational Safety in Aviation
December 30, 2000
− Safety Objectives
− Flight Safety Committee
− Hazard Reporting
− Immunity-based Reporting
− Compliance and Verification
− Safety Trends Analysis
− FOQA Collection/Analysis
− Dissemination of Flight Safety Information
− Liaison with other Departments
Section 4 is a review of Human Factors issues in aviation. The key points touched on in this section
include:
− Human Error
− Ergonomics
− The SHEL Model
− Aim of Human Factors in Aviation
− Safety & Efficiency
− Personality vs. Attitude
− Crew Resource Management
Section 5 discusses the concepts of Incident/Accident Investigation and Reports. Specific definitions of
concepts associated with incident/accident investigation is presented. Accident investigation and reporting
is also addressed.
Section 6 discusses Emergency Response and Crisis Management. A detailed checklist is provided which
provides requirements for a Crisis Management Center.
Section 7 of the AOS handbook discusses Risk Management. The true cost of risk is highlighted as well as
risk profiles, decision making and cost/benefit considerations.
Section 8 provides information on external program interfaces, safety practices of contractors, sub-
contractors, and other third parties.
The appendices provide additional detailed information, including sample report forms, references,
organization and manufacturer information, reviews of analytical methods and tools, sample safety surveys
and audits, an overview of the risk management process, and corporate accident response team guidelines.
16-7
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
December 30, 2000
17.0 Human Factors Engineering and System Safety: Principles and Practices
This chapter will serve as an outline for the integration of human factors into activities where safety is a
major consideration. The introductory section contains an overview of the FAA human factors process and
principles. The remaining sections represent key human factors functions and guidelines that must be
accomplished to produce a successful human factors program. The sections offer ways that have proven
successful during previously conducted programs to accomplish the integration of human factors into
acquisition programs.
The critical impact of human factors on safety is well documented in programs, studies, analyses, and
accident and incident investigations. FAA Order 9550.8, Human Factors Policy directs that:
Human factors shall be systematically integrated into the planning and execution of the
functions of all FAA elements and activities associated with system acquisitions and
system operations. FAA endeavors shall emphasize human factors considerations to
enhance system performance and capitalize upon the relative strengths of people and
machines. These considerations shall be integrated at the earliest phases of FAA
projects.
Objectives of the human factors approach should be to: a) Conduct the planning, reviewing,
prioritization, coordination, generation, and updating of valid and timely human factors information to
support agency needs; b) Develop and institutionalize formal procedures that systematically
incorporate human factors considerations into agency activities; and, c) Establish and maintain the
organizational infrastructure that provides the necessary human factors expertise to agency programs.
This chapter will help in that endeavor. Additional information on human factors support and
requirements can be obtained from the AUA and AND Human Factors Coordinators or the Office of
the Chief Scientific and Technical Advisor for Human Factors, AAR-100, (202) 267-7125.
When human factors is applied early in the acquisition process, it enhances the probability of increased
performance, safety, and productivity; decreased lifecycle staffing and training costs; and becomes
well-integrated into the program’s strategy, planning, cost and schedule baselines, and technical trade-
offs. Changes in operational, maintenance or design concepts during the later phases of a project are
expensive and entail high-risk program adjustments. Identifying lifecycle costs and human
performance components of system operation and maintenance during requirements definition
decreases program risks and long-term operations costs.
17-1
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
Hardware and software design affects both the accuracy of operator task performance and the amount
of time required for each task. Applying human factors principles to the “total system” design will
increase performance accuracy, decrease performance time, and enhance safety. Research has shown
that designing the system to improve human performance is the most cost-effective and safe solution…
especially if it is done early in the acquisition process.
17 - 2
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
• Human-system performance characteristics and their associated cost, benefits, and risks
assist in deciding among alternatives (especially since lifecycle operation and support costs
are often largely dependent upon personnel-related costs)
• Human-system performance and safety risks are appropriately addressed in program
baselines
Early in the acquisition program, the investment analysis must identify for each alternative the full
range of human factors and interfaces (e.g., cognitive, organizational, physical, functional,
environmental) necessary to achieve an acceptable level of performance for operating, maintaining, and
supporting the system in concert with meeting the system’s functional requirements. The analysis
should provide information on what is known and unknown about the human-system performance risks
in meeting minimum system performance requirements. Potential human factors/safety issues are listed
at Table 17-1.
17 - 3
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
17 - 4
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
determinations, alternative analyses, lifecycle cost estimates, cost-benefit analyses, risk assessments,
supportability assessments, and operational suitability assessments. The HFC helps identify system
specific and aggregate technical human factors engineering problems and issues that might otherwise
go undetected for their obscurity, complexity, or elaborate inter-relationships. The human performance
considerations are developed for staffing levels, operator and maintainer skills, training strategies,
human-computer interface, human engineering design features, safety and health issues, and workload
and operational performance considerations in procedures and other human-system interfaces. The
HFC facilitates the establishment of the necessary tools, techniques, methods, databases, metrics,
measures, criteria, and lessons learned to conduct human factors analyses in investment analysis
activities. The HFC provides technical quality control of human factors products, participates in
special working groups, assists in team reviews, helps prepare documentation, and collaborates on
technical exchanges among government and contractor personnel.
Human factors considerations relevant to meeting system performance and functional requirements
(and having safety implications) include:
• Human performance (e.g., human capabilities and limitations, workload, function allocation,
hardware and software design, decision aids, environmental constraints, and team versus individual
performance)
• Training (e.g., length of training, training effectiveness, retraining, training devices and facilities,
and embedded training)
• Staffing (e.g., staffing levels, team composition, and organizational structure)
• Personnel selection (e.g., minimum skill levels, special skills, and experience levels)
• Safety and health aspects (e.g., hazardous materials or conditions, system or equipment design,
operational or procedural constraints, biomedical influences, protective equipment, and required
warnings and alarms).
The HFC provides input to the acquisition program baseline by conducting the following activities:
• Determine the human factors cost, benefit, schedule, and performance baselines for each candidate
solution
• Identify the human factors and human performance measures and thresholds to be achieved (e.g.,
for the equipment, software, environment, support concepts, and configurations expected for the
solution)
• Determine the human factors activities to be undertaken during the program, the schedule for
conducting them, their relative priority, and the expected costs to be incurred
• Calculate or estimate the relative or absolute benefits of the human factors component of each
solution in terms of decision criteria (e.g., cost, schedule, human-system performance)
17 - 5
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
Establishing a Human Factors Program for a given program or project requires focusing on the tasks
the humans (operators, maintainers, and support personnel) will perform on the system, and the
program activities that must be undertaken to allow early identification and resolution of human
performance issues. Figure 17-1 illustrates the steps to be taken in developing the Human Factors
Program.
STEP 5
ID Human
Factors Issues
Because each project or program is unique in its pace, cost, size, complexity, and human interfaces, the
Human Factors Program should be tailored to meet program demands. As the system progresses
through the lifecycle phases of the acquisition process, changes will occur. The Human Factors
17 - 6
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
Program must be structured and maintained to change iteratively with the project. To aid in the
management of the Human Factors Program, a Human Factors Working Group may be established.
There is a strong link between the program documentation and the planning, management, and
execution of the program. The documentation that supports a program defines the performance
requirements and capabilities the program is to meet, the approach to be taken, and the specific tasks
and activities that must be performed during design, development, and implementation of the program.
Similarly, the human factors inputs to the program documentation accomplish the same result
regarding the Human Factors Program. Human factors inputs define human performance requirements
and criteria, identify human performance and resource trade-offs, specify human performance
thresholds, establish an approach to ensure human performance supports project performance, and
define the specific tasks and activities to be conducted.
Without such input, the capabilities and limitations of the designated operators and maintainers will not
adequately influence the design, and may result in lower levels of operational suitability, effectiveness,
and safety.
By identifying and defining human resource and human performance considerations, inputs are
provided to the development of project concepts for functional allocation, hardware and software,
operations and training, and organizational structure. Through the process of assessing these concepts
and the related human resource and human performance trade-offs of various alternatives, the project
concepts (e.g., for requirements, design, and implementation) iteratively evolve. This process applies
equally to various kinds of projects and program (including developmental, NDI, or COTS acquisitions).
The purpose of this process is to place these essential ingredients into the project specifications so that human
performance capabilities and limitations will be incorporated in the project in a binding manner.
17 - 7
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
results in a safe, efficient, usable system for the lowest possible expenditure of resources, the human
performance constraints and requirements need to be placed into the system specification.
A good SOW starts with an understanding of what the sponsor wants the contractor to do. The
starting point for determining human factors requirements for inclusion in the SOW is a review of
human factors requirements in the early project documentation (such as requirements documents,
program baselines, and program plans) to identify human factors issues that must be resolved, and
tasks and analyses that must be conducted by the contractor to ensure that human performance goals
are met.
Essential human factors elements that must be addressed by the requirements in the SOW include:
• Limits to the skill level and characteristics of operator, maintainer, and support personnel
• Maximum acceptable training burden
• Minimum acceptable performance of critical tasks
• Acceptable staffing limits
• System safety and health hazards
The contractor’s response to these requirements will result in a comprehensive human factors program
for the system that defines the management and technical aspects of the effort. The response should
also address the scheduling of key events and their timing in relation to other system engineering
activities. The contractor’s program must demonstrate how it effectively integrates human factors with
their design and development process.
The scope and level of effort to be applied to the various human factors tasks and activities must be
tailored to suit the type of system being acquired and the phase of development. The SOW should
describe the specific task or activity required and the associated data deliverable. Human factors
reviews and demonstrations should be planned and conducted to coordinate and verify that
requirements are being met. The contractor should convincingly indicate how human performance data
would influence system lifecycle design and support.
17 - 8
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
items that are pertinent to the project being acquired, and what is necessary to allow the human factors
engineer sufficient information to assess the quality and suitability of the contractor’s human factors
effort. The Human Factors Coordinator should prepare a list of human factors-related DIDs applicable
to the project being acquired and provide them for inclusion in the SOW.
System engineering is an interdisciplinary approach to evolve and verify an integrated and lifecycle-
balanced set of system product and process solutions that satisfy customer needs. The Human Factors
Coordinator assists in the system engineering task by contributing information related to design
enhancements, safety features, automation impacts, human-system performance trade-offs, ease of use,
and workload. The Human Factors Coordinator also assists in identifying potential task overloading or
skill creep for system operators and maintainers. Where user teams or operator juries and repre-
17 - 9
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
sentatives participate in achieving an operational viewpoint to design, the human factors engineer
complements the effort to ensure performance data represents more than individual preferences.
Optimally, the Human Factors Coordinator participates fully in system engineering design decisions.
While the actual design and development work may be completed by either the sponsor or the
contractor, the Human Factors Coordinator (in conjunction with the Human Factors Working Group)
provides close, continuous direction throughout the process. To accomplish this, the Human Factors
Coordinator reviews all documentation for human performance impacts that will affect total system
performance and exercises his or her responsibility by participating in technical meetings and system
engineering design reviews.
The human engineer actively participates in four major interrelated areas of system engineering:
• Planning
• Analysis
• Design and Development
• Test and Evaluation
The human engineering planning effort specifies the documentation requirements and assists in the
coordination with other program activities. Sponsor and contractor documentation provides traceability
from initially identifying human engineering requirements during analysis and/or system engineering,
through implementing such requirements during design and development, to verifying that these
requirements have been met during test and evaluation. The efforts performed to fulfill the human
engineering requirements must be coordinated with, but not duplicate, efforts performed by other
system engineering functions.
17 - 10
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
17.4.3 Human Engineering in Detail Design
During detail design, the human engineering requirements are converted into detail engineering design
features. Design of the equipment should satisfy human-system performance requirements and meet
the applicable human engineering design criteria. The human factors engineer participates in design
reviews and engineering change proposals for those items having a human interface. Essential products
to be reviewed related to detail design include: hardware design and interfaces, tests and studies,
drawings and representations, environmental conditions, procedures, software, technical
documentation.
The fact that the above may occur at various stages in system development should not preclude a final
human engineering verification of the complete system.
Human factors planning for test and evaluation (T&E) activities is initiated early in the project
management process. Specific human factors-related T&E tasks and activities are subsequently
17 - 11
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
identified in the project/program planning documentation. The conduct of the human factors T&E is
integrated with the system T&E program, which is largely performed during program implementation.
Key principles for addressing human factors requirements in system testing are:
Providing human factors in system testing entails an early start and a continuous process. Figure 17-2
illustrates the flow of this process.
A P(Task Error)
Mean Time
B to Complete
Task
Human engineering testing is incorporated into the project test and evaluation program and is
integrated into engineering design and development tests, demonstrations, acceptance tests, fielding and
other implementation assessments. Compliance with human engineering requirements should be tested
as early as possible. Human engineering findings from design reviews, mockup inspections,
demonstrations, and other early engineering tests should be used in planning and conducting later tests.
Human engineering test planning is directed toward verifying that the system can be operated,
maintained, and supported by user personnel in its intended operational environment.
17 - 12
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
Human engineering test planning should also consider data needed or to be provided by operational test
and evaluation. Test planning includes methods of testing (e.g., use of checklists, data sheets, test
participant descriptors, questionnaires, operating procedures, and test procedures), schedules,
quantitative measures, test criteria and reporting processes. Human engineering portions of tests
include:
Unfavorable outcomes occurring during test and evaluation are subjected to a human engineering
review to differentiate between failures of the equipment alone, failures resulting from human-system
incompatibilities and failures due to human error. Human-system incompatibilities and human errors
occurring in the performance of critical tasks are analyzed to determine the reason for their occurrence
and to propose corrective action(s).
For example, ‘Free Flight’ as described by the RTCA Task Force 3, provides a concept that suggests
placing more responsibility on flight crews to maintain safe separation from other aircraft in the NAS.
This idea could potentially shift aircraft separation responsibility from controllers to flight crews
creating a ‘shared separation’ authority environment. The guiding principle of the Free Flight concept
is to provide benefits to users and providers. Some of the benefits include improved safety through
enhanced conflict detection and resolution capabilities, more flexibility to manage flight operations,
greater predictability of the NAS, and better decision-making tools for air traffic controllers and pilots.
The major benefit anticipated for users is greater freedom to choose efficient routes and altitudes,
resulting in savings on fuel and operating costs. To exercise these benefits, there may be a need to
supply traffic information to flight crews, and develop operating methods and tools for both the air and
ground to assure safety. While there have been studies done on new tools developed to display traffic
17 - 13
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
information in the cockpit (with its conflict alerting logic) and to support in controller decisions,
investigating how the tools might (safely) work together in a shared separation environment requires
considerable exploration and analysis.
An experiment intended to provide an examination of the effect of shared separation authority on flight
operations when both air and ground have enhanced traffic and conflict alerting systems would
necessarily emphasize identifying and evaluating the human factors impact. Such an evaluation would
require detailed knowledge about how safety, human-system integration, and system-to-system
performance are affected in the following broad areas:
Many compromises in safety that lead to errors, accidents, or incidents can be attributed to unforeseen
effects of how new technologies, new operational procedures, and changing organizations affect the
human-system and system-to-system interface. Only through the rigorous exploration of these inter-
relationships, can the safety of the NAS be ensured.
17 - 14
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
17 - 15
FAA System Safety Handbook, Chapter 17: Human Factors Principles & Practices
August 2, 2000
Table 17-2: Overarching Human Factors Guidelines/Principles
Overarching Human Factors Principles
1. Honor The User (The user defines requirements – but only in a structured, data-driven way.)
2. To Err Is Human (People are not machines; machines are not perfect; design the interface to tolerate
errors of both.)
3. Human Factors Is Not Free (Plan the resources for human factors program support.)
4. Human Factors Requires Experts (The application of human factors engineering is neither easy, nor
common sense -- except in retrospect of an incident or accident or poor design; co-locate human factors
resources near the project/program teams they serve.)
5. People Are the Same; Individuals Are Different (Design for people sameness & tolerance of measured
differences, especially in their skill and performance.)
6. Early Operator and Maintainer Decisions Drive Safety and Lifecycle Support Costs (Identify early in
the program development process a requirement to subject every product to an "Out-of-Box" human factors
study.)
7. Operator and Maintainer Skill Is a Function of Aptitude and Training (Training is part of the system
engineering and safety performance package.)
8. Performance Is Measured in Terms of Time and Accuracy (Performance is a matter of degree --
quantitatively and qualitatively determined; test for human performance early and often.)
9. Task Safety & Performance Are Determined by the Design (Designs can improve or detract from task
safety and performance.)
10. Operator and Maintainer Performance Affect System Performance (How people use the system IS
the measure of the system’s capabilities and risks.)
17 - 16
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
2- 1
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
2.0 System Safety Policy and Process
This section describes the System Safety policies and processes used within the FAA.
Plan: The safety risk management process shall be predetermined, documented in a plan that must include the criteria
for acceptable risk.
Hazard identification: The hazard analyses and assessments required in the plan shall identify the safety risks
associated with the system or operations under evaluation.
Analysis: The risks shall be characterized in terms of severity of consequence and likelihood of occurrence in
accordance with the plan.
Comparative Safety Assessment: The Comparative Safety Assessment of the hazards examined shall be compared to
the acceptability criteria specified in the plan and the results provided in a manner and method easily adapted for
decision making.
Decision: The risk management decision shall include the safety Comparative Safety Assessment. Comparative Safety
Assessments may be used to compare and contrast options.
The order permits quantitative or qualitative assessments, but states a preference for quantitative. It requires
the assessments, to the maximum extent feasible, to be scientifically objective, unbiased, and inclusive of all
relevant data. Assumptions shall be avoided when feasible, but when unavoidable they shall be conservative
and the basis for the assumption shall be clearly identified. As a decision tool, the Comparative Safety
Assessment should be related to current risks and should compare the risks of various alternatives when
applicable.
In addition, the order requires each LOB or program office to plan the following for each high-consequence
decision:
Perform and provide a Comparative Safety Assessment that compares each alternative considered (including no action
or change, or baseline) for the purpose of ranking the alternatives for decision making.
Assess the costs and safety risk reduction or increase (or other benefits) associated with each alternative under final
consideration.
2- 2
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
2.1.2 AMS Policies
The AMS policy contains the following paragraphs in 2.9.13:
System Safety Management shall be conducted and documented throughout the acquisition management lifecycle.
Critical safety issues identified during mission analysis are recorded in the Mission Need Statement; a system safety
assessment of candidate solutions to mission need is reported in the Investment Analysis Report; and Integrated Product
Teams provide for program-specific safety risk management planning in the Acquisition Strategy Paper.
Each line of business involved in acquisition management must institute a system safety management process that
includes at a minimum: hazard identification, hazard classification (severity of consequences and likelihood of
occurrence), measures to mitigate hazards or reduce risk to an acceptable level, verification that mitigation measures are
incorporated into product design and implementation, and assessment of residual risk. Status of System Safety shall be
presented at all Joint Resources Council (JRC) meetings. Detailed guidelines for system safety management are found
in the FAST.
The safety risk management process operates as an integral part of the AMS under the oversight of the FAA
System Engineering Council. Figure 2-1 depicts the AMS Integrated Product Development System (IPDS)
process and the supporting system safety activities. The details of “how” to perform each activity shown in
this diagram are discussed in later chapters. General guidance for AMS safety activities is contained in the
NAS System Safety Management Plan (SSMP).
System Safety Products in the AMS Life Cycle
Comparative Safety
OSA Assessment (CSA)/Preliminary
- System- Level Hazard Analysis (PHA)
- Preliminary
- (some - Top - down, focus on known
assumptions) system mission and approaches
Some -Safety and changes at NAS system level
Requirements - Preliminary in nature
- Core Safety Requir ements
The prime goal of the AMS system safety program is the early identification and continuous control of
hazards in the NAS design. The NAS is composed of the elements shown in Figure 2-2.
The outputs of the AMS system safety process are used by FAA management to make decisions based on
safety risk. These outputs are:
2- 3
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
2.2.1 Integrated Product Development System and Safety Risk Management Process
Figure 2-1 depicts the integrated product development system process and the supporting system safety
activities. The integrated product development system is broken down into a number of life cycle
milestones which include: Mission Analysis, Investment Analysis, Solution Implementation, In Service
Management, and Service Life Extension. As noted in Figure 2-1, system safety activities will vary
depending on the phase of the life cycle. The OSA is to be conducted during mission analysis, prior to the
mission need decision at JRC-1. During investment analysis, initial system safety analysis is further refined
into Comparative Safety Assessment and a Preliminary Hazard Analysis (as needed). After the investment
analysis, more formal system safety activities are initiated by the product teams for that program and in
2- 4
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
accordance with the NAS SSMP. During solution implementation, a formal system safety program plan is to
be implemented. System safety activities should include system and sub-system hazard analysis. Prior to the
in-service decision, operating and support hazard analysis is conducted to evaluate the risks during in-service
management, and service life extension.
Operating and Support Hazard analyses can also be conducted for existing facilities, systems, subsystems,
and equipment. Hazard tracking and risk resolution is initiated as soon as hazards and their associated risks
have been identified. This effort is continued until the risk controls are successfully validated and verified.
Accident and Incident investigation, as well as data collection and analysis are conducted throughout the life
cycle, to identify other hazards or risks that affect the system. The specific details within this safety analysis
process are further discussed in Chapter 4.
The Comparative Safety Assessment (CSA) is an analysis type that provides management with a listing of all
the hazards associated with a design change, along with a Comparative Safety Assessment for each
alternative considered. It is used to rank the options for decision-making purposes. The CSA for a given
proposal or design change uses the PHL developed for the OSA. The OSA process is depicted below in
Figure 2-3.
CONOPS
System
Description OED OSA SEC
Functions JRC
Legend:
PHL OHA
OED Operational Environment Protection
PHL Preliminary Hazard List
ASOR Allocation of Safety Objectives
Hazard ASOR
And Requirements
Severity
Analysis OHA Operational Hazard Agreement
Safety SEC System Engineering Council
Objectives JRC Joint Resources Council
CONOPS Concept of Operations
2- 5
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
2.2.3 Hazard Tracking and Risk Resolution
The purpose of hazard tracking and risk resolution is to ensure a closed loop process of identifying and
controlling risks. A key part of this process, management risk acceptance, ensures that the management
activity responsible for system development and fielding is aware of the hazards and makes a considered
decision concerning the implementation of hazard controls. This process is shown in Figure 2-4.
2- 6
FAA System Safety Handbook, Chapter 2: System Safety Policy and Process
December 30, 2000
PHA
SSHA Hazard Analysis
SHA Hazard Analyses High Risk? NO Document
O&SHA
Incidents
YES
JRC/SEC Hazard
merge Tracking
Risk Report
Acceptance YES
Design or
Risk SSWG Adequate
Accepted? Rqmt change Controls?
Evaluation
NO
YES NO
YES
NO
Signed Hazard
Additional
IPT Active Hazard
Tracking
Tracking
Report Controls? Evaluation Report
2- 7
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Chapter 3:
Principles of System Safety
3- 1
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
3- 2
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Total cost
Cost - $
SEEK
Cost of Cost of
Accidents safety
program
Safety effort
Realistically, a certain degree of safety risk must be accepted. Determining the acceptable level of risk is
generally the responsibility of management. Any management decisions, including those related to safety,
must consider other essential program elements. The marginal costs of implementing hazard control
requirements in a system must be weighed against the expected costs of not implementing such controls.
The cost of not implementing hazard controls is often difficult to quantify before the fact. In order to
quantify expected accident costs before the fact, two factors must be considered. These are related to risk
and are the potential consequences of an accident and the probability of its occurrence. The more severe
the consequences of an accident (in terms of dollars, injury, or national prestige, etc.) the lower the
probability of its occurrence must be for the risk to be acceptable. In this case, it will be worthwhile to
spend money to reduce the probability by implementing hazard controls. Conversely, accidents whose
consequences are less severe may be acceptable risks at higher probabilities of occurrence and will
consequently justify a lesser expenditure to further reduce the frequency of occurrence. Using this
concept as a baseline, design limits must be defined.
1
FAA Order 8040.4 Paragraph 5.c.
3- 3
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
HARM
Hazard
Contributory Hazards
3- 4
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
3- 5
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Probable Qualitative: Anticipated to occur one or more times during the entire
system/operational life of an item.
Quantitative: Probability of occurrence per operational hour is greater that 1 x
10-5
Remote Qualitative: Unlikely to occur to each item during its total life. May occur
several time in the life of an entire system or fleet.
Quantitative: Probability of occurrence per operational hour is less than 1 x 10-5
, but greater than 1 x 10-7
Extremely Qualitative: Not anticipated to occur to each item during its total life. May
Remote occur a few times in the life of an entire system or fleet.
Quantitative: Probability of occurrence per operational hour is less than 1 x 10-7
but greater than 1 x 10-9
Extremely Qualitative: So unlikely that it is not anticipated to occur during the entire
Improbable operational life of an entire system or fleet.
Quantitative: Probability of occurrence per operational hour is less than 1 x 10-9
3- 6
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
2
Aircraft Performance Comparative Safety Assessment Model (APRAM), Rannoch Corporation, February 28, 2000
3- 7
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Probability
-3 -5 -7 -9
(Quantitative) 1.0 10 10 10 10
Probability FAR Probable Improbable Extremely Improbable
(Descriptive)
Reasonably
JAR Frequent Probable Remote Extremely Remote Extremely Improbable
3- 8
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Assessment of risk is made by combining the severity of consequence with the likelihood of occurrence in
a matrix. Risk acceptance criteria to be used in the FAA AMS process are shown in Figure 3-3 and
Figure 3-4.
Se
Li v No Safety
ke erity
lih Effect Minor Major Hazardous Catastrophic
oo
d
5 4 3 2 1
Probable
A
Remote
B
Extremely
Remote
C
Extremely
Improbable
D
High Risk
Medium Risk
Low Risk
3- 9
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
An example based on MIL-STD-882C is shown in Figure 3-5. The matrix may be referred to as a Hazard
Risk Index (HRI), a Risk Rating Factor (RRF), or other terminology, but in all cases, it is the criteria used
by management to determine acceptability of risk.
The Comparative Safety Assessment Matrix of Figure 3-5 illustrates an acceptance criteria methodology.
Region R1 on the matrix is an area of high risk and may be considered unacceptable by the managing
authority. Region R2 may be acceptable with management review of controls and/or mitigations, and R3
may be acceptable with management review. R4 is a low risk region that is usually acceptable without
review.
HAZARD CATEGORIES
FREQUENCY OF I II III IV
OCCURENCE CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE
(A) Frequent IA IIA IIIA IVA
(B) Probable R1 IB IIB IIIB IVB
(C) Occasional IC IIC IIIC IVC R4
(D) Remote R2 ID IID IIID IVD
(E) Improbable R3 IE IIE IIIEP IVE
Early in a development phase, performance objectives may tend to overshadow efforts to reduce safety
risk. This is because sometimes safety represents a constraint on a design. For this reason, safety risk
reduction is often ignored or overlooked. In other cases, safety risk may be appraised, but not fully
enough to serve as a significant input to the decision making process. As a result, the sudden
identification of a significant safety risk, or the occurrence of an actual incident, late in the program can
provide an overpowering impact on schedule, cost, and sometimes performance. To avoid this situation,
methods to reduce safety risk must be applied commensurate with the task being performed in each
program phase.
In the early development phase (investment analysis and the early part of solution implementation), the
system safety activities are usually directed toward: 1) establishing risk acceptability parameters; 2)
practical tradeoffs between engineering design and defined safety risk parameters; 3) avoidance of
alternative approaches with high safety risk potential; 4) defining system test requirements to demonstrate
safety characteristics; and, 5) safety planning for follow-on phases. The culmination of this effort is the
safety Comparative Safety Assessment that is a summary of the work done toward minimization of
unresolved safety concerns and a calculated appraisal of the risk. Properly done, it allows intelligent
management decisions concerning acceptability of the risk.
3-11
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Ensure that competent, responsible, and qualified engineers be assigned in program offices and contractor
organizations to manage the system safety program.
Ensure that system safety managers are placed within the organizational structure so that they have the
authority and organizational flexibility to perform effectively.
Ensure that all known hazards and their associated risks are defined, documented, and tracked as a
program policy so that the decision-makers are made aware of the risks being assumed when the system
becomes operational.
Require that an assessment of safety risk be presented as a part of program reviews and at decision
milestones. Make decisions on risk acceptability for the program and accept responsibility for that
decision.
3-12
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Provide warning devices. 3 When neither design nor safety devices can
effectively eliminate identified risks or
adequately reduce risk, devices shall be used to
detect the condition and to produce an
adequate warning signal. Warning signals and
their application shall be designed to minimize
the likelihood of inappropriate human reaction
and response. Warning signs and placards shall
be provided to alert operational and support
personnel of such risks as exposure to high
voltage and heavy objects.
Examples:
3-14
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Modern safety management, i.e.--“system safety management”-- adopts techniques of system theory,
statistical analysis, behavioral sciences and the continuous improvement concept. Two elements critical
to this modern approach are a good organizational safety culture and people involvement.
The establishment of system safety working groups, analysis teams, and product teams accomplishes a
positive cultural involvement when there are consensus efforts to conduct hazard analysis and manage
system safety programs.
Real-time safety analysis is conducted when operational personnel are involved in the identification of
hazards and risks, which is the key to behavioral-based safety. The concept consists of a “train-the-
trainer” format. See chapter 14 for a detailed discussion of how a selected safety team is provided the
necessary tools and is taught how to:
The first step in performing safety risk management is describing the system under consideration. This
description should include at a minimum, the functions, general physical characteristics, and operations of
the system. Normally, detailed physical descriptions are not required unless the safety analysis is focused on
this area.
3-15
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Keep in mind that the reason for performing safety analyses is to identify hazards and risks and to
communicate that information to the audience. At a minimum, the safety assessment should describe the
system in sufficient detail that the projected audience can understand the safety risks.
A system description has both breadth and depth. The breadth of a system description refers to the system
boundaries. Bounding means limiting the system to those elements of the system model that affect or
interact with each other to accomplish the central mission(s) or function. Depth refers to the level of detail in
the description. In general, the level of detail in the description varies inversely with the breadth of the
system. For a system as broad as the National Airspace System (NAS) our description would be very
general in nature with little detail on individual components. On the other hand, a simple system, such as a
valve in a landing gear design, could include a lot of detail to support the assessment.
First, a definition of “system” is needed. This handbook and MIL-STD-882i (System Safety Program
Requirements) define a system as:
A composite at any level of complexity, of personnel, procedures, material, tools,
equipment, facilities, and software. The elements of this composite entity are used together
in the intended operation or support environment to perform a given task or achieve a
specific production, support, or mission requirement.
Graphically, this is represented by the 5M and SHEL models, which depict, in general, the types of
elements that should be considered within most systems.
Mission. The mission is the purpose or central function of the system. This is the reason that all the other
elements are brought together.
Man. This is the human element of a system. If a system requires humans for operation, maintenance, or
installation this element must be considered in the system description.
Machine. This is the hardware and software (including firmware) element of a system.
Management. Management includes the procedures, policy, and regulations involved in operating,
maintaining, installing, and decommissioning a system.
(1) Media. Media is the environment in which a system will be operated, maintained, and installed. This
environment includes operational and ambient conditions. Operational environment means the
conditions in which the mission or function is planned and executed. Operational conditions are those
involving things such as air traffic density, communication congestion, workload, etc. Part of the
operational environment could be described by the type of operation (air traffic control, air carrier,
general aviation, etc.) and phase (ground taxiing, takeoff, approach, enroute, transoceanic, landing, etc.).
Ambient conditions are those involving temperature, humidity, lightning, electromagnetic effects,
radiation, precipitation, vibration, etc.
3-17
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
H
S L E
L
S= Software (procedures, symbology, etc.
H= Hardware (machine)
E= Environment (operational and ambient)
L= Liveware (human element)
In the SHELL model, the match or mismatch of the blocks (interface) is just as important as the
characteristics described by the blocks themselves. These blocks may be re-arranged as required to
describe the system. A connection between blocks indicates an interface between the two elements.
3-18
FAA System Safety Handbook, Chapter 3: Principles of System Safety
December 30, 2000
Each element of the system should be described both functionally and physically if possible. A function is
defined as
Physical characteristics: A physical description provides the audience with information on the real
composition and organization of the tangible system elements. As before, the level of detail varies with the
size and complexity of the system, with the end objective being adequate audience understanding of the
safety risk.
Both models describe interfaces. These interfaces come in many forms. The table below is a list of
interface types that the system engineer may encounter.
i
MIL-STD-882. (1984). Military standard system safety program requirements. Department of Defense.
3-19
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
Chapter 4:
Safety Assessments Before Investment Decision
4 - 1
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
Comparative Safety
OSA Assessment (CSA)/Preliminary
- System- Level Hazard Analysis (PHA)
- Preliminary
- (some - Top - down, focus on known
assumptions) system mission and approaches
S ome- Safety and changes at NAS system level
Req uirements - Preliminary in nature
- Core Safety Requirements
An Operational Safety Assessment (OSA) has been designed to provide a disciplined, and internationally
developed (RTCA SC189) method of objectively assessing the safety requirements of aerospace systems.
In the FAA, the OSA is used to evaluate Communication, Navigation, Surveillance (CNS) and Air Traffic
Management (ATM) systems. The OSA identifies and provides an assessment of the hazards in a system,
4 - 2
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
defines safety requirements, and builds a foundation for follow-on institutional safety analyses related to
Investment Analysis, Solution Implementation, In-Service Management, and Service Life Extension.
The OSA is composed of two fundamental elements: (1) the Operational Services & Environment
Description (OSED), and (2) an Operational Hazard Assessment (OHA). The OSED is a description of
the system physical and functional characteristics, the environment’s physical and functional
characteristics, air traffic services, and operational procedures. This description includes both the ground
and air elements of the system to be analyzed. The OHA is a qualitative safety assessment of the
operational hazards associated with the OSED. Each hazard is classified according to its potential
severity. Each classified hazard is then mapped to a safety objective based on probability of occurrence.
In general, as severity increases, the safety objective is to decrease probability of occurrence.
The information contained in the OSA supports the early definition of system level requirements. It is not
a risk assessment in a classical sense. Instead, the OSA’s function is to determine the system’s
requirements early in the life cycle. The early identification and documentation of these requirements
may improve system integration, lower developmental costs, and increase system performance and
probability of program success. While the OSA itself is not a risk assessment, it does support further
safety risk assessments that are required by FAA Order 8040.4. The follow-on safety assessments may
build on the OSA’s OSED and OHA, by using the hazard list, system descriptions, and severity codes
identified in the OSA. The OSA also provides an essential input into CSA safety assessments that
support trade studies and decision making in the operational and acquisition processes.
The CSA is a safety assessment performed by system safety to assess the hazards and relative risks
associated with alternatives in a change proposal. The alternatives can be design changes, procedure
changes, or program changes. It is useful in trade studies and in decision-making activities where one or
more options are being compared in a system or alternative evaluation. This type of risk assessment can
be used by management to compare and rank risk reduction alternatives. More details on how to perform
a CSA are included in section 4.2.
• Define the boundaries of the system under consideration. Determine, separate, and document
what elements of the system you will describe/analyze from those that you will not
4 - 3
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
describe/analyze. The result of this process is a model of the system under analysis that will
be used to analyze hazards.
• Using models such as those described in chapter 3, describe the system physical and
functional characteristics, the environment physical and functional characteristics, air traffic
services, human elements (e.g. pilots and controllers, etc.) and operational procedures.
• From this description, determine and list the system functions. For example, the primary
function of a precision navigation system is to provide CSA and flight crews with vertical and
horizontal guidance to the desired landing area. These functions could be split if desired into
vertical and horizontal guidance. Supporting functions would be those functions that provide
the system the capability to perform the primary function. For instance a supporting function
of the precision navigation system would be transmission of the RF energy for horizontal
guidance. It is up to the system engineering team to determine how to group these functions
and to what level to take the analysis. Detailed analyses would go into the lower level
functions. Typically the OSA functional analysis is limited to the top-level functions. See
FAA System Engineering Manual for more detailed guidance on functional analysis.
Hazard The potential for harm. Unsafe acts or unsafe conditions that could result
in an accident. (A hazard is not an accident).
Since the work has already been done in defining the system operational environment, it is often best to
relate the functions of the system to hazards. For example, in analyzing the NAS, one would find the
following functions of the NAS (listed in Table 4.1-1). These functions are then translated into hazards
that would be included in the preliminary hazard list. For many of the listed hazards other conditions must
be present before an accident could occur. These are detailed in the detailed description of the risk
assessment. The purpose here is to develop a concise, clear, and understandable PHL.
4 - 4
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
Table 4-1: Examples of NAS System Functions and Their Associated Hazards
In addition to the functional analysis, the following tools can be used to identify the foreseeable hazards
to the system operation. These tools are listed in Table 4-2.
4 - 5
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
CHANGE ANALYSIS Purpose: To detect the hazard implications of both planned and
unplanned change.
Method: Compare the current situation to a previous situation.
CAUSE & EFFECT Purpose: To add depth and increased structure to the Hazard ID process
TOOL -- CHANGE through the use of graphic trees.
ANALYSIS Method: Draw the basic cause and effect diagram on a worksheet. Use a
team knowledgeable of the operation to develop causal factors for each
branch. Can be used as a positive or negative diagram.
Purpose: To detect the hazard implications of both planned and
unplanned change.
Method: Compare the current situation to a previous situation.
CAUSE & EFFECT Purpose: To add depth and increased structure to the Hazard ID process
TOOL through the use of graphic trees.
Method: Draw the basic cause and effect diagram on a worksheet. Use a
team knowledgeable of the operation to develop causal factors for each
branch. Can be used as a positive or negative diagram.
OHA Tasks
The tasks to be accomplished in this phase are:
• From the function list (or tools listed in Table 4-2) develop the list of hazards potentially existing
in the system under study
• Determine the potential severity of each hazard in the hazard list by referring to the risk
determination section of Chapter 3.
Once the TLS is determined for each hazard, requirements can be written to ensure that the appropriate
hazard controls are established as system requirements.
4 - 6
Steps Hazard Classification
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
Se
1. Determine potential ity ver No Safety
Lik
severity of each hazard eli
ho
Effect
5
Minor
4
Major
3
Hazardous
2
Catastrophic
1
od
in the OHA.
2. Map severity to this
Probable
chart to determine A
probability requirement
(minimum) and Remote
objective (desired) B
Low Risk
Clearly risk reduction by design is the preferred method of mitigation. But even if the risk is reduced, the
term “reduction” still implies the existence of residual risk, which is the risk left over after the controls
are applied. For example, residual risk can be controlled in a manner described in Table 4-3. This table
describes the NAS System Function, NAS System Hazard, and NAS System Control.
4 - 7
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
As the engineer performs the assessment, controls that do not yet exist can be identified and listed. These
controls are included in the requirements of the OSA. This is done by turning the controls into
measurable and testable requirements or “shall” statements. A critical function of System Engineering is
the determination and allocation of requirements early in the concept and definition phase. System
Safety’s function in this process is to develop safety-related requirements early in the design to facilitate
System Engineering. A primary source of safety requirements is the OSA. The controls identified, both
existing and recommended, should be translated into a set of system level requirements. For example,
Table 4-4 lists the same hazards and controls that were examined in Table 4-3. The requirements are
examples only and are meant for illustration.
4 - 8
FAA System Safety Handbook, Chapter 4: Pre-Investment Decision Safety Assessments
December 30, 2000
NAS System
Function
NAS System Hazard NAS System Controls NAS System Requirements
Provide air to Loss of air to ground Multiple communication The NAS system shall provide
ground communication and channels. Multiple radios. for multiple communication
communicat- control. Procedures for loss of modes in the enroute structure,
ions and communication. Phase at least 2 channels in each
control. dependent: communication region being in the VHF
is not always critical. frequency spectrum, and one
available through the satellite
communication system. The
total Mean Time Between
Failure (MTBF) of these
systems may not be less than X
hours.
Provide CSA Loss of precision Reliability. Alternate The NAS shall provide at least
precision instrument guidance approaches available. two backup non-precision
approach to the runway. Procedures for alternate approaches at each airport with
instrument airport selection. Fuel a precision approach capability.
guidance to reserve procedures. System The NAS procedures shall
runways. detection and alert to CSA. require part 121 operators to
Phase and condition (IMC select an alternate destination if
vs. VMC) dependent. the forecast weather at the
planned destination is less than
500’ and 1 mile over the
destinations weather planning
minimums within one hour of
the planned arrival.
Provide Lack EFAS Early detection systems The NAS shall detect icing
Enroute Flight warning of severe (satellite) for severe weather. conditions greater than
Advisories of weather to CSA Multiple dissemination moderate accretion when it
severe flight crew. means. Procedures actually exists in any area of 10
weather. (condition dependent) miles square and at least 1000’
require alternate airports. thick for greater than 15
Fuel reserve procedures. minutes duration.
The first step within the CSA process involves describing the system under study in terms of the 5M
model (chapter 3). Since most decisions are a selection of alternatives, each alternative must be described
in sufficient detail to ensure the audience can understand the hazards and risks evaluated. Many times
one of the alternatives will be “no change”, or retaining the baseline system. A preliminary hazard list
(PHL) is developed and then each hazard’s risk is assessed in the context of the alternatives. After this is
done, requirements and recommendations can be made based on the data in the CSA. A CSA should be
written so that the decision-maker can clearly distinguish the relative safety merit of each alternative. An
example (with instructions) of a CSA is included in Appendix B.
Be objective
Be unbiased
Include all relevant data
Use assumptions only if specific information is not available. If assumptions are made they should be
conservative and clearly identified. Assumptions should be made in such a manner that they do not
adversely affect the safety of the system.
Define risk in terms of severity and likelihood in accordance with chapter 3, paragraph 3.4. Severity is
independent of likelihood in that it can and should be defined without considering likelihood of
occurrence. Likelihood is dependent on severity. The definition of likelihood should be made on how
often an accident can be expected to occur, not how often the hazard occurs.
Compare the results of the risk assessment of each hazard for each alternative considered in order to rank
the alternatives for decision making purposes.
Assess the safety risk reduction or other benefits associated with implementation of and compliance with
an alternative under consideration.
Assess risk in accordance with the risk determination defined in Tables 3-2 and 3-3.
Evaluate the likelihood of occurrence of the hazard conditions resulting in an accident at the level of
severity indicated in (4) above. These definitions can be found in chapter 3, Table 7 of this guidebook.
This means that the likelihood selected is the probability of an accident happening in the conditions
described in (4), and not the probability of just the hazard occurring.
Document the assumptions and justification for how severity and likelihood for each hazard condition
was determined.
4 - 11
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
5-1
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
A formal safety program that stresses early hazard identification and elimination or reduction of
associated risk to a level acceptable to the managing activity (MA) is not only effective from a safety
point of view but is also cost effective.
The FAA SSP is structured on common-sense procedures that have been effective on many programs.
These procedures are commonly known as the Safety Order of Precedence as summarized in Table 5-1.
These four general procedures are used to establish the following SSP activities:
• Eliminate identified hazards or reduce associated risk through design, including material
selection or substitution.
• Design to minimize risk created by human error in the operation and support of the system.
• Protect power sources, controls, and critical components of redundant subsystems by
separation, isolation, or shielding.
• When design approaches cannot eliminate a hazard, provide warning and caution notes in
assembly, operations, maintenance, and repair instructions, and distinctive markings on
hazardous components and materials, equipment, and facilities to ensure personnel and
equipment protection. These will be standardized in accordance with MA requirements.
5-2
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Table 5-1: Safety Order of Precedence
Provide warning devices. 3 When neither design nor safety devices can
effectively eliminate identified risks or adequately
reduce risk, devices shall be used to detect the
condition and to produce an adequate warning
signal. Warning signals and their application shall
be designed to minimize the likelihood of
inappropriate human reaction and response.
5-3
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
5-4
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
The following is a partial list of safety activities that can help the program manager control safety risks.
• Develop and distribute safety guidance for the entire life cycle of the system (i.e., design,
development, production, test, transportation, handling, operation, and maintenance).
• Integrate safety activities into all systems engineering and National Airspace Integrated
Logistics Support (NAILS) activities. This integration requires the entire design,
manufacturing, test and logistics support teams to identify hazards and implement controls.
• Perform safety analysis in a timely manner.
• Communicate safety requirements and analyses to all subcontractors of safety significant
equipment.
• Ensure that safety analysis results are discussed in design and document reviews.
• Execute closed loop procedures to ensure that required safety controls are actually
implemented (e.g., warnings in technical manuals and training programs).
• Review historical data for similar applications.
• Demonstrate corrective actions for identified risks.
5-5
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
O&SHA
SSHA
PHA
SYSTEM
CONTRACTUAL SAFETY TEST
REQMTS REQUIREMENTS PROGRAM ANALYSES &
PLAN EVALUATION
Pre-Contract Contract
REQUIREMENTS
INTERFACES
MILESTONES
SAFETY VERIF.
ORGANIZATION
MISHAP REPORT
SCOPE
SAFETY DATA
SSA REPORT
SSPP
The FAA establishes the contractual requirements for a SSPP in the Statement of Work (SOW). The
FAA requires the contractor to establish and maintain an effective and efficient SSP. This is usually the
first safety requirement stated in the SOW. SSP requirements are defined by MIL-STD-882, Section 4.
They are the only mandatory requirements and cannot be tailored. The System Safety Program Plan
purpose is to plan and document the system safety engineering effort necessary to ensure a safe system.
The SSPP will:
5-6
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
• Include information on how system safety will be integrated into the overall system
Integrated Product Development System and Integrated Product Team structure in the FAA.
• Define how hazards and residual risk are communicated to the program manager, and how
the program manager will formally accept and track the hazards and residual risk.
The SSPP contains the scope, organization, milestones, requirements, safety data, safety verification,
accident reporting, and safety program interfaces.
The SSPP is usually required to be submitted as a deliverable for MA approval 30 to 45 days after start of
the contract. In some situations, the MA may require that a preliminary SSPP be submitted with the
proposal to ensure that the contractor has planned and costed an adequate SSP. Since the system safety
effort can be the victim of a cost competitive procurement, an approval requirement for the SSPP
provides the MA with the necessary control to minimize this possibility.
A good SSPP demonstrates risk control planning through an integrated program management and
engineering effort. It is directed towards achieving the specified safety requirements of the SOW and
equipment specification. The plan includes details of those methods the contractor uses to implement
each system safety task described by the SOW and those safety related documents listed in the contract
for compliance (MIL-STD-882, paragraph 6.2). Examples of safety-related documents include
Occupational Safety and Health Administration (OSHA) regulations and other national standards, such as
the National Fire Protection Association (NFPA). The SSPP lists all requirements and activities required
to satisfy the SSP objectives, including all appropriate related tasks. A complete breakdown of system
safety tasks, subtasks, and resource allocations for each program element through the term of the contract
is also included. A baseline plan is required at the beginning of the first contractual phase (e.g.,
Demonstration and Validation or Full-Scale Development) and is updated at the beginning of each
subsequent phase (e.g., production) to describe the tasks and responsibilities for the follow-on phase.
Plans generated by one contractor are rarely efficient or effective for another. Each plan is unique to the
corporate personality and management system. This is important to remember in competitive
procurement of a developed or partially developed system. The plan is prepared so that it describes the
5-7
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
system safety approach to be used on a given program at a given contractor's facilities and describes the
system safety aspects and interfaces of all appropriate program activities. The contractor's approach to
defining the critical tasks leading to system safety certification is included.
The plan should describe an organization featuring a system safety manager who is directly responsible to
the program manager or the program manager's agent for system safety. This agent must not be
organizationally inhibited from assigning action to any level of program management. The plan further
describes methods by which critical safety problems are brought to the attention of program management
and for management approval of closeout action. Organizations that show responsibility through lower
levels of management are ineffective, and therefore unacceptable.
The SSPP is usually valid for a specific phase of the system life cycle, because separate contracts are
awarded as development of equipment proceeds through each phase of the life cycle. For example, a
contract award may be for the development of a prototype during the validation phase. A subsequent
contract may be awarded to develop pre-production hardware and software during full-scale development,
and still another awarded when the equipment enters the production phase. Progressing from one phase
of the life cycle to the next, the new contract's SOW should specify that the SSPP prepared for the former
contract be revised to satisfy the requirements of the new contract and/or contractor.
Each plan should include a systematic, detailed description of the scope and magnitude of the overall SSP
and its tasks. This includes a breakdown of the project by organizational component, safety tasks,
subtasks, events, and responsibilities of each organizational element, including resource allocations and
the contractor's estimate of the level of effort necessary to effectively accomplish the contractual task. It
is helpful to the evaluator if two matrices are included:
• The contractor's system safety personnel. Internal control for the proper implementation
of system safety requirements and criteria affecting hardware, operational resources, and
personnel should be the responsibility of the system safety manager through the
manager's interface with other program disciplines. The system safety manager should
also be responsible for initiation of required action whenever internal coordination of
controls fail in the resolution of problems.
• Other contractor organizational elements involved in the System Safety Working Groups
(SSWGs). System safety responsibilities are an inherent part of every program function
and task. Examples include reliability and test and evaluation (T&E).
Managing Authority
Mechanical Design
R&M
Software Design
5-9
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Responsibility and authority of all personnel with significant safety interfaces
• The contractor's system safety personnel.
• Internal control for the proper implementation of system safety requirements and criteria
affecting hardware, operational resources, and personnel should be the responsibility of the
system safety manager through the manager's interface with other program disciplines.
• The system safety manager should also be responsible for initiation of required action
whenever internal coordination of controls fail in the resolution of problems.
• Other contractor organizational elements involved in the System Safety Working Groups
(SSWGs). System safety responsibilities are an inherent part of every program function and
task. Examples include reliability and test and evaluation (T&E).
• The organizational unit responsible for executing each task (e.g. reliability or T&E) and its
authority in regard to resolution of all identified hazards. Resolution and action relating to
system safety matters may be effective at all organizational levels but must include the
organizational level possessing resolution authority (e.g. program or engineering manager).
The SSP manager should be identified by name, with address and phone number.
The staffing plan of the system safety organization for the duration of the contract
It should include staff loading, control of resources, and the qualifications of key system safety personnel
assigned, including those who possesses coordination/approval authority for contractor prepared
documentation.
The procedures by which the contractor will integrate and coordinate the system safety
efforts,
including assignment of the system safety requirements to internal organizations and subcontractors,
coordination of subcontractor SSPs, integration of hazard analysis, program status reporting, and SSWGs.
The contractor must provide a description of a system safety function with a management authority, as the
agent of the program manager, to maintain a continual overview of the technical and planning aspects of
the total program. Although the specific organizational assignment of this function is a contractor's
responsibility, the plan must show a direct accountability to the program manager with unrestricted access
to any level of management to be acceptable.
The ultimate responsibility for all decisions relating to the conduct and implementation of the SSP rests
with the program director or manager. Each element manager is expected to be fully accountable for the
implementation of safety requirements in the respective area of responsibility.
In the usual performance of their duties, SSP managers must have direct approval authority over any
safety critical program documentation, design, procedures, or procedural operation. A log of non-
deliverable data should be maintained showing all program documentation reviewed, concurrence or non-
5 - 10
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
concurrence, reasons why the system safety engineer concurs or non-concurs, and actions taken as a result
of non-concurrence. The MA should assess activity and progress by reviewing this log.
For major programs, the staffing forecast can be provided at the significant safety task level.
The contractor is required to assign a system safety manager who meets specific educational and
professional requirements and who has had significant assignments in the professional practice of safety.
Qualifications should reflect the system's criticality and SSP magnitude. Application of common sense is
necessary. Clearly, the safety manager for an airframe program requires different credentials than one
responsible for an avionics program. For major programs, a range of six to nine years of system safety
experience is required. In some cases, it is justifiable to require either a registered Professional Engineer
(PE) or a board Certified Safety Professional
In other cases, work experience may be substituted for educational requirements. Small programs or
organizations may have limited access to personnel with full time safety experience, and the MA should
be confident that such credentials are necessary for the specific application before invoking them.
The minimum qualifications for the systems safety manager or staff should be included in the contract.
This may be difficult: The existence of a CSP is a rarity at electronic development and manufacturing
companies. If a CSP is required, the contractor is likely to hire a part-time CSP consultant, a questionable
approach. PEs are more common, but few have careers involving safety. Appendix A in MIL-STD-882
provides a table of minimum qualifications for programs based upon complexity and demands on CSP or
PE qualifications. This approach ignores the hazard severity of the system.
Table 5-2 is suggested as a qualification baseline. It is not absolute and is offered only as guidance. The
MA may adjust these qualifications, as appropriate.
5 - 11
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Program Program
Complexity Severity Education Experience Certification
High Catastrophic BS in Engineering or Six years in system CSP or PE
applicable other safety desired;
equivalent 10 yrs
experience
High Critical BS in Engineering or Six years in system CSP or PE
applicable other safety or related desired;
discipline equivalent 10 yrs
experience
High Marginal BS in Engineering or Two years in system CSP or PE
applicable other safety or related desired;
discipline equivalent 10 yrs
experience
Moderate Catastrophic BS in Engineering or Four years in system CSP or PE
applicable other safety desired; equiv. 10
yrs experience
Moderate Critical BS in Engineering or Four years in system None
applicable other safety or related
discipline
Moderate Marginal BS plus training in Two years in system None
system safety safety or related
discipline
Low Catastrophic BS plus training in Four years in system None
system safety safety or related
discipline
Low Critical BS plus training in Two years in system None
system safety safety or related
discipline
Low Marginal High School Diploma Two years in system None
plus training in safety or related
system safety discipline
5 - 12
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
A SSPP prepared in accordance with MIL-STD-882 provides the FAA with an opportunity to review the
contractor's scheduling of safety tasks in a timely fashion, permitting corrective action when applicable.
MIL-STD-882 guides the contractor to plan and organize the system safety effort and provides the MA
with necessary information for FAA support planning by requiring the elements listed below.
Requirements to be adjusted for program, as necessary.
SSP milestones
Program schedule of safety tasks including start and completion dates, reports, reviews, and estimated
staff loading
Identification of integrated system safety activities (e.g., design analysis, tests, and demonstration)
applicable to the SSP but specified in other engineering studies to preclude duplication. (See Chapter 6,
System Safety Integration and Risk Assessment)
The SSPP must provide the timing and interrelationships of system safety tasks relative to other program
tasks. A suitable program milestone section of an SSPP will include a Gantt chart showing each
significant SSP task, the period of performance for each, and related overall program milestones. For
example, one expects the establishment of design criteria and the generation of the SSPP to begin almost
immediately during any design phase; analyses to run concurrent to design activities and have at least
interim completions prior to major design reviews; and the establishment of hazard tracking systems prior
to a significant testing. Figure 5-3 shows an example of a Gantt chart.
The schedule for each SSP task in the SSPP should be tied to a major milestone (e.g., start 30 days after
or before the preliminary design review [PDR]) rather than a specific date, as MIL-STD-882 requires. In
this manner, the SSPP does not need revision whenever the master program schedule shifts. The same
MA control is maintained through the program master schedule but without the associated cost of
documented revision or schedule date waiver.
5 - 13
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Quantitative requirements. – usually expressed as a failure or accident rate, such as “ the Catastrophic
system accident rate shall not exceed x.xx X 10-Y per operational hour.”
Accident risk requirements – could be expressed as “ No hazards assigned a Catastrophic accident
severity are acceptable.” Accident risk requirements could also be expressed as a level defined by the
accident risk assessment matrix. (see Chapter x. yy) such as “No Category 3 or higher accident risks are
acceptable.”
Standardization requirements – are expressed relative to a known standard that is relevant to the system
being developed. Examples include: The system will comply with the Federal Code of Regulations
CFR-XXX, or “The system will comply with international standards developed by ICAO.”
A composite list of all SSP requirements is included in the requirements and criteria section of the SSPP
for several reasons. The list includes the following.
Organization and integration of safety requirements establishing clear SSP objectives. Frequently, safety
requirements are included at multiple levels in a variety of specifications. Assembling a safety
requirements composite list can be time consuming and, therefore, generating and formally documenting
this list can expect to save significant staff labor costs and likely omissions by those without significant
system safety experience.
5 - 14
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Providing MA assurance that no safety requirements have been missed and that the safety requirements
have been interpreted correctly.
Documentation
The inclusion of a description of risk assessment procedures, and safety precedence is an important
example of where the SSPP contributes to the MA and the contractor reaching a common understanding.
Without such details explicitly described in the SSPP, both the MA and contractor could, in good faith,
proceed down different paths until they discover the difference of interpretation at a major program
milestone.
The hazard analyses described in Chapters 8 & 9 illustrate some methodologies used to identify risks, and
assign severity and criticality criteria. Safety precedence is a method of controlling specific unacceptable
hazards. A closed loop procedure is required to ensure that identified unacceptable risks are resolved in a
documented disciplined manner. The inclusion of such procedures demonstrates both necessary control
and personnel independence.
The presence of the safety criteria in the SSPP is an important step in the system safety management
process. This information must flow down to the system and design engineers (including appropriate
subcontractors). SSPP must provide a procedure that incorporates system safety requirements and criteria
in all safety critical item (CI) specifications. Such safety requirements include both specific design and
verification elements.
Unambiguous communication between the FAA and the contractor depends on standardized definitions.
The FAA may choose for expediency, to invoke a MIL-STD-882 SSP. It must be noted that the
definitions included in MIL-STD-882 are not identical to those used in the FAA community. Therefore,
the SOW should indicate that the definitions in this handbook (or other FAA documents) supersede those
in MIL-STD-882, see Glossary for examples.
The analysis techniques and formats to be used in the qualitative or quantitative analysis to identify risks,
their hazards and effects, hazard elimination, or risk reduction requirements, and how these requirements
are met.
The depth within the system to which each technique is used, including risk identification associated with
the system, subsystem, components, personnel, ground support equipment, GFE, facilities, and their
interrelationship in the logistic support, training, maintenance, and operational environments.
The integration of subcontractor hazard analyses with overall system hazard analyses.
Analysis is the method of identifying hazards. A sound analytical and documentation approach is required
if the end product is to be useful. An inappropriate analytical approach can be identified in the
contractor's discussion within the SSPP.
Each program is required to assess the risk of accident in the design concept as it relates to injury to
personnel, damage to equipment, or any other forms of harm. The result of this assessment is a definition
of those factors and conditions that present unacceptable accident/accident risk throughout the program.
This definition provides a program baseline for formulation of design criteria and assessment of the
adequacy of its application through systems analysis, design reviews, and operational analysis. System
5 - 15
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
safety analyses are accomplished by various methods. As noted in Chapters 8&9 of this handbook, the
basic safety philosophy and design goals must be established prior to initiation of any program analysis
task. Without this advanced planning, the SSP becomes a random identification of hazards resulting in
operational warnings and cautions instead of design correction (i.e., temporary, not permanent solutions)
The SSPP, therefore, describes the methods to be used to perform system safety analyses. The methods
may be quantitative or qualitative, inductive or deductive, but must produce results consistent with
mission goals.
It is important that the SSP describes procedures that will initiate design change or safety trade studies
when safety analyses indicate such action is necessary. Specific criteria or safety philosophy guides trade
studies or design changes. Whenever a management decision is necessary, an assessment of the risk is
presented so that all facts can be considered for a proposed decision. It is common to find budget
considerations driving the design without proper risk assessment. Without safety representation, design
decisions may be made primarily to reduce short-term costs increasing the accident risk. Such a decision
ignores the economics of an accident. In many cases accident and accident costs far exceed the short-term
savings achieved through this process.
The contractor's system safety engineers should be involved in all trade-studies. The SSPP must identify
the responsible activity charged with generating CRAs, and with reviewing and approving the results of
trade-studies to assure that the intent of the original design criteria is met.
The hazard analysis section of the SSPP should describe in detail, the activities which will identify the
impact of changes and modifications to the accident potential of delivered and other existing systems. All
changes or modifications to existing systems must be analyzed for impact in the safety risk baseline
established by the basic system safety analysis effort. In many cases, this analysis can be very limited
where in others a substantial effort is appropriate. The results must be included for review as a part of
each engineering change proposal.
• The verification (e.g., test, analysis, inspection) requirements for ensuring that safety is
adequately demonstrated. Identify any certification requirements for safety devices (e.g., fire
extinguisher, circuit breakers) or other special safety features (e.g., interlocks). Note that
5 - 16
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
some certification requirements will be identified as the design develops so the SSPP should
contain procedures for identifying and documenting these requirements.
• Procedures for making sure test information is transmitted to the MA for review and analysis.
• Procedures for ensuring the safe conduct of all tests.
The FAA System Engineering Manual may be consulted for further information on verification and
validation.
5.3.9 Training
This portion of the SSPP contains the contractor's plan for using the results of SSP in various training
areas. Often hazards that relate to training are identified in the Safety Engineering Report (SER) or in the
System Engineering Design Analysis Report. Procedures should provide for transmitting this information
to any activity preparing training plans. The specifics involved in safety training may be found in Chapter
14.
The SSP will produce results that should be applied in training operator, maintenance, and test personnel.
This training should not only be continuous but also conducted both formally and informally as the
program progresses. The SSPP should also address training devices.
The SSPP should also define the time and circumstances under which the MA assumes primary
responsibility for accident and incident investigation. The support provided by the contractor to
government investigators should be addressed. The procedures by which the MA will be notified of the
results of contractor accident investigations should be spelled out. Provisions should be made for a
government observer to be present for contractor investigations.
Any incident that could have affected the system should be evaluated from a system safety point of view.
An incident in this case is any unplanned occurrence that could have resulted in an accident. Incidents
involve the actions associated with hazards, both unsafe acts or unsafe conditions that could have resulted
in harm. Participants within the system safety program should be trained in the identification of
5 - 17
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
incidents; this involves a concept called behavioral-based safety, which is discussed in Chapter 12,
Facilities System Safety.
5.3.11 Interfaces
Since conducting an SSP will eventually affect almost every other element of a system development
program, a concerted effort must be made to effectively integrate support activities. Each engineering
and management discipline often pursues its own objectives independently, or at best, in coordination
only with mainstream program activities such as design engineering and testing.
To ensure that the SSP is comprehensive, the contractor must impose requirements on subcontractors and
suppliers that are consistent with and contribute to the overall SSP. This part of the SSPP must show the
contractor's procedures for accomplishing this task. The prime contractor must evaluate variations and
specify clear requirements tailored to the needs of the SSP. Occasionally, the MA procures subsystems or
components under separate contracts to be integrated into the overall system. Subcontracted subsystems
that impact safety should be required to implement an SSP.
The integration of these programs into the overall SSP is usually the responsibility of the prime contractor
for the overall system. When the prime contractor is to be responsible for this integration, the Request for
Proposal (RFP) must specifically state the requirement. This subparagraph of the SSPP should indicate
how the prime contractor plans to effect this integration and what procedures will be followed in the event
of a conflict.
The MA system safety manager should be aware that the prime contractor is not always responsible for
the integration of the SSP. For example, in some SSPs, the MA is the SSP integrator for several associate
contractors. The next section of this chapter contains guidance specific to the management of a complex
program with multiple subcontractors requiring an Integrated System Safety Program Plan.
The first step is to develop a plan that is specifically designed to suit the particular project, process,
operation, or system. An ISSPP should be developed for each unique complex entity such as a particular
line-of -business, project, system, development, research task, or test. Consider a complex entity that is
comprised of many parts, tasks, subsystems, operations, or functions and all of these sub-parts should be
combined logically. This is the process of integration. All the major elements of the ISSPP should be
integrated. How this is accomplished is explained in the following paragraphs.
1
Military Standard 882C, explains and defines System Safety Program Requirements, Military Standard 882D is a current update
as of 1999. This version no longer provides the details that version C had provided.
5 - 18
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
the project. It includes integrated efforts of management, team members, subcontractors and all other
participants.
The objective is to establish a management integrator to assure that coordination occurs between the
many entities that are involved in system safety. The tasks and activities associated with integration
management are defined in the document. The ISSPP becomes a model for all other programs within the
effort. Other participants, partners, sub-contractors are to submit plans which are to be approved and
accepted by the integrator. The Plans then become part of the ISSPP.
For large or complex efforts where an ISSPP has been established, activities of the Integrated System
Safety Working Group (ISSWG) are defined in the ISSPP. The ISSWG includes responsive personnel
who are involved in the system safety process. The plan specifically indicates that, for example,
Operations, System Engineering, Test Engineering, Software Engineering, and System Safety
Engineering personnel are active participants in the ISSWG. The integrator may act as the chair of the
ISSWG with key system safety participants from each sub-entity. The group may meet formally on a
particular schedule. Activities are documented in meeting minutes. Participants are assigned actions.
A similar or sub-technique of PERT is known as Critical Path Method (CPM).3 It also involves the
identification of all needed steps from a decision to a desired conclusion --depicted systematically –to
determine the most time-consuming path through a network. This is designated on the diagram as the
“critical path”. The steps along the path are “critical activities”.
Because of the dynamics and the variability of safety management efforts, the networks developed should
suit the complexity required. For large programs a master PERT network can be developed with lower
level PERT charts referenced to provide needed detail. The use of CPM, in conjunction with PERT, can
explore possible variables that influence programs.4 Further detail on PERT and CPM can be acquired
from the references.
2
J.V. Grimaldi and R.H. Simonds, Safety Management, Richard D. Irwin, Inc. Homewood, Illinois, Third Edition, 1975.
3
IBID, Grimaldi
4
System Safety Society, System Safety Analysis Handbook, 2nd Edition, 1997.
5
J. Stephenson, System Safety 2000, A Practical Guide for Planning, Managing, and Conducting System Safety Programs, Van
Nostrand Reinhold, New York, 1991.
5 - 20
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Verification
Test Assessment
Environmental Analysis
Demonstration
Vibration Similarity
Thermal Inspection
Acoustic Validation of records
Modal Survey Simulation
EMC Review of design documentation
Functional
Performance
Figure 5-4: Safety Verification Methods
hazard is to be closed. This activity is conducted and/or reviewed during ISSWG meetings or formal
safety reviews.
Integrated Risk/Hazard Tracking and Risk Resolution is accomplished by the use of the Safety Action
Record (SAR). The SAR document captures the appropriate elements of hazard analysis, risk assessment
and related studies, conducted in support of system safety. See Chapter 2 for a discussion of the Hazard
Tracking/Risk Resolution process ( Paragraph 2.2.1.5)
It should be noted that no single method of verification indicated above provides total system safety
assurance. Safety verification is conducted in support of the closed-loop hazard tracking and risk
resolution process.
Hazard Control Analysis considers the possibility of insufficient control of the system. Controls are to be
evaluated for effectiveness. They are to enhance the design. Keep in mind that system safety efforts are
5 - 21
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
not to cause harm to the system. Consider that any change to a system must be evaluated from a system
risk viewpoint. For more information regarding verification and validation see the FAA System
Engineering Manual.
Specific training is to be conducted for system users, controllers, systems engineers, and technicians.
Training considers normal operations with standard operating procedures, maintenance with appropriate
precautions, test and simulation training, and contingency response. Specific hazard control procedures
will be recommended as a result of analysis efforts. See Chapter 14 for more information on System
Safety training.
Concepts of system safety integration are also applied systematically through formal accident
investigation techniques. Many systematic techniques have been successfully applied for example6:
Scenario Analysis (SA), Sequentially Timed Events Plot (STEP), Root Cause Analysis (RCA), Energy
Trace Barrier Analysis (ETBA), Management Oversight and Risk Tree (MORT), and Project Evaluation
Tree (PET).7 For further details consult the references provided. Consider that hazard analysis is the
inverse of accident investigation and similar techniques are applied in the application of inductive and
deductive processes of hazard analysis and accident investigation.
6
IBID, System safety Society
7
IBID, Stephenson
5 - 22
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
5.4.13 Integrated Inputs to the ISSPP
The external inputs to the system safety process are the design concepts of the system, formal documents,
engineering notebooks, and design discussions during formal meetings and informal communications.
The on-going output of the system safety process is hazard analysis, risk assessment, risk mitigation, risk
management, and optimized safety.
Inputs:
• Concept of Operations
• Requirements Document
• System/Subsystem Specification
• Management and System Engineering Plans, (e.g. Master Test Plan)
• Design details
Design engineers are key players in the system safety effort. Together with systems engineers, they
translate user requirements into system design and are required to optimize many conflicting constraints.
In doing this, they eliminate or mitigate known hazards but may create unidentified new hazards. System
safety provides design engineers with safety requirements, validation and verification requirements, and
5 - 23
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
advice and knowledge based on the SSP's interfacing with the many participants in the design and
acquisition processes.
On a typical program, safety engineers interface with a number of other disciplines as reflected in Table
5-3. In most cases, the frequency of interfacing with these other disciplines is less than that with the
design engineers. Nevertheless, the exchange of data between safety engineering and the program
functions is both important and in some cases mutually beneficial.
Reliability engineers, for example, perform analyses usable by and often without additional cost to safety
engineering. These analyses do not supplant safety-directed analyses. They provide data that improve
the quality and efficiency of the safety analysis process. Three types of reliability analyses are reliability
models, failure rate predictions, and Failure Modes and Effects Criticality Analysis (FMECA).
The safety/maintainability engineering interface is an example of providing mutual benefits. The system
safety program analyzes critical maintenance tasks and procedures. Hazards are identified, evaluated, and
appropriate controls employed to minimize risk. Maintainability analyses, on the other hand, provide
inputs to the hazard analyses, particularly the Operational and Support Hazard Analyses (O&SHA).
5 - 24
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Table 5-3: Other Engineering Organizations Involved in Safety Programs
ORGANIZATION NORMAL SAFETY FUNCTIONS
FUNCTIONS
Design Design equipment and Analyses safest designs and procedures. Ensures that safety
Engineering system to meet requirements in end product item specifications and codes
contractual are met. Incorporates safety requirements for subcontractors
specifications for and vendors in specifications and drawings.
mission
Human (Factors) Ensures optimal Analyses human machine interface for operation,
Engineering integration of human, maintenance, repair, testing, and other proposed tasks to
machine, and minimize human error, provide safe operating conditions,
environment. and to prevent fatigue. Makes procedural analysis.
Reliability Ensures equipment will Performs failure modes and effects criticality analysis
Engineering operate successfully for (FMECA) and failure rate predictions quantifying
specific periods under probability of failure. Performs tests, as necessary, to
stipulated conditions. supplement analytical data. Reviews trouble and failure
reports for safety connotations.
Maintainability Ensures hardware status Ensures that operating status can be determined, minimizes
Engineering and availability. wearout failures through preventative maintenance, and
provides safe maintenance access and procedures.
Participates in analyzing proposed maintenance procedures
and equipment for safety aspects.
Test Engineering Conducts laboratory and Evaluates hardware and procedures to determine whether
field tests of parts, they are safe in operation, whether additional safeguards are
subassemblies, necessary. Determines whether equipment has any
equipment, and systems dangerous characteristics or has dangerous energy levels or
to determine whether failure modes. Evaluates effects of adverse environments
their performance meets on safety.
contractual requirements.
Product (Field) Maintains liaison Assists customer on safety problems encountered in the
Support between customer and field. Constitutes the major channel for feedback of field
producing company. information on performance, hazards, accidents, and near
misses.
Production Determines most Ensures that designed safety is not degraded by poor
Engineering economical and best workmanship and unauthorized production process changes.
means of producing the
product in accordance
with approved designs.
Industrial Safety Ensures that company Provides advice/information on accident prevention for
personnel are not injured industrial processes and procedures.
nor company property
damaged by accidents.
Training Improves technical and Ensures that personnel involved in system development,
managerial capabilities production, and operation are trained to the levels necessary
of company and user for safe accomplishment of their tasks.
personnel.
5 - 25
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Close cooperation between system safety and quality assurance (QA) benefits both functions in several
ways. QA should incorporate, in its policies and procedures, methods to identify and control critical items
throughout the life cycle of a system. The safety function flags safety-critical items and procedures. QA
then can track safety-critical items through manufacturing, acceptance tests, transportation, and
maintenance. New or inadequately controlled hazards can then be called to the attention of the safety
engineer.
Human engineering (HE) and safety engineering are often concerned with similar issues and related
methodologies, (See Chapter 17, Human Factors Safety Principles). HE analyzes identified physiological
and psychological capabilities and limitations of all human interfaces. A variety of human factors inputs
affect the way safety-critical items and tasks impact the production, employment, and maintenance of a
system. Environmental factors that affect the human-machine interface are also investigated and safety
issues identified.
The safety/testing interface is often underestimated. Testing can be physically dangerous. The safety and
test engineers must work together to minimize safety risk. Testing is a vital part of the verification
process and must be included in a comprehensive SSP. It verifies the accomplishment of safety
requirements. Testing may involve:
• Components
• Mock-ups
• Simulations in a laboratory environment
• Development and operation test and evaluation efforts.
System safety may require special tests of safety requirements or analyze results from other tests for
safety verification.
The requirements for interface between safety and product support are similar to those involving safety
and manufacturing. Each examines personnel and manpower factors of design. System safety ensures
that these areas address concerns related to identified hazards and the procedures. Operational,
maintenance, and training hazard implication are passed on to the user as a result of the design and
procedural process.
5.7 Tailoring
An effective SSP is tailored to the particular product acquisition. The FAA's policy is to tailor each SSP
to be compatible with SSMP, the criticality of the system, the size of the acquisition, and the program
phase of that system's life cycle. The resultant safety program becomes a contractual requirement placed
upon system contractors and subcontractors.
Readily adaptable to the FAA's mission, MIL-STD-882D was created to provide a standardized means for
establishing or continuing SSPs of varying sizes at each phase of system development. The SSMP along
with Mil-Std-882 contains a list of tasks from which the FAA program manager may tailor an effective
SSP to meet a specific set of requirements. Each task purpose is stated at the beginning of each task
description. Fully understanding these purposes is critical before attempting to tailor an SSP. There are
three general categories of programs: Low Risk, Moderate Risk, and High Risk.
5 - 26
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
Selecting the appropriate category is difficult and in practice depends on some factors difficult to
quantify, particularly in the early phases of a program. Therefore, this decision should be reviewed at
each phase of the program, permitting the best information available to direct the magnitude of the safety
program. The following steps applied to the risk methodology in Chapter 3 illustrate the technique used
for the program risk decision process.
• Generate a CRA (and PHA if needed) in the IA phase. These analyses will provide the types
and risks of hazards. The development of an airframe and that of a ground communications
system could both produce a system that can lead to death, a Severity 1 or 2 hazard. A
development program that is far more complex and includes more Severity 1 or 2 hazards,
with a higher probability of occurrence than another, is clearly a high risk program, the other
a low risk one. The PHL includes information from sources such as safety, analytical, and
historical experience from similar systems and missions. The PHL process should be updated
and continued in the investment analysis phase.
• Begin the Preliminary Hazard Analysis (PHA) as soon as possible. The PHA focuses on the
details of the system design. In addition to the historical experiences used for the PHL,
information about technologies, materials, and architectural features such as redundancy are
available as sources to the PHA. Systems using new and immature technologies or designs
are more risky than those that use proven technologies or modifications of existing designs.
• Use a detailed hazard analysis to provide new and more precise information about safety risk
for the program production and deployment phases. This step will minimize the risk of
accidents during the test and evaluation process.
A major challenge that confronts government and industry organizations responsible for an SSP is the
selection of those tasks that can materially aid in attaining program safety requirements. Scheduling and
funding constraints mandate a cost-effective selection, one that is based on identified program needs. The
considerations presented herein are intended to provide guidance and rationale for this selection. They
are also intended to provoke questions and encourage problem solving by engineers, operations, and
support personnel.
After selection, the tasks must be identified and tailored to match the system and program specifications.
It is important to coordinate task requirements with other engineering support groups (e.g., reliability,
logistics) to eliminate duplication of tasks and to become aware of additional information of value to
system safety. The timing and depth required for each task, as well as action to be taken based on task
outcome, are program requirements. For these reasons, precise rules are not stated.
Some contractual activities provide cost savings, flexibility, and pre-award planning without affecting
compliance or control. These are:
• Coordinate the delivery schedule of safety analysis deliverables with program milestones
such as a major design review rather than days after contract award. This prevents the need
for contractual changes to adjust for schedule changes. The deliverables should be provided
approximately 30 days prior to the milestones, thereby providing current information and the
ability of the reviewer to prepare for the design review. The deliverable can be established as
a major program milestone; however, this carries the risk of halting an entire program for a
single deliverable.
• Consider requiring updates to the first deliverable rather than autonomous independent
deliverables at major milestones. For example, if the first system hazard analysis is
5 - 27
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
scheduled for delivery at the Systems Design Review (SDR), the submittal required at the
Preliminary Design Review (PDR) might be limited to substitute and supplementary pages.
This requires planning such as configuration control requirements (e.g., page numbering and
dating schemes).
• If major design decisions that significantly affect the cost of safety analyses are expected
during the contract, fix the size of the effort in a manner that maintains FAA control. An
example would be a flight control methodology decision such as would be applied to fly-by-
wire, glass cockpit, or mechanical systems. The number of fault trees required in a safety
analysis depends on the system selected. A good contractual approach would be to fix the
number of fault trees to be provided during negotiations. The contract would reflect that both
the FAA and the contractor must agree on which fault trees are to be performed. Thus the
task can be tailored to the design well downstream from contract award without affecting
performance or cost.
• Maintain a reasonable balance between the analyses and deliverables specified. When the
program manager determines that limiting the deliverables is economically necessary, the
contractor must maintain a detailed controlled and legible project log that is available for MA
review and audit. A compromise approach would be to permit deliverables in contractor
format eliminating formatting costs. Requiring FAA approval of alternating deliverables may
also be considered. In this situation, program control is maintained at the program major
milestones. The MA has the option of reviewing the status of all safety tasks and analyses at
these points in the program. The MA has approval authority at each formal design review.
This control is more significant than that of a single deliverable.
The tasks below are recommended as a minimum effort for a small SSP.
8
FAA System Engineering Manual
5 - 28
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
There are hazard review checklists available for hazard risk identification. These checklists can be found
in System Safety literature and within safety standards and requirements. (See bibliography)
The PHA is developed as an output of the preliminary hazard list. It is the expansion of this list to
include risks, hazards, along with potential effects and controls.
An in-depth hazard analysis generally follows the PHA with a subsystem hazard analysis (SSHA), a
system hazard analysis (SHA), and an operating and support hazard analysis (O&SHA) as appropriate.
For most small programs, a PHA will suffice when appropriate. The PHA then should include all
identified risks, hazards, and controls that are associated with the lifecycle of the system.
A comprehensive evaluation is needed of the risks being assumed prior to test or evaluation of the system
or at contract completion. The evaluation identifies the following:
• All safety features of the hardware, software, human and system design
• Procedural risks that may be present
• Specific procedural controls and precautions that should be followed
The risks encountered in a small program can be as severe and likely to occur as those in a major
program. Caution needs to be exerted to ensure that in tailoring the system safety effort to fit a small
program, one does not over-reduce the scope, but instead uses the tailoring process to optimize the SSP
for the specific system being acquired, or evaluated.
• If hazard data are available, identify the system safety analyses needed and date they are
required.
• Identify and perform any additional system safety analyses needed for interfaces between
GFE and the other systems.
• Ideally, the GFE has sufficient history available to the FAA that unsatisfactory operating
characteristics are well known or have been identified in previous hazard analyses. The MA
should identify these unsatisfactory characteristics or provide the analyses, if available, to the
contractor. The contractor will then compensate for these characteristics in the interface
design. In some cases, such characteristics may not be known or analyses and/or history is
not available. Then either the contractor or the MA must perform the analyses necessary for
interface design.
More complex and critical items require a MA decision process to ensure that the risk of accident is
acceptable. Commercial subsystem development for items such as a radio or system development for
aircraft are likely to include some form of failure-related analysis such as a FMECA or fault tree analysis.
A review of this contractor-formatted analysis may provide the necessary assurance. A poorly or non-
documented analysis provides the opposite effect.
The COTS/NDI concept provides significant up-front cost and schedule benefits but raises safety and
supportability issues. For the NAS to benefit fully from COTS/NDI acquisitions, the SSP must be able to
ensure the operational safety of the final system without unnecessarily adding significantly to its
acquisition cost. The retrofitting of extensive safety analyses or system modifications may negate any
advantage of choosing COTS/NDI
For COTS/NDI acquisitions, a safety assessment for the intended use should be performed and
documented before purchase. Such analyses should contribute to source and/or product selection. This
should be contained in the buyer’s SSPP. COTS/NDI will be evaluated for operational use by
considering all aspects of the item's suitability for the intended purpose. Suitability criteria should
include technical performance, safety, reliability, maintainability, inter-operability, logistics support,
expected operational and maintenance environment, survivability, and intended life cycle. To assure risk
acceptability, appropriate hazard analysis must be conducted to evaluate the risks associated with initial
field testing of COTS/NDI.
Many developers of COTS/NDI may not have SSPs or staff to assess the suitability of COTS/NDI
proposed for NAS applications. Therefore, the MA must do the following.
• Establish minimum analysis requirements for each procurement. These vary due to the
nature of the item being procured and the criticality of its mission. Examples include mission
and usage analysis and specific hazard analyses to determine the potential system impact on
the remainder of the system or the NAS itself.
• Include in each procurement document the system safety analyses required for accurate and
standardized bidding
• Restrict the application of the procured COTS/NDI to the missions analyzed, or reinitiate the
analysis process for new missions.
• Apply skillful, creative tailoring when limiting the SSP scope to accommodate program size
and procurement schedules.
• Marketing investigation, hazard analysis, and System Safety Working Groups are additional
considerations and are explained below.
5 - 30
FAA System Safety Handbook, Chapter 5: Post-Investment Decision Safety Activities
December 30, 2000
evaluated by government and non-government agencies such as the FAA, Department of Defense (DOD),
and Underwriter Labs. It must then determine what this information provides when compared to mission
requirements. The following basic questions form the basis of a COTS/NDI procurement checklist, such
as:
• Has the system been designed and built to meet applicable or any safety standards? Which
ones?
• Have any hazard analyses been performed? Request copies of the analyses and the reviewing
agency comments.
• What is the accident and accident history for the system? Request specifics.
• Are protective equipment and/or procedures needed during operation, maintenance, storage,
or transport? Request specifics.
• Does the system contain or use any hazardous materials, have potentially hazardous
emissions, or generate hazardous waste?
• Are special licenses or certificates required to own, store, or use the system?
Hazard Analysis
A safety engineering report may be all that is necessary or available to gather detailed hazard information
concerning a COTS/NDI program. If the selected program must be modified to meet mission
requirements, other hazard analyses may be required, especially if the modifications are not otherwise
covered.
5 - 31
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
December 30, 2000
Chapter 6:
• Acquisition planning,
• Documentation of detail requirements
• Communicating requirements to industry, and
• Evaluation of the resulting proposals or bids,
• Negotiation and/or selection of the source to perform the contract, and
• Management of the awarded contract to assure delivery of the supplies or services
required.
The execution of these steps should be tailored for each acquisition. Figure 6-1 illustrates a sample
acquisition from planning through contract negotiation. The following paragraphs describe the activities
within the contracting process.
6- 2
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Acceptable
Hazard Risk
Equipment
Sys. Safety Specification
Design Screening Contractor
Requirements Information Selection
Request &
RFP Negotiation
Safety
SSP Bidders
Instructions
PHL
Safety
CDRL
Statement
Requirements
of
SSPP Work
Requirements
For the former, qualified technical personnel must either select and/or tailor an existing specification for the
items required or create a new one if an appropriate one does not exist. The specification must reflect two
types of safety data:
• Performance parameters (e.g., acceptable risk levels, specific safety criteria such as
electrical interlocks)
• Test & Evaluation Requirements (e.g., specific safety tests to be performed and/or specific
program tests to be monitored for safety.
6- 3
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Traditionally, administrative requirements have been specified in the request for proposal. MIL-STD-882D
has taken a position that given the technical requirements, defining the administrative requirements can be
left to the bidding contractor to define as part of the bidding process. The proposal evaluation team will
judge the adequacy of the proposed safety program. Inadequate proposed safety programs can either be
judged not-responsive or amended during negotiation.
The following administrative requirements must be defined and included in the negotiated contract and/or
Statement of Work (SOW):
• Delivery Schedule (e.g., Schedule of safety reviews, analyses, and deliverables. It is suggested
that delivery be tied to specific program milestones rather than calendar dates e.g., 45 days
before Critical Design Review).
• Data Requirements (e.g. Number of safety analysis reports to be prepared, required format,
content, approval requirements, distribution.)
Another valuable element of acquisition planning is estimating contractor costs of safety program elements
to assist in:
Each solicitation contains at least three sections that impact the final negotiated SSP:
• Equipment Specification
• Statement of Work (SOW)
• Instructions for preparation of proposals/bids and evaluation criteria. (Sections L and
M respectively)
6- 4
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
use of FAA and Military Standards can simplify the specification of design criteria. For example, FAA-G-
2100F provides physical safety design criteria. MIL-STD-1522 contains specific instruction for pressure
vessels, placement of relief valves, gauges, and high-pressure flex hose containment. MIL-STD-454,
Requirement 1 specifies design controls for electrical hazards and MIL-STD-1472 for ergonomic issues.
Whether these specifications are contractor prepared or supplied by the managing activity, it is important
that proper instructions are given directly to the designer who controls the final safety configuration of the
system.
MIL-STD-490 gives a format for preparing universally standard types of specifications. Appendix I of
MIL-STD-490 identifies the title and contents of each paragraph of the system specification. Other
appendices describe other types of specifications, such as prime item development, product, and so on.
Several paragraphs in each specification are safety related. These include:
6- 5
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The SOW task descriptions can consist of a detailed statement of the task or contain only references to
paragraphs in other documents such as MIL-STD-882 or this handbook. Elaborate task descriptions are
not required. A simple statement, however, in the body of the SOW such as, "The contractor shall conduct
a System Safety Program to identify and control accident risk" does not define the safety requirements
adequately. A contractor might argue that it is only required to caution it’s design team to look out for and
minimize hazards.
• The requirement for planning and implementing an SSP tailored to the requirements of
MIL-STD-882.
• Defining relationships among the prime contractor and associate contractors,
integrating contractors, and subcontractors i.e. "Who's the Boss?".
• The requirement for contractor support of safety meetings such as System Safety
Working Groups (SSWG). If extensive travel is anticipated, either the FAA should
estimate the number of trips and locations or structure the contract to have this
element on a cost reimbursable basis.
• Definition of number and schedule of safety reviews, with a statement of what should
be covered at the reviews. Safety reviews are best scheduled for major design reviews,
such as the system design review, preliminary design review, and critical design
review.
• Requirement for contractor participation in special certification activities, such as for
aircraft. The FAA may anticipate that support from a communications supplier may
be necessary for the aircraft certification process.
• Procedures for reporting hazards. The CDRL will specify the format and delivery
schedule of hazard reports. Note that permitting contractor format can save
documentation costs but, in the case where there are multiple contractors may make
integration difficult.
6- 6
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Preparing data submissions can be expensive and represent a major portion of the contractor's safety
resources. The system safety data requirements listed on the CDRL, therefore, should represent only the
absolute minimum required to manage or support the safety review and approval process. Two choices are
to be made and reflected in the CDRL: 1) Should the contractor prepare the data in a format specified by a
data item description (DID) or in contractor format. 2) Which submittals require approval for acceptance
and payment.
The contractor does not get paid for data not covered by the CDRL/DID. He is not obligated to deliver
anything not required by a CDRL. It is advantageous to effectively utilize the DIDs when available. When
specifying DIDs they should be examined carefully, sentence by sentence, to assure applicability. It is
6- 7
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
suggested that the data review and approval cycle be 30-45 days. Longer review cycles force the
contractor, in many cases, to revise an analysis of an obsolete configuration.
RFP PROPOSAL
NOTE: This approach takes advantage of standardized DIDs and does not mean to imply that page
limitations on system safety plans are inappropriate. A well-prepared plan can cover the subject in less
than 50 pages.
To encourage attention on system safety in the technical proposal, the bidders instructions should include
wording such as: "The offeror shall submit a summary of system safety considerations involved in initial
trade studies." In later development phases, it may be advantageous to require the offeror to "submit a
preliminary assessment of accident risk." The validation phase may require the bidder to describe system
safety design approaches that are planned for particularly high-risk areas (i.e., separated routing of
6- 8
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
hydraulic lines, or separate room installation of redundant standby generators.) During this program phase,
the following statement could be included:
The offeror shall submit a description of the planned system safety design
and operational approach for identification and control of safety-critical,
high-risk system design characteristics.
As previously noted, the RFP can request submission of draft data items, such as the SSPP or Preliminary
Hazard List (PHL), before contract award. Alternatively, the bidders can be instructed to discuss their
proposed SSP in detail, including typical hazards and design solutions for them or candidate hazards for
analysis. Careful wording can provide almost the same results as a draft data item. Key areas of interest,
such as personnel qualifications or analysis capabilities, can be cited from data items as guides for the
bidders' discussions. For example, "discuss your proposed SSP in detail using data item DI-SAFT-80100,
paragraphs 10.2 and 10.3, as a guide." Using DI-SAFT-80100 as a guide, sample criteria could include
the following:
• Describe in detail the system safety organization, showing organizational and functional
relationships and lines of communication
• Describe in detail the analysis technique and format to be used to identify and resolve hazards
• Justify in detail any deviations from the RFP.
Proposals are evaluated against the award criteria included in the RFP. If safety is not listed in the award
criteria, the bidder's responses to safety requirements have little impact on the award decision.
Negotiations take place with each contractor still in contention after initial review. The IPT members
review in detail all segments of each contractor's proposal and score the acceptability of each element in the
evaluation criteria. Extensive cost and price analysis of the contractors' proposals must be accomplished so
that a determination that the final price is "fair and reasonable" to the government and to the contractor.
The relative proposed cost of the SSP reflects on the seriousness that each contractor places on System
Safety. It is not, in itself the ultimate indicator, as some contractors may "work smarter" than others.
• Proposal Evaluation
• Contractor Evaluation
• Negotiation
6- 9
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The data that follows is divided into eight groups and provided in a checklist format. The contents are
comprehensive and should be tailored for each application. A contractor's response to an RFP that
addresses all issues listed below is likely to be large for most proposals. Additionally, adherence to the
complete list is not appropriate for many acquisitions. Formal questions to the bidders or discussions
during negotiations can resolve reasonable omissions.
6 - 10
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
See Chapter 5 for a more detailed discussion of SSPP contents and the SSPP template. The ISSPP should
be considered a special case of the SSPP that involves multiple major subcontractors that must be
integrated by the Prime Contractor/Integration Contractor.
Contractor's SSP
Requirements and guidance for a contractor's SPP are specified in the Statement of Work (SOW) and the
Data Item Description (DID). Good SSP's have the following characteristics which should be reflected in
either the SSPP or internal documented practices:
• Review of and provide inputs to all plans and contractual documents related to safety.
• Maintenance of safety-related data, generated on the program by the safety staff.
6 - 11
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
• Maintenance of a log, available for FAA review, of all program documentation reviewed
and records all concurrence, non-concurrence, reasons for non-concurrence, and actions
taken to resolve any non-concurrence.
• Coordination of safety-related matters with contractor program management and all
program elements and disciplines.
• Coordination of system safety, industrial safety, and product safety activities on the
program to ensure protection of the system during manufacture and assembly.
• Establishment of internal reporting systems and procedures for investigation and
disposition of accidents and safety incidents, including potentially hazardous conditions not
yet involved in an accident/incident; such matters are reported to the purchasing office as
required by the contract.
• Performance of specified Hazard Analyses.
• Participation in all requirements reviews, preliminary design reviews, critical design
reviews, and scheduled safety reviews to assure that:
- All contractually imposed system safety requirements are met.
- Safety program schedule and CDRL data deliverable content are
compatible.
- Hazard analysis method formats, from all safety program participants,
permit integration in a cost effective manner.
- Technical data are provided to support the preparation of required
analyses.
• Participates in all test, flight, or operational readiness reviews and arranges for
presentation of required safety data.
• Provision for technical support to program engineering activities on a daily basis. Such
technical support includes consultation on safety-related problems, research on new
product development, and research and/or interpretation of safety requirements,
specifications, and standards.
• Planned participation in configuration control board activities, as necessary, to enable
review and concurrence with safety-significant system configuration and changes.
• Review of all trade studies. Identification of those that involve or affect safety.
Participation in all safety related trade studies to assure that system safety trade criteria
are developed and the final decision is made with proper consideration of accident risk.
• Provisions for system safety engineering personnel participation in all trade studies
identified as being safety-related. Ensure that safety impact items and accident risk
assessments are given appropriate weight as decision drivers.
• Provides trade study documentation that shows the accident risk for the recommended
solution is equal to or less than the other alternative being traded, or provide sufficient
justification for recommending another alternative.
6 - 12
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
• Identification of any deficiencies regarding safety analysis or risk assessment, when they
are not provided with government-furnished equipment and property.
• Identification of deficiencies where adequate data to complete contracted safety tasks is not
provided.
• Acknowledgement of specified deliverable safety data format, as cited on the CDRL.
Where no format is indicated, the contractor may use any format that presents the
information in a comprehensible manner.
• Provision for safety certification of safety-critical program documentation and all safety
data items contained in the CDRL.
• Recognition that the SSP encompasses operational site activities. These activities include
all operations listed in operational time lines, including system installation, checkout,
modification, and operation.
• Acknowledgment that SSP consideration must be given to operations and interfaces, with
ground support equipment, and to the needs of the operators relating to personnel
subsystems, such as panel layouts, individual operator tasks, fatigue prevention,
biomedical considerations, etc.
• Incorporation of facility safety design criteria in the facility specifications.
• Evaluation of the safety impact of system design changes. Revisions or updates subsystem
hazard analyses and operating and support hazard analyses to reflect system design
changes during the life of the program.
• Attention given to planning, design, and refurbishment of reusable support equipment,
including equipment carried on flight vehicles, to assure that safety is not degraded by
continued usage.
• Planned review of engineering change proposals (ECP) to evaluate and assess the impact
on safety design baseline. This safety assessment must be a part of the ECP and include
the results of all hazard analyses done for the ECP.
• Planned system safety training for specific types and levels of personnel (i.e., managers,
engineers, and technicians involved in the design, product assurance operations,
production, and field support). Safety inputs to training programs are tailored to the
personnel categories involved and included in lesson plans and examinations.
• Contractor safety training may also include government personnel who will be involved in
contractor activities.
• Safety training includes such subjects as hazard types, recognition, causes, effects, and
preventive and control measures; procedures, checklists, and human error; safeguards,
safety devices, and protective equipment, monitoring and warning devices, and contingency
procedures.
• Provision for engineering and technical support for accident investigations when deemed
necessary by the management activity. This support includes providing contractor
technical personnel to the accident investigation board.
6 - 13
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
• Contractor Integration
• System Safety Program Reviews/Audits
• System Safety Working Group/System Safety Working Group Support
• Hazard Tracking/Risk Resolution
• System Safety Progress Report
System
Design Fix Safety
Design Hazard
Management or Progress
Activity Analysis
Control Summary
Hazard
SSG/
Program Tracking
SSWG
Reviews Risk
Resolution
Contractor Integration
6 - 14
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Major program projects often require multiple associate contractors, subcontractors, integration
contractors, and architect and engineering (AE) firms. On these programs, the integrating contractor often
has the responsibility to oversee system safety efforts of associate contractors or AE firms.
A program with many associate contractors or subcontractors requires an ISSPP that provides, major
emphasis on the integration process, flowdown of system safety requirements and responsibilities, and
monitoring of subcontractor performance. This SSPP is called an Integrated System Safety Program Plan
(ISSPP), which generally follows the requirements of MIL-STD-882. Figure 6-4 illustrates the ISSPP
additional tasks.
The systems integrator or construction contractor has the visibility and, therefore, must have the
responsibility of performing the system hazard analyses and assessments that cover the interfaces between
the various contractors' portions of the system or construction effort. When an integration contractor does
not exist, and the managing authority procures the subsystems directly, this responsibility is given to the
managing authority. In situations where an integration contractor exists, the managing authority must
clearly and contractually define the role and responsibilities of the integration contractor for the associate
contractors. Management is responsible for assisting the integrator in these efforts to ensure that all
contractors and firms mutually understand the system safety requirements and their respective
responsibilities in order to comply with them.
Many No SSPP
Associate
See
Contractors Chapter 5
?
Yes
Analysis Provide
Risk
Establish Structure Requirement Guidance
ISSPP Contract for Systems Analysis to all
of System
Interfaces Contractors
To be Included in ISSPP
The following is a list of tasks from which the managing authority may choose the systems integration
contractor's responsibilities. Those selected should be included in the RFP and SOW.
1. Prepare ISSPP following the requirements. The ISSPP will define the role of the systems integration
contractor and the effort required from each associate contractor to help integrate system safety
requirements for the total system. In addition, the plan may address and identify:
(a) Definitions of where the control, authority, and responsibility transitions from the integrating
contractor to the subcontractors and associate contractors
6 - 15
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
(b) Analyses, risk assessment, and verification data to be developed by each associate contractor
with format and method utilized
(c) Data each associate contractor is required to submit to the integrator and scheduled delivery
keyed to program milestones
(d) Schedule and other information considered pertinent by the integrator
(e) The method of development of system-level requirements to be allocated to each associate
contractor as a part of the system specification, end-item specifications, and other interface
documents
(f) Safety-related data pertaining to off-the-shelf items
(g) Integrated safety analyses to be conducted and support required from associate contractors and
subcontractors
(h) Integrating contractor's roles in the test range or other certification processes
(i) SSP milestones
2. Initiate action through the managing authority to ensure each associate contractor is required to be
responsive to the ISSPP. Recommend to the management contractual modification where the need exists.
3. Examine the integrated system design, operations, and specifically the interfaces between the products
of each associate contractor during risk assessment. This requires using interface data that can often only
be provided by an associate contractor.
4. Summarize the mishap risk presented by the operation of the integrated system during safety
assessments.
5. Provide assistance and guidance to associate contractors regarding safety matters.
6. Resolve differences between associate contractors in areas related to safety, especially during
development of safety inputs to systems and item specifications. When the integrator cannot resolve
problems, notify the managing authority for resolution and approval.
7. Initiate action through the managing authority to ensure information required by an associate contractor
from the integrating contractor (or other associate contractors) to accomplish safety tasks is provided in an
agreed-to format. Establish associated logs to prevent such requests from "becoming lost."
8. Develop a method of exchanging safety information between contractors. If necessary, schedule and
conduct technical meetings between all associate contractors to discuss, review, and integrate the safety
effort. Provide for informal one-on-one telephone contact. Consider establishing system safety databases
at the systems integration contractor with telephone access and/or the distribution of monthly safety reports
featuring contributions from each contractor. These may be extracted from monthly progress reports, if the
progress report requirements are specified accordingly.
9. Implement an audit program to ensure that the objectives and requirements of the SSP are being
accomplished. Notify in writing, any associate contractor of its failure to meet contract program or
technical system safety requirements for which it is responsible. The integrator for the safety effort will
send a copy of the notification letter to the managing authority, whenever such written notification is given.
Establish a deficiency log to track the status of any such issues
• Imposition of MIL-STD-882D
• Imposition of this System Safety Handbook
• Designation of the system safety integrating contractor
• Designation of the status of the other contractors
• Requirements for any special integration safety analyses
• Requirements to support test, environmental, and/or other certification processes.
• Test procedures must include inputs from the safety analyses and identify test and
operations and support requirements.
• Verification of system design, and operational planning compliance with test or
operating site safety requirements, is documented in the final analysis summary.
• Establishment of internal procedures for identification and timely action or
elimination/control of potentially hazardous test conditions induced by design
deficiencies, unsafe acts, or procedural errors. Procedures should be established to
identify, review, and supervise potentially hazardous, high-risk tests, including those
tests performed specifically to obtain safety data.
• Contractor system safety organization review and approval of test plans, procedures,
and safety surveillance, procedures, and changes to verify incorporation of safety
requirements identified by the system analysis. The contractor system safety
organization assures that an assessment of accident risk is included in all pretest
readiness reviews.
• Safety requirements for support equipment are identified in the system safety analyses.
• Support equipment safety design criteria are incorporated in the segment
specifications.
• Test, operations, and field support personnel are certified as having completed a
training course in safety principles and methods.
• Safety requirements for ground handling have been developed and included in the
transportation and handling plans and procedures. Safety requirements for operations
and servicing are included in the operational procedures. The procedures are upgraded
and refined, as required, to correct deficiencies that damage equipment or injure
personnel.
6 - 17
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Safety Audits
System safety audits should be conducted by the system safety manager and, on a periodic basis, by a
contractor management team independent of the program. The list of issues to be included in the audit
program may be selected from the following list:
♦ They are designing and producing items whose design or quality will not
degrade safety
♦ Safety analyses are conducted as required
♦ System safety problems are being brought to the attention of their own
program managers and prime contractor management.
• For each program, group the items in the checklist into four categories:
• Those explicitly required by the SOW and/or contract
• Those that, in the view of the reviewer, are desirable or necessary to perform in meeting the
explicitly stated requirements
• Those that are not applicable to the program for which the evaluation is being performed
• Those that, in the opinion of the evaluator, were not included in the RFP, SOW, or contract.
• For purposes of evaluation, the latter two categories must handled delicately. If an important omission
was made by a bidder(s) and not explicitly included in the RFP, all bidders must be given an equal
opportunity to bid the missing SSP elements.
• Ultimately, the first two categories are used for evaluation. Clearly, the decision process must utilize
the explicitly stated or negotiated requirements. The applicable elements in the checklist can be graded
requirement by requirement either as simply compliant or non-compliant or by assigning "grades" to
the response of each requirement. Grade responses numerically reflect the degree of compliance as:
6 - 18
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The final step is to add (or average) the scores for each bidder to determine acceptability or the best. For
close decisions, the process can be repeated for the implicit requirements as described in group 2 above.
6 - 19
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
• Corporate or Division. Many companies establish safety policies at the Corporate and Division
levels. These safety policies or standards are imposed on all company development and/or
production activities. The presence of such standards, accompanied by audit procedures can
provide the evaluation team with an indication of company commitment, standardized safety
approaches, and safety culture.
• Procurement Activity. Contractors write specifications and SOWs for subcontractors and
vendors. An internal procedure or actual examples of previous subcontracts should demonstrate an
intelligent process or requirements "flow down". It is not sufficient to impose system safety
requirements on a prime contractor and monitor that contractor's SSP if that contractor uses major
system components developed without benefit of a SSP.
• Management of Program's SSP. The contractor's SSPP describes in detail planned management
controls. The plan should reflect a combination of contractual direction, company polices, and
"hands-on" experience in developing, managing, and controlling the SSP and its resources. The
contractor's SSP manager's credentials must include knowing not only company policies,
procedures, and practices but also the technical requirements, necessary activities and tools, and
the characteristics of the operational environments.
• Contractor's Engineering SSP. The system safety engineer should possess in-depth knowledge of
engineering concepts including hazard risk assessment and control, the system, and associated
accident risk to implement the SSP. The engineer develops design checklists, defines specific
requirements, performs hazard analyses, operates or monitors hazard tracking systems, and in
conjunction with the design team implements corrective action. Qualifications of system safety
personnel are discussed in Chapter 4.
• Specifications and Requirements. The potential exists for engineers and designers, possessing
minimal safety knowledge, to be charged with incorporating safety criteria, specifications, and
requirements into the system or product design. It is essential that this activity be monitored by
system safety engineering to verify that these requirements and criteria are incorporated in the
design. It is important that someone with system safety competence "flow down" the safety
requirements throughout the "specification tree". It is the lower level specifications (C typically)
that are the detailed design criteria which get translated into the design. If safety requirements are
not properly incorporated at this level they will be missed in the design process.
• Operational or Test Location. The contractor must demonstrate in his SSPP, Test Plans, and
Logistics documentation that the SSP does not end at the factory door. The contractor must
consider safety during test programs and planned support for government or system integrator
activates.
6 - 20
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Personnel Qualifications and Experience. To provide decision makers with competent hazard
risk assessments, the FAA’s program/assistant manager must insist that the contractor have qualified,
responsive system safety management and technical personnel. This is necessary since the contractor’s
system safety manager is the one who certifies, for his employer, that all safety requirements have been
met. Necessary qualifications vary from program to program as discussed in Chapter 5, Table 5-2
FAA sponsored programs are either the procurement of hardware/systems or services. In the former, the
role of the evaluator is often to determine if bidding contractors have the capability (and track history) to
meet contractual requirements. In the latter case of acquisition of services, the evaluation may be more
focused on the qualification of individuals. In either case, the evaluator is usually provided resumes for
proposed individuals, in others more generic “job descriptions” that establish minimum qualifications for
well defined “charters”.
A useful approach to evaluating either proposed key positions resumes or job descriptions is to utilize a
“Job Analysis Worksheet”. A sample is included as Figure 6-5. It is appropriate to require key resumes
(and an obligation to use the associated individuals post award) in the Request for Proposal’s (RFP)
instructions to bidders. A Job Analysis Worksheet is a checklist of desired job requirements per required
skill level reflecting the knowledge, skills, and abilities (KSA) necessary to implement the program
successfully. The submitted key resumes or alternatively position descriptions is reviewed against the job
requirements as reflected in each KSA to determine if the candidate meets the FAA’s requirements. A
sample position description is provided as Exhibit 6-4.
6 - 21
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
1 Knowledge and ability to manage interrelationships of all components of an SSP in support of both
management and engineering activities. This includes planning, implementation, and authorization
of monetary and personnel resources.
2 Knowledge of theoretical and practical engineering principles and techniques.
3 Knowledge of systems
4 Knowledge of operational and maintenance environments.
5 Knowledge of management concepts and techniques.
6 Knowledge of this life-cycle acquisition process.
7 Ability to apply fundamentals of diversified engineering disciplines to achieve system safety
engineering objectives.
8 .Ability to adapt and apply system safety analytical methods and techniques to related scientific
disciplines.
9 Ability to do independent research on complex systems to apply safety criteria.
10 Skill in the organization, analysis, interpretation, and evaluation of scientific/engineering data in
the recognition and solution of safety-related engineering problems.
11 Skill in written and oral communication.
12 Ability to keep abreast of changes in scientific knowledge and engineering technology and apply
new information to the solution of engineering problems.
1 Acts as agent of the program manager for all system safety aspects of the program. Provides
monthly briefings to the program management on the status of the SSP.
2 Serves as system safety manager for safety engineering functions of major programs. (KSA 1
through 11)
3 Manages activities which review and evaluate information related to types and location of hazards.
(KSA 1,2,3,4,7,9,12)
4 Manages activities to perform extensive engineering studies to determine hazard levels and to
propose solutions. (KSA 1,2,6,7,8,9,11)
5 Manages the development of system guidelines and techniques for new/developing systems and
emerging technologies. (KSA 6,7,8,9,10,12)
6 - 22
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
Qualifications
Minimum of a baccalaureate degree in an engineering, applied science, safety or other closely related
degree appropriate to system safety. Some education or experience in Business Administration is desirable;
Certification as a Professional Engineer or as a Certified Safety Professional (CSP) licensed as a PE,
preferably in safety engineering, or credentials as a CSP in system safety aspects. Approximately 10 years
diversified experience in various aspects of system safety is desired; or demonstrated capability through
previous experience and education to perform successfully the duties and responsibilities shown below.
Serve as a professional authority for the SSP covering the planning, designing, producing, testing,
operating, and maintaining of product systems and associated support equipment. May be assigned to
small programs as system safety representative with duties as described below.
Review initial product system designs and advise design personnel concerning incorporation of safety
requirements into product system, support equipment, test and operational facilities based on safety
standards, prior experience, and data associated with preliminary testing of these items.
Assure a cooperative working relationship and exchange of operational and design safety data with
government regulatory bodies, customers, and other companies engaged in the development and
manufacture of aerospace systems. Act as a company representative for various customer and industry
operational and design safety activities and assist in the planning and conducting of safety conferences.
Evaluate new or modified product systems, to formulate training programs, for updating operating crews
and indoctrinating new employees in systems test and operational procedures. Establish training programs
reflecting latest safety concepts, techniques, and procedures.
Direct investigations of accidents involving design, test, operation, and maintenance of product systems and
associated facilities, and present detailed analysis to concerned customer and company personnel. Collect,
analyze, and interpret data on malfunctions and safety personnel, at all organizational levels; and keep
informed of latest developments, resulting from investigation findings, affecting design specifications or test
and operational techniques. Collaborate with functional safety organizations in order to set and maintain
safety standards. Recommend changes to design, operating procedures, test and operational facilities and
other affected areas; or other remedial action based on accident investigation findings or statistical analysis
to ensure maximum compliance with appropriate safety standards.
Coordinate with line departments to obtain technical and personnel resources required to implement and
maintain safety program requirements.
Figure 6-6 Sample Job Description
6 - 23
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
6.3.3 Negotiation
Negotiation consists of fact finding, discussion, and bargaining. The process leads to several benefits:
• A full understanding of the safety requirement by the contractor and of the contractor's
commitment to meeting and understanding of these requirements
• Correction of proposed SSP deficiencies.
• A mutual understanding of any safety tradeoffs that may be necessary. Trade-off
parameters include performance, schedule, logistics support, and costs.
The negotiation process is the last chance to insure that all necessary safety program and safety risk criteria
is incorporated in the contract. It permits both the FAA and the contractor to clear-up different
requirement interpretations and implementation conflicts. Just as importantly, the contractor and the FAA
can maximize effectiveness for planned safety program cost expenditures. Delivering System Safety
Assessment Reports (SSAR) or Safety Engineering Reports (SER), for example, in a specific media
format, e.g., a desktop publishing package may be an unexpected cost driver for a company that has
standardized on an office suite such as MS or Corel Office. Similarly, when approval of SARs is
specified, the contractor needs to cost assumed rework. If the assumption is high, the FAA may choose to
forgo approval on early program submittals and substitute comments instead. There are obvious risks
associated with foregoing approval on deliverables.
• Contract direction can only be provided through the Government contracting office.
• Government personnel must provided corrective feedback, as needed, in such a manner
that does not discourage candor and sharing of information. To that end, participation
in frequent Technical Information Meetings (TIMs) and other activities such as
Hazard Record Review Boards is a positive action.
• Formal review with official feedback is primarily provided through Major Program
Milestones (such as a Critical Design Review , CDR) and the contract deliverables,
e.g., S/SHA and SAR.
• SSPP
• Work breakdown of system safety tasks, subtasks, and manpower
6 - 24
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The safety review performed at PDR considers the identified hazards and looks at the intended design
controls. The cognizant FAA system safety manager usually reviews the following documents at this point:
6 - 25
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
During the documentation review, the following key points should be checked:
Finally, the government system safety manager must determine if the following requirements have been
met:
• Preliminary design meets requirements established by the negotiated contract
• Hazards, compatible with the level of system development have been identified
• Proposed hazard controls and verification methods are adequate
• Safety-critical interfaces have been established and properly analyzed.
• A Hazard Tracking and Incident Reporting System are in place.
6 - 26
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The requirements that must be met at CDR for a successful program are:
6 - 27
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The requirements for a successful safety program at the pre-operational phase are:
The purpose of such meetings is to provide greater emphasis on the details of the SSP progress and
analyses than is practical at a major milestone review. Given that they are required, the schedule duration,
the pace of development, and the phase of the program should determine the frequency. One scenario for a
two-year full-scale development program might include a kick-off safety meeting shortly after contract
award and one safety review prior to Preliminary Design Review (PDR). Special meetings during the T&E
phase would be held when test results suggest a need. Since one of the primary purposes of a special safety
review is to discuss safety program tasks in greater detail than is compatible with a major program
milestone schedule, some cost savings may be achieved by requesting parallel safety sessions at a major
milestone review. This approach permits the desired detail to be discussed without accumulating the costs
of an independent meeting.
All program reviews and audits provide an opportunity to review and assign action items and to explore
other areas of concern. A mutually acceptable agenda/checklist should be negotiated in advance of the
meeting to ensure all system safety open items are covered and that all participants are prepared for
meaningful discussions.
6 - 28
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
The acquisition of expensive, complex, or critical systems, equipment, or major facilities requires
considerable interaction between the integration contractor and associate contractors simultaneously. In
these situations, the managing authority may require the formation of a System Safety Working
Group/System Safety Working Group (SSWG). The SSWG is a formally chartered group of staff,
representing organizations participating in the acquisition process. This group exists to assist the managing
authority system program manager in achieving the system safety objectives. Contractor support of an
SSWG is useful and may be necessary to ensure procured hardware or software is acceptably free from
risks that could injure personnel or cause unnecessary damage or loss of resources.
The contractor, as an active member of the SSWG, may support the managing authority by providing or
supporting presentations to the government certifying activities such as phase safety reviews or safety
review boards. The following list provides management with SSWG support options to selectively impose
on contractors:
• Present the contractor safety program status, including results of design or operations risk
• Summarize hazard analyses, including identification of problems and status of resolution
• Present results of analyses of prior mishaps or accidents, and hazardous malfunctions,
including recommendations and action taken to prevent recurrences
• Respond to action items assigned by the chairman of the SSWG
• Develop and validate system safety requirements and criteria applicable to the program
• Identify safety deficiencies of the program and providing recommendations for corrective
actions or prevention of recurrence
• Plan and coordinate support for a required certification process
• Document and distribute meeting agendas and minutes
Hazard tracking need not be a complex procedure. Any hazard tracking tool that tracks the information
contained in Section 6.2 and complies with the SSMP and SSPP is acceptable for hazard tracking in the
FAA at the program level. The managing authority, the system integrator, or each contractor may maintain
the Safety Action Record (SAR) database. Each risk that meets or exceeds the threshold specified by the
6 - 29
FAA System Safety Handbook, Chapter 6: System Safety Guidelines for Contracting
August 2, 2000
managing authority should be entered into the SAR database when first identified. Each action taken to
eliminate the risk or reduce the associated risk is documented. Management will detail the procedure for
closing out the hazard or acceptance of any residual risk. The SAR may be documented and delivered as
part of the system safety progress summary using, Safety Engineering Report, or it can be included as part
of an overall program engineering/management report.
Management has considerable flexibility in choosing a closed loop system to closing out a risk. See Figure
6-7. The key is the maintenance and accessibility of a SAR. The contractor can be required to establish the
SAR and include within it a description of the specific corrective action taken to downgrade a medium and
high risk hazards. The corrective action details and log updates can be included in monthly reports,
subsequent data submissions, and at major program milestones.
Y
Further
Risk Assessment controls?
Review
N
SSWG Risk
SEC Review Assessment
Management can review and approve/disapprove the corrective action or its impact by mail, at major
program milestones, SSWG meetings, safety reviews board meetings, or any other engineering control
process found to be effective. Although the method selected is flexible, a "paper trail" reflecting the
identification of medium and high risk, a summary of the corrective action alternatives considered,
conclusions, and the names of the review team is desirable.
• Procedures to record hazards into the log and the level of detail of the log entry
• Procedure by which the contractor shall obtain close out or risk acceptance by the MA
for each hazard
The contractor may prepare a periodic system safety progress report summarizing general progress made
relative to the SSP during the specified reporting period and projected work for the next reporting period.
The report should contain the following information.
• A brief summary of activities, progress, and status of the safety effort in relation to the
scheduled program milestones. It should include progress toward completion of safety
data prepared or in work.
• Newly recognized significant hazards and significant changes in the degree of control of
the remaining known hazards.
• Status of all recommended corrective actions not yet implemented.
• Significant cost and schedules changes that impact the safety program.
• Discussion of contractor documentation reviewed by SSWG during the reporting period.
Indicate whether the documents were acceptable for safety content and whether or not
inputs to improve the safety posture were made.
• Proposed agenda items for the next SSWG meeting, if such groups are formed.
6 - 31
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
Chapter 7:
Integrated System Hazard Analysis
In capsulated form, to accomplish Integrated System Hazard Analysis, system risks are identified as
potential system accident scenarios and the associated contributory hazards. Controls are then designed to
eliminate or control the risks to an acceptable level. The ISSWG may conduct this activity during safety
reviews and Integrated Risk/Hazard Tracking and Risk Resolution.
The analyst should be concerned with machine/environment interactions resulting from change/deviation
stresses as they occur in time/space, physical harm to persons; functional damage and system degradation.
The interaction consideration evaluates the interrelations between the human (including procedures), the
machine and the environment: the elements of a system. The human parameter relates to appropriate
human factors engineering and associated elements: biomechanics, ergonomics, and human performance
variables. The machine equates to the physical hardware, firmware, and software. The human and machine
are within a specific environment. Adverse effects due to the environment are to be studied. One model
used for this analysis has been described earlier as the 5M model. See Chapter 3 for further elaboration.
The interactions and interfaces between the human, machine and the environment are to be evaluated by
application of the above techniques, also with the inclusion of Hazard Control Analysis; the possibility of
insufficient control of the system is analyzed.
7-2
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
Adverse deviations will affect system safety. The purpose of analysis is to identify possible deviations that
can contribute to scenarios. Deviations are malfunctions, degradation, errors, failures, faults, and system
anomalies. They are unsafe conditions and/or acts with the potential for harm. These are termed
contributory hazards in this System Safety Handbook.
Figure 7-1 shows the sequence of events that could cause an accident from a fuel tank rupture on board an
aircraft. There are a number of contributory hazards associated with this event: fuel vapor present, ignition
spark, ignition and tank overpressurization, tank rupture and fragments projected. The contributors
associated with this potential accident involve exposed conductors within the fuel tank due to wire
insulation degradation, and the adequate ignition energy present. The outcome could be any combination of
aircraft damage, and/ or injury, and/or property damage.
Figure 7-2 shows the sequence of events that could cause an accident due to a hydraulic brake failure and
aircraft runway run-off. Note in this case there are again, many contributors to this event: failure of the
primary hydraulic brake system, inappropriate attempt to activate emergency brake system, loss of aircraft
braking capability, aircraft runs off end of runway and contacts obstructions. The outcomes could also
vary from aircraft damage to injury and/or property damage. Note that the initiating events relate to the
failure of the primary hydraulic brake system. This failure in and of itself is the outcome of many other
contributors that caused the hydraulic brake system to fail. Further note that the improper operation of the
emergency brake system is also considered an initiating event.
Figure 7-3 indicates the sequences of events that could cause an accident due to an unsecured cabin door
and the aircraft captain suffers Hypoxia. Note that this event is not necessarily due to a particular failure.
7-3
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
As previously indicated, there are many contributors: the aircraft is airborne without proper cabin pressure
indication, and the captain enters the unpressurized cabin without the proper personal protective equipment.
The initiators in this scenario involve the cabin door not being properly secured, inadequate preflight
checks, and less than adequate indication of cabin pressure loss in the cockpit. The outcome of this
accident is that the captain suffers Hypoxia. Note that if both crew members investigated the anomaly, it
would be possible that both pilots could have experienced Hypoxia and loss of aircraft could have
occurred.
The safeguards that would either eliminate the specific hazards or control the risk to an acceptable level
have also been indicated in the figures. Keep in mind that if a safeguard does not function, that in itself is a
hazard. In summary, it is not easy to identify the single hazard that is the most important within the
scenario sequence. As discussed, the initiating hazards, the contributory hazards, and the primary hazard
must all be considered in determining the risk. The analyst must understand the differences between
hazards, the potential for harm and their associated risks. As stated, a risk is comprised of the hazards
within the logical sequence. In some cases, analysts may interchange terminology and refer to a hazard as
a risk, or vice versa. Caution must be exercised in the use of these terms. When conducting risk
assessment, the analyst must consider all possible combinations of hazards that may constitute one
particular risk, which is the severity and likelihood of a potential accident.
7-4
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
CATASTROPHIC
INITIATING CONTRIBUTORY
EVENTS
HAZARD HAZARDS
(PRIMARY HAZARDS)
ENGINE
TRAINING START UP
ORIENTATION INLET
LTA SEDIGN COVERS INST
HUMAN RELIABILITY
7-5
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
CATASTROPHIC
INITIATING CONTRIBUTORY HAZARDS
EVENTS
HAZARD (PRIMARY HAZARDS)
WIRE
INSULATION AIRCRAFT
FAILURE DAMAGED
FUEL
VAPOR
DESIGN WIRING
TO WITHSTAND CONTROL FUEL
ENVIRONMENT ALLEGE
7-6
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
CATASTROPHIC
INITIATING CONTRIBUTORY HAZARDS EVENTS
HAZARD (PRIMARY HAZARDS)
ADEQUATE
DESIGN
REDUNDANCY
FAILURE OF FAILURE OF
HYDRAULIC PRIMARY
BRAKE HYD BRAKE
SYSTEM SYSTEM SPEED BRAKE
APPLICATION AIRCRAFT
DESIGN SYSTEM ADEQUATE DAMAGED
TO WITHSTAND MAINTENANCE
ENVIRONMENT AND INSPECTION
ADEQUATE
AIRCRAFT
TRAINING AND/
ADEQUATE EMG BREAK ADEQUATE HUMAN CONTACTS INJURY
OR
DESIGN FOR MANUAL FACTORS DESIGN OBSTRUCTION
OPERATION
MINIMIZE
OBSTRUCTIONS
7-7
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
CRITICAL EVENT
INITIATING CONTRIBUTORY HAZARDS
(PRIMARY HAZARDS)
HAZARD
QUALIFICATION
TRAINING
SAFE OPERATING PROCEDURES
INADEQUATE
INADEQUATE
PREFLIGHT
PERSONNEL
PROTECTIVE
EQUIPMENT
SUCCESSFUL
PREFLIGHT
AIRCRAFT
CAPT ENTERS
CABIN DOOR AIRBORNE
AND UNPRESSURIZED AND/ CAPTAIN
NOT SECURED W/O PRESSURE
CABIN OR SUFFERS
INDICATION
HYPOXIA
SECURE CABIN
LTA* INJURY
INDICATION
IN COCKPIT * LESS THAN ADEQUATE
An important system objective should include technical risk management or operational risk management.
Further consideration should be given to the identification of system risks and how system risks equate
within specialty engineering. Risk is an expression of probable loss over a specific period of time or over a
number of operational cycles. There are situations where reliability and system safety risks are in concert
and in some other cases tradeoffs must be made.
7-8
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
A common consideration between reliability and system safety equates to the potential unreliability of the
system and associated adverse events. Adverse events can be analogous to potential system accidents.
Reliability is the probability that a system will perform its intended function satisfactorily for a prescribed
time under stipulated environmental conditions. The system safety objective equates to “the optimum
degree of safety…” and since nothing is perfectly safe the objective is to eliminate or control known system
risk to an acceptable level.
When evaluating risk, contributory hazards are important. Contributory hazards are unsafe acts and
unsafe conditions with the potential for harm. Unsafe acts are human errors that can occur at any time
throughout the system life cycle. Human reliability addresses human error or human failure. Unsafe
conditions can be failures, malfunctions, faults, and anomalies that are contributory hazards. An unreliable
system is not automatically hazardous; systems can be designed to fail-safe. Procedures and administrative
controls can be developed to accommodate human error or unreliable humans, to assure that harm will not
result.
The model below (Figure 7-5) shows the relationship between contributory hazards and adverse events,
which are potential accidents under study.
ADVERSE EVENTS
Worst Case Harm
TOP • Catastrophic event
• Fatality
EVENT
• Loss of system
• Major environmental impact
Contributory Hazards
Contributory Hazards • Human Errors and/or
Unsafe Acts • Human acts and/or
• Conditions -
and/or failures, faults, anomalies,
Unsafe Conditions malfunctions
Initiators can occur at any time
LTA Controls
Less than Adequate (LTA) Controls • Inappropriate control
• Missing control
• Control malfunction
LTA Verification
• Verification error
LTA Verification of Controls • Loss of verification
• Inadequate verification
7-9
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
Determining potential event propagation through a complex system can involve extensive analysis.
Specific reliability and system safety methods such as software hazard analysis, failure modes and effects
analysis, human interface analysis, scenario analysis, and modeling techniques can be applied to determine
system risks, e.g., the inappropriate interaction of software, human (including procedures), machine, and
environment.
Dr. Perrow in 1984 further indicated and enhanced the multi-linear logic discussion with the definition of a
system accident: “system accidents involve the unanticipated interaction of multiple failures.”
From a system safety viewpoint, the problem of risk identification becomes even more complex, in that the
dynamics of a potential system accident are also evaluated. When considering multi-event logic,
determining quantitative probability of an event becomes extensive, laborious, and possibly inconclusive.
The above model of the adverse event represents a convention (an estimation) of a potential system accident
with the associated top event: the harm expected, contributory hazards, less than adequate controls, and
possibly less than adequate verification. The particular potential accident has a specific initial risk and
residual risk.
Since risk is an expression of probable loss over a specific period of time or over a number of operational
cycles, risk is comprised of two major potential accident variables, loss and likelihood. The loss relates to
harm, or severity of consequence. Likelihood is more of a qualitative estimate of loss. Quantitative
likelihood estimates can be inappropriate since specific quantitative methods are questionable considering
the lack of relative appropriate data. Statistics can be misunderstood or manipulated to provide erroneous
7 - 10
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
information. There are further contradictions, which add to complexity when multi-event logic is
considered. This logic includes event flow, initiation, verification/control/hazard interaction, human
response, and software error.
The overall intent of system safety is to prevent potential system accidents by the elimination of associated
risk, or by controlling the risk to an acceptable level. The point is that reliance on probability as the total
means of controlling risk can be inappropriate. Figures 7-1 through 7-3 provided examples of undesired
events that require multiple conditions to exist simultaneously and in a specific sequence. Figure 7-6
summarizes multi-event logic.
Events OUTCOME
The model of an adverse event above is used to illustrate the concept of risk control. For example, consider
a potential system accident where reliability and system safety design and administrative controls are
applied to reduce system risk. There is a top event, contributory hazards, less than adequate controls, and
less than adequate verification. The controls can reduce the severity and/or likelihood of the adverse event.
Consider the potential loss of a single engine aircraft due to engine failure. Simple linear logic would
indicate that a failure of the aircraft’s engine during flight would result in a forced landing possibly into
unsuitable terrain. Further multi-event logic which can define a potential system accident would indicate
additional complexities, e.g., loss of aircraft control due to inappropriate human reaction, deviation from
emergency landing procedures, less than adequate altitude, and/or less than adequate glide ratio. The
reliability related engineering controls in this situation would be appropriate to system safety and would
1
Lowrance, William W., Of Acceptable Risk --- Science and the Determination of Safety, 1945, Copyright 1976 by William
Kaufmann, Inc.
7 - 11
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
consider the overall reliability of the engine, fuel sub-systems, and the aerodynamics of the aircraft. The
system safety related controls would further consider other contributory hazards such as inappropriate
human reaction, and deviation from emergency procedures. The additional controls are administrative in
nature and involve design of emergency procedures, training, human response, communication procedures,
and recovery procedures.
In this example, the controls above would decrease the likelihood of the event and possibly the severity.
The severity would decrease as a result of a successful emergency landing procedure, where the pilot walks
away and there is minimal damage to the aircraft. The analyst must consider worst case credible scenarios
as well as any other credible scenarios that could result in less harm.
This has been a review of a somewhat complex potential system accident in which the hardware, the
human, and the environment were evaluated. There would be additional complexity if software were
included in the example. The aircraft could have been equipped with a fly-by-wire flight control system, or
an automated fuel system.
Software does not fail, but hardware and firmware can fail. Humans can make software-related errors.
Design requirements can be inappropriate. Humans can make errors in coding. The complexity or
extensive software design could add to the error potential. There could be other design anomalies, sneak
paths, and inappropriate do-loops. The sources of software error can be extensive according to Raheja,
“Studies show that about 60 percent of software errors are logic and design errors; the remainder are
coding -and service-related errors.” 2 There are specific software analysis and control methods that can be
successfully applied to contributory hazards, which are related to software.
Again referring to the adverse event model above, note that software errors can result in unsafe conditions
or they could contribute to unsafe acts. Software controls can be inappropriate. The verification of
controls could be less than adequate.
2
Raheja, Dev G., Assurance Technologies --- Principles and Practices, McGraw-Hill, 1991, page 269.
3
Hammer, Willie, Handbook of System and Product Safety, Prentice - Hall, Inc., 1972 page 21.
7 - 12
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
Consider that deficiencies are contributory hazards, unsafe acts and/or conditions that can cause harm.
Without appropriate hazard analysis how would it be possible to identify the contributors?
Codes, standards, and requirements may not be appropriate, or they may be inadequate for the particular
design. Therefore, risk control may be inadequate. The documents may be the result of many efforts,
which may or may not be appropriately related to system safety objectives. For example, activities of
committees may result in consensus, but the assumptions may not address specific hazards. The extensive
analysis that has been conducted in support of document development may not have considered the
appropriate risks. Also, the document may be out dated by rapid technological advancement.
As pointed out in the Final Report of the National Commission on Product Safety, industrial standards are
based on the desire to promote maximum acceptance within industry. To achieve this goal, the standards
are frequently innocuous and ineffective.4
Good engineering practice is required in all design fields. Certain basic practices can be utilized, but a
careful analysis must be conducted to ensure that the design is suitable for its intended use.
Take for example a complex microprocessor and its associated software. These complex systems are never
perfect according to Jones:
(response to all inputs not fully characterized), there may be remnant faults in
hardware/software and the system will become unpredictable in its response when exposed
to abnormal (unscheduled) conditions e.g. excess thermal, mechanical, chemical,
radiation environments.5
This being the case, what can the system safety engineer do to assure acceptable risk? How does one prove
independence and appropriate monitoring?
Defining acceptable risk is dependent on the specific entity under analysis, i.e., the project, process,
procedure, subsystem, or system. Judgment has to be made to determine what can be tolerated should a
loss occur. What is an acceptable catastrophic event likelihood? Is a single fatality acceptable, if the event
can occur once in a million chances? This risk assessment activity can be conducted during a system safety
working group effort within a safety review process. The point to be made here is that a simplistic
assumption, which is based upon a single hazard or risk control (redundancy and monitoring), may be over
simplistic.
4
Ibid. Hammer page26.
5
Jones, Malcolm, The Role of Microelectronics and Software in a Very High Consequence System, Proceedings of the 15th
International System Safety Conference - 1997, page 336.
7 - 13
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
Proving true redundancy is not cut-and-dried in complex systems. It may be possible to design a hardware
subsystem and show redundancy, i.e. redundant flight control cables, redundant hydraulic lines, or
redundant piping. When there are complex load paths, complex microprocessors, and software, true
independence can be questioned. The load paths, microprocessors, and software must also be independent.
Ideally, different independent designs should be developed for each redundant leg. However, even
independent designs produced by different manufacturers may share a common failure mode if the
requirements given the software programmers is wrong.
The concepts of redundancy management should be appropriately applied.6 Separate microprocessors and
software should be independently developed. Single point failures should be eliminated if there are
common connections between redundant lags. The switch over control to accommodate redundancy
transfer should also be redundant. System safety would be concerned with the potential loss of transfer
capability due to a single common event.
Common events can eliminate redundancy. The use of similar hardware and software presents additional
risks, which can result in loss of redundancy. A less than adequate process, material selection, common
error in assembly, material degradation, quality control, inappropriate stress testing, or calculation
assumption; all can present latent risks which can result in common events. A general rule in system safety
states that the system is not redundant unless the state of the backup leg is known and the transfer is truly
independent.
Physical location is another important element when evaluating independence and redundancy. Appropriate
techniques of separation, protection, and isolation are important. In conducting Common Cause Analysis,
a technique described in the System Safety Analysis Handbook,7 as well as this handbook, not only is the
failure state evaluated, but possible common contributory events are also part of the equation. The analyst
identifies the accident sequence in which common contributory events are possible due to physical
relationships.
Other analysis techniques also address location relationships, for example, vicinity analysis, and zonal
analysis. One must determine the possible outcome should a common event occur that can affect all legs of
redundancy simultaneously, e.g., a major fire within a particular fire division, an earthquake causing
common damage, fuel leakage in an equipment bay of an aircraft, or an aircraft strike into a hazardous
location.
Keep in mind that the designers of the Titanic considered compartmentalization for watertight construction.
However, they failed to consider latent common design flaws, such as defects in the steel plating, the state
of knowledge of the steel manufacturing process, or the affects of cold water on steel.
Another misconception relates to monitoring; i.e., that the system is safe because it is monitored. Safety
monitoring should be designed appropriately to assure that there is confidence in the knowledge of the
System State. The system is said to be balanced when it is functioning within appropriate design
parameters. Should the system become unbalanced, the condition must be recognized in order to stabilize
the system before the point of no return. This concept is illustrated in Figure 7-5. The "point of no return"
is the point beyond which damage or an accident may occur.
6
Redundancy Management requirements were developed for initial Space Station designs.
7
System Safety Society, System Safety Analysis Handbook, 2nd Edition, 1997. Pages 3-37 and 3-38.
7 - 14
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
EVENTFLOW
System Becomes
System in
Unbalanced Harm
Balance
Normal State
Contingency Starts Detection Loss Control Starts
System SystemDown
Retest Initiator Point of No
Satisfactorily Event(s) Return
Monitoring devices can be incorporated into the design to check that conditions do not reach dangerous
levels (or imbalance) to ensure that no contingency exists or is imminent. Monitors8 can be used to
indicate:
8
Ibid. Hammer, page 262.
7 - 15
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
analysis to determine probability of an accident9. These concerns and objections are based on the following
reasons:
• A probability, such as reliability, guarantees nothing. Actually, a probability indicates
that a failure, error, or mishap is possible, even though it may occur rarely over a
period of time or during a considerable number of operations. Unfortunately, a
probability cannot indicate exactly when, during which operation, or to which person a
mishap will occur. It may occur during the first, last, or any intermediate operation in
a series. For example, a solid propellant rocket motor developed as the propulsion unit
for a missile had an overall reliability indicating that two motors of every 100,000
fired would probably fail. The first one tested blew up.
• Probabilities are projections determined from statistics obtained from past experience.
Although equipment to be used in actual operations may be exactly the same as the
equipment for which the statistics were obtained, the conditions under which it will be
operated may be different. In addition, variations in production, maintenance,
handling, and similar processes generally preclude two or more pieces of equipment
being exactly alike. There are numerous instances in which minor changes in methods
to produce a component with the same or improved design characteristics as previous
items have instead caused failures and accidents. If an accident has occurred,
correction of the cause by change in the design, material, code, procedures, or
production process may immediately nullify certain statistical data.
• Generalized probabilities do not serve well for specific, localized situations. In other
situations, data may be valid but only in special circumstances. Statistics derived
from military or commercial aviation sources may indicate that a specific number of
aircraft accidents due to bird strikes take place every 100,000 or million flight hours.
On a broad basis involving all aircraft flight time, the probability of a bird strike is
comparatively low. However, at certain airports near coastal areas where birds
abound, the probability of a bird-strike accident is much higher.
• Human error can have damaging effects even when equipment or system reliability has
not been lessened. A common example is the loaded rifle. It is highly reliable, but
people have been killed or wounded when cleaning or carrying them.
• Probabilities are usually predicated on an infinite or large number of trials.
Probabilities, such as reliabilities for complex systems, are of necessity based upon
very small samples, and therefore have relatively low confidence levels.
9
Ibid. Hammer, page 91 and 92.
10
Allocco, Michael, Automation, System Risks and System Accidents, 18th International System safety Society Conference
7 - 16
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
The interaction of the human, and machine if inappropriate, can also introduce additional risks. The human
can become overloaded and stressed due inappropriately displayed data, an inappropriate control input, or
similar erroneous interface. The operator may not fully understand the automation, due to its complexity.
It may not be possible to understand a particular system state. The human may not be able to determine if
the system is operating properly, or if malfunctions have occurred.
Imagine relying on an automated system and due to malfunction or inappropriate function, artificial
indications are displayed and the system is inappropriately communicating. In this case the human may
react to an artificial situation. The condition can be compounded during an emergency and the end result
can be catastrophic. Consider an automated reality providing an artificial world and the human reacts to
such an environment. Should we trust what the machines tell us in all cases?
The integration parameters concerning acclimation further complicate the picture when evaluating
contingency, backup, damage control, or loss control. It is not easy to determine the System State; when
something goes wrong, reality can become artificial. The trust in the system can be questioned.
Determining what broke could be a big problem. When automation fails, the system could have a mind of
its own. The human may be forced to take back control of the malfunctioning system. To accomplish such
a contingency may require the system committee. These sorts of contingencies can be addressed within
appropriate system safety analysis.
11
Ibid. Reheja, page 262.
7 - 17
FAA System Safety Handbook, Chapter 7: Integrated System Hazard Analysis
December 30, 2000
• It may not be possible to determine what went wrong, what failed, or what broke.
• The system does not have to break to contribute to the system accident.
• Planned functions can be contributory hazards.
• Software functions can be inadequate or inappropriate.
• It is unlikely that a change in part of the software does not affect system risk.
• A change in the application may change the risk.
• Software is not generic and is not necessarily reusable.
• The system can be “spoofed”.
• A single error can propagate throughout a complex system.
• Any software error, no matter how apparently inconsequential can cause contributory
events. Consider a process tool, automated calculations, automated design tools and
safety systems.
• It is very hard to appropriately segregate safety-critical software in open loosely
coupled systems.
• Combinations of contributory events can have catastrophic results.
Considering the many concerns and observations listed in these axioms, software-complex systems can be
successfully designed to accommodate acceptable risk through the implementation of appropriately
integrated specialty engineering programs that will identify, eliminate or control system risks.
7 - 18
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Chapter 8:
Safety Analysis: Hazard Analysis Tasks
8.2 ANALYSIS.......................................................................................................................................3
Figure 8.1-1 is a top-level summary of a proactive SSP. Initial safety criteria is established by the
managing activity (MA) and incorporated in the Request for Proposal (RFP) and subsequent contract and
prime item specification. The vehicle used by the MA is a Preliminary Hazard List (PHL). Following
contract award, the first technical task of a contractor's system safety staff is the flowdown of safety
criteria to subsystem specifications and the translation of such criteria into a simplified form easily usable
by the detailed design staff. The detailed criteria is generated from a Requirements Hazard Analysis
using the PHL and Preliminary Hazard Analysis (PHA) as inputs along with requirements from standards,
regulations, or other appropriate sources. Safety design criteria to control safety critical software
commands and responses (e.g., inadvertent command, failure to command, untimely command or
responses, or MA designated undesired events) must be included so that appropriate action can be taken
to incorporate them in the software and hardware specifications. This analysis, in some cases, is
performed before contract award.
Additional Design
Safety Reviews
Requirements
Sources for detailed safety design criteria include Occupational Safety and Health Administration
(OSHA) standards, MIL-STD-454, Requirement 1, and MIL-STD-882. Design review is typically a
continual process using hazard analyses. Active participation at internal and customer design reviews is
also necessary to capture critical hazards and their characteristics. All major milestone design reviews
(reference FAA Order 1810.1F, paragraph 2-8) provide a formal opportunity for obtaining safety
8- 2
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
information and precipitating active dialogue between the MA safety staff and the contractor's safety and
design engineering staff. All resulting action items should be documented with personnel responsibility
assignments and an action item closing date. No formal design review should be considered complete
until safety critical action items are closed out satisfactorily in the view of both the MA and the
contractor. That is, both must sign that the action has been satisfactorily closed out.
All critical hazards identified by either hazard analyses or other design review activities must be formally
documented. Notification of each should be provided to the appropriate contractor staff for corrective
action or control. The Hazard Tracking/Risk Resolution system in Chapter 4 of this handbook should be
used to track the status of each critical hazard.
8.2 Analysis
• Training
• Maintenance
• Operational and maintenance environments
• System/component disposal
1. Describe and bound the system in accordance with system description instructions in Chapter 3.
2. Perform functional analysis if appropriate to the system under study.
3. Develop a preliminary hazard list.
4. Identify contributory hazards, initiators, or any other causes.
5. Establish hazard control baseline by identifying existing controls when appropriate.
6. Determine potential outcomes, effects, or harm.
7. Perform a risk assessment of the severity of consequence and likelihood of occurrence.
8. Rank hazards according to risk.
9. Develop a set of recommendations and requirements to eliminate or control risks
10. Provide managers, designers, test planners, and other affected decision makers with the
information and data needed to permit effective trade-offs
11. Conduct hazard tracking and risk resolution of medium and high risks. Verify that
recommendations and requirements identified in Step 9 have been implemented.
12. Demonstrate compliance with given safety related technical specifications, operational
requirements, and design criteria.
Hazard identification
• Identification
8- 3
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• Evaluation
• Resolution
Timely solutions
Verification that safety requirements have been met or that risk is eliminated or controlled to an
acceptable level
Identification of a risk is the first step in the risk control process. Identifying a risk provides no assurance
that it will be eliminated or controlled. The risk must be documented, evaluated (likelihood and severity),
and when appropriate, highlighted to those with decision making authority.
Evaluation of risks requires determination of how frequently a risk occurs and how severe it could be if
and accident occurs as a result of the hazards. A severe risk that has a realistic possibility of occurring
requires action; one that has an extremely remote chance may not require action. Similarly, a non-critical
accident that has a realistic chance of occurring may not require further study. Frequency may be
characterized qualitatively by terms such as "frequent" or "rarely." It may also be measured
quantitatively such as by a probability (e.g., one in a million flight hours). In summary, the evaluation
step prioritizes and focuses the system safety activity and maximizes the return-on-investment for safety
expenditures.
The timing of safety analysis and resulting corrective action is critical to minimize the impact on cost and
schedule. The later in the life cycle of the equipment that safety modifications are incorporated, the
higher the impact on cost and schedule. The analysis staff should work closely with the designers to feed
their recommendations or, at a minimum, objections back to the designers as soon as they are identified.
A safe design is the end product, not a hazard analysis. By working closely with the design team, hazards
can be eliminated or controlled in the most efficient manner. An inefficient alternate safety analysis
approach is when the safety engineer works alone in performing an independent safety analysis and
formally reports the results. This approach has several disadvantages.
Significant risks will be corrected later than the case where the design engineer is alerted to the problem
shortly after detection by the safety engineer. This requires a more costly fix, leads to program resistance
to change, and the potential implementation of a less effective control. The published risk may not be as
severe as determined by the safety engineer operating in a vacuum, or overcome by subsequent design
evolution.
Once the risks have been analyzed and evaluated, the remaining task of safety engineering is to follow the
development and verify that the agreed-upon safety requirements are met by the design or that the risks
are controlled to an acceptable level.
Two reliability analyses (one a subset of the other) are often compared to hazard analyses. Performance
of a Failure Modes and Effects Analysis (FMEA) is the first step in generating the Failure Modes, Effects,
and Criticality Analysis (FMECA). Both types of analyses can serve as a final product depending on the
8- 4
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
situation. An FMECA is generated from a FMEA by adding a criticality figure of merit. These analyses
are performed for reliability, and supportability information.
A hazard analysis uses a top-down methodology that first identifies risks and then isolates all possible (or
probable) causes. For an operational system, it is performed for specific suspect hazards. In the case of
the hazard analysis, failures, operating procedures, human factors, and transient conditions are included in
the list of hazard causes.
The FMECA is limited even further in that it only considers hardware failures. It may be performed
either top-down or bottom-up, usually the latter. It is generated by asking questions such as "If this fails,
what is the impact on the system? Can I detect it? Will it cause anything else to fail?" If so, the induced
failure is called a secondary failure.
Reliability predictions establish either a failure rate for an assembly (or component) or a probability of
failure. This quantitative data, at both the component and assembly level, is a major source of data for
quantitative reliability analysis. This understanding is necessary to use it correctly. In summary,
however, hazard analyses are first performed in a qualitative manner identifying risks, their causes, and
the significance of hazards associated with the risk.
8.2.4 What General Procedures Should Follow in the Performance of a Hazard Analysis?
Establish safety requirements baseline and applicable history (i.e., system restraints):
8- 5
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Identify risks for each contributory factor (e.g., risks caused by the maintenance environment and the
interface hazards). An example would be performing maintenance tasks incompatible with gloves in a
very cold environment.
Assign severity categories and determine probability levels. Risk probability levels may either be
assigned qualitatively or quantitatively. Risk severity is determined through hazard analysis. This
reflects, using a qualitative measure, the worst credible accident that may result from the risk. These
range from death to negligible effect on personnel and equipment. Evaluating the safety of the system or
risk of the hazard(s), quantitatively requires the development of a probability model and the use of
Boolean algebra. The latter is used to identify possible states or conditions (and combinations thereof)
that may result in accidents. The model is used to quantify the likelihood of those conditions occurring.
Develop corrective actions for critical risks. This may take the form of design or procedural changes.
8- 6
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
The objective of a qualitative analysis is similar to that of a quantitative one. Its method of focus is
simply less precise. That is, in a qualitative analysis, a risk probability is described in accordance with the
likelihood criteria discussed in Chapter 3.
Qualitative analysis verifies the proper interpretation and application of the safety design criteria
established by the preliminary hazard study. It also verifies that the system will operate within the safety
goals and parameters established by the Operational Safety Assessment (OSA). It ensures that the search
for design weaknesses is approached in a methodical, focused way.
In a quantitative analysis, the risk probability is expressed using a number or rate. The objective is to
achieve maximum safety by minimizing, eliminating, or establishing control over significant risks.
Significant risks are identified through engineering estimations, experience, and documented history of
similar equipment.
A probability is the expectation that an event will occur a certain number of times in a specific number of
trials. Actuarial methods employed by insurance companies are a familiar example of the use of
probabilities for predicting future occurrences based on past experiences. Reliability engineering uses
similar techniques to predict the likelihood (probability) that a system will operate successfully for a
specified mission time. Reliability is the probability of success. It is calculated from the probability of
failure, in turn calculated from failure rates (failures/unit of time) of hardware (electronic or mechanical).
An estimate of the system failure probability or unreliability can be obtained from reliability data using
the formula:
P = 1-e-λt
Where P is the probability of failure, e is the natural logarithm, λ is the failure rate in failures per hour,
and t is the number of hours operated.
8- 7
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
However, system safety analyses predict the probability of a broader definition of failure than does
reliability. This definition includes:
It is important to note that the likelihood of damage or injury reflects a broader range of events or
possibilities than reliability. Many situations exist in which equipment can fail and no damage or injury
occurs because systems can be designed to fail safe. Conversely, many situations exist in which
personnel are injured using equipment that functioned reliably (the way it was designed) but at the wrong
time because of an unsafe design or procedure. A simple example is an electrical shock received by a
repair technician working in an area where power has not failed.
• A probability indicates that a failure, error, or accident is possible even though it may occur rarely
over a period of time or during a considerable number of operations. A probability cannot indicate
exactly when, during which operation, or to which person a accident will occur. It may occur during
the first, last, or any intermediate operation in a series without altering the analysis results. Consider
an example of when the likelihood of an aircraft engine failing is accurately predicted to be one in
100,000. The first time the first engine is tried it fails. One might expect the probability of the
second one failing to be less. But, because these are independent events, the probability of the second
one is still one in 100,000. The classic example demonstrating this principal is that of flipping a coin.
The probability of it landing "heads-up" is 1 chance in 2 or 0.5. This is true every time the coin is
flipped even if the last 10 trials experienced a "heads-up" result. Message: Do not change the
prediction to match limited data.
• Probabilities are statistical projections that can be based upon specific past experience. Even if
equipment is expected to perform the same operations as those used in the historical data source, the
circumstances under which it will be operated can be expected to be different. Additional variations
in production, maintenance, handling, and similar processes generally preclude two or more pieces of
equipment being exactly alike. Minor changes in equipment have been known to cause failures and
accidents when the item was used. If an accident or failure occurs, correcting it by changing the
design, material, procedures, or production process immediately nullifies certain portions of the data.
Message: Consider the statistical nature of probabilities when formulating a conclusion.
• Sometimes data are valid only in special circumstances. For instance, a statistical source may
indicate that a specific number of aircraft accidents due to birdstrikes take place every 100,000 or
million hours. One may conclude from this data, that the probability of a birdstrike is comparatively
low. Hidden by the data analysis approach, is the fact that at certain airfields, such as Boston, the
8- 8
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Midway Islands, and other coastal and insular areas where birds abound, the probability of a
birdstrike accident is much higher than the average. This example demonstrates that generalized
probabilities will not serve well for specific, localized areas. This applies to other environmental
hazards such as lightning, fog, rain, snow, and hurricanes. Message: Look for important variables
that may affect conclusions based on statistics.
• Reliability predictions are based upon equipment being operated within prescribed parameters over a
specific period of time. When the equipment's environment or operational profile exceeds those
design limits, the validity of the prediction is invalid. Safety analyses based on this data attempting to
predict safety performance under abnormal and/or emergency conditions may also be invalid.
Reliability predictions do not extend to performance of components or subassemblies following a
failure. That is, the failure rate or characteristics of failed units or assemblies are not accounted for in
reliability generated predictions. Design deficiencies are not accounted for in reliability predictions.
For example, a reliability prediction accounts for the failure rate of components, not the validity of
the logic. Message: Be clear on what conditions the probabilities used in the risk analysis represent.
• Human error can have damaging effects even when equipment reliability is high. For example, a
loaded rifle is highly reliable, yet many people have been killed or wounded when cleaning, carrying,
or playing with loaded guns. Message: Consider the impact of human error on accident probability
estimations.
• The confidence in a probability prediction, as in any statistic, is based on the sample size of the
source data. Predictions based on small sample sizes have a low confidence level; those based on a
large sample size provide a high degree of confidence. Message: Understand the source of prediction
data. Consider the confidence level of the data.
When the limitations are understood, the use of probabilities permits a more precise risk analysis than the
qualitative approach. Calculated hazard risks can be compared to acceptable thresholds to determine
when redesign is necessary. They permit the comparison of alternate design approaches during trade-
studies leading to more thorough evaluations. Performing quantitative analyses requires more work than
qualitative analyses and therefore costs more. If the limitations of the numbers used are not clearly stated
and understood, the wrong conclusion may be reached. When care is taken, a quantitative analysis can be
significantly more useful than a qualitative one.
8- 9
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
The design and pre-design system safety engineering activities, are listed below:
The completion of these activities represents the bulk of the SSP. The output and the effects of
implementing the activities are the safety program. Review of the documented analyses provides the MA
and integrator visibility into the effectiveness and quality of the safety program. It is recommended that
these analyses be documented in a format compatible with an efficient review.
• Inclusion of a "road map" to show the sequence of tasks performed during the analysis.
• Presentation style, which may be in contractor format, consistent with the logic of the
analysis procedure.
• All primary (critical) hazards and risks listed in an unambiguous manner.
• All recommended hazard controls and corrective actions detailed.
Questions that the reviewer should ask as the analyses are reviewed include the following:
• Do the contributory hazards listed include those that have been identified in accidents of
similar systems?
• Are the recommended hazard controls and corrective actions realistic and sufficient?
• Are the recommended actions fed back into the line management system in a positive way
that can be tracked?
Figure 8-2 illustrates the interrelationship of these tasks and their relationship to the design and
contractual process.
8- 10
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
O&SHA
SSHA
PHA
Pre-Contract Contract
The PHL lists of hazards that may require special safety design emphasis or hazardous areas where in-
depth analyses need to be done. Example uses of the PHL include providing inputs to the determination
process of the scope of follow-on hazard analyses (e.g., PHA, SSHA). The PHL may be documented
using a table-type format.
The PHA effort should begin during the earliest phase that is practical and updated in each sequential
phase. Typically, it is first performed during the conceptual phase but, when applicable, may be
performed on an operational system. Performing a PHA early in the life cycle of a system provides
important inputs to tradeoff studies in the early phases of system development. In the case of an
operational system, it aids in an early determination of the state of safety. The output of the PHA may be
used in developing system safety requirements and in preparing performance and design specifications.
In addition, the PHA is the basic hazard analysis that establishes the framework for other hazard analyses
that may be performed.
8- 11
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
A PHA must include, but not be limited to, the following information:
• As complete a description as possible of the system or systems being analyzed, how it will be
used, and interfaces with existing system(s). If an OED was performed during pre-
development, this can form the basis for a system description.
• A review of pertinent historical safety experience (lessons learned on similar systems)
• A categorized listing of basic energy sources
• An investigation of the various energy sources to determine the provisions that have been
developed for their control
• Identification of the safety requirements and other regulations pertaining to personnel safety,
environmental hazards, and toxic substances with which the system must comply.
• Recommendation of corrective actions.
Since the PHA should be initiated very early in the planning phase, the data available to the analyst may
be incomplete and informal. Therefore, the analysis should be structured to permit continual revision and
updating as the conceptual approach is modified and refined. As soon as the subsystem design details are
complete enough to allow the analyst to begin the subsystem hazard analysis in detail, the PHA can be
terminated. The PHA may be documented in any manner that renders the information above clear and
understandable to the non-safety community. A tabular format is usually used.
• Design sketches, drawings, and data describing the system and subsystem elements for the
various conceptual approaches under consideration
• Functional flow diagrams and related data describing the proposed sequence of activities,
functions, and operations involving the system elements during the contemplated life span
• Background information related to safety requirements associated with the contemplated
testing, manufacturing, storage, repair, and use locations and safety-related experiences of
similar previous programs or activities.
The PHA must consider the following for identification and evaluation of hazards as a minimum.
8- 12
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• If available, operating, test, maintenance, and emergency procedures (e.g., human factors
engineering, human error analysis of operator functions, tasks, and requirements; effect of
factors such as equipment layout, lighting requirements, potential exposures to toxic
materials, effects of noise or radiation on human performance; life support requirements and
their safety implications in manned systems, crash safety, egress, rescue, survival, and
salvage).
• If available, facilities, support equipment (e.g., provisions for storage, assembly, checkout,
proof testing of hazardous systems/assemblies that may involve toxic, flammable, explosive,
corrosive, or cryogenic materials/; radiation or noise emitters; electrical power sources), and
training (e.g., training and certification pertaining to safety operations and maintenance).
• Safety-related equipment, safeguards, and possible alternate approaches (e.g., interlocks,
system redundancy, hardware or software fail-safe design considerations, subsystem
protection, fire detection and suppression systems, personal protective equipment, industrial
ventilation, and noise or radiation barriers).
During the Demonstration and Evaluation and/or Full-Scale Development phases, the developer should
analyze the system along with hardware/software design and requirements documents to:
• Refine the identification of hazards associated with the control of the system
• Safety-critical data generated or controlled by the system
• Safety-critical non-control functions performed by the system and unsafe operating modes for
resolution.
The requirements hazard analysis is substantially complete by the time the allocated baseline is defined.
The requirements are developed to address hazards, both specific and nonspecific, in hardware and
software.
The requirements hazard analysis may use the PHL and the PHA as a basis, if available. The analysis
relates the hazards identified to the system design and identifies or develops design requirements to
eliminate or reduce the risk of the identified hazards to an acceptable level. The requirements hazard
analysis is also used to incorporate design requirements that are safety related but not tied to a specific
hazard. This analysis includes the following:
Determination of applicable generic system safety design requirements and guidelines for both hardware
and software from applicable military specifications, Government standards, and other documents for the
8- 13
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
system under development. Incorporate these requirements and guidelines into the high-level system
specifications and design documents, as appropriate.
Analysis of the system design requirements, system/segment specifications, preliminary hardware
configuration item development specifications, software requirements specifications, and the interface
requirements specifications, as appropriate, including the following sub-activities:
• Develop, refine, and specify system safety design requirements and guidelines;
translate into system, hardware, and software requirements and guidelines, where
appropriate; implement in the design and development of the system hardware and
associated software.
• Identify hazards and relate them to the specifications or documents above and
develop design requirements to reduce the risk of those hazards.
• Analyze the preliminary system design to identify potential hardware/software
interfaces at a gross level that may cause or contribute to potential hazards.
Interfaces to be identified include control functions, monitoring functions, safety
systems, and functions that may have indirect impact on safety.
• Perform a preliminary risk assessment on the identified safety-critical software
functions using the hazard risk matrix or software hazard risk matrix of Chapter 10 or
another process as mutually agreed to by the contractor and the MA.
• Ensure that system safety design requirements are properly incorporated into the
operator, users, and diagnostic manuals.
• Develop safety-related design change recommendations and testing requirements and
incorporate them into preliminary design documents and the hardware, software, and system
test plans. The following subactivities should be accomplished:
• Develop safety-related change recommendations to the design and specification
documents listed above and include a means of verification for each design
requirement.
• Develop testing requirements. The contractor may develop safety-related test
requirements for incorporation into the hardware, software, and system integration
test documents.
• Support the system requirements review, system design review, and software specification
review from a system safety viewpoint. Address the system safety program, analyses
performed and to be performed, significant hazards identified, hazard resolutions or proposed
resolutions, and means of verification.
For work performed under contract details to be specified in the SOW shall include, as applicable:
• Definition of acceptable level of risk within the context of the system, subsystem, or
component under analysis
• Level of contractor support required for design reviews
• Specification of the type of risk assessment process.
8- 14
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
As soon as subsystems are designed in sufficient detail, or well into concept design for facilities
acquisition, the SSHA can begin. Design changes to components also need to be evaluated to determine
whether the safety of the system is affected. The techniques used for this analysis must be carefully
selected to minimize problems in integrating subsystem hazard analyses into the system hazard analysis.
The SSHA may be documented in a combination of text and/or tabular format.
A contractor may perform and document a subsystem hazard analysis to identify all components and
equipment, including software, whose performance, performance degradation, functional failure, or
inadvertent functioning could result in a hazard or whose design does not satisfy contractual safety
requirements. The analysis may include:
• A determination of the hazards or risks, including reasonable human errors as well as single
and multiple failures.
• A determination of potential contribution of software (including that which is developed by
other contractors) events, faults, and occurrences (such as improper timing) on the safety of
the subsystem
• A determination that the safety design criteria in the software specification(s) have been
satisfied
• A determination that the method of implementation of software design requirements and
corrective actions has not impaired or decreased the safety of the subsystem nor has
introduced any new hazards.
If no specific analysis techniques are directed, the contractor may obtain MA approval of technique(s) to
be used prior to performing the analysis. When software to be used in conjunction with the subsystem is
being developed under standards, the contractor performing the SSHA will monitor, obtain, and use the
output of each phase of the formal software development process in evaluating the software contribution
to the SSHA (See Chapter 10 for discussion of standards commonly used). Problems identified that
require the response of the software developer shall be reported to the MA in time to support the ongoing
phase of the software development process. The contractor must update the SSHA when needed as a
result of any system design changes, including software changes that affect system safety.
For work performed under contract details to be specified in the SOW shall include, as applicable:
8- 15
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
A contractor may perform and document an SHA to identify hazards and assess the risk of the total
system design, including software, and specifically the subsystem interfaces. This analysis must include a
review of subsystem interrelationships for:
If no specific analysis techniques are directed, the contractor may obtain MA approval of technique(s) to
be used prior to performing the analysis. The SHA may be performed using similar techniques to those
used for the SSHA. When software to be used in conjunction with the system is being developed under
software standards, the contractor performing the SHA should be required to monitor, obtain, and use the
output of each phase of the formal software development process in evaluating the software contribution
to safety. (See Chapter 10, Software Safety Process) Problems identified that require the response of the
software developer should be reported to the MA in time to support the ongoing phase of the software
development process. A contractor should also be required to update the SHA when needed as a result of
any system design changes, including software, which affect system safety. In this way, the MA is kept
up to date about the safety impact of the design evolution and is in a position to direct changes.
When work is performed under contract, details to be specified in the SOW shall include, as applicable:
8- 16
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Test
All
Plans &
Planned
Procedures
Testing
Prime
Installation
Equipment
Design
Design
Documentation
Maintenance
Training
O&SHA
Emergency
Actions Maintenance
Other Procedures
Hazard Analyses
Storage
Test
Equipment
Training Design
The O&SHA effort should start early enough to provide inputs to the design, system test, and operation.
This analysis is most effective as a continuing closed-loop iterative process, whereby proposed changes,
additions, and formulation of functional activities are evaluated for safety considerations prior to formal
acceptance. The analyst performing the O&SHA should have available:
8- 17
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• Effects of off-the-shelf hardware and software across the interface with other system
components or subsystems.
Timely application of the O&SHA will provide design guidance. The findings and recommendations
resulting from the O&SHA may affect the diverse functional responsibilities associated with a given
program. Therefore, it is important that the analysis results are properly distributed for the effective
accomplishment of the O&SHA objectives. The techniques used to perform this analysis must be
carefully selected to minimize problems in integrating O&SHAs with other hazard analyses. The
O&SHA may be documented any format that provides clear and concise information to the non-safety
community.
A contractor may perform and document an O&SHA to examine procedurally controlled activities. The
O&SHA identifies and evaluates hazards resulting from the implementation of operations or tasks
performed by persons considering the following:
The O&SHA must identify the safety requirements or alternatives needed to eliminate identified hazards,
or to reduce the associated risk to a level that is acceptable under either regulatory or contractually
specified criteria. The analysis may identify the following:
• Activities that occur under hazardous conditions, their time periods, and the actions required
to minimize risk during these activities/time periods
• Changes needed in functional or design requirements for system hardware/software, facilities,
tooling, or support/test equipment to eliminate hazards or reduce associated risks
• Requirements for safety devices and equipment, including personnel safety and life support
equipment
• Warnings, cautions, and special emergency procedures (e.g., egress, rescue, escape),
including those necessitated by failure of a software-controlled operation to produce the
expected and required safe result or indication
• Requirements for handling, storage, transportation, maintenance, and disposal of hazardous
materials
• Requirements for safety training and personnel certification.
8- 18
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
The O&SHA documents system safety assessment of procedures involved in system production,
deployment, installation, assembly, test, operation, maintenance, servicing, transportation, storage,
modification, and disposal. A contractor must update the O&SHA when needed as a result of any system
design or operational changes. If no specific analysis techniques are directed, the contractor should
obtain MA approval of technique(s) to be used prior to performing the analysis.
For work performed under contract, details to be specified in the SOW shall include, as applicable:
The first step of the HHA is to identify and determine quantities of potentially hazardous materials or
physical agents (noise, radiation, heat stress, cold stress) involved with the system and its logistical
support. The next step is to analyze how these materials or physical agents are used in the system and for
its logistical support. Based on the use, quantity, and type of substance/agent, estimate where and how
personnel exposures may occur and if possible the degree or frequency of exposure. The final step
includes incorporation into the design of the system and its logistical support equipment/facilities,
cost-effective controls to reduce exposures to acceptable levels. The life-cycle costs of required controls
could be high, and consideration of alternative systems may be appropriate.
An HHA evaluates the hazards and costs due to system component materials, evaluates alternative
materials, and recommends materials that reduce the associated risks and life-cycle costs. Materials are
evaluated if (because of their physical, chemical, or biological characteristics; quantity; or concentrations)
they cause or contribute to adverse effects in organisms or offspring, pose a substantial present or future
danger to the environment, or result in damage to or loss of equipment or property during the systems life
cycle.
• Chemical hazards - Hazardous materials that are flammable, corrosive, toxic, carcinogens or
suspected carcinogens, systemic poisons, asphyxiants, or respiratory irritants
• Physical hazards (e.g., noise, heat, cold, ionizing and non-ionizing radiation)
• Biological hazards (e.g., bacteria, fungi)
• Ergonomic hazards (e.g., lifting, task saturation)
• Other hazardous materials that may be introduced by the system during manufacture,
operation, or maintenance.
8- 19
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• System, facility, and personnel protective equipment requirements (e.g., ventilation, noise
attenuation, radiation barriers) to allow safe operation and maintenance. When feasible
engineering designs are not available to reduce hazards to acceptable levels, alternative
protective measures must be specified (e.g., protective clothing, operation or maintenance
procedures to reduce risk to an acceptable level).
• Potential material substitutions and projected disposal issues. The HHA discusses long- term
effects such as the cost of using alternative materials over the life cycle or the capability and
cost of disposing of a substance.
• Hazardous material data. The HHA describes the means for identifying and tracking
information for each hazardous material. Specific categories of health hazards and impacts
that may be considered are acute health, chronic health, cancer, contact, flammability,
reactivity, and environment.
• Identification of the hazardous materials by name(s) and stock numbers (or CAS numbers);
the affected system components and processes; the quantities, characteristics, and
concentrations of the materials in the system; and source documents relating to the materials
• Determination of the conditions under which the hazardous materials can release or emit
components in a form that may be inhaled, ingested, absorbed by living beings, or leached
into the environment
• Characterization material hazards and determination of reference quantities and hazard
ratings for system materials in question
• Estimation of the expected usage rate of each hazardous material for each process or
component for the system and program-wide impact
• Recommendations for the disposition of each hazardous material identified. If a reference
quantity is exceeded by the estimated usage rate, material substitution or altered processes
may be considered to reduce risks associated with the material hazards while evaluating the
impact on program costs.
For each proposed and alternative material, the assessment must provide the following data for
management review:
• Material identification. Includes material identity, common or trade names, chemical name,
chemical abstract service (CAS) number, national stock number (NSN), local stock number,
physical state, and manufacturers and suppliers
• Material use and quantity. Includes component name, description, operations details, total
system and life cycle quantities to be used, and concentrations of any mixtures
• Hazard identification. Identifies the adverse effects of the material on personnel, the system,
environment, or facilities
• Toxicity assessment. Describes expected frequency, duration, and amount of exposure.
References for the assessment must be provided
8- 20
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
For work performed under contract, details to be specified in the SOW include:
• The identification of actual hazards and risks. Hazards may occur from either simultaneous
or sequential failures and from "outside" influences, such as environmental factors or
operator errors.
• An assessment of each identified risk. A realistic assessment considers the risk severity (i.e.,
what is the worst that can happen?) and the potential frequency of occurrence (i.e., how often
can the accident occur?). Risk as a function of expected loss is determined by the severity of
loss and how often the loss occurs. Some hazards are present all of the time, or most of the
time, but do not cause losses.
• Recommendations for resolution of the risk (i.e., what should we do about it?). Possible
solutions mapped into the safety precedence of Chapter 4 are shown in Figure 8-4.
8- 21
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
should be submitted prior to the preliminary design review. The instructions for a system request for
proposal (RFP) with critical safety characteristics should include the requirements to submit a draft PHA
with the proposal. This initial PHA provides a basis for evaluating the bidder's understanding of the
safety issues. As detailed design specifications and details emerge, the PHA must be revised. The
System Hazard Analysis and Subsystem Hazard Analyses (SHA and SSHA) are typically submitted prior
to a Critical Design Review (CDR) or other similar review. They cannot be completed until the design is
finalized at completion of the CDR. Finally, operating and support hazard analyses (O&SHA) are
typically submitted after operating, servicing, maintenance, and overhaul procedures are written prior to
initial system operation.
Analyses must be done in time to be beneficial. Determining that the timing was too late and rejecting the
analysis for this reason provides little benefit. For example, if an SHA is performed near the end of the
design cycle, it provides little benefit. The time to prevent this situation is during contract generation or
less efficiently at a major program milestone such as design review.
When reviewing an analysis the following may provide some insight as to whether an analysis was
performed in a timely manner:
• Is there a lack of detail in the reports? This lack of detail may also be due to insufficient
experience or knowledge on the analyst's part, or due to lack of detailed design information at
the time.
• Are hazards corrected by procedure changes, rather than through design changes? This may
indicate that hazards were detected too late to impact the design or that the safety program
did not receive the proper management attention.
• Are the controls for some hazards are difficult to assess and therefore require verification
through testing or demonstration? For example, consider an audio alarm control for
minimizing the likelihood of landing an aircraft in a wheels-up condition. The analyst or the
reviewer may realize that there are many potential audio alarms in the cockpit that may
require marginally too much time to shift through. The lack of a planned test or test details
should raise a warning flag. This may indicate poor integration between design, safety, and
test personnel or an inadequate understanding of system safety impact on the test program.
• Is there a lack of specific recommendations? Some incomplete or late hazard reports may
have vague recommendations such as "needs further evaluation" or "will be corrected by
procedures." Recommendations that could have or should have been acted on by the
contractor and closed out before the report was submitted are other clear indications of
inadequate attention. Recommendations to make the design comply with contractual
specifications and interface requirements are acceptable resolutions, provided the
specifications address the hazard(s) identified.
Ideally, the final corrective action(s) should be stated in the analysis. In most cases, this is not possible
because the design may not be finalized, or procedures have not been written. In either case, actions that
control risk to acceptable levels should be identified. For example, if a hazard requires procedural
corrective action, the report should state where the procedure would be found, even if it will be in a
document not yet written. If the corrective action is a planned design change, the report should state that,
and how the design change will be tracked (i.e., who will do what and when). In any case, the planned
specific risk control actions should be included in the data submission. These risks should be listed in a
hazard tracking and resolution system for monitoring.
8- 22
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
If specific risk control implementation details are not yet known (as can happen in some cases), there are
two main options:
• Keep the analysis open and periodically revise the report as risk control actions are
implemented. (This will require a contract change proposal if outside the scope of the original
statement of work (SOW)). For example, an SSHA might recommend adding a warning horn
to the gear "not down" lamp for an aircraft. After alternatives have been evaluated and a
decision made, the analysis report (and equipment specification) should be revised to include
"An auditory and a visual warning will be provided to warn if the landing gear is not
extended under the following conditions .....".
• Close the analysis, but indicate how to track the recommendation. (Provisions for tracking
such recommendations must be within the scope of the contract's SOW.) This is usually done
for a PHA, which is rarely revised. For example, a PHA may recommend a backup
emergency hydraulic pump. The analysis should state something like ".. . recommend
emergency hydraulic pump that will be tracked under Section L of the hydraulic subsystem
hazard analysis." This method works fine if the contract's SOW requires the analyst to
develop a tracking system to keep hazards from getting lost between one analysis and the
next. The presence of a centralized hazard tracking system is a good indicator of a quality
system safety program and should be a contractual requirement.
Failure Modes and Effects Analysis (FMEA) / Failure Modes, Effects, and Criticality Analysis (FMECA).
Some system safety analyses get a "jump start" from FMEAs or FMECAs prepared by reliability
engineers. The FMEA/FMECA data get incorporated into system safety analyses by adding a hazard
category or other appropriate entries. This saves staffing and funds. An FMEA/FMECA performed by a
reliability engineer will have different objectives than the safety engineer's analyses. The following
cautions should be noted:
• Corrective action for hazards surfaced by these tools is the responsibility of the safety
engineer(s).
• Sequential or multiple hazards may not be identified by the FMEA/FMECA.
• Some hazards may be missing. This is because many hazards are not a result of component
failures (e.g., human errors, sneak circuits).
• All failure modes are not hazards. If the FMECA is blindly used as the foundation for a
hazard analysis, time could be wasted on adding safety entries on non-safety critical systems.
• Human error hazards might not be identified.
• System risks will not have been identified.
8- 23
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• The matrix format is the most widely used. This method lists the component parts of a
subsystem on a reprinted form that includes several columns, the number of which can vary
according to the analysis being done. As a minimum, there should be columns for each of the
following:
• Logic diagrams, particularly fault trees, are used to focus on certain risks. These are
deductive analyses that begin with a defined undesired event (usually a accident condition)
then branch out to organize all faults, sub-events, or conditions that can lead to the original
undesired event.
• The narrative format will suffice for a few cases, such as focusing on a few easily identified
risks associated with simple systems. This format is the easiest to apply (for the analyst), but
is the most difficult to evaluate. There is no way to determine if a narrative report covers all
risks so the evaluator is relying totally on the analyst's judgment.
A PHA should always be performed for each separate program or project. The PHA provides an initial
assessment of the overall program risk and it is used as a baseline for follow-on analyses, such as SSHAs,
SHAs, and O&SHAs. It also identifies the need for safety tests and is used to establish safety
requirements for inclusion in the system's specifications.
8- 24
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Subsequent decisions relate to the desirability of SSHA, SHA, and/or O&SHA. This decision is based
upon several factors:
• The nature and use of the system being evaluated, especially safety criticality.
• The results of the PHA. If the system being analyzed has no unresolved safety concerns, then
further analyses may not be necessary. If the hazards appear to be based upon training or
procedural problems, then an O&SHA may be the next step. The results of the PHA will
dictate the need.
• The complexity of the system being analyzed. A major system, such as an aircraft or air
traffic control center would need separate analyses for different subsystems, then an overall
system analysis to integrate, or find the hazards resulting from the interfaces between the
different subsystems. On the other hand, an aircraft landing gear system should only need
one single hazard analysis.
• The available funding.
The tabular, or matrix, format is the most widely used format for a PHA, primarily because it provides a
convenient assessment of the overall risks to a system. The basic tabular format may have entries for
hazard sources, such as energy sources (i.e., electrical, pneumatic, mechanical). This PHA would list all
known electrical energy sources with their initial hazard assessments, and then recommended corrective
action. Another type of tabular format PHA would list key hazards (such as fire and explosion) and
identify the known potential contributors for these events.
Some PHAs will be in the form of a logic diagram or Fault Tree Analysis (FTA). These are usually done
to identify the major causes of a top undesired event, and are generally not done to a detailed level.
8- 25
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Instead, the details are added during subsequent analyses. A few PHAs will be done in a narrative format.
Typically, each paragraph will cover an individual risk, its impact, and proposed resolution. Narrative
analyses are preferred for covering a risk in detail, but have the drawback of not having a good tracking
system unless tracking numbers are assigned. Narrative PHAs can have provisions for tracking risks, by
limiting each single risk and by using the paragraph numbers for tracking.
If the answer to any of the questions is "no," then revising or re-performing the PHA may be necessary.
One pitfall may be timing. By the time a PHA is completed and submitted, there may be insufficient time
to do much with it before the program continues on toward future milestones. In order to obtain the most
benefit from the PHA process, the evaluator must work closely with the analyst to ensure the analysis is
proceeding correctly. Periodic submittals of an analysis do not always provide enough time to correct
inappropriate approaches before program milestones push the program beyond the point where the
analysis is beneficial.
Most SSHAs are documented in the matrix format, while some are fault trees or other forms of logic
diagrams. Fault trees, by themselves, are incomplete and do not directly provide useful information. The
utility of fault trees come from the cut and path sets they generate and the analysis of the cut and path sets
8- 26
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
for common cause failures and independence of failures/faults. Fault trees are good for analyzing a
specific undesired event (e.g., rupture of pressure tank), and can find sequential and simultaneous failures,
but are time consuming and expensive. The SSHAs are more detailed than the PHA and are intended to
show that the subsystem design meets the safety requirements in the subsystem specifications(s). If
hazards are not identified and corrected during the design process, they might not be identified and
corrected later when the subsystem designs are frozen and the cost of making a change is significantly
increased.
8- 27
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
System Such
as an aircraft
(B747)
Secondary (undeveloped) and environmental failures require judgment too. During most FTAs, these
failures usually are not developed (i.e., pursued further) as they may be beyond the scope of the analyses.
These failures are labeled by diamond symbols in a fault tree.
• Write the SOW so that the "final" SSHA is delivered when the production baseline design is
really established.
• Require the risk to be tracked until it is really closed out.
8.7.5 How Can Other Sources of Data be Used to Complete the Analysis?
The FMEA or FMECA can provide SSHA data. These analyses use a matrix format partially suitable for
an SSHA. It lists each component, the component function, types of failure, and the effects of the failures.
Most FMEAs also include component failure rate information. An FMEA can be used as a basis for an
SSHA, but several factors must be considered:
• Many FMEAs do not list hazard categories (e.g., Category I - catastrophic) necessary for
hazard analyses.
8- 28
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• Hazards may not be resolved in a reliability analysis. These analyses emphasize failure
effects and rates. They do not always lead to or document corrective action for hazards.
• Failure rate data used for reliability purposes may not be meaningful for safety analyses.
Failure rates THAT meet reliability requirements (normally in the .9 or .99 range) may not be
adequate to meet safety requirements (often in the .999999 range). In addition, many
reliability failures such as a leaking actuator may not be hazardous although in the case it
may, if undetected, become a safety issue as degradation continues. Some such as ruptured
actuator may be a hazard.
• Sequential or multiple hazards might not be addressed, as well as risks.
• FMEAs address only failures and ignore such safety related faults such as human or
procedural errors.
In spite of shortcomings, it is normally more cost effective to expand a reliability analysis to include
Hazard Category, Hazard Resolution, and to modify reliability data that is appropriate for safety to be
useful as an SSHA than starting from scratch.
An FTA is ideal for focusing on a single undesired event (e.g., failure of engine ignition) but is time
consuming and can be expensive. Nevertheless, the FTA should be used for any serious risk whose
causes are not immediately obvious (e.g., "0" ring failure) and that needs to be examined in detail because
of the concern over the effects of multiple failures and common cause failures. The approach is to list the
undesired events, then perform fault trees for each one.
Ideally, the SHA will identify hazards and risks that apply to more than a single subsystem and are not
identified in the SSHAs. Most risks of this type result at interfaces between subsystems. For example, an
Air Traffic Control (ATC) might have separate SSHAs on the communications and data processing
systems. Assume that these SSHAs controlled all known critical and catastrophic hazards. The SHA
might identify a previously undiscovered hazard (e.g., incompatible maximum data transfer rates leading
to data corruption). The analysis approach is to examine the interfaces between subsystems. In addition,
the SHA looks for ways in which safety-critical system level functions can be lost.
Consider, for example, an aircraft anti-skid braking SSHA. It cannot be performed comprehensively if
the input information is limited to the landing gear design since there are many other subsystems that
interface with the anti-skid subsystem. For instance, the cockpit contains the control panel that turns the
anti-skid system on and off and notifies the crew of an anti-skid system failure. This control panel is
normally not documented in the landing gear design package and potential could be missed if the analysis
focuses only on the landing gear. Other brake system interfaces exist at the hydraulic and electrical
power supply subsystems. The SHA is designed to cut across all interfaces.
The system and subsystem definitions are important to the evaluation of a SHA. If the overall system
(and its subsystems) are not adequately defined, it is difficult to perform a successful SHA. In most
cases, system definition is simple. An aircraft, for example, can be a system. In an aircraft "system"
there are many subsystems, such as flight controls and landing gear.
8- 29
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• Are all the proper interfaces considered? It is obvious that aircraft flight control subsystems
interface with hydraulic power subsystems, but not so that they interface with electrical,
structural, and the display systems. The evaluator must be familiar with the system being
analyzed; if not, the evaluator cannot determine whether or not all interfaces were covered.
• How were the interfaces considered? For example did the analysis consider both mechanical
and electrical connections between two subsystems such as structure and hydraulic.
Timing of the O&SHA is important. Generally, an Occupational Safety and Health Administration's
(OSHA) output (i.e., hazard control) is safety's blessing on "procedures." In most cases, procedures aren't
available for review until the system begins initial use or initial test and evaluation. As a result, the
O&SHA is typically the last formal analysis to be completed. Actually, the sooner the analysis begins,
the better. Even before the system is designed, an O&SHA can be started to identify hazards with the
anticipated operation of the system. Ideally, the O&SHA should begin with the formulation of the system
and not be completed until sometime after initial test of the system (which may identify additional
hazards). This is critical because design and construction of support facilities must begin far before the
system is ready for fielding, and all special safety features (e.g., fire suppression systems) must be
identified early or the costs to modify the facilities may force program managers and users to accept
unnecessary risks.
When evaluating an O&SHA, it is important to insure that the analysis considers not only the normal
operation of the system, but abnormal, emergency operation, system installation, maintenance, servicing,
storage, and other operations as well. Misuse and emergency operations must also be considered. In
other words, if anyone will be doing anything with the system, planned or unplanned, the O&SHA should
cover it.
• Is there auxiliary equipment (e.g., loading handling, servicing, tools) that are planned to be
used with the system?
• Is there a training program? Who will do the training, when, and how? What training aids
will be used? Mock-ups and simulators may be needed for complex systems.
• Are there procedures and manuals? These must be reviewed and revised as needed to
eliminate or control hazards. This effort requires that the analyst have good working
relationships with the organization developing the procedures. If procedures are revised for
any reason, the safety analyst needs to be involved.
• Are there procedures for the handling, use, storage, and disposal procedures for hazardous
materials?
8- 30
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Human factors are an important consideration for the O&SHA. The O&SHA should be done in concert
with the human factors organization since many accidents or accidents can be caused by operator error.
Equipment must be user friendly and the O&SHA is an appropriate tool to ensure this takes place.
Ideally, the O&SHA should be performed by both by system safety and human factors personnel.
O&SHAs are normally completed and submitted as a single document, typically in a matrix format. For a
complex system, this analysis is composed of several separate analyses, such as one for operation and
another for maintaining and servicing the system (sometimes called maintenance hazard analysis). The
latter might be performed for several different levels of maintenance. Maintenance analyses consider
actions such as disconnecting and re-applying power, use of access doors, panels, and hardstands.
The O&SHA should also include expanded operations, i.e., uses of the system for reasonable operations
not explicitly specified in the equipment specification. For example, an O&SHA should normally cover
the risks associated with aircraft refueling and engine maintenance. There may be some unusual
operational conditions (bad weather approaching) where an O&SHA may be necessary where refueling
needs to be performed simultaneously with the performance of maintenance. Early test programs are a
significant source of operating and support hazards not previously identified. An observant safety
monitor might notice that, for example, the proximity of an aircraft fuel vent outlet and hot engines.
Corrective action would be to relocate the vent to remove fuel vapors from the vicinity of the hot engines.
To benefit from test programs, and identify these "expanded operations", O&SHAs can be required to
include data from by contract to use test experience as an input to the analysis.
The FTA is one of several deductive logic model techniques, and is by far the most common. The FTA
begins with a stated top-level hazardous/undesired event and uses logic diagrams to identify single events
and combinations of events that could cause the top event. The logic diagram can then be analyzed to
identify single and multiple events that can cause the top event. Probability of occurrence values are
assigned to the lowest events in the tree. FTA utilizes Boolean Algebra to determine the probability of
occurrence of the top (and intermediate) events. When properly done, the FTA shows all the problem
areas and makes the critical areas stand out. The FTA has two drawbacks:
• Depending on the complexity of the system being analyzed, it can be time consuming, and
therefore very expensive.
• It does not identify all system hazards, it only identifies failures associated with the
predetermined top event being analyzed. For example, an FTA will not identify "ruptured
tank" as a hazard in a home water heater. It will show all failures that lead to that event. In
other words, the analyst needs to identify all hazards that cannot be identified by use of a
fault tree.
The graphic symbols used in a FTA are provided in Figure 8-6.
8- 31
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Events Gates
The first area for evaluation (and probably the most difficult) is the top event. This top event should be
very carefully defined and stated. If it is too broad (e.g., aircraft crashes), the resulting FTA will be overly
large. On the other hand, if the top event is too narrow (e.g., aircraft crashes due to pitch-down caused by
broken bellcrank pin), then the time and expense for the FTA may not yield significant results. The top
event should specify the exact hazard and define the limits of the FTA. In this example, a good top event
would be "uncommanded aircraft pitch-down," which would center the fault tree around the aircraft flight
control system, but would draw in other factors, such as pilot inputs and engine failures. In some cases, a
broad top event may be useful to organize and tie together several fault trees. In the example, the top
event would be "aircraft crash." This event would be connected to an OR-gate having several detailed top
events as shown in Figure 8-5. Some fault trees do not lend themselves to quantification because the
factors that tie the occurrence of a second level event to the top event are normally outside the
control/influence of the operator (e.g., an aircraft that experiences loss of engine power may or may not
crash depending on altitude at which the loss occurs).
8- 32
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Airplane
Crashes
A quick evaluation of a fault tree may be possible by looking at the logic gates. Most fault trees will have
a substantial majority of OR gates. If fault trees have too many OR gates, every fault of event may lead
to the top event. This may not be the case, but a large majority of OR gates will certainly indicate this.
An evaluator needs to be sure that logic symbols are well defined and understood. If nonstandard
symbols are used, they must not get mixed with other symbols.
Check for proper control of transfers. Transfers are reference numbers permitting linking between pages
of FTA graphics. Fault trees can be extremely large, requiring the uses of many pages and clear interpage
references. Occasionally, a transfer number may be changed during fault tree construction. If the
corresponding sub-tree does not have the same transfer number, then improper logic will result.
Cut sets (minimum combinations of events that lead to the top event) need to be evaluated for
completeness and accuracy. Establishing the correct number of cuts and their depth is a matter of
engineering judgment. The fault tree in Figure 8-6 obscures some of the logic visible in Figure 8-5,
preventing identification of necessary corrective action. Figure 8-7 illustrates that event Figure 8-6 was
not complete.
8- 33
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
Airplane
Crashes
Each fault tree should include a list of minimum cut sets. Without this list, it is difficult to identify
critical faults or combinations of events. For large or complicated fault trees, a computer is necessary to
catch all of the cut sets; it is nearly impossible for a single individual to find all of the cut sets.
For a large fault tree, it may be difficult to determine whether or not the failure paths were completely
developed. If the evaluator is not totally familiar with the system, the evaluator may need to rely upon
other means. A good indication is the shape of the symbols at the branch bottom. If the symbols are
primarily circles (primary failures), the tree is likely to be complete. On the other hand, if many symbols
are diamonds (secondary failures or areas needing development), then it is likely the fault tree needs
expansion.
Faulty logic is probably the most difficult area to evaluate, unless the faults lie within the gates, which are
relatively easy to spot. A gate-to-gate connection shows that the analyst might not completely understand
the workings of the system being evaluated. Each gate must lead to a clearly defined specific event, i.e.,
what is the event and when does it occur? If the event consists of any component failures that can directly
cause that event, an OR gate is needed to define the event. If the event does not consist of any component
failures, look for an AND gate.
When reviewing an FTA with quantitative hazard probabilities of occurrence, identify the events with
relatively large probability of occurrence. They should be discussed in the analysis summaries, probably
as primary cause factors.
A large fault tree performed manually is susceptible to errors and omissions. There are many advantages
of computer modeling relative to manual analysis (of complex systems):
• Logic errors and event (or branch) duplications can be quickly spotted.
• Cut sets (showing minimum combinations leading to the top event) can be listed.
• Numerical calculations (e.g., event probabilities) can be quickly done.
• A neat, readable, fault tree can be drawn.
8- 34
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• Establishing overall risk levels (usually specified in terms of risk severity and risk
probability).
• Determining areas that need particular attention due to their higher probabilities of a failure.
Overall risk can be expressed by looking at the combination of severity (i.e., what is the worst that can
happen?) and probability (i.e., how often will it happen?). This is a realistic and widely accepted
approach. A high level hazard can have a low risk of occurrence. For example, an aircraft wing
separation in flight is definitely a catastrophic risk, but under normal flight conditions, it is not likely to
occur, so the risk is relatively low. At the other end of the spectrum, many jet engines spill a small
amount of fuel on the ground during shutdown. This is a relatively low severity with a high probability of
occurrence, so the overall risk is low.
Judgment is needed for preparing an analysis and for evaluating it. An analyst might judge a "wheel
down" light failure as a Severity 2 or 3 risk because its failure still gives the aircraft "get home" capability
with reduced performance. On the other hand, if the wheels fail to lock in a down position and no
warning is given, significant damage and injury may result. This scenario is a Severity of 1. Judgment is
needed for establishing risk probabilities.
An accurate method for determining risk probabilities is to use component failure rates (e.g., valve xxx
will fail to close once in 6 x 105 operations). However, there are some pitfalls that need to be considered
during evaluation:
• Where did the failure rates come from? Industry data sources? Government data sources?
Others? What is their accuracy?
• If the component has a usage history on a prior system, its failure rate on the new system
might be the same. However, the newer system might subject the component to a different
use cycle or environment, and significantly affect the failure rate.
• For newly developed components, how was the failure rate determined?
• Does the failure rate reflect the hazard failure mode or does it represent all failure modes?
For example, if a hazard is caused by capacitor shorting, the failure rate might represent all
capacitor failure modes including open and value drift. The result is exaggeration of the
probability of occurrence.
8- 35
FAA System Safety Handbook, Chapter 8: Safety Analysis/Hazard Analysis Tasks
December 30, 2000
• System users are comprised of many contributors, human errors, software malfunctions, not
just hardware failures.
Any of the above techniques can be used successfully. If more than one contractor or organization will be
performing analyses, or if one is subcontracted to another contractually, all of them must be required to
use the same definitions of probability levels, or some mismatching will result.
8- 36
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
Chapter 9:
Analysis Techniques
9.1 Introduction
Many analysis tools are available to perform hazard analyses for each program. These range from the
relatively simple to the complex. In general, however, they fall into two categories:
This chapter describes characteristics of many popular analysis approaches and, in some cases, provides
procedures and examples of these techniques. The analysis techniques covered in this chapter are the
following:
Fault Hazard
Fault Tree
Common Cause Failure
Sneak Circuit
Energy Trace
Failure Modes, Effects, and Criticality
Analysis (FMECA)
9- 2
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
The fault hazard analysis must consider both "catastrophic" and "out-of-tolerance modes" of failure. For
example, a five-percent, 5K (plus or minus 250 ohm) resistor can have as functional failure modes failing
open or failing short, while the out-of-tolerance modes might include too low or too high a resistance.
To conduct a fault hazard analysis, it is necessary to know and understand certain system characteristics:
Equipment mission
Operational constraints
Success and failure boundaries
Realistic failure modes and a measure of their
probability of occurrence.
1. The system is divided into modules (usually functional or partitioning) that can be handled
effectively.
2. Functional diagrams, schematics, and drawings for the system and each subsystem are then
reviewed to determine their interrelationships and the interrelationships of the component
subassemblies. This review may be done by the preparation and use of block diagrams.
3. For analyses performed down to the component level, a complete component list with the specific
function of each component is prepared for each module as it is to be analyzed. For those cases
when the analyses are to be performed at the functional or partitioning level, this list is for the
lowest analysis level.
4. Operational and environmental stresses affecting the system are reviewed for adverse effects on the
system or its components.
5. Significant failure mechanisms that could occur and affect components are determined from
analysis of the engineering drawings and functional diagrams. Effects of subsystem failures are
then considered.
6. The failure modes of individual components that would lead to the various possible failure
mechanisms of the subsystem are then identified. Basically, it is the failure of the component that
produces the failure of the entire system. However, since some components may have more than
9- 3
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
one failure mode, each mode must be analyzed for its effect on the assembly and then on the
subsystem. This may be accomplished by tabulating all failure modes and listing the effects of
each, e.g. a resistor that might fail open or short, high or low). An understanding of physics of
failure is necessary. For example, most resistors cannot fail in a shorted mode. If the analyst does
not understand this, considerable effort may be wasted on attempting to control a nonrealistic
hazard.
7. All conditions that affect a component or assembly should be listed to indicate whether there are
special periods of operation, stress, personnel action, or combinations of events that would increase
the probabilities of failure or damage.
8. The risk category should be assigned.
9. Preventative or corrective measures to eliminate or control the risks are listed.
10. Initial probability rates are entered. These are "best judgments" and are revised as the design
process goes on. Care must be taken to make sure that the probability represents that of the
particular failure mode being evaluated. A single failure rate is often provided to cover all of a
component's failure modes rather than separate ones for each. For example, MIL-HBK-217, a
common source of failure rates, does not provide a failure rate for capacitor shorts, another for
opens, and a third for changes in value. It simply provides a single failure for each operating
condition (temperature, electrical stress, and so forth).
11. A preliminary criticality analysis may be performed as a final step.
The Fault Hazard analysis has some serious limitations. They include:
1. A subsystem is likely to have failures that do not result in accidents. Tracking all of these in the
System Safety Program (SSP) is a costly, inefficient process. If this is the approach to be used,
combining it with an FMEA (or FMECA) performed by the reliability program can save some
costs.
2. This approach concentrates usually on hardware failures, to a lesser extent on software failures,
and often inadequate, attention is given to human factors. For example, a switch with an extremely
low failure rate may be dropped from consideration, but the wrong placement of the switch may
lead to an accident. The adjacent placement of a power switch and a light switch, especially of
similar designs, will lead to operator errors.
3. Environmental conditions are usually considered, but the probability of occurrence of these
conditions is rarely considered. This may result in applying controls for unrealistic events.
4. Probability of failure leading to hardware related hazards ignores latent defects introduced through
substandard manufacturing processes. Thus some hazards may be missed.
5. One of the greatest pitfalls in fault hazard analysis (and in other techniques) is over precision in
mathematical analysis. Too often, analysts try to obtain "exact" numbers from "inexact" data, and
too much time may be spent on improving preciseness of the analysis rather than on eliminating the
hazards.
This tool is used by the professional safety and reliability community to both prevent and resolve hazards
and failures. Both qualitative and quantitative methods are used to identify areas in a system that are most
critical to safe operation. Either approach is effective. The output is a graphical presentation providing
9- 4
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
technical and administrative personnel with a map of "failure or hazard" paths. FTA symbols may be
found in Figure 8- 5. The reviewer and the analyst must develop an insight into system behavior,
particularly those aspects that might lead to the hazard under investigation.
Qualitative FTAs are cost effective and invaluable safety engineering tools. The generation of a qualitative
fault tree is always the first step. Quantitative approaches multiply the usefulness of the FTA but are more
expensive and often very difficult to perform.
An FTA (similar to a logic diagram) is a "deductive" analytical tool used to study a specific undesired
event such as "engine failure." The "deductive" approach begins with a defined undesired event, usually a
postulated accident condition, and systematically considers all known events, faults, and occurrences that
could cause or contribute to the occurrence of the undesired event. Top level events may be identified
through any safety analysis approach, through operational experience, or through a "Could it happen?"
hypotheses. The procedural steps of performing a FTA are:
1. Assume a system state and identify and clearly document state the top level undesired event(s).
This is often accomplished by using the PHL or PHA. Alternatively, design documentation such as
schematics, flow diagrams, level B & C documentation may reviewed.
2. Develop the upper levels of the trees via a top down process. That is determine the intermediate
failures and combinations of failures or events that are the minimum to cause the next higher level
event to occur. The logical relationships are graphically generated as described below using
standardized FTA logic symbols.
3. Continue the top down process until the root causes for each branch is identified and/or until
further decomposition is not considered necessary.
4. Assign probabilities of failure to the lowest level event in each branch of the tree. This may be
through predictions, allocations, or historical data.
5. Establish a Boolean equation for the tree using Boolean logic and evaluate the probability of the
undesired top level event.
6. Compare to the system level requirement. If it the requirement is not met, implement corrective
action. Corrective actions vary from redesign to analysis refinement.
The FTA is a graphical logic representation of fault events that may occur to a functional system. This
logical analysis must be a functional representation of the system and must include all combinations of
system fault events that can cause or contribute to the undesired event. Each contributing fault event
should be further analyzed to determine the logical relationships of underlying fault events that may cause
them. This tree of fault events is expanded until all "input" fault events are defined in terms of basic,
identifiable faults that may then be quantified for computation of probabilities, if desired. When the tree
has been completed, it becomes a logic gate network of fault paths, both singular and multiple, containing
combinations of events and conditions that include primary, secondary, and upstream inputs that may
influence or command the hazardous mode.
9- 5
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
Engine
Failure
O1
O2 O3 O4
No Filter
Carbu-
Fuel 3 Ignit. Ignit.
retor 4
Fuel Sys. Sys.
Pump #1 #2
2 No Fan Pump
Coolant 2 3
1
O4
Seal Bearing
1 2
Standardized symbology is used and is shown in Figure 8-5. A non-technical person can, with minimal
training, determine from the fault tree, the combination and alternatives of events that may lead to failure or
a hazard. Figure 9-1 is a sample fault tree for an aircraft engine failure. In this sample there are three
possible causes of engine failure: fuel flow, coolant, or ignition failure. The alternatives and combinations
leading to any of these conditions may also be determined by inspection of the FTA.
Based on available data, probabilities of occurrences for each event can be assigned. Algebraic
expressions can be formulated to determine the probability of the top level event occurring. This can be
compared to acceptable thresholds and the necessity and direction of corrective action determined.
The FTA shows the logical connections between failure events and the top level hazard or event. "Event,"
the terminology used, is an occurrence of any kind. Hazards and normal or abnormal system operations are
examples. For example, both "engine overheats" and "frozen bearing" are abnormal events. Events are
shown as some combination of rectangles, circles, triangles, diamonds, and "houses." Rectangles represent
events that are a combination of lower level events. Circles represent events that require no further
expansion. Triangles reflect events that are dependent on lower level events where the analyst has chosen
to develop the fault tree further. Diamonds represent events that are not developed further, usually due to
insufficient information. Depending upon criticality, it may be necessary to develop these branches further.
9- 6
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
In the aircraft engine example, a coolant pump failure may be caused by a seal failure. This level was not
further developed. The example does not include a "house." That symbol illustrates a normal (versus
failure) event. If the hazard were "unintentional stowing of the landing goal", a normal condition for the
hazard would be the presence of electrical power.
FTA symbols can depict all aspects of NAS events. The example reflects a hardware based problem. More
typically, software (incorrect assumptions or boundary conditions), human factors (inadequate displays),
and environment conditions (ice) are also included, as appropriate.
Events can be further broken down as primary and secondary. A primary event is a coolant pump failure
caused by a bad bearing. A secondary event would be a pump failure caused by ice through the omission
of antifreeze in the coolant on a cold day. The analyst may also distinguish between faults and failures. An
ignition turned off at the wrong time is a fault, an ignition switch that will not conduct current is an
example of failure.
Events are linked together by "AND" and "OR" logic gates. The latter is used in the example for both fuel
flow and carburetor failures. For example, fuel flow failures can be caused by either a failed fuel pump or
a blocked fuel filter. An "AND" gate is used for the ignition failure illustrating that the ignition systems are
redundant. That is both must fail for the engine to fail. These logic gates are called Boolean gates or
operators. Boolean algebra is used for the quantitative approach. The "AND" and "OR" gates are
numbered sequentially A# or O# respectively in Figure 9-1.
As previously stated, the FTA is built through a deductive "top down" process. It is a deductive process in
that it considers combinations of events in the "cause" path as opposed to the inductive approach, which
does not. The process is asking a series of logical questions such as "What could cause the engine to fail?"
When all causes are identified, the series of questions is repeated at the next lower level, i.e., "What would
prevent fuel flow?" Interdependent relationships are established in the same manner.
When a quantitative analysis is performed, probabilities of occurrences are assigned to each event. The
values are determined through analytical processes such as reliability predictions, engineering estimates, or
the reduction of field data (when available). A completed tree is called a Boolean model. The probability of
occurrence of the top level hazard is calculated by generating a Boolean equation. It expresses the chain of
events required for the hazard to occur. Such an equation may reflect several alternative paths. Boolean
equations rapidly become very complex for simple looking trees. They usually require computer modeling
for solution.
In addition to evaluating the significance of a risk and the likelihood of occurrence, FTAs facilitate
presentations of the hazards, causes, and discussions of safety issues. They can contribute to the
generation of the Master Minimum Equipment List (MMEL).
The FTA's graphical format is superior to the tabular or matrix format in that the inter-relationships are
obvious. The FTA graphic format is a good tool for the analyst not knowledgeable of the system being
examined. The matrix format is still necessary for a hazard analysis to pick up severity, criticality, family
tree, probability of event, cause of event, and other information. Being a top-down approach, in contrast to
the fault hazard and FMECA, the FTA may miss some non-obvious top level hazards.
9- 7
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
FTA are examined through the development of matrices to determine if failures are linked to some common
cause relating to environment, location, secondary causes, human error, or quality control. A cut set is a
set of basic events (e.g., a set of component failures) whose occurrence causes the system to fail. A
minimum cut set is one that has been reduced to eliminate all redundant "fault paths." CCFA provides a
better understanding of the interdependent relationship between FTA events and their causes. It analyzes
safety systems for "real" redundancy. This analysis provides additional insight into system failures after
development of a detailed FTA when data on components, physical layout, operators, and inspectors are
available.
Sneak Circuit Analysis (SCA) is a unique method of evaluating electrical circuits. SCA employs
recognition of topological patterns that are characteristic of all circuits and systems. The purpose of this
analysis technique is to uncover latent (sneak) circuits and conditions that inhibit desired functions or cause
undesired functions to occur, without a component having failed. The process is convert schematic
diagrams to topographical drawings and search for sneak circuits. This is a labor intensive process best
performed by special purpose software. Figure 9-2 shows an automobile circuit that contains a sneak
circuit. The sneak path is through the directional switch and flasher, the brake light switch, and the radio.
9- 8
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
The latent nature of sneak circuits and the realization that they are found in all types of electrical/electronic
systems suggests that the application of SCA to any system that is required to operate with a high
reliability is valuable. This process is quite expensive and is often limited to highly critical (from the safety
viewpoint) systems. Applications include many systems outside the FAA such as nuclear plant safety
subsystems, ordnance handling systems, and space craft. Consideration should be given to utilizing this
tool for FAA applications that eliminate human control such as an autopilot.
The fact that the circuits can be broken down into the patterns shown allows a series of clues to be applied
for recognition of possible sneak circuit conditions. These clues help to identify combinations of controls
and loads that are involved in all types of sneak circuits. Analysis of the node-topographs for sneak circuit
conditions is done systematically with the application of sneak circuit clues to one node at a time. When all
of the clues that apply to a particular pattern have been considered, it is assured that all possible sneak
circuits that could result from that portion of the circuit have been identified. The clues help the analyst to
determine the different ways a given circuit pattern can produce a "sneak." Figure 9-3 is a node topograph
equivalent of Figure 9-2
Power
Directional
Switch
Flasher
Lights Brake
Radio
Light
Switch
There are four basic categories of sneak circuits that will be found.
9- 9
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
In addition to the identification of sneak circuits, results include disclosure of data errors and areas of
design concern. Data errors are identified and reported incrementally on Drawing Error Reports from the
time of data receipt through the analysis period. These errors generally consist of lack of agreement
between or within input documents. Conditions of design concern are primarily identified during the
network tree analysis. Design concern conditions include:
The three resultant products of SCA (sneak circuit, design concern, and drawing error conditions) are
reported with an explanation of the condition found, illustrated as required, and accompanied with a
recommendation for correction.
The purpose of energy trace analysis is to ensure that all hazards and their immediate causes are identified.
Once the hazards and their causes are identified, they can be used as top events in a fault tree or used to
verify the completeness of a fault hazard analysis. Consequently, the energy trace analysis method
complements but does not replace other analyses, such as fault trees, sneak circuit analyses, event trees,
and FMEAs.
Identification of energy sources and energy transfer processes is the key element in the energy source
analysis procedure. Once sources of energy have been identified, the analyst eliminates or controls the
hazard using the system safety precedence described in Chapter 3, Table 3-1.
These analyses point out potential unwanted conditions that could conceivably happen. Each condition is
evaluated further to assess its hazard potential. The analysis and control procedures discussed throughout
this handbook are applied to the identified hazards.
1. Identify the resource being protected (personnel or equipment) to guide the direction of the analysis
toward the identification of only those conditions (i.e., hazards) that would be critical or
catastrophic from a mission viewpoint.
2. Identify system and subsystems, and safety critical components.
9 - 10
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
3. Identify the operational phase(s), such as preflight, taxi, takeoff, cruise, landing, that each
system/subsystem/component will experience. It is often desirable to report results of hazard
analyses for each separate operational phase.
4. Identify the operating states for the subsystems/components (e.g., on/off, pressurized, hot, cooled)
during each operational phase.
5. Identify the energy sources or transfer modes that are associated with each subsystem and each
operating state. A list of general energy source types and energy transfer mechanisms is presented
in Figure 9-4.
6. Identify the energy release mechanism for each energy source (released or transferred in an
uncontrolled/unplanned manner). It is possible that a normal (i.e., as designed) energy release
could interact adversely with other components in a manner not previously or adequately
considered.
7. Review a generic threat checklist for each component and energy source or transfer mode.
Experience has shown that certain threats are associated with specific energy sources and
components.
8. Identify causal factors associated with each energy release mechanism. A hazard causal factor
may have subordinate or underlying causal factors associated with it. For instance, excessive
stress may be a "top level" factor. The excessive stress may, in turn, be caused by secondary
factors such as inadequate design, material flaws, poor quality welds, excessive loads due to
pressure or structural bending. By systematically evaluating such causal factors, an analyst may
identify potential design or operating deficiencies that could lead to hazardous conditions. Causal
factors are identified independent of the probability of occurrence of the factor; the main question
to be answered is: Can the causal factor occur or exist?
9. Identify the potential accident that could result from energy released by a particular release
mechanism.
10. Define the hazardous consequences that could result given the accident specified in the previous
step.
11. Evaluate the hazard category (i.e., critical, catastrophic, or other) associated with the potential
accident.
12. Identify the specific hazard associated with the component and the energy source or transfer mode
relative to the resource being protected.
13. Recommend actions to control the hazardous conditions.
14. Specify verification procedures to assure that the controls have been implemented adequately.
9 - 11
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
There are some risk/hazard control methodologies that lend themselves to an energy source hazard analysis
approach. These include the following strategies:
9 - 12
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
Hazard analyses typically use a top down analysis methodology (e.g., Fault Tree). The approach first
identifies specific hazards and isolates all possible (or probable) causes. The FMEA/FMECA may be
performed either top down or bottoms-up, usually the latter.
Hazard analyses consider failures, operating procedures, human factors, and transient conditions in the list
of hazard causes. The FMECA is more limited. It only considers failures (hardware and software). It is
generated from a different set of questions than the HA: “If this fails, what is the impact on the system?
Can I detect it? Will it cause anything else to fail?” If so, the induced failure is called a secondary failure.
FMEAs may be performed at the hardware or functional level and often are a combination of both. For
economic reasons, the FMEA often is performed at the functional level below the printed circuit board or
software module assembly level and at hardware or smaller code groups at higher assembly levels. The
approach is to characterize the results of all probable component failure modes or every low level function.
A frozen bearing (component) or a shaft unable to turn (function) are valid failure modes.
The procedural approach to generating an FMEA is comparable to that of the Fault Hazard Analysis. The
first step is to list all components or low level functions. Then, by examining system block diagrams,
schematics, etc., the function of each component is identified. Next, all reasonably possible failure modes
of the lowest “component” being analyzed are identified. Using a coolant pump bearing as an example (see
Figure 9-5), they might include frozen, high friction, or too much play. For each identified failure mode,
the effect at the local level, an intermediate level, and the top system level are recorded. A local effect
might be “the shaft won’t turn”, the intermediate “pump won’t circulate coolant”, and the system level
“engine overheat and fail”. At this point in the analysis, the FMEA might identify a hazard.
The analyst next documents the method of fault detection. This input is valuable for designing self test
features or the test interface of a system. More importantly, it can alert an air crew to a failure in process
prior to a catastrophic event. A frozen pump bearing might be detected by monitoring power to the pump
motor or coolant temperature. Given adequate warning, the engine can be shut down before damage or the
aircraft landed prior to engine failure. Next, compensating provisions are identified as the first step in
determining the impact of the failure. If there are redundant pumps or combined cooling techniques, the
9 - 13
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
significance of the failure is less than if the engine depends on a single pump. The severity categories used
for the hazard analysis can be used as the severity class in the FMEA. A comments column is usually
added to the FMEA to provide additional information that might assist the reviewer in understanding any
FMEA column.
Adding a criticality figure of merit is needed to generate the FMECA, shown in Figure 9-5, from the
FMEA. Assigning severity levels can not be performed without first identifying the purpose of the
FMECA. For example, a component with a high failure rate would have a high severity factor for a
reliability analysis: a long lead time or expensive part would be more important in a supportability analysis.
Neither may be significant from a safety perspective. Therefore, a safety analysis requires a unique
criticality index or equation. The assignment of a criticality index is called a criticality analysis. The Index
is a mathematical combination of severity and probability of occurrence (likelihood of occurrence).
Figure 9-5: Sample Failure Modes, Effects, and Criticality Analysis
Item/ Function Failure Failure Next Primary Failure Compen- Severity Fail
Function Modes Local Higher End Detection sation Class Rate
Effects Method Provisions
Pump Facilitate Frozen Shaft Pump Engine Engine Air cooling I
bearing shaft won’t failure failure Temp
rotation rotate
High Shaft Loss of Engine “ “ “ “ II
Friction turns cooling runs hot
slowly capacity
Loose Shaft “ “ Low “ “ “ “ III
(Wear) slips Horse
Power
Not shown are columns that may be added including frequency class, interfaces, and comments.
The FMECA and the hazard analyses provided some redundant information but more importantly some
complementary information. The HA considers human factors and systems interface problems, the
FMECA does not. The FMECA, however, is not more likely to identify hazards caused by component or
software module failure than the HA, which considers compensating and fault detection features. These are
all important safety data.
1
Stephens, Richard, A. and Talso, Warner, System safety Analysis Handbook: A Source Book for Safety Practitioners, System
Safety Society, 2nd Edition, August 1999.
9 - 14
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
for additional readings in Appendix C. The FAA’s Office of System Safety can provide instruction and
assistance in the applications of the listed methods and techniques.
9 - 15
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
7 Change Analysis Change Analysis examines the effects Any change to a system, equipment
of modifications from a starting point procedure, or operation should be
or baseline. evaluated from a system safety
9 - 16
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
11 Confined Space The purpose of this analysis Any confined areas where there may
Safety technique is to provide a systematic be a hazardous atmosphere, toxic
examination of confined space risks. fume, or gas, the lack of oxygen,
could present risks.
Confined Space Safety should be
considered at tank farms, fuel
storage areas, manholes, transformer
vaults, confined electrical spaces,
race-ways.
9 - 17
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
13 Control Rating Control Rating Code is a generally Control Rating Code can be applied
Code applicable system safety-based when here are many hazard control
procedure used to produce consistent options available.
safety effectiveness ratings of
candidate actions intended to control The technique can be applied toward
hazards found during analysis or any safe operating procedure, or
accident analysis. Its purpose is to design hazard control.
control recommendation quality,
apply accepted safety principles, and
priorities hazard controls.
14 Critical Incident This is a method of identifying errors Operational personnel can collect
2
Technique and unsafe conditions that contribute information on potential or past
to both potential and actual accidents errors or unsafe conditions. Hazard
or incidents within a given population controls are then developed to
by means of a stratified random minimize the potential error or
sample of participant-observers unsafe condition.
selected from within the population.
This technique can be universally
applied in any operational
environment.
2
Tarrents, William, E. The Measurement of Safety Performance, Garland STPM Press, 1980.
9 - 18
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
16 Critical Path Critical Path Analysis identifies This technique is applied in support
Analysis critical paths in a Program of large system safety programs,
Evaluation graphical network. when extensive system safety –
Simply it is a graph consisting of related tasks are required.
symbology and nomenclature
defining tasks and activities. He
critical path in a network is the
longest time path between the
beginning and end events.
17 Damage Modes Damage Modes and Effects Analysis Risks can be minimized and their
and Effects evaluates the damage potential as a associated hazards eliminated by
Analysis result of an accident caused by evaluating damage progression and
hazards and related failures. severity.
18 Deactivation This analysis identifies safety The deactivation process involves
Safety Analysis concerns associated with facilities placing a facility into a safe mode
that are decommissioned/closed. and stable condition that can be
monitored if needed.
20 Energy Analysis The energy analysis is a means of The technique can be applied to all
conducting a system safety evaluation systems, which contain, make use
of a system that looks at the of, or which store energy in any
“energetics” of the system. form or forms, (e.g. potential,
kinetic mechanical energy, electrical
energy, ionizing or non-ionizing
radiation, chemical, and thermal.)
9 - 19
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
21 Energy Trace and Energy Trace and Barrier Analysis is The technique can be applied to all
Barrier Analysis similar to Energy Analysis and systems, which contain, make use
Barrier Analysis. of, or which store energy in any
form or forms, (e.g. potential,
The analysis can produce a kinetic mechanical energy, electrical
consistent, detailed understanding of energy, ionizing or non-ionizing
the sources and nature of energy radiation, chemical, and thermal.)
flows that can or did produce
accidental harm.
22 Energy Trace Similar to Energy Trace and Barrier The analysis could be used when
Checklist Analysis, Energy Analysis and conducting evaluation and surveys
Barrier Analysis. for hazard identification associated
with all forms of energy.
The analysis aids in the identification
of hazards associated with energetics The use of a checklist can provide a
within a system, by use of a systematic way of collecting
specifically designed checklist. information on many similar
exposures.
23 Environmental The analysis is conducted to assess The analysis is conducted for any
Risk Analysis the risk of environmental system that uses or produces toxic
noncompliance that may result in hazardous materials that could cause
hazards and associated risks. harm to people and the environment.
24 Event and Casual Event and Casual Factor Charting The technique is effective for
Factor Charting utilizes a block diagram to depict solving complicated problems
cause and effect. because it provides a means to
organize the data, provides a
summary of what is known and
unknown about the event, and
results in a detailed sequence of
facts and activities.
25 Event Tree An Event Tree models the sequence The tool can be used to organize,
Analysis of events that results from a single characterize, and quantify potential
initiating event. accidents in a methodical manner.
26 Explosives Safety This method enables the safety Explosives Safety Analysis can be
9 - 20
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
32 Fault Isolation The method is used to determine and Determine faults in any large-scale
Methodology locate faults in large-scale ground ground based system that is
based systems. computer controlled.
9 - 21
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
34 Fire Hazards Fire Hazards Analysis is applied to Any fire risk can be evaluated.
Analysis evaluate the risks associated with fire
exposures. There are several fire-
hazard analysis techniques, i.e. load
analysis, hazard inventory, fire
spread, scenario method.
35 Flow Analysis The analysis evaluates confined or The technique is applicable to all
unconfined flow of fluids or energy, systems which transport or which
intentional or unintentional, from one control the flow of fluids or energy.
component/sub-system/ system to
another.
36 Hazard Analysis Generic and specialty techniques to Multi-use technique to identify
identify hazards. Generally, and hazards within any system, sub-
formal or informal study, evaluation, system, operation, task or
or analysis to identify hazards. procedure.
37 Hazard Mode Method of establishing and Multi-use technique
Effects Analysis comparing potential effects of
hazards with applicable design
criteria.
38 Hardware/Softwar The analysis evaluates the interface Any complex system with hardware
e Safety Analysis between hardware and software to and software.
identify hazards within the interface.
39 Health hazard The method is used to identify health The technique is applicable to all
Assessment hazards and risks associated within systems which transport, handle,
any system, sub-system, operation, transfer, use, or dispose of
task or procedure. hazardous materials of physical
agents.
The method evaluates routine,
planned, or unplanned use and
releases of hazardous materials or
physical agents.
9 - 22
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
42 Human Reliability The purpose of the Human The analysis is appropriate were
Analysis Reliability Analysis is to assess reliable human performance in
factors that may impact human necessary for the success of the
reliability in the operation of the human-machine systems.
system.
43 Interface Analysis The analysis is used to identify Interface Analysis is applicable to
hazards due to interface all systems.
incompatibilities.
All interfaces should be investigated;
The methodology entails seeking machine-software, environment-
those physical and functional human, environment-machine,
incompatibilities between adjacent, human-human, machine-machine,
interconnected, or interacting etc.
elements of a system which, if
allowed to persist under all conditions
of operation, would generate risks.
44 Job Safety This technique is used to assess the Job Safety Analysis can be applied
Analysis various ways a task may be to evaluate any job, task, human
performed so that the most efficient function, or operation.
and appropriate way to do a task is
selected.
9 - 23
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
9 - 24
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
9 - 25
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
9 - 26
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
66 Software Failure This technique identifies software Software is embedded into vital and
Modes and Effects related design deficiencies through critical systems of current as well as
Analysis analysis of process flow-charting. It future aircraft, facilities, and
also identifies areas for equipment.
verification/validation and test This methodology can be used for
evaluation. any software process; however,
application to software controlled
hardware systems is the predominate
application. It can be used to
analyze control, sequencing, timing
monitoring, and the ability to take a
system from an unsafe to a safe
condition.
67 Software Fault This technique is employed to Any software process at any level of
Tree Analysis identify the root cause(s) of a “top” development or change can be
undesired event. To assure adequate analyzed deductively. However, the
protection of safety critical functions predominate application is software
by inhibits interlocks, and/or controlled hardware systems.
hardware.
68 Software Hazard The purpose of this technique is to This practice is universally
Analysis identify, evaluate, and eliminate or appropriate to software systems.
mitigate software hazards by means
of a structured analytical approach
that is integrated into the software
development process.
69 Software Sneak Software Sneak Circuit Analysis The technique is universally
Circuit Analysis (SSCA) is designed to discover appropriate to any software
program logic that could cause program.
undesired program outputs or
inhibits, or incorrect
sequencing/timing.
70 Structural Safety This method is used to validate The approach is appropriate to
Analysis mechanical structures. Inadequate structural design; i.e., airframe.
structural assessment results in
increased risk due to potential for
latent design problems.
71 Subsystem Hazard Subsystem Hazard Analysis (SSHA) This protocol is appropriate to
Analysis identifies hazards and their effects subsystems only.
that may occur as a result of design.
72 System Hazard System Hazard Analysis purpose is Any closed loop hazard
9 - 27
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
9 - 28
FAA System Safety Handbook, Chapter 9: Analysis Techniques
December 30, 2000
9 - 29
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
Appendix A
Glossary
A-1
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Acceptable Risk The residual (final) risk remaining after application of controls, i.e. Hazard Controls / Risk
Controls, have been applied to the associated Contributory Hazards; that have been identified
and communicated to management for acceptance.
Accident An unplanned fortuitous event that results in harm, i.e. loss, fatality, injury, system loss; also
see Risk Severity. The specific type and level of harm must be defined; the worst case severity
that can be expected as the result of the specific event under study. Various contributory
hazards can result in a single accident; also see Contributory Hazard, Cause, Root Cause, and
Initiating Events.
Accident. An unplanned event that results in a harmful outcome; e.g. death, injury,
occupational illness, or major damage to or loss of property.
Death.
Injury.
Occupational illness.
Damage to or loss of equipment or property.
Damage to the environment.
Accreditation A formal declaration by the Accreditation Authority that a system is approved to operate in a
particular manner using a prescribed set of safeguards.
Act A formal decision or law passed by a legislative body.
Administrative Administrative controls to eliminate or reduce safety related risk, i.e. training, programs,
Hazard Control procedures, warnings, instruction, tasks, plans; also see Risk Control.
Anomalous Behavior which is not in accordance with the documented requirements
Behavior
The organizational structure of a system, identifying its components, their interfaces and a
Architecture concept of execution between them.
Assumed Risk The residual risk associated with a specific hazardous event or primary hazard, which has been
accepted by management.
Audit An independent examination of the life cycle processes and their products for compliance,
accuracy, completeness and traceability.
Audit Trail The creation of a chronological record of system activities (audit trail) that is sufficient to
enable the reconstruction, review, and examination of the sequence of environments and
activities surrounding or leading to an operation, procedure, or an event in a transaction from
its inception to its final results.
Authenticate (1) To verify the identity of a user, device, or other entity in a system, often as a prerequisite to
allowing access to resources in the system.
(2) To verify the integrity of data that has been stored, transmitted, or otherwise exposed to
possible unauthorized modification.
Barrier A material object or set of objects that separates, Demarcates, or services as a barricade; or
A-1
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
something immaterial that impedes or separates. Both physical and non-physical barriers are
utilized and applied in hazard control; i.e. anything used to control, prevent or impede unwanted
adverse energy flow and / or anything used to control, prevent or impede unwanted event flow.
Baseline The approved, documented configuration of a software or hardware configuration item, that
thereafter serves the basis for further development and that can be changed only through change
control procedures.
Cause Something that brings about an event; a person or thing that is the occasion of an action or
state; a reason for an action or condition.
Certification Legal recognition by the certification authority that a product, service, organization or person
complies with the applicable requirements. Such certification comprises the activity of
checking the product, service, organization or person and the formal recognition of compliance
with the applicable requirements by issue of certificate, license, approval or other document as
required by national law or procedures. In particular, certification of a product involves:
(a) the process of assuring the design of a product to ensure that it complies with a set of
standards applicable to that type of product so as to demonstrate an acceptable level of safety,
(acceptable risk);
(b) the process of assessing an individual product to ensure that it conforms to the certified type
design;
(c) the issue of any certificate required by national laws to declare that compliance or
conformity has been found with applicable standards in accordance with item (a).
Certification The organization or person responsible within the state (country) concerned with the
Authority certification of compliance with applicable requirements.
Class(es) Parameters of risk are classified in order to conduct analysis, evaluations, reviews,
presentations, etc.; i.e. generic contributory hazards, generic risks, generic events.
Code A collection of laws, standards, or criteria relating to a particular subject.
Component A combination of parts, devices, and structures, usually self-contained, which performs a
distinctive function in the operation of the overall equipment.
Configuration The requirements, design and implementation that define a particular version of a system or
system component.
Configuration The process of evaluating, approving or disapproving, and coordinating changes to
Control configuration items after formal establishment of their configuration identification.
Configuration A collection of hardware or software elements treated as a unit for the purpose of configuration
Item management.
Configuration The process of identifying and defining the configuration items in a system, controlling the
Management release and change of these items throughout the system life cycle, recording and reporting the
status of configuration items and change requests, and verifying the completeness and
correctness of configuration items.
Contributory The potential for harm. An unsafe act and / or unsafe condition which contributes to the
A -2
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Hazard accident, (see cause, root cause, contributory events, initiator; the potential for adverse energy
flow to result in an accident.) A hazard is not an accident. A failure or a malfunction can result
in an unsafe condition, and / or unsafe act. Human error can result in an unsafe act.
Contributory Hazards define the contributory events that lead to the final outcome. For
simplicity, Contributory Hazards can also include Initiating Events and Primary Hazards.
Sequential logic defining the Hazardous Event should remain consistent throughout the hazard
analysis process.
Consequence See Risk Severity.
Control See Risk Control
Criticality Reliability term. The degree of impact that a malfunction has on the operation of a system.
Critical Path Defines the sequence of events that control the amount of time needed to complete the effort
described within the PERT (Program Evaluation Review Technique) network.
Danger Danger expresses a relative exposure to a hazard. A hazard may be present, but there may be
little danger because of the precautions taken.
Damage Damage is the severity of injury, and / or the physical, and/ or functional, and /or monetary loss
that could result if hazard control is less then adequate.
Debug The process of locating and eliminating errors that have been shown, directly or by inference, to
exist in software.
Deductive A top down approach of analysis logic: “What can cause a specific event to occur?”
Analysis
Derived Essential, necessary or desired attributes not explicitly documented, but logically implied by the
Requirements documented requirements.
Development The requirements, design and implementation that define a particular version of a system or
Configuration system component.
Design Contain non-mandatory general rules, concepts, and examples of good and best practices to
Handbooks, assist a designer or operator.
Guides and
Manuals
Emulator A combination of computer program and hardware that mimic the instruction and execution of
another computer or system.
Engineering Engineering design controls to eliminate or reduce safety related risks; also Hazard Control and
Controls Risk Control.
Entity Item That which can be individually described and considered. May be an activity, process, product,
organization, system, person or any combination thereof.
Environment (a) The aggregate of operational and ambient conditions to include the external procedures,
conditions, and objects that affect the development, operation, and maintenance of a system.
Operational conditions include traffic density, communication density, workload, etc. Ambient
conditions include weather, emi, vibration, acoustics, etc.
(b) Everything external to a system which can affect or be affected by the system.
An act that through ignorance, deficiency, or accident departs from or fails to achieve what
A -3
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Error should be done. Errors can be predictable and random. Errors can also be categorized as
primary or contributory. Primary errors are those committed by personnel immediately and
directly involved with the accident. Contributory errors result from actions on the part of
personnel whose duties preceded and affected the situation during which the results of the error
became apparent. The difference between a computed, observed, or measured value or
condition and true, specified, or theoretically correct value or condition. A mistake in
engineering, requirement specification, or design, implementation, or operation which could
result in a failure, and /or contributory hazard. There are four types of Human Errors: 1)
Omission 2) Commission 3) Sequence 4) Timing
Explosion Proof The item is designed to withstand an internal explosion; designed to vent explosive bases below
ignition temperature.
Fail- A characteristic design which permit continued operation in spite of the occurrence of a discrete
Operational malfunction.
Fail-safe A characteristic of a system whereby any malfunction affecting the system safety will cause the
system to revert to a state that is known to be within acceptable risk parameters.
Fail-Soft Pertaining to a system that continues to provide partial operational capability in the event of a
certain malfunction.
Failure Reliability term. The inability of a system, subsystem, component, or part to perform its
required function within specified limits, under specified conditions for a specified duration. A
failure may result in an unsafe condition and / or act, i.e. a hazard; the termination of the ability
of a system element to perform a required function; the lack of correct performance. Failures
and hazards are not interchangeable.
Firmware The combination of a hardware device and computer instructions and data that reside as read-
only software on that device.
Formal The process of evaluating the products of a given phase using formal mathematical proofs to
Verification ensure correctness and consistency with respect to the products and standards provided as input
to that phase.
Formal Formal evaluation by top management of the status and adequacy of the quality system in
Qualification relation to quality policy and objectives.
Review
Formal The process that allows the determination of whether a configuration item complies with the
Qualification requirements allocated to it.
Hazard The potential for harm; also see Contributory Hazard, Primary Hazard. A hazard is not an
accident. Per FAA Order 8040.4 a " Condition, event, or circumstance that could lead to or
contribute to an unplanned or undesired event."
Hazard or hazardous condition. Anything, real or potential, that could make possible, or
contribute to making possible, an accident.
A -4
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Event an occurrence that creates a hazard. This logic indicates that a Hazardous Event is an
occurrence that creates the potential for harm; Initiating Event, or Root Cause, are more
appropriate terms.
The Hazardous Event (now) defines the total sequence of events from the Initiating Event to the
final outcome, the harm, the Initiating Event, Contributory Hazards, Primary Hazard, and Risk
Severity.
The Hazardous Event under study is considered open or closed depending Report Status on the
status of Hazard Control.
The Hazardous Event under study is considered open; the corrective action Report Status
evaluation and verification is in process. The status will remain open until (Open) management
has reviewed the actions taken and accepted the associated risk. All related Contributory
Hazards are to be evaluated.
The hazardous Event under study is considered closed; the corrective Report Status action
evaluation and verification is completed, and management has (Closed) reviewed the actions
taken and has accepted the associated risk.
Hazard Hazard Probability defines in quantitative or qualitative terms, the estimated probability of the
Probability specific Contributory Hazards which are defined within the Hazardous Event under study;
possible elements within a fault tree.
Note that hazard probability is not defined as the aggregate probability of occurrence of the
individual hazardous events that create a specific hazard; see Hazardous Event and Accident.
Also note that Hazard Probability is not the same as likelihood; see likelihood.
Hazard Severity. An assessment of the consequences of the worst credible accident that could
be caused by a specific hazard.
Hazard A tracking log is maintained for closeout. Risk Tracking and Risk Resolution should be
Tracking and conducted throughout the system life cycle. Risk/Hazard Controls are to be formally verified.
Resolution.
Inadvertent Unintentional operation.
Operation
Independent Confirmation by independent examination and provision of objective evidence that specified
Verification & requirements have been fulfilled, and that the particular requirements for a specific intended use
Validation are fulfilled.
(IV&V)
Inductive A bottom-up analysis approach of analysis logic: “What happens if a specific failure occurs?”
A -5
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Analysis
Incident A near miss accident with minor consequences that could have resulted in greater loss.
An unplanned event that could have resulted in an accident, or did result in minor damage, and
which indicates the existence of, though may not define, a hazard or hazardous condition.
Sometimes called a mishap.
Initiating Initiating Events; initiator; the contributory hazard; unsafe act and / or unsafe condition that
Events initiated the adverse event flow, which resulted in the hazardous event under evaluation; also
see Root Cause.
Intrinsically Designers determine which hazards could be present, the level of associated risk that could
Safe Design constitute danger, and the controls to assure acceptable risk. Nothing is perfectly safe; see safe.
Inspection A static technique that relies on visual examination of development products to detect
deviations, violations or other problems.
Latent Present and capable of becoming though not now visible or active.
Likelihood Likelihood defines in quantitative or qualitative terms, the estimated probability of the specific
Hazardous event under study. Likelihood is one element of associated risk. Fault Trees and
other models can be constructed and individual Hazard Probabilities are estimated, and
likelihood can be calculated via Boolean Logic. It should be noted that estimated likelihood
defined in conventional hazard analysis may be appropriate due to the variability, conference,
resources, and other factors.
A hazard. Note that the use of mishap is different within the FAA community than as used in
MIL-STD-882C. The latter equates mishap to an accident.
N-Version Software developed and tested to fulfill a set of requirements where multiple versions of
Software software are intentionally made independent and different. Differences can be in some or all of:
specifications, design, use of language, algorithms, data structures, etc.
Non- Deliverable part not developed as a part of the developmental process being addressed.
Developmental The developer, or some other party but provides software - deliverable software that is not
A -6
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Item (NDI) developed under the contract. Non-developmental software may also be referred to as reusable
software, government furnished software, commercially available software, or Commercial Off-
The-Shelf (COTS) software.
Non- A system based upon non-programmable hardware devices (i.e., a system not based on
Programmable programmable electronics. NOTE: Examples would include hardwired electrical or electronic
(N-P) System systems, mechanical, hydraulic, or pneumatic systems, etc.
Objective Information, which can be proved true, based on facts obtained through observation,
Evidence measurement, test or other means.
Optimum The associated risks that have been identified have been accepted provided that all identified
Safety controls are implemented and enforced.
Phase Defined segment of work. Note: a phase does not imply the use of any specific life-cycle
model, nor does it imply a period of time in the development of a product.
Practice Recommended methods, rules, and designs for voluntary compliance.
Process Set of inter-related resources and activities, which transform inputs into outputs.
Product Service Historical data generated by activities at the interface between the supplier and the customer
History and by supplier internal activities to meet the customer needs regarding the quality, reliability
and safety trends of the product or service.
Product Generic term used to describe the onus on a producer or others to make restitution for loss
Liability related to personal injury, property damage or other harm caused by a product.
Proximate The relationship between the plaintiff’s injuries and the plaintiff’s failure to exercise a legal
Cause duty, such as reasonable care.
Primary A primary hazard is one that can directly and immediately results in: loss, consequence, adverse
Hazard outcome, damage, fatality, system loss, degradation, loss of function, injury, etc. The primary
hazard is also referred to as: catastrophe, catastrophic event, critical event, marginal event, and
negligible event.
Quality A planned and systematic pattern of actions necessary to provide adequate confidence that an
Assurance item or product conforms to established requirements.
Quality Audit Systematic and independent examination to determine whether quality activities and related
results comply with planned arrangements and whether these arrangements are implemented
effectively and are suitable to achieve objectives.
Quality Systematic examination of the extent to which an entity is capable of fulfilling specified
Evaluation requirements.
Qualification Process of demonstrating whether an entity is capable of fulfilling specified requirements.
Process
Quantitative In any discussion of mishap risk management and risk assessment, the question of quantified
Assessment. acceptability parameters arises. Care should be exercised, under such conditions not to forget
the limitations of a mathematical approach. In any high-risk system, there is a strong
temptation to rely totally on statistical probability because, on the surface, it looks like a
convenient way to measure safety "who can argue with numbers"? To do so, however, requires
that the limitations and principles of this approach are well understood and that past
engineering experience is not ignored. Quantitative acceptability parameters must be well
A -7
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
defined, predictable, demonstrable, and above all, useful. They must be useful in the sense that
they can be easily related to the design and the associate decision criteria. More detail may be
found in chapter 7 on the limitations of the use of probabilities.
Many factors fundamental to system safety are not quantifiable. Design deficiencies are not
easily examined from a statistical standpoint. Additionally, the danger exists that system safety
analysts and managers will become so enamored with the statistics that simpler and more
meaningful engineering processes are ignored. Quantification of certain specific failure modes,
which depend on one of two system components, can be effective to bolster the decision to
accept or correct it.
e. System safety analysis and risk assessment does not eliminate the
need for good engineering judgment.
CONCEPT or DESCRIPTION
TERM
and / or operating units.
Hazard Probability and Severity are measurable and, when combined, give us risk.
Identified risk is that risk which has been determined through various analysis
techniques. The first task of system safety is to identify, within practical limitations, all
possible risks. This step precedes determine the significance of the risk (severity) and the
likelihood of its occurrence (hazard probability). The time and costs of analysis efforts, the
quality of the safety program, and the state of technology impact the number of risks identified.
Unidentified risk is the risk not yet identified. Some unidentified risks are subsequently
identified when a mishap occurs. Some risk is never known.
Unacceptable risk is that risk which cannot be tolerated by the managing activity. It is
a subset of identified risk that must be eliminated or controlled.
Acceptable risk is the part of identified risk that is allowed to persist without further
engineering or management action. Making this decision is a difficult yet necessary
responsibility of the managing activity. This decision is made with full knowledge that it is the
user who is exposed to this risk.
Residual risk is the risk left over after system safety efforts have been fully employed.
It is not necessarily the same as acceptable risk. Residual risk is the sum of acceptable risk and
unidentified risk. This is the total risk passed on to the user.
Eliminate
Acceptable
Unacceptable Residual
Unidentified
Control
Risk Analysis The development of qualitative and / or quantitative estimate of risk based on evaluation and
mathematical techniques.
Risk Accepting risk is a function of both risk assessment and risk management. Risk acceptance is
Acceptance. not a simple matter and the concept is difficult for some to accept. Several points must be kept
in mind.
A -9
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
On the surface, taking risks seems foolish and to be avoided. Everything we do, however,
involves risk. Defining acceptable risk is subjective and perceived risks are often as important
as actual risks. Risks imposed on us by others are generally considered to be less unacceptable
than those inherent in nature. There are dangers in every type of travel, but there are dangers in
staying home--40 percent of all fatal accidents occur there. There are dangers in eating most
food caused by pesticides, preservatives, natural fats, or just eating more than necessary. There
are breathing related dangers in industrial and urban areas. The results of air pollution leads to
the death of at least 10,000 Americans each year; inhaling natural radioactivity is believed to
kill a similar number; and many diseases are contracted by inhaling germs. 12,000 Americans
are killed each year in job related accidents, and probably 10 times that number die from job
related illness. There are dangers in exercising and dangers in not getting enough exercise.
Risk is an unavoidable part of our everyday lives.
We all accept risk, knowingly or unknowingly. In a FAA program, it is the ultimately the
responsibility of the MA to determine how much and what kind is to be accepted and what is
not. In the real word, making this decision is a trade-off process involving many inputs. As
tradeoffs are being considered and the design progresses, it may become evident that some of
the safety parameters are forcing higher program risk. From the program manager's
perspective, a relaxation of one or more of the established parameters may appear to be
advantageous when considering the broader perspective of cost and performance optimization.
The program manager has the authority and responsibility, in some circumstances, to make a
decision against the recommendation of his system safety manager. The system safety manager
must recognize such management prerogatives.
A prudent program manager must make a decision whether to fix the identified problem or
formally document acceptance of the added risk. In some cases, this requires contract or
system specification modification. When the program manager decides to accept the risk, the
decision must be coordinated with all affected organizations and then documented so that in
future years everyone will know and understand the elements of the decision and why it was
made. It also provides necessary data if the decision must be revisited.
Risk The process by which the results of risk analysis are used to make decisions.
Assessment
Risk Control The Risk associated with the hazardous event under study is adequately controlled, by the
reduction of severity and / or likelihood, via the application of engineering and/ or
administrative hazard controls. Anything that mitigates or ameliorates the risk. See system
A -10
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
safety design order of precedence in Chapter 3.
Risk Hazard By combining the probability of occurrence with hazard severity, a matrix is created where
Index. intersecting rows and columns are defined by a Risk Hazard Index (RHI). The risk hazard
index forms the basis for judging both the acceptability of a risk and the management level at
which the decision of acceptability will be made. The index may also be used to prioritize
resources to resolve risks due to hazards or to standardize hazard notification or response
actions.
3.7 Residual Risk. To make important program decisions, the PM must know what residual
risk exists in the system being acquired. When such risks are marginally acceptable or
potentially unacceptable, the PM is required to raise the presence of residual risk to higher
levels of authority such as the Service Director or Associate/Assistant Administrator for action
A -11
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
or acceptance. To present a cohesive description of the hazard to this higher level of decision
making, all analyses performed and either the contractor or the FAA must document actions
taken to control the hazard. In some contractual situations, the PM may apply additional
resources or other remedies to help the contractor satisfactorily resolve the issue. If not, the
PM can add his position to the contractor information and forward the matter to a higher
decision level. A decision matrix very similar to a Risk Hazard Index called in this example a
Risk Hazard Level index can be used to establish which decisions fall under the PM and which
should be forwarded to a higher organizational level.
Risk Severity The harm expected should the hazardous event occur, (i.e., loss, consequence, adverse outcome,
damage, fatality, system loss, degradation, loss of function, injury) considering the risk
associated with the hazardous event under evaluation.
Severity range thresholds for each severity category should be comparable when considering
personal, system, or facility losses. For example, events or conditions that could cause the loss
of an entire aircraft or facility would be categorized by MIL-STD-882 as catastrophic. Loss of
a single crewman, mechanic, or passenger would also fall in the catastrophic category. Severe
injuries, such as total loss of sight of a mechanic, and system damage of several million dollars
are not normally considered to have equal value, even though both are in the critical category.
If the RHI ranking criteria use risk as a function of severity and probability, quantitative scales
or qualitative scales based on quantitative logic should be used. If the concept that the expected
losses (or risk) associated with a hazardous event or condition may be estimated by multiplying
the expected severity of the accident by the probability of the accident, then some sort of
quantitative basis is necessary. Failure to provide a quantitative basis for the scales can cause
significant confusion and dissipation of safety resources when an arbitrary risk ranking scale is
used.
Develop the severity values using order of magnitude ranges. There are several advantages to
separating severity categories by orders of magnitude ranges: They include:
A -12
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Quantify the threshold values for the probability ranges. Quantification reduces confusion
associated with strictly qualitative definitions. Although it is impossible to quantify the ranges
in 882(C) due to its extremely broad application, developing quantified probability ranges for
specific systems is a relatively easy task to accomplish.
Safety or Safe. Freedom from those conditions that can cause death, injury, occupational
illness, or damage to or loss of equipment or property, or damage to the environment.
Note that absolute safety is not possible because complete freedom from all hazardous
conditions is not possible. Therefore, safety is a relative term that implies a level of risk that is
both perceived and accepted. Thus the emphasis in SSPs as reflected in the definitions above is
in managing risk. Chapter 3(FAA SS HB) describes the risk management process. "System" is
also a relative term. A subsystem can be viewed as a system with more narrow predetermined
boundaries than the system. System safety is not an absolute quantity either. System safety is
an optimized level of risk that is constrained by cost, time, and operational effectiveness
(performance). System safety requires that risk be evaluated and the level of risk accepted or
rejected by an authority. Finally, system safety is a discipline employed from the initial design
steps through system disposal (also known as "cradle to grave or "womb to tomb").
Safety Analysis All associated analysis methods, process, and / or techniques to systematically evaluate safety
related risks.
Safety Risk The principal reason to employ risk management and/or risk analysis is to improve decision-
Management making. Risk analysis and risk management is at the heart of many FAA regulatory decisions.
Committee For example, risk analysis was performed to determine the hazards to flight from airborne wind
(SRMC) shear. Risk management was also evident in the decision to require that all airliners be
equipped with airborne wind shear detection. Risk management requires first analyzing risk in-
turn requiring access to sufficient credible data, and then developing policies and procedures to
A -13
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
eliminate, mitigate, and/or manage them. In keeping with this process, an intra-agency team
(the SRMC) was formed to examine the FAA’s approach to risk management. The committee
was and remains open to representatives of all FAA organizations interested in risk
management.
If the RHI ranking criteria use risk as a function of severity and probability, quantitative scales
or qualitative scales based on quantitative logic should be used. If the concept that the expected
losses (or risk) associated with a hazardous event or condition may be estimated by multiplying
the expected severity of the accident by the probability of the accident, then some sort of
quantitative basis is necessary. Failure to provide a quantitative basis for the scales can cause
significant confusion and dissipation of safety resources when an arbitrary risk ranking scale is
used.
This committee inventoried existing FAA risk management processes, capabilities, and
practices. Processes included types of decisions appropriate for risk management and current
technical approaches. Capabilities included personnel skill levels, tools, and access to needed
data. Practices include details of implementation and documentation.
The SRMC has become a standing committee to serve as a resource for the FAA. It currently:
exchanges risk management information between offices and other government agencies to
avoid duplication of effort. It provides support across program lines including risk
management/analysis training assistance capability. It identifies and recommends needed
enhancements to FAA risk management/analysis capabilities and/or efficiencies.
Safety Critical All interactions, elements, components, subsystems, functions, processes, interfaces, within the
system that can affect a predetermined level of risk.
Safety Documents the results of safety analyses, including Operational Safety Assessments (OSA),
Engineering Comparative Risk Assessments (CRA), Preliminary Hazard Analyses (PHA), System Hazard
Report Analyses (SHA), Subsystem Hazard Analyses (SSHA), and Operational and Support Hazard
Analysis (O&SHA).
Security Risk Some safety risks that the FAA must manage are the result of security issues. By its nature,
the details of methodologies used to analyze and assess security hazards/risks cannot be
published in this document. The section does, however, summarize a top-level approach to
security risk management, especially as it relates to the methodologies used for safety risk
management. Since the development of safety and risk management has not always been
parallel, their terminology is sometimes different. Several security unique terms are introduced.
Safety and Security hazards are both caused by experiencing a series of events that lead to a
questionable condition. In security analyses, the term vulnerability is used to summarize the
event path (approach used to achieve negative effect) that leads to the hazard.
Single Point A single item of hardware, the failure of which would lead directly to loss of life, and / or
Failure system. Actually, a single malfunction, and / or failure, and /or error, of which would lead to
loss of life, and / or system.
A -14
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Software Computer programs, procedures, rules, and associated documentation and data pertaining to the
operation of a computer system.
Software Code A software program or routine or set of routines, which were specified, developed and tested
for a system configuration.
Structured Any software development technique that includes structured design and results in the
Programming development of structured programs.
Subprogram A separately compilable, executable component of a computer programs.
Subroutine A routine that returns control to the program of subprogram that called it.
Subsystem An element of a system that, in itself, may constitute a system.
Syntax The structural or grammatical rules that define how the symbols in a language are to be
combined to form words, phrases, expressions, and other allowable constructs.
System A composite, at any level of complexity, of personnel, procedures, materials, tools, equipment,
facilities, and software. The elements of this composite entity are used together in the intended
operational or support environment to perform a given task or achieve a specific production,
support, requirement; a set of arrangement of components so related or connected as to form a
unity or organic whole.
A standardized management and engineering discipline that integrates the consideration of man,
machine, and environment in planning, designing, testing, operating, and maintaining FAA
operations, procedures, and acquisition projects. System safety is applied throughout a
system's entire life cycle to achieve an acceptable level of risk within the constraints of
operational effectiveness, time, and cost.
System Safety The analysis of a complex system by means of methods, techniques, and / or processes, to
Analysis comprehensively evaluate safety related risks that are associated with the system under study.
System Safety An engineer qualified by appropriate credentials: training, education, registration, certification,
Engineer and / or experience to perform system safety engineering.
One should have an appropriate background and credentials directly related to system safety in
order to practice in the field, i.e., CSP, PE, training, education, and actual experience.
System Safety An engineering discipline requiring specialized professional knowledge and skills in applying
Engineering scientific and engineering principles, criteria, and techniques to identify and eliminate, or reduce
safety related risks.
System Safety A formally charted group of persons representing organizations associated with the system
A -15
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Working under study, organized to assist management in achieving the system safety objectives.
Group
System Safety A person responsible for managing the system safety program.
Manager
System Safety System safety is achieved through the implementation and careful execution of an SSP. As
Objectives stated previously, the ultimate objective of system safety is eliminated or minimize accidents
and their results. The objectives of an SSP are to ensure that:
• Safety, consistent with system purpose and program constraints, is designed into the system in
a timely, cost-effective manner.
• Hazards are identified, evaluated, and eliminated, or the associated risk reduced to a level
acceptable to the managing activity (MA) throughout the entire life cycle of a system.
• Historical safety data, including lessons learned from other systems, are considered and used.
• Minimum risk is sought in accepting and using new designs, materials, and production and
test techniques.
• Actions taken to eliminate hazards or reduce risk to a level acceptable to the MA are
documented.
• Consideration is given to safety, ease of disposal, and storage of any hazardous materials
associated with the system.
• Significant safety data are documented as "lessons learned" and are submitted to data banks,
design handbooks, or specifications.
• Hazards identified after production are minimized consistent with program restraints.
System Safety The overall goal of a system safety program is to design systems that do not contain
Order of unacceptable hazards. However, the nature of most complex systems makes it impossible or
Precedence. impractical to design them completely hazard-free. As hazard analyses are performed, hazards
will be identified that require resolution. System safety precedence defines the order to be
followed for satisfying system safety requirements and reducing the presence and impact of
risks. The alternatives for eliminating the specific hazard or controlling its associated risk must
be evaluated so that an acceptable method for risk reduction can be pursued.
A -16
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Design for Minimum Risk. The most effective safety program is one that eliminates hazards
through design. If an identified hazard cannot be eliminated, reduce the associated risk to an
acceptable level, as defined by the MA, through design selection. Defining minimum risk is not
a simple matter. It is not a cookbook process that can be numerically developed without
considerable thought. Minimum risk varies from program to program. See paragraph 3.6 for
more information.
Incorporate Safety Devices. If identified hazards cannot be eliminated or their associated risk
adequately reduced through design selection, that risk should be reduced to a level acceptable to
the MA through the use of fixed, automatic, or other protective safety design features or
devices. Provisions should be made for periodic functional checks of safety devices when
applicable.
Provide Warning Devices. When neither design nor safety devices can effectively eliminate
identified hazards or adequately reduce associated risk, devices should be used to detect the
condition and to produce an adequate warning signal to alert personnel of the hazard. Warning
signals and their application must be designed to minimize the probability of incorrect
personnel reaction to the signals and shall be standardized within like types of systems.
Develop Procedures and Training. Where it is impractical to eliminate hazards through design
selection or adequately reduce the associated risk with safety and warning devices, procedures
and training should be used. However, without a specific waiver from the MA, no warning,
caution, or other form of written advisory shall be used as the only risk reduction method for
Category I or II hazards. Procedures may include the use of personal protective equipment.
System Safety The tasks and activities of system safety that enhance effectiveness by ensuring that
Program requirements are met, in a timely, cost-effective manner throughout all phases of the system life
cycle.
System Safety A description of the planned methods to be used to implement the system safety requirements.
Program Plan
System Safety Concept Exploration
Requirements
by Acquisition • Evaluate system safety design features
Phase • Identify possible interface problems
• Highlight special safety considerations
• Describe safety tests/data needed for next phase
• Update requirements based on analysis results
• Review designs of similar systems
• Use past experience with similar system requirements
• Identify waiver requirements
• Prepare a report for milestone reviews
• Tailor subsequent phase SSPs.
A -17
FAA System Safety Handbook, Appendix A: Glossary
December 30, 2000
CONCEPT or DESCRIPTION
TERM
Demonstration/Validation
CONCEPT or DESCRIPTION
TERM
• Evaluate storage, packing, and handling requirements/plans
• Review production plans, drawings, procedures
• Review plans for disposal of hazardous materials
• Prepare documentation for major milestones
• Tailor requirements for production
• Review National Airspace Integrated Logistics Support (NAILS) considerations.
Facilities-Related Requirements
CONCEPT or DESCRIPTION
TERM
(b) Detailed instructions for the set-up and execution of a given set of test cases, and
instructions for the evaluation of results of executing the test cases.
Traceability Ability to trace the history, application or location of an entity by means of recorded
identifications.
Transient An error that occurs once, or at unpredictable intervals.
Error
Validation The process of evaluating a system (and subset), during or at the end of the development
process to determine whether it satisfies specified requirements. Conformance to requirements
is no total assurance of acceptable risk.
Verification The process of evaluating a system (and subset) to determine whether the products of a given
development phase satisfy the conditions imposed at the start of the phase.
Volatile Memory that requires a continuous supply of power to its internal circuitry to prevent the loss
Memory of stored information.
Voting A scheme in which the outputs of three of more channels of a system implementation are
compared with each other in order to determine agreement between two or more channels, and
to permit continued operation in the presence of a malfunction in one of the channels. A degree
of fault / malfunction tolerance is obtained.
Watchdog A device that monitors a prescribed operation of computer hardware and / or software and
Timer provides an indication when such operation has ceased.
Zero Energy All energy within the system has been reduced to the lowest possible energy level, at “zero
State energy level” if possible. All stored or residual energy, such as within capacitors, springs,
elevated devices, rotating flywheels, hydraulic systems, pneumatic systems, have been
dissipated.
It should be noted that it is not possible to dissipate / de-energize all energy within the system
additional controls should be implemented, i.e. lockout, repositioning, isolating, restraining,
guarding, shielding, relief, bleed off devices.
A -20
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
Appendix B
B-1
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
SEC TRACKING No: This is the number assigned CRA Title: Title as assigned by the FAA SEC
to the CRA by the FAA System Engineering Council
(SEC)
SYSTEM: This is the system being affected by the change, e.g. National Airspace System
Initial Date: Date initiated SEC date: Date first reviewed by the SEC
REFERENCES: A short list or references. If a long list is used can be continued on a separate page.
SSE INFORMATION
SSE Name/Title: Location: Telephone No.:
Name and title of person who Address and office symbol of
performed or led team SSE
SUMMARY OF HAZARD CLASSIFICATION:
(worst credible case; see List of Hazards below for individual risk assessments)
Option A (Baseline): Place the highest risk Proposed Change
assessment code for the baseline here Option(s) B-X: Place the highest risk assessment
code for the alternatives here.
DESCRIPTION OF (Option A) BASELINE AND PROPOSED CHANGE(s)
Option A: Describe the system under study here in terms of the 5 M Model discussed in chapter 2.
Describe the baseline (or no change) system and each alternative. This section can be continued in an
appendix if it does not fit into this area. Avoid too much detail, but include enough so that the
decision-maker has enough information to understand the risk associated with each alternative.
SEVERITY:
1 CATASTROPHIC – Death, system or aircraft loss, permanent total
disability
2 HAZARDOUS - Severe injury or major aircraft or system damage PROBABILITY
3 MAJOR - Minor injury or minor aircraft or system damage SEVERITY A B C D
4 MINOR – Less than minor injury or aircraft or system damage 1
5 NO SAFETY EFFECT 2
PROBABILITY: 3
B-2
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
HAZARD LIST
No. Hazard Condition RISK ASSESSMENT CODE (RAC)
List the hazard conditions here. Enter the risk Baseline Option Option Option D Option E
assessment codes for each hazard – alternative Option A B C
to the right.
1 Loss of communication between air traffic 1D 1D 1C 1C 1B
controllers and aircraft (flight essential)
B-3
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
B-4
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
Summarize your conclusions. Which option is best (and 2nd, 3rd, etc) and why. Include enough detail to
appropriately communicate with the audience.
Recommendations: Provide additional controls to further mitigate or eliminate the risks. Follow the safety order
of precedence, i.e., (1) eliminate/mitigate by design, (2) incorporate safety features, (3) provide warnings, and
(4) procedures/training. See Chapter 4 for further elaboration of the Safety Order of Precedence). Define SSE
requirements for reducing the risk of the design/option(s).
B-5
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
Baseline Option A
Severity: 1-Catastrophic Probability: E-Improbable Assessment: Medium Risk
Option B
Severity: NA Probability: NA Assessment: NA
Severity
The AIM states “Radio communications are a critical link in the ATC system. The link can be a strong bond
between pilot and controller or it can be broken with surprising speed and disastrous results”.i
B-6
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
Probability
B-7
FAA System Safety Handbook, Appendix B: Comparative Risk Assessment (CRA) Form
December 30, 2000
Severity Definitions
B-8
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
Appendix C
REFERENCES
C-1
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
GOVERNMENT REFERENCES
FAA Advisory Circular 25.1309 (Draft), System Design and Analysis, January 28, 1998
DOD 5000.2R, Mandatory Procedures for Major Defense Acquisition Programs and Major
Automated Information Systems, March 15, 1996
DOD-STD 2167A, Military Standard Defense System Software Development, February 29, 1988
MIL-STD-1629A “Procedures for Performing a Failure Mode, Effects and Criticality Analysis,”
November 1980.
MIL-STD-1472D, “Human Engineering Design Criteria for Military Systems, Equipment and
Facilities,” 14 March 1989.
29 CFR 1910.119 Process Safety Management, U.S. Government Printing Office, July 1992.
Department of the Air Force, Software Technology Support Center, Guidelines for Successful
Acquisition and Management of Software-Intensive Systems: Weapon Systems, Command and
Control Systems, Management Information Systems, Version-2, June 1996, Volumes 1 and 2
AFISC SSH 1-1, Software System Safety Handbook, September 5, 1985
Department of Defense, AF Inspections and Safety Center (now the AF Safety Agency), AFIC
SSH 1-1 “Software System Safety,” September 1985.
Department of Labor, 29 CFR 1910, “OSHA Regulations for General Industry,” July 1992.
Department of Labor, 29 CFR 1926, “OSHA Regulations for Construction Industry,” July 1992.
Department of Labor, OSHA 3133, “Process Safety Management Guidelines for Compliance,”
1992.
C-2
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
Environmental Protection Agency, 1990a, Guidance for Data Usability in Risk Assessment,
EPA/540/G-90/008, Office of Emergency and Remedial Response, Washington, DC 1990.
COMMERICIAL REFERENCES
ACGIH, “Guide for Control of Laser Hazards,” American Conference of Government Industrial
Hygienists, 1990.
American Society for Testing and Materials (ASTM), 1916 Race Street, Philadelphia, PA. 19103
ASTM STP762, “Fire Risk Assessment” American Society for Testing Materials, 1980.
IEEE STD 1228, Institute of Electrical and Electronics Engineers, Inc., Standard For Software
Safety Plans, 1994
IEEE STD 829, Institute of Electrical and Electronics Engineers, Inc., Standard for Software Test
Documentation, 1983
IEEE STD 830, Institute of Electrical and Electronics Engineers, Inc., Guide to Software
Requirements Specification, 1984
IEEE STD 1012, Institute of Electrical and Electronics Engineers, Inc., Standard for Software
Verification and Validation Plans, 1987
Joint Software System Safety Committee, "Software System Safety Handbook", December 1999
NASA NSTS 22254, “Methodology for Conduct of NSTS Hazard Analyses,” May 1987.
C-3
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
National Fire Protection Association, “Properties of Flammable Liquids, Gases and Solids”.
Nuclear Regulatory Commission NRC, “Safety/Risk Analysis Methodology”, April 12, 1993.
Joint Services Computer Resources Management Group, “Software System Safety Handbook: A
Technical and Managerial Team Approach”, Published on Compact Disc, December 1999.
INDIVIDUAL REFERENCES
Ang, A.H.S., and Tang, W.H., “Probability Concept in Engineering Planning and Design”, Vol. II
John Wiley and Sons, 1984.
Bahr, N. J., “System Safety Engineering and Risk Assessment: A Practical Approach”, Taylor
and Francis 1997.
Benner, L. “Guide 7: A Guide for Using energy Trace and Barrier Analysis with the STEP
Investigation System”, Events Analysis, Inc., Oakton, Va., 1985.
Briscoe, G.J., “Risk Management Guide”, EG&G Idaho, Inc. SSDC-11, June 1997.
Brown, M., L., “Software Systems Safety and Human Error”, Proceedings: COMPASS 1988
Brown, M., L., “What is Software Safety and Who’s Fault Is It Anyway?” Proceedings:
COMPASS 1987
Brown, M., L., “Applications of Commercially Developed Software in Safety Critical Systems”,
Proceedings of Parari ’99, November 1999
Bozarth, J. D., Software Safety Requirement Derivation and Verification, Hazard Prevention, Q1,
1998
Card, D.N. and Schultz, D.J., “Implementing a Software Safety Program”, Proceedings:
COMPASS 1987
Clark, R., Benner, L. and White, L. M., “Risk Assessment Techniques Manual,” Transportation
Safety Institute, March 1987, Oklahoma City, OK.
C-4
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
Clemens, P.L. “A Compendium of Hazard Identification and Evaluation Techniques for System
Safety Application,” Hazard Prevention, March/April, 1982.
Cooper, J.A., “Fuzzy-Algebra Uncertainty Analysis,” Journal of Intelligent and Fuzzy Systems,
Vol. 2 No. 4 1994.
Connolly, B., “Software Safety Goal Verification Using Fault Tree Techniques: A Critically Ill
Patient Monitor Example”, Proceedings: COMPASS 1989
Dunn, R., Ullman, R., “Quality Assurance For Computer Software”, McGraw Hill, 1982
Forrest, M., and McGoldrick, Brendan, “Realistic Attributes of Various Software Safety
Methodologies”, Proceedings: 9 Th International System Safety Society, 1989
Hammer, W., R., “Identifying Hazards in Weapon Systems – The Checklist Approach”,
Proceedings: Parari ’97, Canberra, Australia
Hammer, Willie, “Occupational Safety Management and Engineering”, 2 Ed., Prentice-Hall, Inc,
Englewood Cliffs, NJ, 1981.
Heinrich, H.W., Petersen, D., Roos, N., “Industrial Accident Prevention: A Safety Management
Approach”, McGraw-Hill, 5 Th Ed., 1980.
Johnson, W.G., “MORT –The Management Oversight and Risk Tree,” SAN 821-2, U.S. Atomic
Energy Commission, 12 February 1973.
Kjos, K., “Development of an Expert System for System Safety Analysis”, Proceedings: 8 Th
International System Safety Conference, Volume II.
Klir, G.J., Yuan, B., “Fuzzy Sets and Fuzzy logic: Theory and Applications”, Prentice Hall P T R,
1995.
Lawrence, J.D., “Survey of Industry Methods for Producing Highly Reliable Software”,
NUREG/CR-6278, Lawrence Livermore National Laboratory, November 1994.
Leveson, N., G, “SAFEWARE; System Safety and Computers, A Guide to Preventing Accidents
and Losses Caused By Technology”, Addison Wesley, 1995
C-5
FAA System Safety Handbook, Appendix C: Related Readings in Aviation System Safety
December 30, 2000
Leveson, N., G., “Software Safety: Why, What, and How, Computing Surveys”, Vol. 18, No. 2,
June 1986.
Littlewood, B. and Strigini, L., “The Risks of Software”, Scientific American, November 1992.
Mills, H., D., “Engineering Discipline for Software Procurement”, Proceedings: COMPASS
1987.
Moriarty, Brian and Roland, Harold, E., “System Safety Engineering and Management”, Second
Edition, John Wiley & Sons, 1990.
Raheja, Dev, G., “Assurance Technologies: Principles and Practices”, McGraw-Hill, Inc., 1991.
Rodger, W.P. “Introduction to System Safety Engineering”, John Wiley and Sons.
Saaty, T.L., “The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation”, 2
Nd., RWS Publications, 1996.
Stephenson, Joe, “System Safety 2000 A Practical Guide for Planning, Managing, and
Conducting System Safety Programs”, Van Nostrand Reinhold, 1991.
Tarrants, William, E. “The Measurement of Safety Performance”, Garland STPM Press, 1980.
OTHER REFERENCES
UK Ministry of Defense. Interim DEF STAN 00-54: “Requirements for Safety Related Electronic
Hardware in Defense Equipment”, April 1999.
UK Ministry of Defense. Defense Standard 00-55: “Requirements for Safety Related Software in
Defense Equipment”, Issue 2, 1997
C-6
FAA System Safety Handbook, Appendix D
December 30, 2000
Appendix D
Structured Analysis and Formal Methods
D-1
FAA System Safety Handbook, Appendix D
December 30, 2000
Structured Analysis became popular in the 1980’s and is still used by many. The analysis consists of
interpreting the system concept (or real world) into data and control terminology, that is into data flow
diagrams. The flow of data and control from bubble to data store to bubble can be very hard to track and
the number of bubbles can get to be extremely large. One approach is to first define events from the
outside world that require the system to react, then assign a bubble to that event, bubbles that need to
interact are then connected until the system is defined. This can be rather overwhelming and so the
bubbles are usually grouped into higher level bubbles. Data Dictionaries are needed to describe the data
and command flows and a process specification is needed to capture the transaction/transformation
information. The problems have been: 1) choosing bubbles appropriately, 2) partitioning those bubbles in
a meaningful and mutually agreed upon manner, 3) the size of the documentation needed to understand
the Data Flows, 4) still strongly functional in nature and thus subject to frequent change, 5) though “data”
flow is emphasized, “data” modeling is not, so there is little understanding of just what the subject matter
of the system is about, and 6) not only is it hard for the customer to follow how the concept is mapped
into these data flows and bubbles, it has also been very hard for the designers who must shift the DFD
organization into an implementable format.
Information Modeling, using entity-relationship diagrams, is really a forerunner for OOA. The analysis
first finds objects in the problem space, describes them with attributes, adds relationships, refines them
into super and sub-types and then defines associative objects. Some normalization then generally occurs.
Information modeling is thought to fall short of true OOA in that, according to Peter Coad & Edward
Yourdon:
This handbook presents in detail the two new most promising methods of structured analysis and design:
Object-Oriented and Formal Methods (FM). OOA/OOD and FM can incorporate the best from each of
the above methods and can be used effectively in conjunction with each other. Lutz and Ampo described
their successful experience of using OOD combined with Formal Methods as follows: “ For the target
applications, object-oriented modeling offered several advantages as an initial step in developing formal
specifications. This reduced the effort in producing an initial formal specification. We also found that
the object-oriented models did not always represent the “why,” of the requirements, i.e., the underlying
intent or strategy of the software. In contrast, the formal specification often clearly revealed the intent of
the requirements.”
Object Oriented Design (OOD) is gaining increasing acceptance worldwide. These fall short of full
Formal Methods because they generally do not include logic engines or theorem provers. But they are
more widely used than Formal Methods, and a large infrastructure of tools and expertise is readily
available to support practical OOD usage.
D-2
FAA System Safety Handbook, Appendix D
December 30, 2000
OOA/OOD is the new paradigm and is viewed by many as the best solution to most problems. Some of
the advantages of modeling the real world into objects is that 1) it is thought to follow a more natural
human thinking process and 2) objects, if properly chosen, are the most stable perspective of the real
world problem space and can be more resilient to change as the functions/services and data &
commands/messages are isolated and hidden from the overall system. For example, while over the
course of the development life-cycle the number, as well as types, of functions (e.g. turn camera 1 on,
download sensor data, ignite starter, fire engine 3, etc.) May change, the basic objects (e.g. cameras,
sensors, starter, engines, operator, etc.) needed to create a system usually are constant. That is, while
there may now be three cameras instead of two, the new Camera-3 is just an instance of the basic object
‘camera’. Or while an infrared camera may now be the type needed, there is still a ‘camera’ and the
differences in power, warm-up time, and data storage may change, all that is kept isolated (hidden) from
affecting the rest of the system.
OOA incorporates the principles of abstraction, information hiding, inheritance, and a method of
organizing the problem space by using the three most “human” means of classification. These combined
principles, if properly applied, establish a more modular, bounded, stable and understandable software
system. These aspects of OOA should make a system created under this method more robust and less
susceptible to changes, properties which help create a safer software system design.
Abstraction refers to concentrating on only certain aspects of a complex problem, system, idea or
situation in order to better comprehend that portion. The perspective of the analyst focuses on similar
characteristics of the system objects that are most important to them. Then, at a later time, the analyst can
address other objects and their desired attributes or examine the details of an object and deal with each in
more depth. Data abstraction is used by OOA to create the primary organization for thinking and
specification in that the objects are first selected from a certain perspective and then each object is defined
in detail. An object is defined by the attributes it has and the functions it performs on those attributes.
An abstraction can be viewed, as per Shaw, as “a simplified description, or specification, of a system that
emphasizes some of the system’s details or properties while suppressing others. A good abstraction is
one that emphasizes details that are significant to the reader or user and suppresses details that are, at least
for the moment, immaterial or diversionary”.
Information hiding also helps manage complexity in that it allows encapsulation of requirements, which
might be subject to change. In addition, it helps to isolate the rest of the system from some object specific
design decisions. Thus, the rest of the s/w system sees only what is absolutely necessary of the inner
workings of any object.
Inheritance “ defines a relationship among classes [objects], wherein one class shares the structure or
behavior defined in one or more classes... Inheritance thus represents a hierarchy of abstractions, in which
a subclass [object] inherits from one or more superclasses [ancestor objects]. Typically, a subclass
augments or redefines the existing structure and behavior of its superclasses”.
Classification theory states that humans normally organize their thinking by: looking at an object and
comparing its attributes to those experienced before (e.g. looking at a cat, humans tend to think of its size,
color, temperament, etc. in relation to past experience with cats) distinguishing between an entire object
and its component parts (e.g., a rose bush versus its roots, flowers, leaves, thorns, stems, etc.)
classification of objects as distinct and separate groups (e.g. trees, grass, cows, cats, politicians).
In OOA, the first organization is to take the problem space and render it into objects and their attributes
(abstraction). The second step of organization is into Assembly Structures, where an object and its parts
are considered. The third form of organization of the problem space is into Classification Structures
during which the problem space is examined for generalized and specialized instances of objects
D-3
FAA System Safety Handbook, Appendix D
December 30, 2000
(inheritance). That is, if looking at a railway system the objects could be engines (provide power to pull
cars), cars (provide storage for cargo), tracks (provide pathway for trains to follow/ride on), switches
(provide direction changing), stations (places to exchange cargo), etc. Then you would look at the
Assembly Structure of cars and determine what was important about their pieces parts, their wheels, floor
construction, coupling mechanism, siding, etc. Finally, Classification Structure of cars could be into
cattle, passenger, grain, refrigerated, and volatile liquid cars.
The purpose of all this classification is to provide modularity which partitions the system into well
defined boundaries that can be individually/independently understood, designed, and revised. However,
despite “classification theory”, choosing what objects represent a system is not always that straight
forward. In addition, each analyst or designer will have their own abstraction, or view of the system
which must be resolved. OO does provide a structured approach to software system design and can be
very useful in helping to bring about a safer, more reliable system.
“Formal Methods (FM) consists of a set of techniques and tools based on mathematical modeling and
formal logic that are used to specify and verify requirements and designs for computer systems and
software.”
While Formal Methods (FM) are not widely used in US industry, FM has gained some acceptance in
Europe. A considerable learning curve must be surmounted for newcomers, which can be expensive.
Once this hurdle is surmounted successfully, some users find that it can reduce overall development life-
cycle cost by eliminating many costly defects prior to coding.
A digital system may fail as a result of either physical component failure, or design errors. The validation
of an ultra-reliable system must deal with both of these potential sources of error.
Well known techniques exist for handling physical component failure; these techniques use redundancy
and voting. The reliability assessment problem in the presence of physical faults is based upon Markov
modeling techniques and is well understood.
The design error problem is a much greater threat. Unfortunately, no scientifically justifiable defense
against this threat is currently used in practice. There are 3 basic strategies that are advocated for dealing
with the design error:
2. Design Diversity (i.e. software fault-tolerance: N-version programming, recovery blocks, etc.)
The problem with life testing is that in order to measure ultrareliability one must test for exorbitant
amounts of time. For example, to measure a 10-9 probability of failure for a 1-hour mission one must test
for more than 114,000 years.
Many advocate design diversity as a means to overcome the limitations of testing. The basic idea is to use
separate design/implementation teams to produce multiple versions from the same specification. Then,
D-4
FAA System Safety Handbook, Appendix D
December 30, 2000
non-exact threshold voters are used to mask the effect of a design error in one of the versions. The hope is
that the design flaws will manifest errors independently or nearly so.
By assuming independence one can obtain ultra-reliable-level estimates of reliability even though the
individual versions have failure rates on the order of 10-4. Unfortunately, the independence assumption
has been rejected at the 99% confidence level in several experiments for low reliability software.
Furthermore, the independence assumption cannot ever be validated for high reliability software because
of the exorbitant test times required. If one cannot assume independence then one must measure
correlations. This is infeasible as well---it requires as much testing time as life-testing the system because
the correlations must be in the ultra-reliable region in order for the system to be ultra-reliable. Therefore,
it is not possible, within feasible amounts of testing time, to establish that design diversity achieves ultra-
reliability.
Consequently, design diversity can create an illusion of ultra-reliability without actually providing it.
It is felt that formal methods currently offer the only intellectually defensible method for handling the
design fault problem. Because the often quoted 1 - 10-9 reliability is well beyond the range of
quantification, there is no choice but to develop life-critical systems in the most rigorous manner available
to us, which is the use of formal methods.
Traditional engineering disciplines rely heavily on mathematical models and calculation to make
judgments about designs. For example, aeronautical engineers make extensive use of computational fluid
dynamics (CFD) to calculate and predict how particular airframe designs will behave in flight. We use the
term formal methods to refer to the variety of mathematical modeling techniques that are applicable to
computer system (software and hardware) design. That is, formal methods is the applied mathematics
engineering and, when properly applied, can serve a role in computer system design.
Formal methods may be used to specify and model the behavior of a system and to mathematically verify
that the system design and implementation satisfy system functional and safety properties. These
specifications, models, and verifications may be done using a variety of techniques and with various
degrees of rigor. The following is an imperfect, but useful, taxonomy of the degrees of rigor in formal
methods:
Level 1 represents the use of mathematical logic or a specification language that has a formal semantics to
specify the system. This can be done at several levels of abstraction. For example, one level might
enumerate the required abstract properties of the system, while another level describes an implementation
that is algorithmic in style.
Level 2 formal methods goes beyond Level 1 by developing pencil-and-paper proofs that the more
concrete levels logically imply the more abstract-property oriented levels. This is usually done in the
manner illustrated below.
Level 3 is the most rigorous application of formal methods. Here one uses a semi-automatic theorem
prover to make sure that all of the proofs are valid. The Level 3 process of convincing a mechanical
D-5
FAA System Safety Handbook, Appendix D
December 30, 2000
prover is really a process of developing an argument for an ultimate skeptic who must be shown every
detail.
Formal methods is not an all-or-nothing approach. The application of formal methods to only the most
critical portions of a system is a pragmatic and useful strategy. Although a complete formal verification
of a large complex system is impractical at this time, a great increase in confidence in the system can be
obtained by the use of formal methods at key locations in the system.
Formal inspections and formal analysis are different. Formal Inspections should be performed within
every major step of the software development process.
Formal Inspections, while valuable within each design phase or cycle, have the most impact when applied
early in the life of a project, especially the requirements specification and definition stages of a project.
Studies have shown that the majority of all faults/failures, including those that impinge on safety, come
from missing or misunderstood requirements. Formal Inspection greatly improves the communication
within a project and enhances understanding of the system while scrubbing out many of the major
errors/defects.
For the Formal Inspections of software requirements, the inspection team should include representatives
from Systems Engineering, Operations, Software Design and Code, Software Product Assurance, Safety,
and any other system function that software will control or monitor. It is very important that software
safety be involved in the Formal Inspections.
It is also very helpful to have inspection checklists for each phase of development that reflect both generic
and project specific criteria. The requirements discussed in this section and in Robyn R. Lutz's paper
"Targeting Safety-Related Errors During Software Requirements Analysis" will greatly aid in establishing
this checklist. Also, the checklists provided in the NASA Software Formal Inspections Guidebook are
helpful.
Timing and sizing analysis for safety critical functions evaluates software requirements that relate to
execution time and memory allocation. Timing and sizing analysis focuses on program constraints.
Typical constraint requirements are maximum execution time and maximum memory usage. The safety
organization should evaluate the adequacy and feasibility of safety critical timing and sizing
requirements. These analyses also evaluate whether adequate resources have been allocated in each case,
under worst case scenarios. For example, will I/O channels be overloaded by many error messages,
preventing safety critical features from operating.
Quantifying timing/sizing resource requirements can be very difficult. Estimates can be based on the
actual parameters of similar existing systems.
Items to consider include:
D-6
FAA System Safety Handbook, Appendix D
December 30, 2000
In many cases it is difficult to predict the amount of computing resources required. Hence, making use
of past experience is important.
Assessing memory usage can be based on previous experience of software development if there is
sufficient confidence. More detailed estimates should evaluate the size of the code to be stored in the
memory, and the additional space required for storing data and scratchpad space for storing interim and
final results of computations. Memory estimates in early program phases can be inaccurate, and the
estimates should be updated and based on prototype codes and simulations before they become realistic.
Dynamic Memory Allocation can be viewed as either a practical memory run time solution or as a
nightmare for assuring proper timing and usage of critical data. Any suggestion of Dynamic Memory
Allocation, common in OOD, CH environments, should be examined very carefully; even in “non-
critical” functional modules.
Address I/O for science data collection, housekeeping and control. Evaluate resource conflicts between
science data collection and safety critical data availability. During failure events, I/O channels can be
overloaded by error messages and these important messages can be lost or overwritten. (e.g. the British
“Piper Alpha” offshore oil platform disaster). Possible solutions includes, additional modules designed to
capture, correlate and manage lower level error messages or errors can be passed up through the calling
routines until at a level which can handle the problem; thus, only passing on critical faults or
combinations of faults, that may lead to a failure.
Execution times versus CPU load and availability. Investigate time variations of CPU load, determine
circumstances of peak load and whether it is acceptable. Consider multi-tasking effects. Note that
excessive multi-tasking can result in system instability leading to “crashes”.
Analysis should address the validity of the system performance models used, together with simulation and
test data, if available.
D-7
FAA System Safety Handbook, Appendix E: System Safety Principles
December 30, 2000
Appendix E
E-1
FAA System Safety Handbook, Appendix E: System Safety Principles
December 30, 2000
E-2
FAA System Safety Handbook, Appendix E: System Safety Principles
December 30, 2000
• Safety Management
• MA responsibilities:
E-3
FAA System Safety Handbook, Appendix F
December 30, 2000
Appendix F
F-1
FAA System Safety Handbook, Appendix F
December 30, 2000
• Hazard Identification
• Risk Assessment
• Risk Control Option Analysis
• Risk Control Decisions
• Risk Control Implementation
• Supervision and Review
In an organization with a mature ORM culture, the use of these tools by all personnel will be regarded as
the natural course of events. The norm will be “Why would I even consider exposing myself and others to
the risks of this activity before I have identified the hazards involved using the best procedures or designs
available?” The following pages describe each tool using a standard format with models and examples.
PURPOSE: The Operations Analysis (OA) provides an itemized sequence of events or a flow diagram
depicting the major events of an operation. This assures that all elements of the operation are evaluated as
potential sources of risk. This analysis overcomes a major weaknesses of traditional risk management,
which tends to focus effort on one or two aspects of an operation that are intuitively identified as risky,
often to the exclusion of other aspects that may actually be riskier. The Operations Analysis also guides
the allocation of risk management resources over time as an operation unfolds event by event in a
systematic manner.
F-2
FAA System Safety Handbook, Appendix F
December 30, 2000
APPLICATION: The Operations Analysis or flow diagram is used in nearly all risk management
applications, including the most time-critical situations. It responds to the key risk management question
“What am I facing here and from where can risk arise?”
METHOD: Whenever possible, the Operations Analysis is taken directly from the planning of the
operation. It is difficult to imagine planning an operation without identifying the key events in a time
sequence. If for some reason such a list is not available, the analyst creates it using the best available
understanding of the operation. The best practice is to break down the operation into time-sequenced
segments strongly related by tasks and activities. Normally, this is well above the detail of individual
tasks. It may be appropriate to break down aspects of an operation that carry obviously higher risk into
more detail than less risky areas. The product of an OA is a compilation of the major events of an
operation in sequence, with or without time checks. An alternative to the Operations Analysis is the flow
diagram. Commonly used symbols are provided at Figure 1.1.1A. Putting the steps of the process on
index cards or sticky-back note paper allows the diagram to be rearranged without erasing and redrawing,
thus encouraging contributions.
F-3
FAA System Safety Handbook, Appendix F
December 30, 2000
FINAL REPORT
END ARRIVE AT DESTINATION
AIRCRAFT ACCEPTED
RESOURCES: The key resource for the Operations Analysis are the operational planners. Using their
operational layout will facilitate the integration of risk controls in the main operational plan and will
eliminate the expenditure of duplicate resources on this aspect of hazard identification.
COMMENTS: Look back on your own experience. How many times have you been surprised or seen
others surprised because they overlooked possible sources of problems? The OA is the key to minimizing
this source of accidents.
F-4
FAA System Safety Handbook, Appendix F
December 30, 2000
PURPOSE: The PHA provides an initial overview of the hazards present in the overall flow of the
operation. It provides a hazard assessment that is broad, but usually not deep. The key idea of the PHA is
to consider the risk inherent to every aspect of an operation. The PHA helps overcome the tendency to
focus immediately on risk in one aspect of an operation, sometimes at the expense of overlooking more
serious issues elsewhere in the operation. The PHA will often serve as the hazard identification process
when risk is low or routine. In higher risk operations, it serves to focus and prioritize follow-on hazard
analyses by displaying the full range of risk issues.
APPLICATION: The PHA is used in nearly all risk management applications except the most time-
critical. Its broad scope is an excellent guide to the identification of issues that may require more detailed
hazard identification tools.
METHOD: The PHA is usually based on the Operations Analysis or flow diagram, taking each event in
turn from it. Analysts apply their experience and intuition, use reference publications and standards of
various kinds, and consult with personnel who may have useful input. The extent of the effort is dictated
by resource and time limitations, and by the estimate of the degree of overall risk inherent in the
operation. Hazards that are detected are often listed directly on a copy of the Operations Analysis as
shown at Figure 1.1.2A. Alternatively, a more formal PHA format such as the worksheet shown at Figure
1.1.2B can be used. Operations Analysis. The completed PHA is used to identify hazards requiring more
in-depth hazard identification or it may lead directly to the remaining five steps of the ORM process, if
F-5
FAA System Safety Handbook, Appendix F
December 30, 2000
hazard levels are judged to be low. Key to the effectiveness of the PHA is assuring that all events of the
operation are covered.
Figure 1.1.2A Building the PHA directly From the Operations Analysis Flow Diagram
List the operational phases List the hazards noted for each
vertically down the page. Be sure operational phase here. Strive for
to leave plenty of space on the detail within the limits imposed by the
worksheet between each phase to time you have set aside for this tool.
allow several hazards to be noted
RESOURCES: The two key resources for the PHA are the expertise of personnel actually experienced in
the operation and the body of regulations, standards, and instructions that may be available. The PHA
can be accomplished in small groups to broaden the List the operational phases vertically down the page.
Be sure to leave plenty of space on the worksheet between each phase to allow several hazards to be noted
for each phase. List the hazards noted for each operational phase. Strive for detail within the limits
imposed by time. A copy of a PHA accomplished for an earlier similar operation would aid in the
process.
COMMENTS: The PHA is relatively easy to use and takes little time. Its significant power to impact
risk arises from the forced consideration of risk in all phases of an operation. This means that a key to
success is to link the PHA closely to the Operations Analysis.
F-6
FAA System Safety Handbook, Appendix F
December 30, 2000
F-7
FAA System Safety Handbook, Appendix F
December 30, 2000
PURPOSE: The "What If" tool is one of the most powerful hazard identification tools. As in the case of
the Scenario Process tool, it is designed to add structure to the intuitive and experiential expertise of
operational personnel. The "What If" tool is especially effective in capturing hazard data about failure
modes that may create hazards. It is somewhat more structured than the PHA. Because of its ease of use,
it is probably the single most practical and effective tool for use by operational personnel.
APPLICATION: The "What If" tool should be used in most hazard identification applications, including
many time-critical applications. A classic use of the "What If" tool is as the first tool used after the
Operations Analysis and the PHA. For example, the PHA reveals an area of hazard that needs additional
investigation. The best single tool to further investigate that area will be the “What If” tool. The user will
zoom in on the particular area of concern, add detail to the OA in this area and then use the "What If"
procedure to identify the hazards.
METHOD: Ensure that participants have a thorough knowledge of the anticipated flow of the operation.
Visualize the expected flow of events in time sequence from the beginning to the end of the operation.
Select a segment of the operation on which to focus. Visualize the selected segment with "Murphy"
injected. Make a conscious effort to visualize hazards. Ask, "what if various failures occurred or
problems arose”? Add hazards and their causes to your hazard list and assess them based on probability
and severity.
The "What-If" analysis can be expanded to further explore the hazards in an operation by developing
short scenarios that reflect the worst credible outcome from the compound effects of multiple hazards in
the operation.
RESOURCES: A key resource for the "What If" tool is the Operations Analysis. It may be desirable to
add detail to it in the area to be targeted by the "What If" analysis. However, in most cases an OA can be
used as-is, if it is available. The "What If" tool is specifically designed to be used by personnel actually
involved in an operation. Therefore, the most critical what if resource is the involvement of operators and
their first lines supervisors. Because of its effectiveness, dynamic character, and ease of application, these
personnel are generally quite willing to support the "What If" process.
COMMENTS: The "What If" tool is so effective that the Occupational Safety and Health
Administration (OSHA) has designated as it one of six tools from among which activities facing
catastrophic risk situations must choose under the mandatory hazard analysis provisions of the process
safety standard.
EXAMPLES: Following (Figure 1.1.3A) is an extract from the typical output from the "What If" tool.
F-8
FAA System Safety Handbook, Appendix F
December 30, 2000
Situation: Picture a group of 3 operational employees informally applying the round robin
procedure for the "What If" tool to a task to move a multi-ton machine from one location to
another. A part of the discussion might go as follows:
Joe: What if the machine tips over and falls breaking the electrical wires that run within the walls
behind it?
Bill: What if it strikes the welding manifolds located on the wall on the West Side? (This
illustrates “piggybacking” as Bill produces a variation of the hazard initially presented by Joe).
Mary: What if the floor fails due to the concentration of weight on the base of the lifting
device?
Joe: What if the point on the machine used to lift it is damaged by the lift?
Bill: What if there are electrical, air pressure hoses, or other attachments to the machine that are
not properly neutralized?
Mary: What if the lock out/tag out is not properly applied to energy sources servicing the
machine? And so on....
Note: The list above for example might be broken down as follows:
These related groups of hazards are then subjected to the remaining five steps of the ORM
process.
PURPOSE: The Scenario Process tool is a time-tested procedure to identify hazards by visualizing them.
It is designed to capture the intuitive and experiential expertise of personnel involved in planning or
executing an operation, in a structured manner. It is especially useful in connecting individual hazards
into situations that might actually occur. It is also used to visualize the worst credible outcome of one or
more related hazards, and is therefore an important contributor to the risk assessment process.
APPLICATION: The Scenario Process tool should be used in most hazard identification applications,
including some time-critical applications. In the time-critical mode, it is indeed one of the few practical
F-9
FAA System Safety Handbook, Appendix F
December 30, 2000
tools, in that the user can quickly form a “mental movie” of the flow of events immediately ahead and the
associated hazards.
METHOD: The user of the Scenario Process tool attempts to visualize the flow of events in an
operation. This is often described as constructing a “mental movie”. It is often effective to close the eyes,
relax and let the images flow. Usually the best procedure is to use the flow of events established in the
OA. An effective method is to visualize the flow of events twice. The first time, see the events as they are
intended to flow. The next time, inject “Murphy” at every possible turn. As hazards are visualized, they
are recorded for further action. Some good guidelines for the development of scenarios are as follows:
Limit them to 60 words or less. Don’t get tied up in grammatical excellence (in fact they don’t have to be
recorded at all). Use historical experience but avoid embarrassing anyone. Encourage imagination (this
helps identify risks that have not been previously encountered). Carry scenarios to the worst credible
event.
RESOURCES: The key resource for the Scenario Process tool is the Operations Analysis. It provides
the script for the flow of events that will be visualized. Using the tool does not require a specialist.
Operational personnel leading or actually performing the task being assessed are key resources for the
OA. Using this tool is often entertaining, dynamic and often motivates even the most junior personnel in
the organization.
COMMENTS: A special value of the Scenario Process tool is its ability to link two or more individual
hazards developed using other tools into an operation relevant scenario.
EXAMPLES. Following is an example (Figure 1.1.4A) of how the Scenario Process tool might be used in
an operational situation.
PURPOSE: The Logic Diagram is intended to provide considerable structure and detail as a primary
hazard identification procedure. Its graphic structure is an excellent means of capturing and correlating
F-10
FAA System Safety Handbook, Appendix F
December 30, 2000
the hazard data produced by the other primary tools. Because of its graphic display, it can also be an
effective hazard-briefing tool. The more structured and logical nature of the Logic Diagram adds
substantial depth to the hazard identification process to complement the other more intuitive and
experiential tools. Finally, an important purpose of the Logic Diagram is to establish the connectivity and
linkages that often exist between hazards. It does this very effectively through its tree-like structure.
APPLICATION: Because it is more structured, the Logic Diagram requires considerable time and effort
to accomplish. Following the principles of ORM, its use will be more limited than the other primary tools.
This means limiting its use to higher risk issues. By its nature it is also most effective with more
complicated operations in which several hazards may be interlinked in various ways. Because it is more
complicated than the other primary tools, it requires more practice, and may not appeal to all operational
personnel. However, in an organizational climate committed to ORM excellence, the Logic Diagram will
be a welcomed and often used addition to the hazard identification toolbox.
METHOD: There are three types of Logic Diagrams. These are the:
Positive diagram. This variation is designed to highlight the factors that must be in place if risk is to be
effectively controlled in the operation. It works from a safe outcome back to the factors that must be in
place to produce it.
Event diagram. This variation focuses on an individual operational event (often a failure or hazard
identified using the "What If" tool) and examines the possible consequences of the event. It works from an
event that may produce risk and shows what the loss outcomes of the event may be.
Negative diagram. This variation selects a loss event and then analyzes the various hazards that could
combine to produce that loss. It works from an actual or possible loss and identifies what factors could
produce it.
All of the various Logic Diagram options can be applied either to an actual operating system or one being
planned. Of course, the best time for application is in the planning stages of the operational lifecycle. All
of the Logic Diagram options begin with a top block. In the case of the positive diagram, this is a desired
outcome; in the case of the event diagram, this is an operations event or contingency possibility; in the
case of the negative diagram, it is a loss event. When working with positive diagram or negative diagram,
the user then, reasons out the factors that could produce the top event. These are
entered on the next line of blocks. With the event diagram, the user lists the possible results of the event
being analyzed. The conditions that could produce the factors on the second line are then considered and
they are entered on the third line. The goal is to be as logical as possible when constructing Logic
Diagrams, but it is more important to keep the hazard identification goal in mind than to construct a
masterpiece of logical thinking. Therefore, a Logic Diagram should be a worksheet with lots of changes
and variations marked on it. With the addition of a chalkboard or flip chart, it becomes an excellent group
tool.
Figure 1.1.5A below is a generic diagram, and it is followed by a simplified example of each of the types
of Logic Diagrams (Figures 1.1.5B, 1.1.5C, 1.1.5D).
F-11
FAA System Safety Handbook, Appendix F
December 30, 2000
EVENT
SUPPORTING SUPPORTING
SUPPORTING SUPPORTING
CAUSE CAUSE
CAUSE CAUSE
CONTAINER STAYS
ON VEHICLE
TIEDOWN PROPERLY
ETC. ETC.
ACCOMPLISHED
F-12
FAA System Safety Handbook, Appendix F
December 30, 2000
FORKLIFT PROCEDURES
VIOLATED-EXCEEDED
LIFT CAPACITY
LIFT MECHANISM
FAILS, LIFT FAILS ETC. ETC.
LOAD BOUNCES
TO THE GROUND
CONTAINER RUPTURES,
CHEMICAL AGENT
LEAKS
F-13
FAA System Safety Handbook, Appendix F
December 30, 2000
CONTAINER FALLS
OFF VEHICLE &
RUPTURES
FAILURE OF
TIEDOWN GEAR ETC. ETC.
FAILURE TO INSPECT
& TEST TIEDOWNS ETC.
IAW PROCEDURES
VARIOUS
ROOT CAUSES
RESOURCES: All of the other primary tools are key resources for the Logic Diagram, as it can
correlate hazards that they generate. If available, a safety professional may be an effective facilitator for
the Logic Diagram process.
COMMENTS: The Logic Diagram is the most comprehensive tool available among the primary
procedures. Compared to other approaches to hazard identification, it will substantially increase the
quantity and quality of hazards identified.
EXAMPLE: Figure 1.1.5E illustrates how a negative diagram could be constructed for moving a heavy
piece of equipment.
F-14
FAA System Safety Handbook, Appendix F
December 30, 2000
The Logic Diagram pulls together all sources of hazards and displays them in a graphic
format that clarifies the risk issues.
Mechanical failure Load is too heavy Improper operator The load shifts due Machine strikes an
of for the forklift technique (jerky, to lift point or overhead obstacle
the forklift bad technique) failure to secure and tilts
The machine
breaks at the point
of lift
Improper operator
technique (jerky,
bad technique)
Some changes are planned, but many others occur incrementally over time, without any conscious
direction. The Change Analysis is intended to analyze the hazard implications of either planned or
incremental changes. The Change Analysis helps to focus only on the changed aspects of the operation,
thus eliminating the need to reanalyze the total operation, just because a change has occurred in one area.
The Change Analysis is also used to detect the occurrence of change. By periodically comparing current
procedures with previous ones, unplanned changes are identified and clearly defined. Finally, Change
Analysis is an important accident investigation tool. Because many incidents/accidents are due to the
injection of change into systems, an important investigative objective is to identify these changes using the
Change Analysis procedure.
Whenever significant changes are planned in operations in which there is significant operational risk of
any kind. An example is the decision to conduct a certain type of operation at night that has heretofore
only been done in daylight.
Periodically in any important operation, to detect the occurrence of unplanned changes.
As an accident investigation tool.
As the only hazard identification tool required when an operational area has been subjected to in-depth
hazard analysis, the Change Analysis will reveal whether any elements exist in the current operations that
were not considered in the previous in-depth analysis.
METHOD: The Change Analysis is best accomplished using a format such as the sample worksheet
shown at Figure 1.1.6B. The factors in the column on the left side of this tool are intended as a
comprehensive change checklist.
F-16
FAA System Safety Handbook, Appendix F
December 30, 2000
To use the worksheet: The user starts at the top of the column and considers the current situation compared
to a previous situation and identifies any change in any of the factors.
When used in an accident investigation, the accident situation is compared to a previous baseline.
The significance of detected changes can be evaluated intuitively or they can be subjected to "What If",
Logic Diagram, or scenario, other specialized analyses.
F-17
FAA System Safety Handbook, Appendix F
December 30, 2000
RESOURCES: Experienced operational personnel are a key resource for the Change Analysis tool.
Those who have long-term involvement in an operational process must help define the “comparable
situation.” Another important resource is the documentation of process flows and task analyses. Large
numbers of such analyses have been completed in recent years in connection with quality improvement
and reengineering projects. These materials are excellent definitions of the baseline against which change
can be evaluated.
COMMENTS: In organizations with mature ORM processes, most, if not all, higher risk activities will
have been subjected to thorough ORM applications and the resulting risk controls will have been
incorporated into operational guidance. In these situations, the majority of day-to-day ORM activity will
be the application of Change Analysis to determine if the operation has any unique aspects that have not
been previously analyzed.
ALTERNATIVE NAMES: The cause and effect diagram. The fishbone tool, the Ishikawa Diagram
PURPOSE: The Cause and Effect Tool is a variation of the Logic Tree tool and is used in the same
hazard identification role as the general Logic Diagram. The particular advantage of the Cause and Effect
Tool is its origin in the quality management process and the thousands of personnel who have been
trained in the tool. Because it is widely used, thousands of personnel are familiar with it and therefore
require little training to apply it to the problem of detecting risk.
APPLICATION: The Cause and Effect Tool will be effective in organizations that have had some
success with the quality initiative. It should be used in the same manner as the Logic Diagram and can be
applied in both a positive and negative variation.
METHOD: The Cause And Effect diagram is a Logic Diagram with a significant variation. It provides
more structure than the Logic Diagram through the branches that give it one of its alternate names, the
fishbone diagram. The user can tailor the basic “bones” based upon special characteristics of the
operation being analyzed. Either a positive or negative outcome block is designated at the right side of the
diagram. Using the structure of the diagram, the user completes the diagram by adding causal factors in
either the “M” or “P” structure. Using branches off the basic entries, additional hazards can be added.
The Cause And Effect diagram should be used in a team setting whenever possible.
RESOURCES: There are many publications describing in great detail how to use cause and effect
diagrams.1
COMMENTS:
EXAMPLES: An example of Cause and Effect Tool in action is illustrated at Figure 1.1.7A.
1
K. Ishikawa, Guide to Quality Control, Quality Resources, White Plains, New York, 12th Printing 1994.
F-18
FAA System Safety Handbook, Appendix F
December 30, 2000
SITUATION: The supervisor of an aircraft maintenance operation has been receiving reports from Quality
Assurance regarding tools in aircraft after maintenance over the last six months. The supervisor has followed up
but each case has involved a different individual and his spot checks seem to indicate good compliance with tool
control procedures. He decides to use a cause and effect diagram to consider all the possible sources of the tool
control problem. The supervisor develops the cause and effect diagram with the help of two or three of his best
maintenance personnel in a group application.
NOTE: Tool control is one of the areas where 99% performance is not adequate. That would mean one in a
hundred tools are misplaced. The standard must be that among the tens (or hundreds) of thousands of individual
uses of tools over a year, not one is misplaced.
Human Methods
Materials Machinery
People Procedures
Using the positive diagram as a guide the supervisor and working group apply all possible and practical options
developed from it.
F-19
FAA System Safety Handbook, Appendix F
December 30, 2000
They can be used by nearly everyone in the organization, though some may require either training or
professional facilitation.
Each tool provides a capability not fully realized in any of the primary tools.
They use the tools of the less formal safety program to support the ORM process.
They are well supported with forms, job aids, and models.
Their effectiveness has been proven. In an organization with a mature ORM process, all personnel will be
aware of the existence of these specialty tools and capable of recognizing the need for their application.
While not everyone will be comfortable using every procedure, a number of people within the
organization will have experience applying one or another of them.
PURPOSE: The special role of the HAZOP is hazard analysis of completely new operations. In these
situations, traditional intuitive and experiential hazard identification procedures are especially weak. This
lack of experience hobbles tools such as the "What If" and Scenario Process tools, which rely heavily on
experienced operational personnel. The HAZOP deliberately maximizes structure and minimizes the need
for experience to increase its usefulness in these situations.
APPLICATION: The HAZOP should be considered when a completely new process or procedure is
going to be undertaken. The issue should be one where there is significant risk because the HAZOP does
demand significant expenditure of effort and may not be cost effective if used against low risk issues. The
HAZOP is also useful when an operator or leader senses that “something is wrong” but they can’t
identify it. The HAZOP will dig very deeply into the operation and to identify what that “something” is.
METHOD: The HAZOP is the most highly structured of the hazard identification procedures. It uses a
standard set of guide terms (Figure 1.1) which are then linked in every possible way with a tailored set of
process terms (for example “flow”). The process terms are developed directly from the actual process or
from the Operations Analysis. The two words together, for example “no” (a guideword) and “flow” (a
process term) will describe a deviation. These are then evaluated to see if a meaningful hazard is
indicated. If so, the hazard is entered in the hazard inventory for further evaluation. Because of its rigid
process, the HAZOP is especially suitable for one-person hazard identification efforts.
Figure 1.2.1A Standard HAZOP Guidewords
F-20
FAA System Safety Handbook, Appendix F
December 30, 2000
RESOURCES: There are few resources available to assist with HAZOP; none are really needed.
COMMENTS: The HAZOP is highly structured, and often time-consuming. Nevertheless, in its special
role, this tool works very effectively. OSHA selected it for inclusion in the set of six mandated procedures
of the OSHA process safety standard.
PURPOSE: The map analysis is designed to use terrain maps and other system models and schematics to
identify both things at risk and the sources of hazards. Properly applied the tool will reveal the following:
Task elements at risk
The sources of risk
The extent of the risk (proximity)
Potential barriers between hazard sources and operational assets
APPLICATION: The Mapping Tool can be used in a variety of situations. The explosive quantity-
distance criteria are a classic example of map analysis. The location of the flammable storage is plotted
and then the distance to various vulnerable locations (inhabited buildings, highways, etc.) is determined.
The same principles can be extended to any facility. We can use a diagram of a maintenance shop to note
the location of hazards such as gases, pressure vessels, flammables, etc. Key assets can also be plotted.
Then hazardous interactions are noted and the layout of the facility can be optimized in terms of risk
reduction.
METHOD: The Mapping Tool requires some creativity to realize its full potential. The starting point is
a map, facility layout, or equipment schematic. The locations of hazard sources are noted. The easiest
way to detect these sources is to locate energy sources, since all hazards involve the unwanted release of
energy. Figure 1.2.2A lists the kinds of energy to look for. Mark the locations of these sources on the map
or diagram. Then, keeping the operation in mind, locate the personnel, equipment, and facilities that the
various potentially hazardous energy sources could impact. Note these potentially hazardous links and
enter them in the hazard inventory for risk management.
F-21
FAA System Safety Handbook, Appendix F
December 30, 2000
Electrical
Kinetic (moving mass e.g. a vehicle, a machine part, a bullet)
Potential (not moving mass e.g. a heavy object suspended overhead)
Chemical (e.g. explosives, corrosive materials)
Noise and Vibration
Thermal (heat)
Radiation (Non-ionizing e.g. microwave, and ionizing e.g. nuclear radiation, x-rays)
Pressure (air, hydraulic, water)
RESOURCES: Maps can convey a great deal of information, but cannot replace the value of an on-site
assessment. Similarly, when working with an equipment schematic or a facility layout, there is no
substitute for an on-site inspection of the equipment or survey of the facility.
COMMENTS: The map analysis is valuable in itself, but it is also excellent input for many other tools
such as the Interface Analysis, Energy Trace and Barrier Analysis, and Change Analysis.
EXAMPLE: The following example (Figure 1.2.2B) illustrates the use of a facility schematic that
focuses on the energy sources there as might be accomplished in support of an Energy Trace and Barrier
Analysis.
SITUATION: A team has been assigned the task of renovating an older facility
for use as a museum for historical aviation memorabilia. They evaluate the facility layout
(schematic below). By evaluating the potential energy sources presented in this
schematic, it is possible to identify hazards that may be created by the operations to be conducted.
F-22
FAA System Safety Handbook, Appendix F
December 30, 2000
Areas with
Areas
old of
former Gas lines for
Pneumatic lines for old mail distribution
Medical Medical
PURPOSE: The Interface Analysis is intended to uncover the hazardous linkages or interfaces between
seemingly unrelated activities. For example, we plan to build a new facility. What hazards may be
created for other operations during construction and after the facility is operational? The Interface
Analysis reveals these hazards by focusing on energy exchanges. By looking at these potential energy
transfers between two different activities, we can often detect hazards that are difficult to detect in any
other way.
APPLICATION: An Interface Analysis should be conducted any time a new activity is being introduced
and there is any chance at all that unfavorable interaction could occur. A good cue to the need for an
Interface Analysis is the use of either the Change Analysis (indicating the injection of something new) or
the map analysis (with the possibility of interactions).
METHOD: The Interface Analysis is normally based on an outline such as the one illustrated at Figure
3.1. The outline provides a list of potential energy types and guides the consideration of the potential
interactions. A determination is made whether a particular type of energy is present and then whether
F-23
FAA System Safety Handbook, Appendix F
December 30, 2000
there is potential for that form of energy to adversely affect other activities. As in all aspects of hazard
identification, the creation of a good Operations Analysis is vital.
Figure 1.2.3A The Interface Analysis Worksheet
Energy Element
Kinetic (objects in motion)
Electromagnetic (microwave, radio, laser)
Radiation (radioactive, x-ray)
Chemical
Other
Personnel Element: Personnel moving from one area to another
Equipment Element: Machines and material moving from one area to another
Supply/materiel Element:
Intentional movement from one area to another
Unintentional movement from one area to another
Product Element: Movement of product from one area to another
Information Element: Flow of information from one area to another or interference (i.e.
jamming)
Bio-material Element
Infectious materials (virus, bacteria, etc.)
Wildlife
Odors
RESOURCES: Interface Analyses are best accomplished when personnel from all of the involved
activities participate, so that hazards and interfaces in both directions can be effectively and
knowledgeably addressed. A safety office representative can also be useful in advising on the types and
characteristics of energy transfers that are possible.
COMMENTS: The lessons of the past indicate that we should give serious attention to use of the
Interface Analysis. Nearly anyone who has been involved in operations for any length of time can relate
stories of overlooked interfaces that have had serious adverse consequences.
F-24
FAA System Safety Handbook, Appendix F
December 30, 2000
Energy Interface
Movement of heavy construction equipment
Movement of heavy building supplies
Movement of heavy equipment for repair
Possible hazmat storage/use at the facility
Personnel Interface
Movement of construction personnel (vehicle or pedestrian) through base area
Movement of repair facility personnel through base area
Possible movement of base personnel (vehicular or pedestrian) near or through the facility
Equipment Interface: Movement of equipment as indicated above
Supply Interface
Possible movement of hazmat through base area
Possible movement of fuels and gases
Supply flow for maintenance area through base area
Product Interface
Movement of equipment for repair by tow truck or heavy equipment transport through the base area
Information Interface
Damage to buried or overhead wires during construction or movement of equipment
Possible Electro-magnetic interference due to maintenance testing, arcing, etc.
Biomaterial Interface: None
PURPOSE: Most organizations have accumulated extensive, detailed databases that are gold mines of
risk data. The purpose of the analysis is to apply this data to the prevention of future accidents or
incidents.
APPLICATION: Every organization should complete an operation incident analysis annually. The
objective is to update the understanding of current trends and causal factors. The analysis should be
completed for each organizational component that is likely to have unique factors.
F-25
FAA System Safety Handbook, Appendix F
December 30, 2000
METHOD: The analysis can be approached in many ways. The process generally builds a database of
the factors listed below and which serves as the basis to identify the risk drivers
Typical factors to examine include the following:
RESOURCES: The analysis relies upon a relatively complete and accurate database. The FAA's system
safety office (ASY) may have the needed data. That office can also provide assistance in the analysis
process. System Safety personnel may have already completed analyses of similar activities or may be
able to suggest the most productive areas for initial analysis.
COMMENTS: The data in databases has been acquired the hard way - through the painful and costly
mistakes of hundreds of individuals. By taking full advantage of this information the analysis process can
be more realistic, efficient, and thorough and thereby preventing the same accidents (incidents?) from
occurring over and over again.
PURPOSE: Often the most knowledgeable personnel in the area of risk are those who operate the
system. They see the problems and often think about potential solutions. The purpose of the Interview
Tool is to capture the experience of these personnel in ways that are efficient and positive for them.
Properly implemented, the Interview Tool can be among the most valuable hazard identification tools.
APPLICATION: Every organization can use the Interview Tool in one form or another.
METHOD: The Interview Tool’s great strength is versatility. Figure 1.2.5A illustrates the many options
available to collect interview data. Key to all of these is to create a situation in which interviewees feel
free to honestly report what they know, without fear of any adverse consequences. This means absolute
confidentiality must be assured, by not using names in connection with data.
F-26
FAA System Safety Handbook, Appendix F
December 30, 2000
RESOURCES: It is possible to operate the interview process facility-wide with the data being supplied
to individual units. Hazard interviews can also be integrated into other interview activities. For example,
counseling sessions could include a hazard interview segment. In these ways, the expertise and resource
demands of the Interview Tool can be minimized.
COMMENTS: The key source of risk is human error. Of all the hazard identification tools, the Interview Tool is potentially
the most effective at capturing human error data.
1. Describe below incidents, near misses or close calls that you have experienced or seen since
you have been in this organization. State the location and nature (i.e. what happened and why)
of the incident. If you can’t think of an incident, then describe two hazards you have observed.
Personnel: _________________________________________________________________
Incident 1__________________________________________________________________
Incident 2__________________________________________________________________
Supervisors: _______________________________________________________________
Incident 1__________________________________________________________________
Incident 2__________________________________________________________________
Incident 1__________________________________________________________________
Incident 2__________________________________________________________________
F-27
FAA System Safety Handbook, Appendix F
December 30, 2000
PURPOSE: Inspections have two primary purposes. (1) The detection of hazards. Inspections
accomplish this through the direct observation of operations. The process is aided by the existence of
detailed standards against which operations can be compared. The OSHA standards and various national
standards organizations provide good examples. (2) To evaluate the degree of compliance with
established risk controls. When inspections are targeted at management and safety management
processes, they are usually called surveys. These surveys assess the effectiveness of management
procedures by evaluating status against some survey criteria or standard. Inspections are also important
as accountability tools and can be turned into important training opportunities
APPLICATION: Inspections and surveys are used in the risk management process in much the same
manner as in traditional safety programs. Where the traditional approach may require that all facilities are
inspected on the same frequency schedule, the ORM concept might dictate that high-risk activities be
inspected ten times or more frequently than lower risk operations, and that some of the lowest risk
operations be inspected once every five years or so. The degree of risk drives the frequency and depth of
the inspections and surveys.
METHOD: There are many methods of conducting inspections. From a risk management point of view
the key is focusing upon what will be inspected. The first step in effective inspections is the selection of
inspection criteria and the development of a checklist or protocol. This must be risk-based. Commercial
protocols are available that contain criteria validated to be connected with safety excellence.
Alternatively, excellent criteria can be developed using incident databases and the results of other hazard
identification tools such as the Operations Analysis and Logic Diagrams, etc. Some these have been
computerized to facilitate entry and processing of data. Once criteria are developed, a schedule is created
and inspections are begun. The inspection itself must be as positive an experience as possible for the
people whose activity is being inspected. Personnel performing inspections should be carefully trained,
not only in the technical processes involved, but also in human relations. During inspections, the ORM
concept encourages another departure from traditional inspection practices. This makes it possible to
evaluate the trend in organization performance by calculating the percentage of unsafe (non-standard)
versus safe (meet or exceed standard) observations. Once the observations are made the data must be
carefully entered in the overall hazard inventory database. Once in the database the data can be analyzed
as part of the overall body of data or as a mini-database composed of inspection findings only.
RESOURCES: There are many inspection criteria, checklists and related job aids available
commercially. Many have been tailored for specific types of organizations and activities. The System
Safety Office can be a valuable resource in the development of criteria and can provide technical support
in the form of interpretations, procedural guidance, and correlation of data.
COMMENTS: Inspections and surveys have long track records of success in detecting hazards and
reducing risk. However, they have been criticized as being inconsistent with modern management practice
because they are a form of “downstream” quality control. By the time a hazard is detected by an
inspection, it may already have caused loss. The ORM approach to inspections emphasizes focus on the
F-28
FAA System Safety Handbook, Appendix F
December 30, 2000
higher risks within the organization and emphasizes the use of management and safety program surveys
that detect the underlying causes of hazards, rather than the hazards themselves.
EXAMPLES: Conventional inspections normally involve seeking and recording unsafe acts or
conditions. The number of these may reflect either the number of unsafe acts or conditions occurring in
the organization or the extent of the effort extended to find hazards. Thus, conventional inspections are
not a reliable indicator of the extent of risk. To change the nature of the process, it is often only necessary
to record the total number of observations made of key behaviors, then determine the number of unsafe
behaviors. This yields a rate of “unsafeness” that is independent of the number of observations made.
ALTERNATIVE NAMES: The task analysis, job safety analysis, JHA, JSA
PURPOSE: The purpose of the Job Hazard Analysis (JHA) is to examine in detail the safety
considerations of a single job. A variation of the JHA called a task analysis focuses on a single task, i.e.,
some smaller segment of a “job.”
APPLICATION: Some organizations have established the goal of completing a JHA on every job in the
organization. If this can be accomplished cost effectively, it is worthwhile. Certainly, the higher risk jobs
in an organization warrant application of the JHA procedure. Within the risk management approach, it is
important that such a plan be accomplished by beginning with the most significant risk areas first.
The JHA is best accomplished using an outline similar to the one illustrated at Figure 1.2.7A. As shown
in the illustration, the job is broken down into its individual steps. Jobs that involve many quite different
tasks should be handled by analyzing each major task separately. The illustration considers risks both to
the workers involved, and to the system, as well as. Risk controls for both. Tools such as the Scenario
and "What If" tools can contribute to the identification of potential hazards. There are two alternative
ways to accomplish the JHA process. A safety professional can complete the process by asking questions
of the workers and supervisors involved. Alternatively, supervisors could be trained in the JHA process
and directed to analyze the jobs they supervise.
F-29
FAA System Safety Handbook, Appendix F
December 30, 2000
RESOURCES: The System Safety Office has personnel trained in detail in the JHA process who can
serve as consultants, and may have videos that walk a person through the process.
COMMENTS: The JHA is risk management. The concept of completing in-depth hazard assessments of
all jobs involving significant risk with the active participation of the personnel doing the work is an ideal
model of ORM in action.
F-30
FAA System Safety Handbook, Appendix F
December 30, 2000
PURPOSE: The Opportunity Assessment is intended to identify opportunities to expand the capabilities
of the organization and/or to significantly reduce the operational cost of risk control procedures. Either of
these possibilities means expanded capabilities.
METHOD: The Opportunity Assessment involves five key steps as outlined at Figure 1.2.10A. In Step
1, operational areas that would benefit substantially from expanded capabilities are identified and
prioritized. Additionally, areas where risk controls are consuming extensive resources or are otherwise
constraining operation capabilities are listed and prioritized. Step 2 involves the analysis of the specific
risk-related barriers that are limiting the desired expanded performance or causing the significant expense.
This is a critical step. Only by identifying the risk issues precisely can focused effort be brought to bear
to overcome them. Step 3 attacks the barriers by using the risk management process. This normally
involves reassessment of the hazards, application of improved risk controls, improved implementation of
existing controls, or a combination of these options. Step 4 is used when available risk management
procedures don’t appear to offer any breakthrough possibilities. In these cases the organization must seek
out new
ORM tools using benchmarking procedures or, if necessary, innovate new procedures. Step 5 involves the
exploitation of any breakthroughs achieved by pushing the operational limits or cost saving until a new
barrier is reached. The cycle then repeats and a process of continuous improvement begins.
F-31
FAA System Safety Handbook, Appendix F
December 30, 2000
As might be expected, these tools are complex and require significant training to use. Full proficiency
also requires experience in using them. They are best reserved for use by, loss control professionals.
Those with an engineering, scientific, or other technical background are certainly capable of using these
tools with a little read-in. Even though professionals use the tools, much of the data that must be fed into
the procedures must come from operators.
In an organization with a mature ORM culture, all personnel in the organization will be aware that higher
risk justifies more extensive hazard identification. They will feel comfortable calling for help from loss
control professionals, knowing that these individuals have the advanced tools needed to cope with the
most serious situations. These advanced tools will play a key role in the mature ORM culture in helping
the organization reach its hazard identification goal: No significant hazard undetected.
PURPOSE: The Energy Trace and Barrier Analysis (ETBA) is a procedure intended to detect hazards
by focusing in detail on the presence of energy in a system and the barriers for controlling that energy. It
is conceptually similar to the Interface Analysis in its focus on energy forms, but is considerably more
thorough and systematic.
APPLICATION: The ETBA is intended for use by loss system safety professionals and is targeted
against higher risk operations, especially those involving large amounts of energy or a wide variety of
energy types. The method is used extensively in the acquisition of new systems and other complex
systems.
Step 1 is the identification of the types of energy found in the system. It often requires considerable
expertise to detect the presence of the types of energy listed at Figure 1.3.1B.
Step 2 is the trace step. Once identified as present, the point of origin of a particular type of energy must
be determined and then the flow of that energy through the system must be traced.
In Step 3 the barriers to the unwanted release of that energy must be analyzed. For example, electrical
energy is usually moved in wires with an insulated covering.
F-32
FAA System Safety Handbook, Appendix F
December 30, 2000
In Step 4 the risk of barrier failure and the unwanted release of the energy are assessed. Finally, in Step 5,
risk control options are considered and selected.
Step 4. Determine the risk (the potential for hazardous energy to escape control and damage
something significant)
Electrical
Kinetic (moving mass e.g. a vehicle, a machine part, a bullet)
Potential (not moving mass e.g. a heavy object suspended overhead)
Chemical (e.g. explosives, corrosive materials)
Noise and Vibration
Thermal (heat)
Radiation (Non-ionizing e.g. microwave, and ionizing e.g. nuclear radiation, x-rays)
Pressure (air, Hydraulic, water)
RESOURCES: This tool requires sophisticated understanding of the technical characteristics of systems
and of the various energy types and barriers. Availability of a safety professional, especially a safety
engineer or other professional engineer is important.
COMMENTS: Most accidents involve the unwanted release of one kind of energy or another. This fact
makes the ETBA a powerful hazard identification tool. When the risk stakes are high and the system is
complex, the ETBA is a must have.
F-33
FAA System Safety Handbook, Appendix F
December 30, 2000
Scenario: The supervisor of a maintenance facility has just investigated a serious incident
involving one of his personnel who received a serious shock while using a portable power drill in
the maintenance area. The tool involved used a standard three-prong plug. Investigation revealed
that the tool and the receptacle were both functioning properly. The individual was shocked when
he was holding the tool and made contact with a piece of metal electrical conduit (it one his drill
was plugged into) that had become energized as a result of an internal fault. As a result the
current flowed through the individual to the tool and through the grounded tool to ground resulting
in the severe shock. The supervisor decides to fully assess the control of electrical energy in this
area.
Option 1. Three prong tool. Electrical energy flow that is from the source through an insulated
wire, to the tool, to a single insulated electric motor. In the event of an internal fault the flow is
from the case of the tool through the ground wire to ground through the grounded third prong
through a properly grounded receptacle.
Hazards: Receptacle not properly grounded, third prong removed, person provides lower path of
resistance, break in any of the ground paths (case, cord, plug, and receptacle). These hazards are
serious in terms of the frequency encountered in the work environment and might be expected to
be present in 10% or more cases.
Option 2. Double insulated tool. The tool is not grounded. Protection that is provided by double
insulating the complete flow of electrical energy at all points in the tool. In the event of an internal
fault, there are two layers of insulation protection between the fault and the person preventing
shorting through the user.
Hazards: If the double layers of insulation are damaged as a result of extended use, rough
handling, or repair/maintenance activity, the double insulation barrier can be compromised. In the
absence of a fully effective tool inspection and replacement
program such damage is not an unusual situation.
Option 3. Grand Fault Circuit Fault Interrupters. Either of the above types of tools is used
(double insulated is preferred). Electrical energy flows as described above in both the normal and
fault situations. However, in the event of a fault (or any other cause of a differential between the
potential of a circuit), it is detected almost instantly and the circuit is opened preventing the flow
of dangerous amounts of current. Because no dangerous amount of current can flow the individual
using the tool is in no danger of shock. Circuit interrupters are reliable at a level of 1 in 10,000 or
higher and when they do fail, most failure modes are in the fail-safe mode. Ground Fault circuit
fault interrupters are inexpensive to purchase and relatively easy to install. In this case, the best
option is very likely to be the use of the circuit interrupter in connection with either Option 1 or 2,
with 2 the preferred. This combination for all practical purposes eliminates the possibility of
electric shock and injury/death as a result of using portable power tools.
F-34
FAA System Safety Handbook, Appendix F
December 30, 2000
1.3.2 THE
FAULT TREE ANALYSIS
FORMAL NAME: The Fault Tree Analysis
PURPOSE: The Fault Tree Analysis (FTA) is a hazard identification tool based on the negative type
Logic Diagram. The FTA adds several dimensions to the basic logic tree. The most important of these
additions are the use of symbols to add information to the trees and the possibility of adding quantitative
risk data to the diagrams. With these additions, the FTA adds substantial hazard identification value to
the basic Logic Diagram previously discussed.
APPLICATION: Because of its relative complexity and detail, it is normally not cost effective to use the
FTA against risks assessed below the level of extremely high or high. The method is used extensively in
the acquisition of new systems and other complex systems where, due to the complexity and criticality of
the system, the tool is a must.
METHOD: The FTA is constructed exactly like a negative Logic Diagram except that the symbols
depicted in Figure 1.3.2A are used.
F-35
FAA System Safety Handbook, Appendix F
December 30, 2000
.
A basic event. An event, usually a malfunction, for which further causes are not normally sought.
A normal event. An event in an operational sequence that is within expected performance standards
.
.
An “AND” gate. Requires all of the below connected events to occur before the above connected event can occur
An “OR” gate. Any one of the events can independently cause the event placed above the OR gate
.
An undeveloped event. This is an event not developed because of lack of information or the event lacks significance.
Transfer symbols. These symbols transfer the user to another part of the diagram. These symbols are used to
eliminate the need to repeat identical analyses that have been completed in connection with another part
of the fault tree.
RESOURCES: The System Safety Office is the best source of information regarding Fault Tree
Analysis. Like the other advanced tools, the FTA will involve the consultation of a safety professional or
engineer trained in the use of the tool. If the probabilistic aspects are added, it will also require a database
capable of supplying the detailed data needed.
COMMENTS: The FTA is one of the few hazard identification procedures that will support
quantification when the necessary data resources are available.
EXAMPLE: A brief example of the FTA is provided at Figure 1.3.2B. It illustrates how an event may be
traced to specific causes that can be very precisely identified at the lowest levels.
F-36
FAA System Safety Handbook, Appendix F
December 30, 2000
Fire Occurs in
Storeroom
Airflow
< Critical
Valve
And
Combustibles Ignition source
stored in In storeroom
storeroom
Or
Or
Combustibles Combustibles Stock Material Radiant Thermal
Stored in Leak into Degrades to Electrical Spark Energy Raises Direct Thermal
Storeroom Storeroom Combustible State Occurs Energy Present
PURPOSE: The Failure Modes and Effects Analysis (FMEA) is designed to evaluate the impact due to
the failure of various system components. A brief example of FMEA illustrating this purpose is the
analysis of the impact of the failure of the communications component (radio, landline, computer, etc.) of
a system on the overall operation. The focus of the FMEA is on how such a failure could occur (failure
mode) and the impact of such a failure (effects).
APPLICATION: The FMEA is generally regarded as a reliability tool but most operational personnel
can use the tool effectively. The FMEA can be thought of as a more detailed “What If” analysis. It is
especially useful in contingency planning, where it is used to evaluate the impact of various possible
failures (contingencies). The FMEA can be used in place of the "What If" analysis when greater detail is
needed or it can be used to examine the impact of hazards developed using the "What If" tool in much
greater detail.
F-37
FAA System Safety Handbook, Appendix F
December 30, 2000
METHOD: The FMEA uses a worksheet similar to the one illustrated at Figure 1.3.3A. As noted on the
sample worksheet, a specific component of the system to be analyzed is identified. Several components
can be analyzed. For example, a rotating part might freeze up, explode, breakup, slow down, or even
reverse direction. Each of these failure modes may have differing impacts on connected components and
the overall system. The worksheet calls for an assessment of the probability of each identified failure
mode.
System_________________________ Date_______________
Subsystem _____________________ Analyst_____________
RESOURCES: The best source of more detailed information on the FMEA is the System Safety Office.
Situation: The manager of a major facility is concerned about the possible impact of the failure of the
landline communications system that provides the sole communications capability at the site. The
decision is made to do a Failure Modes and Effects Analysis. An extract from the resulting FMEA is
shown below.
Failure Failure Corrective
Component Function Mode Effect on System Probability Action
& Cause Higher Item
Landline Comm Cut-natural cause, Comm system Cease Probable Clear natural obstacle
Wire falling tree, etc. down Fire from around wires
F-38
FAA System Safety Handbook, Appendix F
December 30, 2000
ALTERNATIVE NAMES: The timeline tool, the sequential time event plot (STEP)2
PURPOSE: The Multi-linear Events Sequencing Tool (MES) is a specialized hazard identification
procedure designed to detect hazards arising from the time relationship of various operational activities.
The MES detects situations in which either the absolute or relative timing of events may create risk. For
example, an operational planner may have crammed too many events into a single period of time, creating
a task overload problem for the personnel involved. Alternatively, the MES may reveal that two or more
events in an operational plan conflict because a person or piece of equipment is required for both but
obviously cannot be in two places at once. The MES can be used as a hazard identification tool or as an
incident investigation tool.
APPLICATION: The MES is usually considered a loss prevention method, but the MES worksheet
simplifies the process to the point that a motivated individual can effectively use it. The MES should be
used any time that risk levels are significant and when timing and/or time relationships may be a source of
risk. It is an essential tool when the time relationships are relatively complex.
METHOD: The MES uses a worksheet similar to the one illustrated at Figure 4.1. The sample
worksheet displays the timeline of the operation across the top and the “actors” (people or things) down
the left side. The flow of events is displayed on the worksheet, showing the relationship between the
actors on a time basis. Once the operation is displayed on the worksheet, the sources of risk will be
evident as the flow is examined.
2
K. Hendrisk, and L. Benner, Investigating Accidents with Step, Marcel Dekker, New York, 1988.
F-39
FAA System Safety Handbook, Appendix F
December 30, 2000
(People or things
involved in the
process)
RESOURCES: The best sources for more detailed information on the MES is the System Safety staff.
As with the other advanced tools, using the MES will normally involve consultation with a safety
professional familiar with its application.
COMMENTS: The MES is unique in its role of examining the time-risk implications of operations.
PURPOSE: The Management Oversight and Risk Tree (MORT) uses a series of charts developed and
perfected over several years by the Department of Energy in connection with their nuclear safety
programs. Each chart identifies a potential operating or management level hazard that might be present in
an operation. The attention to detail characteristic of MORT is illustrated by the fact that the full MORT
diagram or tree contains more than 10,000 blocks. Even the simplest MORT chart contains over 300
blocks. The full application of MORT is a time-consuming and costly venture. The basic MORT chart
with about 300 blocks can be routinely used as a check on the other hazard identification tools. By
reviewing the major headings of the MORT chart, an analyst will often be reminded of a type of hazard
that was overlooked in the initial analysis. The MORT diagram is also very effective in assuring attention
to the underlying management root causes of hazards.
APPLICATION: Full application of MORT is reserved for the highest risks and most operation-critical
activities because of the time and expense required. MORT generally requires a specially trained loss
control professional to assure proper application.
METHOD: MORT is accomplished using the MORT diagrams, of which there are several levels
available. The most comprehensive, with about 10,000 blocks, fills a book. There is an intermediate
diagram with about 1500 blocks, and a basic diagram with about 300. It is possible to tailor a MORT
diagram by choosing various branches of the tree and using only those segments. The MORT is
essentially a negative tree, so the process begins by placing an undesired loss event at the top of the
F-40
FAA System Safety Handbook, Appendix F
December 30, 2000
diagram used. The user then systematically responds to the issues posed by the diagram. All aspects of
the diagram are considered and the “less than adequate” blocks are highlighted for risk control action.
RESOURCES: The best source of information on MORT is the System Safety Office.
COMMENTS: The MORT diagram is an elaborate negative Logic Diagram. The difference is primarily
that the MORT diagram is already filled out for the user, allowing a person to identify the contributory
factors for a given undesirable event. Since the MORT is very detailed, as mentioned above, a person can
identify basic causes for essentially any type of event.
EXAMPLES: The top blocks of the MORT diagram are displayed at Figure 1.3.5A.
Accidental
Losses
F-41
FAA System Safety Handbook, Appendix F
December 30, 2000
Example. The example below demonstrates the application of the matrix to the risk associated with
moving a heavy piece of machinery.
Risk to be assessed: The risk of the machine falling over and injuring personnel.
Probability assessment: The following paragraphs illustrate the thinking process that might be followed in
developing the probability segment of the risk assessment:
Use previous experience and the database, if available. “We moved a similar machine once before and
although it did not fall over, there were some close calls. This machine is not as easy to secure as that
machine and has a higher center of gravity and poses an even greater chance of falling. The base safety
office indicates that there was an accident about 18 months ago that involved a similar operation. An
individual received a broken leg in that case.”
Use the output of the hazard analysis process. “Our hazard analysis shows that there are several steps in
the machine movement process where the machine is vulnerable to falling. Furthermore, there are several
different types of contributory hazards that could cause the machine to fall. Both these factors increase
the probability of falling.”
Consider expert opinion. “My experienced manager feels that there is a real danger of the machine
falling”
Consider your own intuition and judgment. “My gut feeling is that there is a real possibility we could lose
control of this machine and topple it. The fact that we rarely move machines quite like this one increases
the probability of trouble.”
Refer to the matrix terms. “Hmmm, the decision seems to be between likely and occasional. I understand
likely to mean that the machine is likely to fall, meaning a pretty high probability. Certainly there is a real
chance it may fall, but if we are careful, there should be no problem. I am going to select Occasional as
the best option from the matrix.”
Severity assessment. The following illustrates the thinking process that might occur in selecting the
severity portion of the risk assessment matrix for the machine falling risk:
Identify likely outcomes. “If the machine falls, it will crush whatever it lands on. Such an injury will
almost certainly be severe. Because of the height of the machine, it can easily fall on a person’s head and
body with almost certain fatal results. There are also a variety of different crushing injuries, especially of
the feet, even if the machine falls only a short distance.
Identify the most likely outcomes. “Because of the weight of the machine, a severe injury is almost
certain. Because people are fairly agile and the fact that the falling machine gives a little warning that it is
falling, death is not likely.”
Consider factors other than injuries. “We identified several equipment and facility items at risk. Most of
these we have guarded, but some are still vulnerable. If the machine falls nobody can do any thing to
protect these items. It would take a couple of days at least to get us back in full production.”
F-42
FAA System Safety Handbook, Appendix F
December 30, 2000
Refer to the matrix (see Figure 2.1A). “Let’s see, any injury is likely to be severe, but a fatality is not
very probable, property damage could be expensive and could cost us a lot of production time.
Considering both factors, I think that critical is the best choice.”
Combine probability and severity in the matrix. The thinking process should be as follows:
The probability category occasional is in the middle of the matrix (refer to the matrix below). I go down
until it meets the critical category coming from the left side. The result is a high rating. I notice that it is
among the lower high ratings but it is still high.”
Probability
Frequent Likely Occasional Seldom Unlikely
A B C D E
S Catastrophic I Extremely
E Extremely
V Critical II
E
High
High High
R
I
Moderate III Medium
Medium
T Low
Y Negligible IV
Risk Levels
Limitations and concerns with the use of the matrix. As you followed the scenario above, you may have
noted that there are some problems involved in using the matrix. These include the following:
Subjectivity. There are at least two dimensions of subjectivity involved in the use of the matrix. The first
is in the interpretation of the matrix categories. Your interpretation of the term “critical” may be quite
different from mine. The second is in the interpretation of the risk. If a few weeks ago I saw a machine
much like the one to be moved fall over and crush a person to death, I might have a greater tendency to
rate both the probability and severity higher than someone who did not have such an experience. If time
and resources permit, averaging the rating of several can reduce this variation
personnel.
Inconsistency. The subjectivity described above naturally leads to some inconsistency. A risk rated very
high in one organization may only have a high rating in another. This becomes a real problem if the two
risks are competing for a limited pot of risk control resources (as they always are). There will be real
motivation to inflate risk assessments to enhance competitiveness for limited resources.
F-43
FAA System Safety Handbook, Appendix F
December 30, 2000
Avoiding risk altogether requires canceling or delaying the job, or operation, but is an option that is
rarely exercised due to operational importance. However, it may be possible to avoid specific risks: risks
associated with a night operation may be avoided by planning the operation for daytime, likewise
thunderstorms can be avoided by changing the route of flight.
Delaying a risk. It may be possible to delay a risk. If there is no time deadline or other operational benefit
to speedy accomplishment of a risky task, then it is often desirable delay the acceptance of the risk.
During the delay, the situation may change and the requirement to accept the risk may go away. During
the delay additional risk control options may become available for one reason or another (resources
become available, new technology becomes available, etc.) thereby reducing the overall risk.
Risk transference does not change probability or severity of the risk, but it may decrease the probability
or severity of the risk actually experienced by the individual or organization accomplishing the activity.
As a minimum, the risk to the original individual or organization is greatly decreased or eliminated
because the possible losses or costs are shifted to another entity.
Risk is commonly spread out by either increasing the exposure distance or by lengthening the time
between exposure events. Aircraft may be parked so that an explosion or fire in one aircraft will not
propagate to others. Risk may also be spread over a group of personnel by rotating the personnel involved
in a high-risk operation.
Compensate for a risk. We can create a redundant capability in certain special circumstances. Flight
control redundancy is an example of an engineering or design redundancy. Another example is to plan for
a back up, and then when a critical piece of equipment or other asset is damaged or destroyed we have
capabilities available to bring on line to continue the operation.
Risk can be reduced. The overall goal of risk management is to plan operations or design systems that do
not contain hazards and risks. However, the nature of most complex operations and systems makes it
impossible or impractical to design them completely risk-free. As hazard analyses are performed, hazards
will be identified that will require resolution. To be effective, risk management strategies must address the
components of risk: probability, severity, or exposure. A proven order of precedence for dealing with
risks and reducing the resulting risks is:
F-44
FAA System Safety Handbook, Appendix F
December 30, 2000
Plan or Design for Minimum Risk. From the first, plan the operation or design the system to eliminate risks.
Without hazards there is no probability, severity or exposure. If an identified risk cannot be eliminated,
reduce the associated risk to an acceptable level. Flight control components can be designed so they
cannot be incorrectly connected during maintenance operations as an example.
Incorporate Safety Devices.If identified hazards cannot be eliminated or their associated risk adequately
reduced by modifying the operation or system elements or their inputs, that risk should be reduced to an
acceptable level through the use of safety design features or devices. Safety devices can effect probability
and reduce severity: an automobile seat belt doesn’t prevent a collision but reduces the severity of
injuries.
Provide Warning Devices. When planning, system design, and safety devices cannot effectively eliminate
identified hazards or adequately reduces associated risk, warning devices should be used to detect the
condition and alert personnel of the hazard. As an example, aircraft could be retrofitted with a low
altitude ground collision warning system to reduce controlled flight into the ground risks. Warning
signals and their application should be designed to minimize the probability of the incorrect personnel
reaction to the signals and should be standardized. Flashing red lights or sirens are a common warning
device that most people understand.
Develop Procedures and Training.Where it is impractical to eliminate hazards through design selection or
adequately reduce the associated risk with safety and warning devices, procedures and training should be
used. A warning system by itself may not be effective without training or procedures required to respond
to the hazardous condition. The greater the human contribution to the functioning of the system or
involvement in the operational process, the greater the chance for variability. However, if the system is
well designed and the operation well planned, the only remaining risk reduction strategies may be
procedures and training. Emergency procedure training and disaster preparedness exercises improve
human response to hazardous situations.
In most cases it will not be possible to eliminate safety risk entirely, but it will be possible to significantly
reduce it. There are many risk reduction options available. Examples are included in the next section.
F-47
FAA System Safety Handbook, Appendix F
December 30, 2000
Developing a decision-making process and system: Risk decision-making should be scrutinized in a risk
decision system.
Selecting the best combination of risk controls: This process can be made as simple as intuitively
choosing what appears to be the best control or group of controls, or so complex they justify the use of
the most sophisticated decision-making tools available. For most risks involving moderate levels of risk
and relatively small investments in risk controls, the intuitive method is fully satisfactory. Guidelines for
intuitive decisions are:
Don’t select control options to produce the lowest level of risk, select the combination yielding the most
operational supportive level of risk. This means keeping in mind the need to take risks when those
appropriate risks are necessary for improved performance.
Be aware that some risk controls are incompatible. In some cases using risk control A will cancel the
effect of risk control B. Obviously using both A and B is wasting resources. For example, a fully
F-49
FAA System Safety Handbook, Appendix F
December 30, 2000
effective machine guard may make it completely unnecessary to use personnel protective equipment such
as goggles and face shields. Using both will waste resources and impose a burden on operators.
Be aware that some risk controls reinforce each other. For example, a strong enforcement program to
discipline violators of safety rules will be complemented by a positive incentive program to reward safe
performance. The impact of the two coordinated together will usually be stronger than the sum of their
impacts.
Evaluate full costs versus full benefits. Try to evaluate all the benefits of a risk and evaluate them against
all of the costs of the risk control package. Traditionally, this comparison has been limited to comparisons
of the incident/accident costs versus the safety function costs.
When it is supportive, choose redundant risk controls to protect against risk in-depth.
Keep in mind the objective is not risk control, it is optimum risk control.
Selecting risk controls when risks are high and risk control costs are important - cost benefit assessment.
In these cases, the stakes are high enough to justify application of more formal decision-making
processes. All of the tools existing in the management science of decision-making apply to the process of
risk decision-making. Two of these tools should be used routinely and deserve space in this publication.
The first is cost benefit assessment, a simplified variation of cost benefit analysis. Cost benefit analysis is
a science in itself, however, it can be simplified sufficiently for routine use in risk management decision-
making even at the lowest organizational levels. Some fiscal accuracy will be lost in this process of
simplification, but the result of the application will be a much better selection of risk controls than if the
procedures were not used. Budget personnel are usually trained in these procedures and can add value to
the application. The process involves the following steps:
Step 1. Measure the full, lifecycle costs of the risk controls to include all costs to all involved parties. For
example, a motorcycle helmet standard should account for the fact that each operator will need to pay for
a helmet.
Step 2. Develop the best possible estimate of the likely lifecycle benefits of the risk control package to
include any non-safety benefits expressed as a dollar estimate. For example, an ergonomics program can
be expected to produce significant productivity benefits in addition to a reduction in cumulative trauma
injuries.
Step 4. Develop the cost benefit ratio. You are seeking the best possible benefit-to-cost ratio but at least 2
to 1.
Step 5. Fine-tune the risk control package to achieve an improved “bang for the buck”. The example at
Figure 4.1A illustrates this process of fine-tuning applied to an ergonomics-training course (risk control).
F-50
FAA System Safety Handbook, Appendix F
December 30, 2000
Anyone can throw money at a problem. A manager finds the optimum level of resources producing
an optimum level of effectiveness, i.e. maximum bang for the buck. Consider an ergonomics-
training program involving training 400 supervisors from across the entire organization in a 4-hour
(3 hours training, 1-hour admin) ergonomics-training course that will cost $30,500 including
student time. Ergonomics losses have been averaging $300,000 per year and estimates are that the
risk control will reduce this loss by 10% or $30,000. On the basis of a cost benefit assessment over
the next year (ignoring any out year considerations), this risk control appears to have a one year
negative cost benefit ratio i.e. $30,000 in benefit, versus a $30,500 investment, a $500 loss.
Apparently it is not a sound investment on a one-year basis. This is particularly true when we
consider that most decision-makers will want the comfort of a 2 or 3 to 1 cost benefit ratio to insure
a positive outcome. Can this project be turned into a winner?
We can make it a winner if able to access risk information concerning ergonomics injuries/illnesses
from loss control office data, risk management concepts, and a useful tool called “Pareto’s Law”.
Pareto’s Law, as previously mentioned, essentially states that 80% of most problems can be found
in 20% of the exposure. For example, 80% of all traffic accidents might involve only 20% of the
driver population. We can use this law, guided by our injury/illness data, to turn the training
program into a solid winner. Here is what we might do.
Step 1. Let’s assume that Pareto’s Law applies to the distribution of ergonomics problems within
this organization. If so, then 80% of the ergonomics problem can be found in 20% of our
exposures. Our data can tell us which 20%. We can then target the 20% (80 students) of the
original 400 students that are accounting for 80% of our ergonomics costs ($240,000).
Step 2. Lets also assume that Pareto’s Law applies to the importance of tasks that we intend to
teach in the training course. If the three hours of training included 10 tasks, lets assume that two of
those tasks (20%) will in fact account for 80% of the benefit of the course. Again our data should
be able to indicate this. Lets also assume that by good luck, these two tasks only take the same time
to teach as the other eight. We might now decide to teach only these two tasks which will require
only 36 minutes (20% of 180 minutes). We will still retain 80% of the $240,000 target value or
$192,000.
Step 3. Since the training now only requires 36 minutes, we will modify our training procedure to
conduct the training in the workshops rather than in a classroom. This reduces our admin time from
1 hour (wash up, travel, get there well before it actually starts, and return to work) to 4 minutes.
Our total training time is now 40 minutes.
Summary. We are still targeting $192,000 of the original $300,000 annual loss but our cost factor
is now 80 employees for 40 minutes at $15/hour, with our teaching cost cut to 1/5th of the $6000
(80 students instead of 400) which is $1200. We still have our staff cost so the total cost of the
project is now $2500. We will still get the 10% reduction in the remaining $192,000 that we are
still targeting, which totals $19,200. Our cost benefit ratio is now a robust 7.68 to 1. If all goes
well with the initial training and we actually demonstrate at 20% loss reduction, we may choose to
expand the training to the next riskiest 20% of our 400 personnel which should also produce a very
positive return.
F-51
FAA System Safety Handbook, Appendix F
December 30, 2000
Selecting risk controls when risks are high and risk control costs are important - use of decision matrices.
An excellent tool for evaluating various risk control options is the decision matrix. On the vertical
dimension of the matrix we list the operation supportive characteristics we are looking for in risk controls.
Across the top of the matrix we list the various risk control options (individual options or packages of
options). Then we rank each control option on a scale of 1 (very low) to 10 (very high) in each of the
desirable characteristics. If we choose to, we can weight each desirable characteristic based on its
operational significance and calculate the weighted score (illustrated below). All things being the same,
the options with the higher scores are the stronger options. A generic illustration is provided at Figure
4.1B.
RATING
FACTOR WEIGHT* RISK CONTROL OPTIONS/PACKAGES
#1 #2 #3 #4 #5 #6
Low Cost
5 9/45 6/30 4/20 5/25 8/40 8/40
Easy to implement
4 10/40 7/28 5/20 6/24 8/32 8/32
Positive Operator
involvement 5 8/40 2/10 1/5 6/30 3/15 7/35
Consistent with
Culture 3 10/30 2/6 9/27 6/18 6/18 6/18
Easy to integrate
3 9/27 5/15 6/18 7/21 6/18 5/15
Easy to measure
2 10/20 10/20 10/20 8/16 8/16 5/10
Low risk (sure to
succeed) 3 9/27 9/27 10/30 2/6 4/12 5/15
Summary. It is not unusual for a risk control package to cost hundreds of thousands of dollars and even
millions over time. Millions of dollars and critical operations may be at risk. The expenditure of several
tens of thousands of dollars to get the decision right is sound management practice and good risk
management.
F-52
FAA System Safety Handbook, Appendix F
December 30, 2000
5.1 Introduction
Figure 5.1A summarizes a Risk Control Implementation model. It is based on accountability being an
essential element of risk management success. Organizations and individuals must be held accountable for
the risk decisions and actions that they take or the risk control motivation is minimized. The model
depicted at Figure 5.1A is the basis of positive accountability and strong risk control behavior.
communicated to the responsible individual. This step of the process is the rigorous application of the old
adage that “What is monitored (or measured) and checked gets done.”
In regard to ORM, indicators should provide information concerning the success or lack of success of
controls intended to mitigate a risk. These indicators could focus on those key areas identified during the
assessment as being critical to minimizing a serious risk area. Additionally, matrices may be developed to
generically identify operations/areas where ORM efforts are needed.
A representative set of risk measures that a maintenance shop leader could use to assess the progress of
his shop toward the goal of improving safety performance. Similar indicators could be developed in the
areas of environment, fire prevention, security, and other loss control areas.
The tool control effectiveness index. Establish key indicators of tool control program effectiveness
(percentage of tool checks completed, items found by QA, score on knowledge quiz regarding control
procedures, etc.). All that is needed is a sampling of data in one or more of these areas. If more than one
area is sampled, the scores can be weighted if desired and rolled up into a single tool control index by
averaging them. See Figure 6.1A for the example.
F-54
FAA System Safety Handbook, Appendix F
December 30, 2000
The protective clothing and equipment risk index. Shop personnel are using this index measures the
effectiveness with which required protective clothing and equipment. Making spot observations
periodically during the workday collects data. Data are recorded on a check sheet and are rolled-up
monthly. The index is the percent safe observations of the total number of observations made as
illustrated at Figure 6.1B.
The protective clothing and equipment safety index is 78 (21 divided by 27 = 78%).
In this index high scores are desirable
The emergency procedures index. This index measures the readiness of the shop to respond to various
emergencies such as fires, injuries, and hazmat releases. It is made up of a compilation of indicators as
shown at Figure 6.1C A high score is desirable.
The quality assurance score. This score measures a defined set of maintenance indicators tailored to the
particular type of aircraft serviced. Quality Assurance (QA) personnel record deviations in these target
areas as a percentage of total observations made. The specific types of deviations are noted. The score is
the percentage of positive observations with a high score being desirable. Secondary scores could be
developed for each type of deviation if desired.
The overall index. Any combination of the indicators previously mentioned, along with others as desired,
can be rolled up into an overall index for the maintenance facility as illustrated at Figure 6.1D.
F-55
FAA System Safety Handbook, Appendix F
December 30, 2000
TOTAL: 357.6
OR AVERAGE: 89.4
This index is the overall safety index for the maintenance facility. The goal is to push toward
100% or a maximum score of 400. This index would be used in our accountability procedures
to measure performance and establish the basis for rewards or corrective action.
Once the data has been collected and analyzed, the results need to be provided to the unit. With this
information the unit will be able to concentrate their efforts on those areas where improvement would
produce the greatest gain.
Summary. It is not difficult to set up useful and effective measures of operational risk, particularly once
the key risks have been identified during a risk assessment. Additionally, the workload associated with
such indicators can be minimized by using data already collected and by collecting the data as an
integrated routine aspect of operational processes.
F-56
Appendix G
______________________________________________________________________________
Distribution: A-WXYZ-2; A-FOF-0 (Ltd) Initiated by: ASY-
300
8040.4
6/26/98
6/26/98
2. DISTRIBUTION. This order is distributed to the division level in the Washington headquarters,
regions, and centers, with limited distribution to all field offices and facilities.
4. SCOPE. This order requires the application of a flexible but formalized safety risk management
process for all high-consequence decisions, except in situations deemed by the Administrator to be an
emergency. A high-consequence decision is one that either creates or could be reasonably estimated to
result in a statistical increase or decrease, as determined by the program office, in personal injuries and/or
loss of life and health, a change in property values, loss of or damage to property, costs or savings, or other
economic impacts valued at $100,000,000 or more per annum. The objective of this policy is to formalize a
common sense approach to risk management and safety risk analysis/assessment in FAA decisionmaking.
This order is not intended to interfere with regulatory processes and activities. Each program office will
interpret, establish, and execute the policy contained herein consistent with its role and responsibility. The
Safety Risk Management Committee will consist of technical personnel with risk assessment expertise and
be available for guidance across all FAA programs.
5. SAFETY RISK MANAGEMENT POLICY. The FAA shall use a formal, disciplined, and
documented decisionmaking process to address safety risks in relation to high-consequence decisions
impacting the complete product life cycle. The critical information resulting from a safety risk
management process can thereby be effectively communicated in an objective and unbiased manner to
decisionmakers, and from decisionmakers to the public. All decisionmaking authorities within the FAA
shall maintain safety risk management expertise appropriate to their operations, and shall perform and
document the safety risk management process prior to issuing the high-consequence decision. The choice
of methodologies to support risk management efforts remains the responsibility of each program office. The
decisionmaking authority shall determine the documentation format. The approach to safety risk
management is composed of the following steps:
a. Plan. A case-specific plan for risk analysis and risk assessment shall be predetermined in
adequate detail for appropriate review and agreement by the decisionmaking authority prior to commitment
of resources. The plan shall additionally describe criteria for acceptable risk.
Page 2
Par 5
6/26/98
8040.4
b. Hazard Identification. The specific safety hazard or list of hazards to be addressed by the safety
risk management plan shall be explicitly identified to prevent ambiguity in subsequent analysis and
assessment.
c. Analysis. Both elements of risk (hazard severity and likelihood of occurrence) shall be
characterized. The inability to quantify and/or lack of historical data on a particular hazard does not
exclude the hazard from this requirement. If the seriousness of a hazard can be expected to increase over
the effective life of the decision, this should be noted. Additionally, both elements should be estimated for
each hazard being analyzed, even if historical and/or quantitative data is not available.
d. Assessment. The combined impact of the risk elements in paragraph 5c shall be compared to
acceptability criteria and the results provided for decisionmaking.
e. Decision. The risk management decision shall consider the risk assessment results conducted in
accordance with paragraph 5d. Risk assessment results may be used to compare and contrast alternative
options.
(5) Distinguish clearly as to what risks would be affected by the decision and what risks
would not.
(7) Relate to current risk or the risk resulting from not adopting the proposal being
considered.
b. Principles. The principles to be applied when preparing safety risk assessments are:
(1) Each risk assessment should first analyze the two elements of risk: severity of the
hazard and likelihood of occurrence. Risk assessment is then performed by comparing the combined effect
of their characteristics to acceptable criteria as determined in the plan (paragraph 5a).
(2) A risk assessment may be qualitative and/or quantitative. To the maximum extent
practicable, these risk assessments will be quantitative.
Par 6 Page 3
(and 4)
8040.4
6/26/98
(4) Basic assumptions should be documented or, if only bounds can be estimated reliably,
the range encompassed should be described.
(a) Describe any model used in the risk assessment and make explicit the
assumptions incorporated in the model.
(d) Indicate the extent that the model and the assumptions incorporated have been
validated by or conflict with empirical data.
(6) All safety risk assessments should include or summarize the information of paragraphs
6a (3) and 6a(4) as well as 6b (4) and 6b (5). This record should be maintained by the organization
performing the assessment in accordance with Order 1350.15B, Records Organization, Transfer, and
Destruction Standards.
a. Compare the results of a risk assessment for each risk-reduction alternative considered,
including no action, in order to rank each risk assessment for decisionmaking purposes. The assessment
will consider future conditions, e.g., increased traffic volume.
b. Assess the costs and the safety risk reduction or other benefits associated with implementation
of, and compliance with, an alternative under final consideration.
9. SAFETY RISK MANAGEMENT COMMITTEE. This order establishes the Safety Risk
Management Committee. Appendix 2, Safety Risk Management Committee, contains the committee
charter. The committee shall provide a service to any FAA organization for safety risk management
planning, as outlined in appendix 2, when requested by the responsible program office. It also meets
periodically (e.g., two to four times per year) to exchange risk management ideas and information. The
committee will provide advice and counsel to the Office of System Safety, the Assistant Administrator for
System Safety, and other management officials when requested.
Jane F. Garvey
Administrator
Page 2
Par 5
8040.4
Appendix 1
APPENDIX 1. DEFINITIONS.
1. COSTS. Direct and indirect costs to the United States Government, State, local, and tribal
governments, international trade impacts, and the private sector.
6. PRODUCT LIFE CYCLE. The entire sequence from precertification activities through those
associated with removal from service.
7. MISHAP. Unplanned event, or series of events, that results in death, injury, occupational illness, or
damage to or loss of equipment or property.
8. RISK. Expression of the impact of an undesired event in terms of event severity and event likelihood.
9. RISK ASSESSMENT.
a. Process of identifying hazards and quantifying or qualifying the degree of risk they pose for
exposed individuals, populations, or resources; and/or
b. Document containing the explanation of how the assessment process is applied to individual
activities or conditions.
10. RISK CHARACTERIZATION. Identification or evaluation of the two components of risk, i.e.,
undesired event severity and likelihood of occurrence.
11. RISK MANAGEMENT. Management activity ensuring that risk is identified and eliminated or
controlled within established program risk parameters.
12. SAFETY RISK. Expression of the probability and impact of an undesired event in terms of hazard
severity and hazard likelihood.
13. SUBSTITUTION RISK. Additional risk to human health or safety, to include property risk, from an
action designed to reduce some other risk(s).
Page 1 and 2
6/26/98
8040.4
Appendix 2
1. PURPOSE. The Safety Risk Management Committee provides a communication and support team to
supplement the overall risk analysis capability and efficiency of key FAA organizations.
2. RESPONSIBILITIES. The Committee supports FAA safety risk management activities. It provides
advice and guidance, upon request from responsible program offices, to help them fulfill their authority and
responsibility to incorporate safety risk management as a decisionmaking tool. It serves as an internal
vehicle for risk management process communication, for coordination of risk analysis methods, and for use
of common practices where appropriate. This includes, but is not limited to:
a. Continuing the internal exchange of risk management information among key FAA
organizations.
b. Fostering the exchange of risk management ideas and information with other government
agencies and industry to avoid duplication of effort.
g. Assisting in the identification of suitable risk analysis tools and initiate appropriate training in
the use of these tools.
3. COMPOSITION. The Safety Risk Management Committee is composed of safety and risk
management professionals representing all Associate/Assistant Administrators and the Offices of the Chief
Counsel, Civil Rights, Government and Industry Affairs, and Public Affairs. The Assistant Administrator
for System Safety will designate an individual to chair the committee. The chairperson is responsible for
providing written notice of all meetings to committee members and, in coordination with the executive
secretary, keeping minutes of the meetings.
Page 1
8040.4
6/26/98
Appendix 2
4. ASSIGNMENTS. The Safety Risk Management Committee may form ad hoc working groups to
address specific issues when requested by the responsible program office. Composition of those working
groups will consist of member representatives from across the FAA. Working groups will be disbanded
upon completion of their task. The Office of System Safety shall provide the position of executive
secretary of the committee. The Office of System Safety shall also furnish other administrative support.
5. FUNDING. Resources for support staff and working group activities will be provided as determined
by the Assistant Administrator for System Safety. Unless otherwise stated, each member is responsible for
his/her own costs associated with committee membership.
Page 2
APPENDIX H
MIL-STD-882D
MIL-STD-882D
NOT MEASUREMENT
SENSITIVE
MIL-STD-882D
10 February 2000
SUPERSEDING
MIL-STD-882C
19 January 1993
DEPARTMENT OF DEFENSE
STANDARD PRACTICE FOR
SYSTEM SAFETY
2
MIL-STD-882D
FOREWORD
1. This standard is approved for use by all Departments and Agencies within the
Department of Defense (DoD).
2. The DoD is committed to protecting: private and public personnel from accidental
death, injury, or occupational illness; weapon systems, equipment, material, and facilities from
accidental destruction or damage; and public property while executing its mission of national
defense. Within mission requirements, the DoD will also ensure that the quality of the
environment is protected to the maximum extent practical. The DoD has implemented
environmental, safety, and health efforts to meet these objectives. Integral to these efforts is the
use of a system safety approach to manage the risk of mishaps associated with DoD operations.
A key objective of the DoD system safety approach is to include mishap risk management
consistent with mission requirements, in technology development by design for DoD systems,
subsystems, equipment, facilities, and their interfaces and operation. The DoD goal is zero
mishaps.
4. This revision applies the tenets of acquisition reform to system safety in Government
procurement. A joint Government/Industrial process team oversaw this revision. The
Government Electronic and Information Technology Association (GEIA), G-48 committee on
system safety represented industry on the process action team. System safety information (e.g.,
system safety tasks, commonly used approaches, etc.) associated with previous versions of this
standard are in the Defense Acquisition Deskbook (see 6.8). This standard practice is no longer
the source for any safety-related data item descriptions (DIDs).
ii
MIL-STD-882D
CONTENTS
PARAGRAPH PAGE
FOREWORD ..................................................................................................................ii
1. SCOPE............................................................................................................................ 1
1.1 Scope................................................................................................................... 1
2. APPLICABLE DOCUMENTS........................................................................................ 1
3. DEFINITIONS................................................................................................................ 1
3.1 Acronyms used in this standard ........................................................................... 1
3.2 Definitions........................................................................................................... 1
3.2.1 Acquisition program ............................................................................................ 1
3.2.2 Developer ............................................................................................................ 1
3.2.3 Hazard ................................................................................................................. 1
3.2.4 Hazardous material .............................................................................................. 2
3.2.5 Life cycle............................................................................................................. 2
3.2.6 Mishap................................................................................................................. 2
3.2.7 Mishap risk.......................................................................................................... 2
3.2.8 Program manager................................................................................................. 2
3.2.9 Residual mishap risk............................................................................................ 2
3.2.10 Safety .................................................................................................................. 2
3.2.11 Subsystem ........................................................................................................... 2
3.2.12 System................................................................................................................. 2
3.2.13 System safety....................................................................................................... 2
3.2.14 System safety engineering.................................................................................... 2
6. NOTES ........................................................................................................................... 5
6.1 Intended use......................................................................................................... 5
6.2 Data requirements................................................................................................ 5
6.3 Subject term (key words) listing........................................................................... 6
iii
MIL-STD-882D
APPENDIXES
A Guidance for implementation of system safety efforts.......................................... 8
TABLES
TABLE PAGE
A-I. Suggested mishap severity categories................................................................. 18
A-II. Suggested mishap probability levels................................................................... 19
A-III. Example mishap risk assessment values............................................................. 20
A-IV. Example mishap risk categories and mishap risk acceptance levels .................... 20
iv
MIL-STD-882D
1. SCOPE
1.1 Scope. This document outlines a standard practice for conducting system safety.
The system safety practice as defined herein conforms to the acquisition procedures in
DoD Regulation 5000.2-R and provides a consistent means of evaluating identified risks.
Mishap risk must be identified, evaluated, and mitigated to a level acceptable (as defined by the
system user or customer) to the appropriate authority and compliant with federal (and state where
applicable) laws and regulations, Executive Orders, treaties, and agreements. Program trade
studies associated with mitigating mishap risk must consider total life cycle cost in any decision.
When requiring MIL-STD-882 in a solicitation or contract and no specific paragraphs of this
standard are identified, then apply only those requirements presented in section 4.
2. APPLICABLE DOCUMENTS
Sections 3, 4, and 5 of this standard contain no applicable documents. This section does not
include documents cited in other sections of this standard or recommended for additional
information or as examples.
3. DEFINITIONS
3.1 Acronyms used in this standard. The acronyms used in this standard are defined as
follows:
3.2 Definitions. Within this document, the following definitions apply (see 6.4):
3.2.3 Hazard. Any real or potential condition that can cause injury, illness, or death to
personnel; damage to or loss of a system, equipment or property; or damage to the environment.
1
MIL-STD-882D
3.2.4 Hazardous material. Any substance that, due to its chemical, physical, or
biological nature, causes safety, public health, or environmental concerns that would require an
elevated level of effort to manage.
3.2.5 Life cycle. All phases of the system's life including design, research, development,
test and evaluation, production, deployment (inventory), operations and support, and disposal.
3.2.7 Mishap risk. An expression of the impact and possibility of a mishap in terms of
potential mishap severity and probability of occurrence.
3.2.8 Program Manager (PM). A government official who is responsible for managing
an acquisition program. Also, a general term of reference to those organizations directed by
individual managers, exercising authority over the planning, direction, and control of tasks and
associated functions essential for support of designated systems. This term will normally be
used in lieu of any other titles, e.g.; system support manager, weapon program manager, system
manager, and project manager.
3.2.9 Residual mishap risk. The remaining mishap risk that exists after all mitigation
techniques have been implemented or exhausted, in accordance with the system safety design
order of precedence (see 4.4).
3.2.10 Safety. Freedom from those conditions that can cause death, injury, occupational
illness, damage to or loss of equipment or property, or damage to the environment.
3.2.12 System. An integrated composite of people, products, and processes that provide
a capability to satisfy a stated need or objective.
2
MIL-STD-882D
4. GENERAL REQUIREMENTS
This section defines the system safety requirements to perform throughout the life cycle for any
system, new development, upgrade, modification, resolution of deficiencies, or technology
development. When properly applied, these requirements should ensure the identification and
understanding of all known hazards and their associated risks; and mishap risk eliminated or
reduced to acceptable levels. The objective of system safety is to achieve acceptable mishap risk
through a systematic approach of hazard analysis, risk assessment, and risk management. This
document delineates the minimum mandatory requirements for an acceptable system safety
program for any DoD system. When MIL-STD-882 is required in a solicitation or contract, but
no specific references are included, then only the requirements in this section are applicable.
System safety requirements consist of the following:
4.1 Documentation of the system safety approach. Document the developer's and
program manager's approved system safety engineering approach. This documentation shall:
b. Include information on system safety integration into the overall program structure.
c. Define how hazards and residual mishap risk are communicated to and accepted by the
appropriate risk acceptance authority (see 4.7) and how hazards and residual mishap risk will be
tracked (see 4.8).
4.3 Assessment of mishap risk. Assess the severity and probability of the mishap risk
associated with each identified hazard, i.e., determine the potential negative impact of the hazard
on personnel, facilities, equipment, operations, the public, and the environment, as well as on the
system itself. The tables in Appendix A are to be used unless otherwise specified.
4.4 Identification of mishap risk mitigation measures. Identify potential mishap risk
mitigation alternatives and the expected effectiveness of each alternative or method. Mishap risk
mitigation is an iterative process that culminates when the residual mishap risk has been reduced
to a level acceptable to the appropriate authority. The system safety design order of precedence
for mitigating identified hazards is:
3
MIL-STD-882D
b. Incorporate safety devices. If unable to eliminate the hazard through design selection,
reduce the mishap risk to an acceptable level using protective safety features or devices.
c. Provide warning devices. If safety devices do not adequately lower the mishap risk of
the hazard, include a detection and warning system to alert personnel to the particular hazard.
4.5 Reduction of mishap risk to an acceptable level. Reduce the mishap risk through a
mitigation approach mutually agreed to by both the developer and the program manager.
Communicate residual mishap risk and hazards to the associated test effort for verification.
4.6 Verification of mishap risk reduction. Verify the mishap risk reduction and
mitigation through appropriate analysis, testing, or inspection. Document the determined
residual mishap risk. Report all new hazards identified during testing to the program manager
and the developer.
4.7 Review of hazards and acceptance of residual mishap risk by the appropriate
authority. Notify the program manager of identified hazards and residual mishap risk. Unless
otherwise specified, the suggested tables A-I through A-III of the appendix will be used to rank
residual risk. The program manager shall ensure that remaining hazards and residual mishap risk
are reviewed and accepted by the appropriate risk acceptance authority (ref. table A-IV). The
appropriate risk acceptance authority will include the system user in the mishap risk review. The
appropriate risk acceptance authority shall formally acknowledge and document acceptance of
hazards and residual mishap risk.
4.8 Tracking of hazards, their closures, and residual mishap risk. Track hazards, their
closure actions, and the residual mishap risk. Maintain a tracking system that includes hazards,
their closure actions, and residual mishap risk throughout the system life cycle. The program
manager shall keep the system user advised of the hazards and residual mishap risk.
5. DETAILED REQUIREMENTS
Program managers shall identify in the solicitation and system specification any specific system
safety engineering requirements including risk assessment and acceptance, unique classifications
and certifications (see 6.6 and 6.7), or any mishap reduction needs unique to their program.
Additional information in developing program specific requirements is located in Appendix A.
4
MIL-STD-882D
6. NOTES
(This section contains information of a general or explanatory nature that may be helpful, but is
not mandatory.)
6.1 Intended use. This standard establishes a common basis for expectations of a
properly executed system safety effort.
6.2 Data requirements. Hazard analysis data may be obtained from contracted sources
by citing DI-MISC-80508, Technical Report - Study/Services. When it is necessary to obtain
data, list the applicable Data Item Descriptions (DIDs) on the Contract Data Requirements List
(DD Form 1423), except where the DoD Federal Acquisition Regulation Supplement exempts
the requirement for a DD Form 1423. The developer and the program manager are encouraged
to negotiate access to internal development data when hard copies are not necessary. They are
also encouraged to request that any type of safety plan required to be provided by the
contractor, be submitted with the proposal. It is further requested that any of the below listed
data items be condensed into the statement of work and the resulting data delivered in one
general type scientific report.
Current DIDs, that may be applicable to a system safety effort (check DoD 5010.12-L,
Acquisition Management Systems and Data Requirements Control List (AMSDL) for the most
current version before using), include:
5
MIL-STD-882D
Environmental
Hazard
Mishap
Mishap probability levels
Mishap risk
Mishap severity categories
Occupational Health
Residual mishap risk
System safety engineering
6.4 Definitions used in this standard. The definitions at 3.2 may be different from
those used in other specialty areas. One must carefully check the specific definition of a term
in question for its area of origination before applying the approach described in this document.
6.6 Explosive hazard classification and characteristic data. Any new or modified item of
munitions or of an explosive nature that will be transported to or stored at a DoD installation or
facility must first obtain an interim or final explosive hazard classification. The system safety
effort should provide the data necessary for the program manager to obtain the necessary
classification(s). These data should include identification of safety hazards involved in handling,
shipping, and storage related to production, use, and disposal of the item.
6.7 Use of system safety data in certification and other specialized safety approvals.
Hazard analyses are often required for many related certifications and specialized reviews.
Examples of activities requiring data generated during a system safety effort include:
a. Federal Aviation Agency airworthiness certification of designs and modifications
b. DoD airworthiness determination
c. Nuclear and non-nuclear munitions certification
d. Flight readiness reviews
e. Flight test safety review board reviews
f. Nuclear Regulatory Commission licensing
g. Department of Energy certification
Special safety-related approval authorities include USAF Radioisotope Committee,
Weapon System Explosive Safety Review Board (Navy), Non-Nuclear Weapons and Explosives
Safety Board (NNWESB), Army Fuze Safety Review Board, Triservice Laser Safety Review
6
MIL-STD-882D
Board, and the DoD Explosive Safety Board. Acquisition agencies should ensure that
appropriate service safety agency approvals are obtained prior to use of new or modified
weapons systems in an operational or test environment.
6.9 Identification of changes. Due to the extent of the changes, marginal notations are
not used in this revision to identify changes with respect to the previous issue.
7
MIL-STD-882D
APPENDIX A
A.1 SCOPE
A.1.1 Scope. This appendix provides rationale and guidance to fit the needs of most
system safety efforts. It includes further explanation of the effort and activities available to meet
the requirements described in section 4 of this standard. This appendix is not a mandatory part
of this standard and is not to be included in solicitations by reference. However, program
managers may extract portions of this appendix for inclusion in requirement documents and
solicitations.
A.2.1 General. The documents listed in this section are referenced in sections A.3, A.4,
and A.5. This section does not include documents cited in other sections of this appendix or
recommended for additional information or as examples.
A.2.2.1 Specifications, standards, and handbooks. This section is not applicable to this
appendix.
A.2.2.2 Other Government documents, drawings, and publications. The following other
Government document forms a part of this document to the extent specified herein. Unless
otherwise specified, the issue is that cited in the solicitation.
(Copies of DoD 5000.2-R are available from the Washington Headquarters Services,
Directives and Records Branch (Directives Section), Washington, DC or from the DoD
Acquisition Deskbook).
A.2.4 Order of precedence. Since this appendix is not mandatory, in event of a conflict
between the text of this appendix and the reference cited herein, the text of the reference takes
precedence. Nothing in this appendix supersedes applicable laws and regulations unless a
specific exemption has been obtained.
8
MIL-STD-882D
APPENDIX A
A.3 DEFINITIONS
A.3.1 Acronyms used in this appendix. No additional acronyms are used in this
appendix.
A.3.2.2 Fail-safe. A design feature that ensures the system remains safe, or in the event
of a failure, causes the system to revert to a state that will not cause a mishap.
A.3.2.6 Mishap risk assessment. The process of characterizing hazards within risk areas
and critical technical processes, analyzing them for their potential mishap severity and
probabilities of occurrence, and prioritizing them for risk mitigation actions.
A.3.2.10 Safety critical. A term applied to any condition, event, operation, process, or
item whose proper recognition, control, performance, or tolerance is essential to safe system
operation and support (e.g., safety critical function, safety critical path, or safety critical
component).
9
MIL-STD-882D
APPENDIX A
A.3.2.11 System safety management. All plans and actions taken to identify, assess,
mitigate, and continuously track, control, and document environmental, safety, and health
mishap risks encountered in the development, test, acquisition, use, and disposal of DoD weapon
systems, subsystems, equipment, and facilities.
A.4.1 General. System safety applies engineering and management principles, criteria,
and techniques to achieve acceptable mishap risk, within the constraints of operational
effectiveness, time, and cost, throughout all phases of the system life cycle. It draws upon
professional knowledge and specialized skills in the mathematical, physical, and scientific
disciplines, together with the principles and methods of engineering design and analysis, to
specify and evaluate the environmental, safety, and health mishap risk associated with a system.
Experience indicates that the degree of safety achieved in a system is directly dependent upon
the emphasis given. The program manager and the developer must apply this emphasis during
all phases of the system's life cycle. A safe design is a prerequisite for safe operations, with the
goal being to produce an inherently safe product that will have the minimum safety-imposed
operational restrictions.
A.4.1.1 System safety in environmental and health hazard management. DoD 5000.2-R
has directed the integration of environmental, safety, and health hazard management into the
systems engineering process. While environmental and health hazard management are normally
associated with the application of statutory direction and requirements, the management of
mishap risk associated with actual environmental and health hazards is directly addressed by the
system safety approach. Therefore, environmental and health hazards can be analyzed and
managed with the same tools as any other hazard, whether they affect equipment, the
environment, or personnel.
A.4.2 Purpose (see 1.1). All DoD program managers shall establish and execute
programs that manage the probability and severity of all hazards for their systems
(DoD 5000.2-R). Provision for system safety requirements and effort as defined by this standard
should be included in all applicable contracts negotiated by DoD. These contracts include those
negotiated within each DoD agency, by one DoD agency for another, and by DoD for other
Government agencies. In addition, each DoD in-house program will address system safety.
A.4.3 System safety planning. Before formally documenting the system safety approach,
the program manager, in concert with systems engineering and associated system safety
10
MIL-STD-882D
APPENDIX A
professionals, must determine what system safety effort is necessary to meet program and
regulatory requirements. This effort will be built around the requirements set forth in section 4
and includes developing a planned approach for safety task accomplishment, providing qualified
people to accomplish the tasks, establishing the authority for implementing the safety tasks
through all levels of management, and allocating appropriate resources to ensure that the safety
tasks are completed.
A.4.3.1 System safety planning subtasks. System safety planning subtasks should:
c. Establish system safety milestones and relate these to major program milestones,
program element responsibility, and required inputs and outputs.
f. Establish an approach and methodology for reporting to the program manager the
following minimum information:
g. Establish the method for the formal acceptance and documenting of residual mishap
risks and the associated hazards.
h. Establish the method for communicating hazards, the associated risks, and residual
mishap risk to the system user.
11
MIL-STD-882D
APPENDIX A
i. Specify requirements for other specialized safety approvals (e.g., nuclear, range,
explosive, chemical, biological, electromagnetic radiation, and lasers) as necessary (reference 6.6
and 6.7).
A.4.3.2 Safety performance requirements. These are the general safety requirements
needed to meet the core program objectives. The more closely these requirements relate to a
given program, the more easily the designers can incorporate them into the system. In the
appropriate system specifications, incorporate the safety performance requirements that are
applicable, and the specific risk levels considered acceptable for the system. Acceptable risk
levels can be defined in terms of: a hazard category developed through a mishap risk assessment
matrix; an overall system mishap rate; demonstration of controls required to preclude
unacceptable conditions; satisfaction of specified standards and regulatory requirements; or other
suitable mishap risk assessment procedures. Listed below are examples of safety performance
statements.
A.4.3.3 Safety design requirements. The program manager, in concert with the chief
engineer and utilizing systems engineering and associated system safety professionals, should
establish specific safety design requirements for the overall system. The objective of safety
design requirements is to achieve acceptable mishap risk through a systematic application of
design guidance from standards, specifications, regulations, design handbooks, safety design
checklists, and other sources. Review these for safety design parameters and acceptance criteria
applicable to the system. Safety design requirements derived from the selected parameters, as
well as any associated acceptance criteria, are included in the system specification. Expand these
requirements and criteria for inclusion in the associated follow-on or lower level specifications.
See general safety system design requirements below.
12
MIL-STD-882D
APPENDIX A
b. Hazardous substances, components, and operations are isolated from other activities,
areas, personnel, and incompatible materials.
f. Consider safety devices that will minimize mishap risk (e.g., interlocks, redundancy,
fail safe design, system protection, fire suppression, and protective measures such as clothing,
equipment, devices, and procedures) for hazards that cannot be eliminated. Make provisions for
periodic functional checks of safety devices when applicable.
j. Safety critical tasks may require personnel proficiency; if so, the developer should
propose a proficiency certification process to be used.
l. Inadequate or overly restrictive requirements regarding safety are not included in the
system specification.
13
MIL-STD-882D
APPENDIX A
A.4.3.3.1 Some program managers include the following conditions in their solicitation,
system specification, or contract as requirements for the system design. These condition
statements are used optionally as supplemental requirements based on specific program needs.
a. Single component failure, common mode failure, human error, or a design feature that
could cause a mishap of Catastrophic or Critical mishap severity catagories.
d. Packaging or handling procedures and characteristics that could cause a mishap for
which no controls have been provided to protect personnel or sensitive equipment.
a. For non-safety critical command and control functions: a system design that requires
two or more independent human errors, or that requires two or more independent failures, or a
combination of independent failure and human error.
b. For safety critical command and control functions: a system design that requires at
least three independent failures, or three independent human errors, or a combination of three
independent failures and human errors.
d. System designs that positively prevent damage propagation from one component to
another or prevent sufficient energy propagation to cause a mishap.
e. System design limitations on operation, interaction, or sequencing that preclude
occurrence of a mishap.
14
MIL-STD-882D
APPENDIX A
f. System designs that provide an approved safety factor, or a fixed design allowance that
limits, to an acceptable level, possibilities of structural failure or release of energy sufficient to
cause a mishap.
g. System designs that control energy build-up that could potentially cause a mishap
(e.g., fuses, relief valves, or electrical explosion proofing).
i. System designs that positively alert the controlling personnel to a hazardous situation
where the capability for operator reaction has been provided.
a. Management is always aware of the mishap risks associated with the system, and
formally documents this awareness. Hazards associated with the system are identified, assessed,
tracked, monitored, and the associated risks are either eliminated or controlled to an acceptable
level throughout the life cycle. Identify and archive those actions taken to eliminate or reduce
mishap risk for tracking and lessons learned purposes.
b. Historical hazard and mishap data, including lessons learned from other systems, are
considered and used.
e. System users are kept abreast of the safety of the system and included in the safety
decision process.
A.4.4.1 Documentation of the system safety approach. The documentation of the system
safety approach should describe the planned tasks and activities of system safety management
15
MIL-STD-882D
APPENDIX A
and system engineering required to identify, evaluate, and eliminate or control hazards, or to
reduce the residual mishap risk to a level acceptable throughout the system life cycle. The
documentation should describe, as a minimum, the four elements of an effective system safety
effort: a planned approach for task accomplishment, qualified people to accomplish tasks, the
authority to implement tasks through all levels of management, and the appropriate commitment
of resources (both manning and funding) to ensure that safety tasks are completed. Specifically,
the documentation should:
a. Describe the scope of the overall system program and the related system safety effort.
Define system safety program milestones. Relate these to major program milestones, program
element responsibility, and required inputs and outputs.
b. Describe the safety tasks and activities of system safety management and engineering.
Describe the interrelationships between system safety and other functional elements of the
program. List the other program requirements and tasks applicable to system safety and
reference where they are specified or described. Include the organizational relationships
between other functional elements having responsibility for tasks with system safety impacts and
the system safety management and engineering organization including the review and approval
authority of those tasks.
d. Describe the process through which management decisions will be made (for example,
timely notification of unacceptable risks, necessary action, incidents or malfunctions, waivers to
safety requirements, and program deviations). Include a description on how residual mishap risk
is formally accepted and this acceptance is documented.
e. Describe the mishap risk assessment procedures, including the mishap severity
categories, mishap probability levels, and the system safety design order of precedence that
should be followed to satisfy the safety requirements of the program. State any qualitative or
quantitative measures of safety to be used for mishap risk assessment including a description of
the acceptable and unacceptable risk levels (if applicable). Include system safety definitions that
modify, deviate from, or are in addition to those in this standard or generally accepted by the
system safety community (see Defense Acquisition Deskbook and System Safety Society’s
System Safety Analysis Handbook) (see A.6.1).
f. Describe how resolution and action relative to system safety will be implemented at
the program management level possessing resolution authority.
16
MIL-STD-882D
APPENDIX A
h. Describe the mishap or incident notification, investigation, and reporting process for
the program, including notification of the program manager.
i. Describe the approach for collecting and processing pertinent historical hazard,
mishap, and safety lessons learned data. Include a description on how a system hazard log is
developed and kept current (see A.4.4.8.1).
j. Describe how the user is kept abreast of residual mishap risk and the associated
hazards.
A.4.4.3 Assessment of mishap risk. Assess the severity and probability of the mishap
risk associated with each identified hazard, i.e., determine the potential impact of the hazard on
personnel, facilities, equipment, operations, the public, or environment, as well as on the system
itself. Other factors, such as numbers of persons exposed, may also be used to assess risk.
A.4.4.3.1 Mishap risk assessment tools. To determine what actions to take to eliminate
or control identified hazards, a system of determining the level of mishap risk involved must be
developed. A good mishap risk assessment tool will enable decision makers to properly
understand the level of mishap risk involved, relative to what it will cost in schedule and dollars
to reduce that mishap risk to an acceptable level.
A.4.4.3.2 Tool development. The key to developing most mishap risk assessment tools
is the characterization of mishap risks by mishap severity and mishap probability. Since the
highest system safety design order of precedence is to eliminate hazards by design, a mishap risk
assessment procedure considering only mishap severity will generally suffice during the early
design phase to minimize the system’s mishap risks (for example, just don’t use hazardous or
toxic material in the design). When all hazards cannot be eliminated during the early design
phase, a mishap risk assessment procedure based upon the mishap probability as well as the
mishap severity provides a resultant mishap risk assessment. The assessment is used to establish
priorities for corrective action, resolution of identified hazards, and notification to management
of the mishap risks. The information provided here is a suggested tool and set of definitions that
can be used. Program managers can develop tools and definitions appropriate to their individual
programs.
17
MIL-STD-882D
APPENDIX A
NOTE: These mishap severity categories provide guidance to a wide variety of programs.
However, adaptation to a particular program is generally required to provide a mutual
understanding between the program manager and the developer as to the meaning of the terms
used in the category definitions. Other risk assessment techniques may be used provided that
the user approves them.
18
MIL-STD-882D
APPENDIX A
derived from research, analysis, and evaluation of historical safety data from similar systems.
Supporting rationale for assigning a mishap probability is documented in hazard analysis
reports. Suggested qualitative mishap probability levels are shown in Table A-II.
A.4.4.3.2.3 Mishap risk assessment. Mishap risk classification by mishap severity and
mishap probability can be performed by using a mishap risk assessment matrix. This
assessment allows one to assign a mishap risk assessment value to a hazard based on its mishap
severity and its mishap probability. This value is then often used to rank different hazards as to
their associated mishap risks. An example of a mishap risk assessment matrix is shown at
Table A-III.
19
MIL-STD-882D
APPENDIX A
PROBABILITY
Frequent 1 3 7 13
Probable 2 5 9 16
Occasional 4 6 11 18
Remote 8 10 14 19
Improbable 12 15 17 20
A.4.4.3.2.4 Mishap risk categories. Mishap risk assessment values are often used in
grouping individual hazards into mishap risk categories. Mishap risk categories are then used
to generate specific action such as mandatory reporting of certain hazards to management for
action or formal acceptance of the associated mishap risk. Table A-IV includes an example
listing of mishap risk categories and the associated assessment values. In the example, the
system management has determined that mishap risk assessment values 1 through 5 constitute
“High” risk while values 6 through 9 constitute “Serious” risk.
TABLE A-IV. Example mishap risk categories and mishap risk acceptance levels.
*Representative mishap risk acceptance levels are shown in the above table. Mishap risk
acceptance is discussed in paragraph A.4.4.7. The using organization must be consulted by the
corresponding levels of program management prior to mishap risk acceptance.
A.4.4.3.2.5 Mishap risk impact. The mishap risk impact is assessed, as necessary,
using other factors to discriminate between hazards having the same mishap risk value. One
might discriminate between hazards with the same mishap risk assessment value in terms of
mission capabilities, or social, economic, and political factors. Program management will
closely consult with the using organization on the decisions used to prioritize resulting actions.
A.4.4.3.3 Mishap risk assessment approaches. Commonly used approaches for assessing
mishap risk can be found in the Defense Acquisition Deskbook and System Safety Society’s
System Safety Analysis Handbook (see A.6.1)
20
MIL-STD-882D
APPENDIX A
A.4.4.4 Identification of mishap risk mitigation measures. Identify potential mishap risk
mitigation alternatives and the expected effectiveness of each alternative or method. Mishap risk
mitigation is an iterative process that culminates when the residual mishap risk has been reduced
to a level acceptable to the appropriate authority.
A.4.4.4.1 Prioritize hazards for corrective action. Hazards should be prioritized so that
corrective action efforts can be focused on the most serious hazards first. A categorization of
hazards may be conducted according to the mishap risk potential they present.
A.4.4.4.2 System safety design order of precedence (see 4.4). The ultimate goal of a
system safety program is to design systems that contain no hazards. However, since the nature
of most complex systems makes it impossible or impractical to design them completely hazard-
free, a successful system safety program often provides a system design where there exist no
hazards resulting in an unacceptable level of mishap risk. As hazard analyses are performed,
hazards will be identified that will require resolution. The system safety design order of
precedence defines the order to be followed for satisfying system safety requirements and
reducing risks. The alternatives for eliminating the specific hazard or controlling its associated
risk are evaluated so that an acceptable method for mishap risk reduction can be agreed to.
A.4.4.5 Reduction of mishap risk to an acceptable level. Reduce the system mishap risk
through a mitigation approach mutually agreed to by the developer, program manager and the
using organization.
A.4.4.5.1 Communication with associated test efforts. Residual mishap risk and
associated hazards must be communicated to the system test efforts for verification.
A.4.4.6 Verification of mishap risk reduction. Verify the mishap risk reduction and
mitigation through appropriate analysis, testing, or inspection. Document the determined
residual mishap risk. The program manager must ensure that the selected mitigation approaches
will result in the expected residual mishap risk. To provide this assurance, the system test effort
should verify the performance of the mitigation actions. New hazards identified during testing
must be reported to the program manager and the developer.
A.4.4.6.1 Testing for a safe design. Tests and demonstrations must be defined to
validate selected safety features of the system. Test or demonstrate safety critical equipment and
procedures to determine the mishap severity or to establish the margin of safety of the design.
Consider induced or simulated failures to demonstrate the failure mode and acceptability of
safety critical equipment. When it cannot be analytically determined whether the corrective
action taken will adequately control a hazard, conduct safety tests to evaluate the effectiveness of
the controls. Where costs for safety testing would be prohibitive, safety characteristics or
procedures may be verified by engineering analyses, analogy, laboratory test, functional
mockups, or subscale/model simulation. Integrate testing of safety systems into appropriate
system test and demonstration plans to the maximum extent possible.
21
MIL-STD-882D
APPENDIX A
A.4.4.6.2 Conducting safe testing. The program manager must ensure that test teams are
familiar with mishap risks of the system. Test plans, procedures, and test results for all tests
including design verification, operational evaluation, production acceptance, and shelf-life
validation should be reviewed to ensure that:
A.4.4.7 Review and acceptance of residual mishap risk by the appropriate authority.
Notify the program manager of identified hazards and residual mishap risk. For long duration
programs, incremental or periodic reporting should be used.
A.4.4.7.1 Residual mishap risk. The mishap risk that remains after all planned mishap
risk management measures have been implemented is considered residual mishap risk. Residual
mishap risk is documented along with the reason(s) for incomplete mitigation.
A.4.4.7.2 Residual mishap risk management. The program manager must know what
residual mishap risk exists in the system being acquired. For significant mishap risks, the
program manager is required to elevate reporting of residual mishap risk to higher levels of
appropriate authority (such as the Program Executive Officer or Component Acquisition
Executive) for action or acceptance. The program manager is encouraged to apply additional
resources or other remedies to help the developer satisfactorily resolve hazards providing
significant mishap risk. Table A-IV includes an example of a mishap risk acceptance level
matrix based on the mishap risk assessment value and mishap risk category.
A.4.4.7.3 Residual mishap risk acceptance. The program manager is responsible for
formally documenting the acceptance of the residual mishap risk of the system by the appropriate
authority. The program manager should update this residual mishap risk and the associated
hazards to reflect changes/modifications in the system or its use. The program manager and
using organization should jointly determine the updated residual mishap risk prior to acceptance
of the risk and system hazards by the risk acceptance authority, and should document the
agreement between the user and the risk acceptance authority.
A.4.4.8 Tracking hazards and residual mishap risk. Track hazards, their closures, and
residual mishap risk. A tracking system for hazards, their closures, and residual mishap risk
must be maintained throughout the system life cycle. The program manager must keep the
system user apprised of system hazards and residual mishap risk.
22
MIL-STD-882D
APPENDIX A
A.4.4.8.1 Process for tracking of hazards and residual mishap risk. Each system must
have a current log of identified hazards and residual mishap risk, including an assessment of the
residual mishap risk (see A.4.4.7). As changes are integrated into the system, this log is updated
to incorporate added or changed hazards and the associated residual mishap risk. The
Government must formally acknowledge acceptance of system hazards and residual mishap risk.
Users will be kept informed of hazards and residual mishap risk associated with their systems.
A.5.1 Program manager responsibilities. The program manager must ensure that all
types of hazards are identified, evaluated, and mitigated to a level compliant with acquisition
management policy, federal (and state where applicable) laws and regulations, Executive Orders,
treaties, and agreements. The program manager should:
A.5.1.1 Establish, plan, organize, implement, and maintain an effective system safety
effort that is integrated into all life cycle phases.
23
MIL-STD-882D
APPENDIX A
A.5.1.2 Ensure that system safety planning is documented to provide all program
participants with visibility into how the system safety effort is to be conducted.
A.5.1.3 Establish definitive safety requirements for the procurement, development, and
sustainment of the system. The requirements should be set forth clearly in the appropriate
system specifications and contractual documents.
A.5.1.5 Monitor the developer’s system safety activities and review and approve
delivered data in a timely manner, if applicable, to ensure adequate performance and compliance
with safety requirements.
A.5.1.6 Ensure that the appropriate system specifications are updated to reflect results of
analyses, tests, and evaluations.
A.5.1.7 Evaluate new lessons learned for inclusion into appropriate databases and submit
recommendations to the responsible organization.
A.5.1.8 Establish system safety teams to assist the program manager in developing and
implementing a system safety effort.
A.5.1.11 Keep the system users apprised of system hazards and residual mishap risk.
A.5.1.12 Ensure the program meets the intent of the latest MIL-STD 882.
A.5.1.13 Ensure adequate resources are available to support the program system safety
effort.
A.5.1.14 Ensure system safety technical and managerial personnel are qualified and
certified for the job.
A.6 NOTES
A.6.1 DoD acquisition practices and safety analysis techniques. Information on DoD
acquisition practices and safety analysis techniques is available at the referenced Internet sites.
Nothing in the referenced information is considered binding or additive to the requirements
provided in this standard.
24
MIL-STD-882D
APPENDIX A
A.6.1.2 System Safety Analysis Handbook. Unionville, VA: System Safety Society.
25
MIL-STD-882D
CONCLUDING MATERIAL
Reviewing activities:
Army - AR, AT, CR, MI
Navy - EC, OS, SA, SH
Air Force - 10, 11, 13, 19
26
STANDARDIZATION DOCUMENT IMPROVEMENT PROPOSAL
INSTRUCTIONS
1. The preparing activity must complete blocks 1, 2, 3, and 8. In block 1, both the document number and revision letter
should be given.
2. The submitter of this form must complete blocks 4, 5, 6, and 7, and send to preparing activity.
3 The preparing activity must provide a reply within 30 days from receipt of the form.
NOTE: This form may not be used to request copies of documents, nor to request waivers, or clarification of
requirements on current contracts. Comments submitted on this form do not constitute or imply authorization to waive any
portion of the referenced document(s) or to amend contractual requirements.
1. DOCUMENT NUMBER 2. DOCUMENT DATE (YYYYMMDD)
I RECOMMEND A CHANGE:
MIL-STD-882 20000210
3. DOCUMENT TITLE
System Safety
4. NATURE OF CHANGE (Identify paragraph number and include proposed rewrite, if possible. Attach extra sheets as needed.)
6. SUBMITTER
a. NAME (Last, First, Middle Initial) b. ORGANIZATION
c. ADDRESS (Include zip code) d. TELEPHONE (Include Area Code) 7. DATE SUBMITTED
(1) Commercial (YYYYMMDD)
(2) DSN
(if applicable)
8. PREPARING ACTIVITY
a. NAME b. TELEPHONE (Include Area Code)
Headquarters, Air Force Materiel Command (1) Commercial (937) 257-6007
System Safety Division
(2) DSN 787-6007
b. ADDRESS (Include Zip Code) IF YOU DO NOT RECEIVE A REPLY WITHIN 45 DAYS, CONTACT:
Defense Standardization Program Office (DLSC-LM)
HQ AFMC/SES 8725 John J. Kingman Road, Suite 2533
4375 Chidlaw Road Fort Belvoir, Virginia 22060-6621
Wright Patterson AFB, Ohio 45433-5006 Telephone 703 767-6888 DSN 427-6888
DD Form 1426, FEB 1999 (EG) PREVIOUS EDITION IS OBSOLETE. WHS/DIOR, Feb 99
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Appendix J
Software Safety
SOFTWARE SAFETY...............................................................................................................................1
J-1
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
The purpose of this section is to describe the software safety activities that should be incorporated into the
software development phases of project development. The software safety information that should be included
in the documents produced during these phases is also discussed. The term "software components" is used in a
general sense to represent important software development products such as software requirements, software
designs, software code or program sets, software tests, etc.
This section of the handbook describes the software safety team involvement in developing safety
requirements for software. The software safety requirements can be top-down (flowed down from system
requirements) and/or bottom-up (derived from hazard analyses). In some organizations, top-down flow is
the only permitted route for requirements into software, and in those cases, newly derived bottom-up safety
requirements must be flowed back into the system specification.
The requirements of software components are typically expressed as functions with corresponding inputs,
processes, and outputs, plus additional requirements on interfaces, limits, ranges, precision, accuracy, and
performance. There may also be requirements on the data of the program set - its attributes, relationships,
and persistence, among others.
Software safety requirements are derived from the system and subsystem safety requirements developed to
mitigate hazards identified in the Preliminary, System, and Subsystems Hazard Analyses.
Also, the assigned safety engineer flows requirements to systems engineering. The systems engineering
group and the software development group have a responsibility to coordinate and negotiate requirement
flowdown to be consistent with the software safety requirement flowdown.
The software safety organization should flow requirements into the Software Requirements Document (SRD)
and the Software Interface Specification (SIS) or Interfaces Control Document (ICD). Safety-related
requirements must be clearly identified in the SRD
SIS activities identify, define, and document interface requirements internal to the sub-system in which software
resides, and between system (including hardware and operator interfaces), subsystem, and program set
components and operation procedures. Note that the SIS is sometimes effectively contained in the SRD, or
within an Interface Control Document (ICD) which defines all system interfaces, including hardware to
hardware, hardware to software, and software to software.
• Through top down analysis of system design requirements (from specifications): The
system requirements may identify system hazards up-front, and specify which system
functions are safety critical. The (software) safety organization participates or leads the
mapping of these requirements to software.
• From the Preliminary Hazard Analysis (PHA): The PHA looks down into the system
from the point of view of system hazards. Preliminary hazard causes are mapped to, or
interact with, software. Software hazard control features are identified and specified as
requirements.
• Through bottom up analysis of design data, (e.g. flow diagrams, FMEAs, fault trees etc.)
J-3
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Design implementations allowed but not anticipated by the system requirements are analyzed and new
hazard causes are identified. Software hazard controls are specified via requirements when the hazard causes
map to, or interact with, software.
An area of concern in the flowdown process is incomplete analysis, and/or inconsistent analysis of highly
complex systems, or use of ad hoc techniques by biased or inexperienced analysts. The most rigorous (and
most expensive) method of addressing this concern is adoption of formal methods for requirements analysis
and flowdown. Less rigorous and less expensive ways include checklists and/or a standardized structured
approach to software safety as discussed below and throughout this guidebook.
The following section contain a description of the type of analysis and gives the methodology by defining the
task, the resources required to perform the analysis, and the expected output from the analyses.
• Develop a systematic checklist of software safety requirements and any hazard controls,
ensuring they correctly and completely include (and cross reference) the appropriate
specifications, hazard analyses test and design documents. This should include both
generic and specific safety requirements.
• Develop a hazard requirement flowdown matrix that maps safety requirements and
hazard controls to system/software functions and to software modules and components.
Where components are not yet defined, flow to the lowest level possible and tag for
future flowdown.
The evaluation will determine whether the requirement has safety implications and, if so, the requirement is
designated “safety critical”. It is then placed into a tracking system to ensure traceability of software
requirements throughout the software development cycle from the highest-level specification all the way to
the code and test documentation. All of the following techniques are focused on safety critical software
components.
J-4
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
The system safety organization coordinates with the project system engineering organization to review and
agree on the criticality designations. At this point the systems engineers may elect to make design changes to
reduce the criticality levels or consolidate modules reducing the number of critical modules.
At this point, some bottom-up analyses can be performed. Bottom-up analyses identify requirements or
design implementations that are inconsistent with, or not addressed by, system requirements. Bottom-up
analyses can also reveal unexpected pathways (e.g., sneak circuits) for reaching hazardous or unsafe states.
System requirements should be corrected when necessary.
It is possible that software components or subsystems might not be defined during the Requirements Phase,
so those portions of the Criticality Analysis would be deferred to the Architectural Design Phase. In any case,
the Criticality Analysis will be updated during the Architectural Design Phase to reflect the more detailed
definition of software components.
• All software requirements are analyzed in order to identify additional potential system
hazards that the system PHA did not reveal and to identify potential areas where system
requirements were not correctly flowed to the software. Identified potential hazards are
then addressed by adding or changing the system requirements and reflowing them to
hardware, software and operations as appropriate.
• At the system level: identify hardware or software items that receive/pass/initiate critical
signals or hazardous commands.
• At the software requirements level: identify software functions or objects that
receive/pass/initiate critical signals or hazardous commands.
• This safety activity examines the system/software requirements and design to identify
unsafe conditions for resolution such as out-of-sequence, wrong event, inappropriate
magnitude, incorrect polarity, inadvertent command, adverse environment, deadlocking,
and failure-to-command modes.
• The software safety requirements analysis considers such specific requirements as the
characteristics discussed below as critical software characteristics.
The following resources are available for the Requirements Criticality Analysis. Note: documents in brackets
correspond to terminology from DOD-STD-2167. Other document names correspond to NASA-STD-
2100.91.
The results and findings of the Criticality Analyses should be fed to the System Requirements and System
Safety Analyses. For all discrepancies identified, either the requirements should be changed because they are
incomplete or incorrect, or else the design must be changed to meet the requirements. The analysis identifies
additional hazards that the system analysis did not include, and identifies areas where system or interface
requirements were not correctly assigned to the software.
The results of the criticality analysis may be used to develop Formal Inspection checklists for performing the
formal inspection process described later in INSERT REFERENCE.
All characteristics of safety critical software must be evaluated to determine if they are safety critical. Safety
critical characteristics should be controlled by requirements that receive rigorous quality control in
conjunction with rigorous analysis and test. Often all characteristics of safety critical software are
themselves safety critical.
J-6
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
This list is not exhaustive and often varies depending on the system architecture and environment.
Generic requirements prevent costly duplication of effort by taking advantage of existing proven techniques
and lessons learned rather than reinventing techniques or repeating mistakes. Most development programs
should be able to make use of some generic requirement; however, they should be used with care.
As technology evolves, or as new applications are implemented, new "generic" requirements will likely arise,
and other sources of generic requirements might become available. A partial listing of generic requirement
sources is shown below:
• EWRR (Eastern and Western Range Regulation) 127-1, Section 3.16.4 Safety Critical
Computing System Software Design Requirements.
• AFISC SSH 1-1 System Safety Handbook - Software System Safety, Headquarters Air
Force Inspection and Safety Center.
• EIA Bulletin SEB6-A System Safety Engineering in Software Development (Electrical
Industries Association)
• Underwriters Laboratory - UL 1998 Standard for Safety - Safety-Related Software,
January 4th, 1994
A listing of many of the generic software safety requirements is presented in the table below.
The failure of safety critical software functions shall be detected, isolated, and recovered from such that
catastrophic and critical hazardous events are prevented from occurring.
Software shall perform automatic Failure Detection, Isolation, and Recovery (FDIR) for identified safety
critical functions with a time to criticality under 24 hours
Automatic recovery actions taken shall be reported. There shall be no necessary response from ground
operators to proceed with the recovery action.
The FDIR switchover software shall be resident on an available, non-failed control platform which is
different from the one with the function being monitored.
Override commands shall require multiple operator actions.
Software shall process the necessary commands within the time to criticality of a hazardous event.
Hazardous commands shall only be issued by the controlling application, or by authorized ground personnel.
Software that executes hazardous commands shall notify ground personnel upon execution or provide the
J-7
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Coding Standards
Coding Standards, a class of generic software requirements, are, in practice, “safe” subsets of programming
languages. These are needed because most compilers can be unpredictable in how they work. For example,
dynamic memory allocation is predictable. In applications where some portions of memory are safety
critical, it is important to control which memory elements are assigned in a particular compilation process;
the defaults chosen by the compiler might be unsafe. Some attempts have been made at developing coding
safety standards (safe subsets).
J-9
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
use of Z-transforms to develop difference equations to implement the control laws. This
will also make most efficient use of real-time computing resources.
Sampling rates should be selected with consideration for noise levels and expected
variations of control system and physical parameters. For measuring signals that are not
critical, the sample rate should be at least twice the maximum expected signal frequency to
avoid aliasing. For critical signals, and parameters used for closed loop control, it is
generally accepted that the sampling rate must be much higher; at least a factor of ten above
the system characteristic frequency is customary.
Dynamic memory allocation: ensure adequate resources are available to accommodate
usage of dynamic memory allocation, without conflicts. Identify and protect critical
memory blocks. Poor memory management has been a leading factor in several critical
failures.
Memory Checking: Self-test of memory usage can be as part of BIT/self-test to give
advance warning of imminent saturation of memory.
Quantization: Digitized systems should select word-lengths long enough to reduce the
effects of quantization noise to ensure stability of the system. Selection of word-lengths and
floating-point coefficients should be appropriate with regard to the parameters being processed
in the context of the overall control system. Too short word-lengths can result in system
instability and misleading readouts. Too long word-lengths result in excessively complex
software and heavy demand on CPU resources, scheduling and timing conflicts etc.
Computational Delay: Computers take a finite time to read data and to calculate and output
results, so some control parameters will always be out of date. Controls systems must
accommodate this. Also, check timing clock reference datum, synchronization and accuracy
(jitter). Analyze task scheduling (e.g., with Rate Monotonic Analysis (RMA)).
implement the software safety requirements. After allocation of the software safety requirements to the
software design, Safety Critical Computer Software Components (SCCSCs) are identified. Bottom-up safety
analysis is performed on the architectural design to identify potential hazards, to define and analyze SCCSCs
and the early test plans are reviewed to verify incorporation of safety related testing. Analyses included in the
Architectural Design Phase are:
Software for a system, while often subjected to a single development program, actually consists of a set of
multi-purpose, multifunction entities. The software functions need to be subdivided into many modules and
further broken down to components.
Some of these modules will be safety critical, and some will not. The criticality analysis provides the
appropriate initial criticality designation for each software function. The safety activity relates identified
hazards from the following analyses previously described to the Computer Software Components (CSCs)
that may affect or control the hazards.
This analysis identifies all those software components that implement software safety requirements or
components that interface with SCCSCs that can affect their output. The designation Safety Critical
Computer Software Component (SCCSC) should be applied to any module, component, subroutine or other
software entity identified by this analysis.
While Requirements Criticality and Update Criticality analysis simply assign a Yes or No to whether each
component is safety critical, the Risk Assessment process takes this further. Each SCCSCs is prioritized for
analysis and corrective action according to the five levels of Hazard Prioritization ranking given previously.
J-11
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
unacceptable hazards occur. This is done by postulating credible faults/failures and evaluating their effects
on the system. Input/output timing, multiple event, out-of-sequence event, failure of event, wrong event,
inappropriate magnitude, incorrect polarity, adverse environment, deadlocking, and hardware failure
sensitivities are included in the analysis.
Methods used for FMEA (Failure Modes and Effects Analysis) can be used substituting software
components for hardware components in each case. A widely used FMEA procedure is MIL-STD-1629,
which is based on the following eight steps. Formal Inspections (described earlier), design reviews and
animation/simulation augment this process.
Design Reviews
Design data is reviewed to ensure it properly reflects applicable software safety requirements. Design
changes are generated where necessary. Applicability matrices, compliance matrices, and compliance
checklists can be used to assist in completing this task. Output products are engineering change requests,
hazard reports (to capture design decisions affecting hazard controls and verification) and action items.
Animation/Simulation
Simulators, prototypes (or other dynamic representations of the required functionality as specified by the
design), and test cases to exercise crucial functions can be developed. Run the tests and observe the system
response. Requirements can be modified as appropriate. Documented test results can confirm expected
behavior or reveal unexpected behavior. The status of critical verifications is captured by hazard reports.
J.4.5 Interface Analysis
Interdependence Analysis
Examine the software to determine the interdependence among CSCs, modules, tables, variables, etc.
Elements of software that directly or indirectly influences SCCSCs are also identified as SCCSCs, and as
such should be analyzed for their undesired effects. For example, shared memory blocks used by two or
more SCCSCs. The inputs and outputs of each SCCSC are inspected and traced to their origin and
destination.
Independence Analysis
The safety activity evaluates available design documentation to determine the independence/dependence and
interdependence of SCCSCs to both safety-critical and non-safety-critical CSCs. Those CSCs that are found
to affect the output SCCSCs are designated as SCCSCs. Areas where FCR (Fault Containment Region)
integrity is compromised are identified. The methodology is to map the safety critical functions to the
software modules and map the software modules to the hardware hosts and FCRs. Each input and output of
each SCCSC should be inspected. Resources are definition of safety critical functions needing to independent
J-12
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
design descriptions, and data diagrams. Design changes to achieve valid FCRs and corrections to SCCSC
designations may be necessary
The ultimate, fully rigorous DLA uses the application of Formal Methods (FM). Where FM is
inappropriate, because of its high cost versus software of low cost or low criticality, simpler DLA can be
used. Less formal DLA involves a human inspector reviewing a relatively small quantity of critical software
artifacts (e.g. PDL, prototype code), and manually tracing the logic. Safety critical logic to be inspected can
include failure detection/diagnosis; redundancy management, variable alarm limits, and command inhibit
logical preconditions.
Commercial automatic software source analyzers can be used to augment this activity, but should not be
relied upon absolutely since they may suffer from deficiencies and errors, a common concern of COTS tools
and COTS in general.
J-13
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Interrupts and their effect on data must receive special attention in safety-critical areas. Analysis should
verify that interrupts and interrupt handling routines do not alter critical data items used by other routines.
The integrity of each data item should be evaluated with respect to its environment and host. Shared
memory, and dynamic memory allocation can affect data integrity. Data items should also be protected from
being overwritten by unauthorized applications. Considerations of EMI affecting memory should be
reviewed in conjunction with system safety.
Interface characteristics to be addressed should include data encoding, error checking and synchronization.
The analysis should consider the validity and effectiveness of checksums and CRCs. The sophistication of
error checking implemented should be appropriate for the predicted bit error rate of the interface. An overall
system error rate should be defined, and budgeted to each interface. Examples of interface problems:
• Sender sends eight-bit word with bit 7 as parity, but recipient believes bit 0 is parity.
• Sender transmits updates at 10 Hz, but receiver only updates at 1 Hz.
• Sender encodes word with leading bit start, but receiver decodes with trailing bit start.
• Interface deadlock prevents data transfer (e.g., Receiver ignores or cannot recognize
“ready to send”).
• User reads data from wrong address.
• Sender addresses data to wrong address.
In a language such as C, or C++ where data typing is not strict, sender may use different data types than
reviewer expects. (Where there is strong data typing, the compilers will catch this).
J.5.7 Petri-Nets
Petri-nets are a graphical technique that can be used to model and analyze safety-critical systems for such
properties as reachability, recoverability, deadlock, and fault tolerance. Petri-nets allow the identification of
the relationships between system components such as hardware and software, and human interaction or
effects on both hardware and software. Real-time Petri-net techniques can also allow analysts to build
dynamic models that incorporate timing information. In so doing, the sequencing and scheduling of system
actions can be monitored and checked for states that could lead to unsafe conditions.
The Petri-net modeling tool is different from most other analysis methods in that it clearly demonstrates the
dynamic progression of state transitions. Petri-nets can also be translated into mathematical logic
expressions that can be analyzed by automated tools. Information can be extracted and reformed into
analysis assisting graphs and tables that are relatively easy to understand (e.g., reachability graphs, inverse
Petri-net graphs, critical state graphs). Some of the potential advantages of Petri-nets over other safety
analysis techniques include the following:
J-15
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Adding time and probabilities to each Petri-net allows incorporation of timing and
probabilistic information into the analysis. The model may be used to analyze the
system for other features besides safety.
Unfortunately, Petri-nets require a large amount of detailed analysis to build even relatively small systems,
thus making them very expensive. In order to reduce expenses, a few alternative Petri-net modeling
techniques have been proposed, each tailored to perform a specific type of safety analysis. For example,
time Petri-net (TPN), take account for time dependency factor of real-time systems; inverse Petri-net,
specifically needed to perform safety analysis, uses the previously discussed backward modeling approach to
avoid modeling all of the possible reachable status; and critical state inverse Petri-nets, which further refine
inverse Petri-net analysis by only modeling reachable states at predefined criticality levels.
Petri-net analysis can be performed at any phase of the software development cycle; though, it is highly
recommended for reasons of expense and complexity that the process be started at the beginning of the
development cycle and expanded for each of the succeeding phases. Petri-net, inverse Petri-net and critical
state Petri-nets are all relatively new technologies, are costly to implement, and absolutely require technical
expertise on the part of the analyst. Petri net analysis is a complex subject, and is treated in more detail in
Appendix C of this handbook.
The Dynamic Flowgraph Methodology (DFM) is an integrated, methodical approach to modeling and
analyzing the behavior of software-driven embedded systems for the purpose of dependability assessment
and verification. The methodology has two fundamental goals: 1) to identify how events can occur in a
system; and 2) identify an appropriate testing strategy based on an analysis of system functional behavior.
To achieve these goals, the methodology employs a modeling framework in which models expressing the
logic of the system being analyzed are developed in terms of contributing relationships between physical
variables and temporal characteristics of the execution of software modules.
J-16
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Models are analyzed to determine how a certain state (desirable or undesirable) can be reached. This is done
by developing timed fault trees which take the form of logical combinations of static trees relating the system
parameters at different points in time. The resulting information concerning the hardware and software states
that can lead to certain events of interest can then be used to increase confidence in the system, eliminate
unsafe execution paths, and identify testing criteria for safety critical software functions.
Linguistic, structural, and combined metrics exist for measuring the complexity of software and while
discussed below briefly.
Use complexity estimation techniques, such as McCabe or Halstead. If an automated tool is available, the
software design and/or code can be run through the tool. If there is no automated tool available, examine the
critical areas of the detailed design and any preliminary code for areas of deep nesting, large numbers of
parameters to be passed, intense and numerous communication paths, etc. (Refer to references cited above.)
Resources are the detailed design, high level language description, source code, and automated complexity
measurement tool(s).
Output products are complexity metrics, predicted error estimates, and areas of high complexity identified
for further analysis or consideration for simplification.
Several automated tools are available on the market which provides these metrics. The level and type of
complexity can indicate areas where further analysis, or testing, may be warranted. Beware, however, these
metrics should be used with caution as they may indicate that a structure, such as a CASE statement, is
highly complex while in reality that complexity leads to a simpler, more straight forward method of
programming and maintenance, thus decreasing the risk of errors.
Linguistic measurements measure some property of the text without regard for the contents (e.g., lines of
code, number of statements, number and type of operators, total number and type of tokens, etc). Halstead's
Metrics is a well known measure of several of these arguments.
Structural metrics focuses on control-flow and data-flow within the software and can usually be mapped into
a graphics representation. Structural relationships such as the number of links and/or calls, number of nodes,
nesting depth, etc. are examined to get a measure of complexity. McCabe's Cyclomatic Complexity metric is
the most well known and used metric for this type of complexity evaluation.
The use of software to control safety-critical processes is placing software development environments (i.e.
languages, compilers, utilities, etc.) under increased scrutiny. When computer languages are taught, students
J-17
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
are seldom warned of the limitations and insecurities that the environment possesses. An insecurity is a
feature of a programming language whose implementation makes it impossible or extremely difficult to
detect some violation of the language rules, by mechanical analysis of a program's text. The computer
science profession has only recently focused on the issues of the inherent reliability of programming
environments for safety-critical applications.
This section will provide an introduction on the criteria for determining which languages are well suited for
safety-critical applications. In addition, an overview of a safe subset of the ADA language will be discussed
with the rationale for rejecting language constructs. Reading knowledge of Pascal, ADA, C or another
modern high level block structured language is required to understand the concepts that are being discussed.
There are two primary reasons for restricting a language definition to a subset: 1) some features are defined
in an ambiguous manner and 2) some features are excessively complex. A language is considered suitable
for use in a safety-critical application if it has a precise definition (complete functionality as well), is
logically coherent, and has a manageable size and complexity. The issue of excessive complexity makes it
virtually impossible to verify certain language features. Overall, the issues of logical soundness and
complexity will be the key toward understanding why a language is restricted to a subset for safety-critical
applications.
An overview of the insecurities in the ADA language standard is included in this entry. Only those issues
that are due to the ambiguity of the standard will be surveyed. The problems that arise because a specific
implementation (e.g., a compiler) is incorrect can be tracked by asking the compiler vendor for a historical
list of known bugs and defect repair times. This information should give a user a basis with which to
compare the quality of product and service of different vendors.
Probably the most common misuse in practically all-programming languages is that of uninitialized
variables. This mistake is very hard to catch because unit testing will not flag it unless explicitly designed to
do so. The typical manifestation of this error is when a program that has been working successfully is run
under different environmental conditions and the results are not as expected.
Calls to de-allocate memory should be examined to make sure that not only is the pointer released but that
the memory used by the structure is released.
The order of evaluation of operands when side effects from function calls modify the operands is generally
dismissed as poor programming practice but in reality is an issue that is poorly defined (no standard of any
type has been defined) and arbitrarily resolved by implementers of language compilers.
Method of Assessment
The technique used to compare programming languages will not deal with differences among manufacturers
of the same language. Compiler vendor implementations, by and large, do not differ significantly from the
intent of the standard, however standards are not unambiguous and they are interpreted conveniently for
marketing purposes. One should be aware that implementations will not adhere 100% to the standard
because of the extremely large number of states a compiler can produce. The focus of this study then is to
review the definition of a few languages for certain characteristics that will provide for the user a shell
against inadvertent misuse. When evaluating a language, the following questions should be asked of the
language as a minimum:
J-18
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Formal Methods have been used with success on both military and commercial systems that were considered
safety-critical applications. The benefits from the application of the methodology accrue to both safety and
non-safety areas. Formal Methods do not guarantee a precise quantifiable level of reliability; at present they
are only acknowledged as producing systems that provide a high level of assurance.
On a qualitative level the following list identifies different levels of application of assurance methods in
software development. They are ranked by the perceived level of assurance achieved with the lowest
numbered approaches representing the highest level of assurance. Each of the approaches to software
development is briefly explained by focusing on that part of the development that distinguishes it from the
other methods.
Formal development down to object code requires that formal mathematical proofs be
carried out on the executable code.
Formal development down to source code requires that the formal specification of the
system undergo proofs of properties of the system.
J-19
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Rigorous development down to source code is when requirements are written in a formal
specification language and emulators of the requirements are written. The emulators serve
the purpose of a prototype to test the code for correctness of functional behavior.
Structured development to requirements analysis then rigorous development down to
source code performs all of the steps from the previous paragraph. The source code
undergoes a verification process that resembles a proof but falls short of one.
Structured development down to source code is the application of the structured
analysis/structured design method. It consists of a conceptual diagram that graphically
illustrates functions, data structures, inputs, outputs, and mass storage and their
interrelationships. Code is written based on the information in the diagram.
Ad hoc techniques encompass all of the non-structured and informal techniques (i.e.
hacking, code a little then test a little).
Whether or not formal methods are used to develop a system, a high level RSM can be used to provide a
view into the architecture of an implementation without being engulfed by all the accompanying detail.
Semantic analysis criteria can be applied to this representation and to lower level models to verify the
behavior of the RSM and determine that its behavior is acceptable. The analysis criteria will be listed in a
section below and in subsequent sections because they are applicable at practically every stage of the
development life cycle.
The state machine models should be built to abstract different levels of hierarchy. The models are
partitioned in a manner that is based on considerations of size and logical cohesiveness. An uppermost level
model should contain at most 15 to 20 states; this limit is based on the practical consideration of
comprehensibility. In turn, each of the states from the original diagram can be exploded in a fashion similar
to the bubbles in a data flow diagram/control flow diagram (DFD/CFD) (from a structured
analysis/structured design methodology) to the level of detail required. An RSM model of one of the lower
levels contains a significant amount of detail about the system.
The states in each diagram are numbered and classified as one of the following attributes: Passive, Startup,
Safe, Unsafe, Shutdown, Stranded and Hazard. For the state machine to represent a viable system, the
diagram must obey certain properties that will be explained later in this work.
J-20
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
The passive state represents an inert system, that is, nothing is being produced. However, in
the passive state, input sensors are considered to be operational. Every diagram of a system
contains at least one passive state. A passive state may transition to an unsafe state.
The startup state represents the initialization of the system. Before any output is produced,
the system must have transitioned into the startup state where all internal variables are set to
known values. A startup state must be proven to be safe before continuing work on the
remaining states. If the initialization fails, a timeout may be specified and a state transition
to an unsafe or passive state may be defined.
Input/Output Variables
All information from the sensors should be used somewhere in the RSM. If not, either an input from a
sensor is not required or, more importantly, an omission has been made from the software requirements
specification. For outputs it can be stated that, if there is a legal value for an output that is never produced,
then a requirement for software behavior has been omitted.
State Attributes
The state attributes of the RSM are to be labeled according to the scheme in Chapter 10.
J-21
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Some of these code analysis techniques mirror those used in detailed design analysis. However, the results
of the analysis techniques might be significantly different than during earlier development phases, because
the final code may differ substantially from what was expected or predicted.
Each of these analyses, contained in this section, should be undergoing their second iteration, since they
should have all been applied previously to the code-like products (PDL) of the detailed design.
There are some commercial tools available which perform one or more of these analyses in a single package.
These tools can be evaluated for their validity in performing these tasks, such as logic analyzers, and path
analyzers. However, unvalidated COTS tools, in themselves, cannot generally be considered valid methods
for formal safety analysis. COTS tools are often useful to reveal previously unknown defects.
Note that the definitive formal code analysis is that performed on the final version of the code. A great deal
of the code analysis is done on earlier versions of code, but a final check on the final version is essential.
For safety purposes it is desirable that the final version have no “instrumentation” (i.e., extra code added), in
order to see where erroneous jumps go. One may need to run the code on an instruction set emulator that
can monitor the code from the outside, without adding the instrumentation.
Logic reconstruction entails the preparation of flow charts from the code and comparing them to the design
material descriptions and flow charts.
Equation reconstruction is accomplished by comparing the equations in the code to the ones provided with
the design materials.
Memory decoding identifies critical instruction sequences even when they may be disguised as data. The
analyst should determine whether each instruction is valid and if the conditions under which it can be
executed are valid. Memory decoding should be done on the final un-instrumented code. Employment of
Fault Trees and Petri Nets has been discussed in the previous section of this appendix.
Of particular concern to safety is ensuring the integrity of safety critical data against being inadvertently
altered or overwritten. For example, check to see if interrupt processing is interfering with safety critical
data. Also, check the “typing” of safety critical declared variables.
J-22
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
Each of these interfaces is a source of potential problems. Code interface analysis is intended to verify that
the interfaces have been implemented properly. Hardware and human operator interfaces should be made
part of the “Design Constraint Analysis” discussed below.
The physical limitations of the processing hardware platform should be addressed. Timing, sizing and
throughput analyses should also be repeated as part of this process to ensure that computing resources and
memory available are adequate for safety critical functions and processes.
Underflows/overflows in certain languages (e.g., ADA) give rise to “exceptions” or error messages generated
by the software. These conditions should be eliminated by design if possible; if they cannot be precluded,
then error handling routines in the application must provide appropriate responses, such as retry, restart, etc.
J-23
FAA System Safety Handbook, Appendix J: Software Safety
December 30, 2000
There is no particular technique for identifying unused code; however, unused code is often identified during
the course of performing other types of code analysis. Unused code can be found during unit testing with
COTS coverage analyzer tools.
Care should be taken during logical code analyses to ensure that every part of the code is eventually
exercised at some time during all possible operating modes of the system.
Some analysis is advisable to assess the optimum test coverage as part of the test planning process. There is
a body of theory that attempts to calculate the probability that a system with a certain failure probability will
pass a given number of tests.
Techniques known as “white box” testing can be performed, usually at the modular level.
Statistical methods such as Monte Carlo simulations can be useful in planning "worst case" credible
scenarios to be tested.
J-24