C5. Best practices for proper alarm management
C5. Best practices for proper alarm management
management
8/7/2024
Process alarms are critical safeguards put into place on operating units to enable
operators to avoid incidents that impact a company’s safety, reliability and bottom line. It
is absolutely essential to any safe facility operation that alarms are properly managed
and utilized to address real and consequential operational issues before they become
incidents.
How can alarm management performance be evaluated? For example, some top
performing line operations managers request a snapshot of the alarm screen each
morning with an explanation of the cause of any active alarms and what has been done
to address the issues. Unfortunately, this can result in unintended consequences, as
operators will take purposeful actions to reduce or eliminate alarms by deactivating
them, thus giving managers a very small non-representative list of active alarms each
day—with some having zero active alarms—which does not reflect the actual state of
operations.
Some of these top performers are mistakenly seeing their desired “performance”
achieved because their console operators put problem alarms into a “shelved,”
“bypassed,” “inhibited” or other “named state” that removes them from an easily
monitored condition and effectively removes them as safeguards for serious incidents.
Some console operators do this to make it easier for them to operate without
distractions. Most do not understand that the safety of their operations is at risk when
they disable critical alarms, regardless of their reasoning. To combat this issue, most
consoles have timers to prevent the alarm “bypass” from remaining active beyond the
current shift but that allow it to be extended easily by the oncoming console operator by
“selecting all” and authorizing the extension. This is an unacceptable process.
Case study. The company senior vice president (SVP) of operations had cause to visit
a floating production storage and offloading (FPSO) facility. The offshore installation
manager (OIM) had prepared an agenda, but it was approaching 6 pm and the OIM had
planned a dinner with the leadership team. However, after disembarking the helicopter
and being taken through the health, safety and environmental (HSE) briefing, the SVP
immediately donned his personal protection equipment (PPE) and requested a visit to
the control room. There were some puzzled looks on the faces of the crew, but the SVP
was clearly setting an example of good, strong leadership. Entering the control room,
his first request was to see the inhibits and overrides register, where he randomly
selected a few items and posed several questions as to why these inhibits had been in
place for so long. This was a great demonstration of a senior executive who was well-
versed in operations and, more importantly, operational risk.
Proper alarm management. Good leaders “walk the talk,” where they make clear their
expectations to all stakeholders—especially front-line supervisors—and make very clear
the consequences in the event of non-compliance when managing risk.
Monitoring and action by leadership and support organizations to address the root
causes of excessive and nuisance alarms are required to ensure safe and incident-free
operation. It is critical to have the alarms that are “inhibited” or “shelved” capture daily
via metrics to allow management to understand the current liabilities and to reinforce the
actions that impact alarm management (FIG. 1). Many companies have added a
distributed control system (DCS) report each morning that addresses each point that
has had its alarm status manually changed, called a point attribute report (PAR). This
report, combined with the alarm screen shot report or similar indicator, is a better
measure of proper alarm management.
Proper alarm management that enables operators to avoid serious incidents includes an
alarm management strategy comprising the following:
Console 1 shows good performance until the last two days of the month: a major upset
occurs on Day 30, with alarm rates well beyond what could be managed. A detailed
review of the upset is warranted, with modifications needed and made to make the
response manageable for future events.
Consoles 2 and 3 indicated the best performance of the group and would be considered
good performance, assuming that these are realistic reflections of actual alarms without
inhibition.
Console 4 showed issues mid-month for 8 d. Understanding what caused the elevated
alarm conditions is warranted with corrections made to prevent recurrence.
Consoles 5 and 6 required major alarm management reviews to correct their ongoing
alarm overload performance. A steady diet of alarms every few minutes is not a
reasonable workload for console operators and will prevent the safe and reliable unit
performance that comes with steady-state operation. There is low probability that the
advanced controllers are properly enabled with adequate freedom to keep the unit in
stable and optimal operation.
Today’s operations require a much more sophisticated evaluation of current alarm
conditions with the limited view available for those other than the control operator. Some
companies include a screen dedicated to critical alarms as part of their DCS screen
distribution, which allows ready access for the console operator, as well as technical
support and line management.