American comedian Sam Levenson once said, “You must learn from the mistakes of others. You can’t possibly live long enough to make them all yourself.” When it comes to alarm management, Levenson is correct.
Ineffective alarm systems pose a serious risk to safety, the environment, and plant profitability. Too often, alarm system effectiveness is unknowingly undermined by poorly configured alarms. Static alarm settings cannot adapt to dynamic plant conditions, and many other nuisances result in alarm floods that overwhelm operators when they instead need concise direction.
Alarm systems are the primary tool for identifying abnormal situations and helping plant personnel take timely, appropriate action to move their processes back to operational targets. For operators considering undertaking an alarm management program, taking the time to examine common alarming blunders is important to ensure steps are taken to avoid them.
Examining the alarm management process
Before diving into the common mistakes associated with alarm management, it is important to illustrate and review the proper execution path. The overall structure of a successful alarm management process is fundamentally the same across industries, regardless of plant size. This includes benchmark, alarm philosophy, rationalization, implementation, continuous improvement, and maintenance.
Benchmarking and evaluating current performance is the time to identify the most pressing alarm system problems and the biggest opportunities for improvement. This is also the best place to start if it is unclear what actions are required for a successful alarm management program.
With the landscape better understood, it is also important to develop an alarm philosophy document (APD). The APD is not a theory but an engineering document focused on all aspects of the alarm management system. This document should clearly outline key concepts and governing rules for the alarm strategy, such as what constitutes an alarm and what risk categories pertain to site operations. The philosophy should also outline roles and responsibilities and change management procedures and project goals, such as target alarm rates. For reference, the ANSI/ISA-18.2-2009, Management of Alarm Systems for the Process Industries standard identifies all the sections required in the alarm philosophy document and covers other areas (e.g., rationalization).
There are different methods of rationalization, and more than one method may be used to solve alarming issues. The purpose of alarm rationalization is to determine the causes, consequences, and corrective actions for an alarm. Once rationalization is complete, the next step is implementation, which is key to a successful alarm system. Control logic, alarm design, and graphics need to be included in the implementation stage.
Alarms are dynamic and will be affected by process or control changes, and routine performance monitoring helps to identify new opportunities for improvement, such as dynamic alarm strategies. Once the alarm practices are at a comfortable level, they can be integrated into plant workflow to sustain optimized plant performance over the long term.
Even with a proper execution path in place, however, operators are still at risk of committing alarm blunders that can impede a successful alarm management program. By avoiding these common pitfalls in their alarm management programs, operators can continue effectively responding to abnormal situations and better ensure their plants are meeting their operational targets.
Blunder #1: Operations ownership of the alarm system
Often, the operations group in facilities believes alarm issues belong to the controls or instrumentation group since this group’s responsibility is computer system maintenance. It is important for operations to take ownership and realize that the computer control system is also their responsibility, and how it functions is determined by their requirements. It is irresponsible, for instance, to take a car to a mechanic and let them decide on what maintenance needs to be done. Similarly, operations should not expect the maintenance group to resolve the alarm issue. Maintenance will make the required changes, but operations must drive them.
Blunder #2: Missing or incomplete alarm philosophy document
The APD defines all aspects of the alarm system. One reason alarms are out of control is they are not being properly designed or maintained. Failing to establish and document best practices is a recipe for disaster. Guidelines for performing alarm rationalization need to be formulated. For example, an alarm philosophy should include methodology and rules for setting alarms, an alarm review to build commitment and consolidate training, and an audit process to ensure the philosophy is consistently applied. These guidelines will clearly define the criteria for legitimate alarms and the setting of their priorities. These are the backbone of an APD, which acts as a corporate standard to guide the organization’s alarm management initiatives.
Blunder #3: Using the wrong tools
Alarm and event archiving and proper analysis tools are required to ensure that time spent on problem correction delivers maximum return. All alarms should be reviewed to ensure consistent priorities, but it is inefficient, costly, and irresponsible to correct minor nuisances when problems remain that pose serious risk to plant safety. Beyond simple analysis, tools that enable automatic change control, punch-list generation, discrepancy reporting, and project tracking are available. Forethought needs to be given to how alarm information will be used once this knowledge is in a repository, however. Although these tasks can be performed without special software tools, it is not practical to do so. The effort is often so daunting that alarm management initiatives collapse under the weight of their own logistics. It is best to do away with paper trails for change control and spreadsheets posing as master alarm databases.
Blunder #4: Neglecting to benchmark
Benchmarking is vital to any serious improvement initiative. If current performance is not measured, progress cannot be accurately determined. The first step is to keep track of alarm rates for several weeks in order to obtain a baseline measurement. Once that is done, it is important to assess how the plant’s current alarm levels measure up to industry standards. Once benchmarking and assessing current performance is complete, the next step is identifying opportunities for improvement. Below, in order of importance, are the key questions that need to be answered when performing this assessment.
- Is the dynamic (real-time) alarm load acceptable for all operators?
- Does the dynamic alarm prioritization meet industry standards?
- What are the troublesome tags on the system during steady-state operation?
- How does the configured distributed control system (DCS) alarm count compare to standards (alarms per tag)?
- What does the configured alarm distribution look like compared to standards?
Tracking key performance indicators indicates when alarm rates are improving or deteriorating, but the true measurement of a successful alarm management system is the reduction of unplanned outages, safety incidents, and environmental discussions.
Blunder #5: Only tracking alarms
Often, and mistakenly, all required data is not tracked, and tracking only alarms is not sufficient. Alarm rationalization requires more than one type of data. For example, when an alarm occurs, it is necessary to know if an operator actually responded to it. Tracking operator actions is an effective way to identify control problems and automation opportunities and audit the effectiveness of the alarm strategy. If the operator did not respond, it is likely that the alarm is a nuisance alarm. Examine the ratio of operator actions to audible process alarms in order to identify poor alarm strategies. The approach requiring that every alarm expects operator intervention demands that this ratio exceed 2:1.
Other data to track are related to operator actions, including controller setpoint, mode changes, and system errors. If a controller’s mode or output is repeatedly changed, it is a clear sign the loop needs fixing. If action data is coupled with controller performance data, an understanding of the loop’s problems can be quickly diagnosed, saving time. If a controller’s setpoint is frequently changed, and the controller has no supervisory control, then the automation engineer must solve the discrepancy. Installing new automation strategies can free the operator to focus on pushing limits rather than maintaining process stability. In addition, process variable history is important for determining some deadband alarm settings and for performing engineering reviews prior to implementation. It is also worth considering if a loop is poorly performing and the operator is manipulating the output to keep the process in control. Typically, these loops are not a priority, and the work order is placed on the bottom of the maintenance list, yet the operator spends significant time managing this loop.
Blunder #6: Assuming users will read documentation
It is naïve to expect personnel to thoroughly read and examine all manuals and handbooks. In fact, handing operators proper documentation is not a valid substitute for practical training. The easiest way to undermine effective alarm management is to implement a solution without giving personnel the hands-on training they need. This point is perhaps best illustrated with a real-world example: A large petrochemical plant went to great efforts to improve their alarm system performance through alarm rationalization. Once the new settings were designed, changes were uploaded to the control system over the span of 45 minutes. Even though the operator was aware of the rationalization process and that the changes were being made, he did not understand the ramifications of the changes. After the changes, the console was quiet, and he was very concerned that something was wrong with the computer control system or that the changes made were too radical. It took multiple shifts for him to become comfortable knowing that he would get an alarm for an issue with enough time to respond. With such a culture shift, extra time for training should be expected.
Blunder #7: Cutting resource corners
Alarm rationalization is an intensive manpower effort that will yield great results when executed correctly. This being said, facilities try to minimize costs by reducing the alarm rationalization team. If the correct personnel are not in the room, then the determination of alarms will be incomplete. It is disturbingly common for companies to exclude the most important resource from rationalization meetings: the operator. Operators are the end user and the primary stakeholder in alarm optimization. And if the operator is excluded from the rationalization process, the project will fail. Instrument technicians, automation engineers, process engineers, and field operators are not operators. It should be noted that the only person who can be the “operator” is an experienced operator. This person fights alarms and unit problems throughout the day and across shifts. Their knowledge is very valuable during the rationalization process.
Alarm rationalization is the process of applying operational experience to alarm system design. Although operators are the most important participants in this process, they cannot carry this burden alone. Without a facilitator who is familiar with alarm rationalization, the rationalization project will take longer than it should, yield poor results, and most likely have to be repeated.
Finally, alarm rationalization requires an engineering review prior to implementation. This is required to ensure results are consistent with hazard and operability studies (HAZOP) and safety integrity level (SIL) studies. The process, unit, or contact engineer owns this role.
Blunder #8: Incorporating alarm rationalization in incident investigations/HAZOPs/LOPAs
The biggest offender in creating nuisance alarms after rationalization is incident investigations, HAZOPs, and layer of protection analyses (LOPAs). After investing time and effort in rationalizing the alarms, these procedures are allowed to add alarms without proper rationalization because they have identified a safety issue, loss of production, or an environmental excursion. Yet with all the right personnel in the room, at least two topics have likely been covered: causes and consequences. However, it is also important to review corrective action, time to respond, alarm setpoint, and severity. These topics will only add a few more minutes and result in rationalized alarms. The solution is to modify the procedures for incident investigations, HAZOPs, and LOPAs, and incorporate the alarm rationalization when an alarm is defined.
Blunder #9: Adding dynamic alarming too early
The advancement of dynamic alarm techniques has facilities believing that the solution to the alarm problem is through dynamic alarming. It is easier to suppress alarms than to determine the cause and execute the correction. Rationalization needs to be completed prior to applying dynamic alarming and, in most cases, rationalization will resolve the alarming issue itself. Part of the rationalization process is to identify opportunities for dynamic alarming. What needs to be considered is how to maintain the dynamic alarms strategies, which are part of the system and will need to be evaluated when the control strategy or process changes.
Blunder #10: Lack of accountability
Failure to assign roles and responsibility is the most common and most deadly oversight in an alarm management project. Advocate resolving this issue by encouraging “accountability through visibility.” In other words, ensure everyone has access to each other’s project data, which encourages plant personnel to work together. There may be resistance to this, but the end result is improved plant operations. While it may sound intense, such an approach is effective in practice.
It is best to define maintenance tasks and assign responsibilities in the alarm philosophy document. This must be done in a simple manner, both textually and in actual day-to-day practice, to ensure sustained support of the idea. This will also give personnel an opportunity to participate in the system installation and/or verification. They are more inclined to use the new technologies since they have ownership from participating in the initial configuration.
An alarm management program can significantly improve plant safety, reliability, and profitability, but will only succeed if deployed properly. By following the recommended life cycle methodology and avoiding common mistakes, operators will have an effective and successful alarm management program that will undoubtedly ensure plant personnel are more productive, making the plant and operations more reliable.