In the aftermath of some high profile national security incidents, investigations starting with the catastrophe or near miss and working backwards have shown a clear series of events or warning signs which if recognised may have allowed the incident to be prevented. The security services have subsequently been criticised for being ‘unable to connect the dots' and intervene appropriately. Similarly, investigations into process safety incidents that work backwards from the loss of containment or other more serious outcome often show a clear series of events or warning signs which if recognised and acted upon may have prevented the incident. In both the security arena and the process industries, however, overlaying the context of other activities, information, and distractions on top of the clear ‘dots' uncovered by the top-down investigation often shows a picture that is much less visible or perhaps virtually unrecognisable. In a process safety incident, factors such as alarm flooding or operator attention being focused on the plant area where the upset is most visible can obscure the warning signs or mask the escalation of the incident.
This paper outlines a number of principles to consider for improving the effectiveness of critical alarms in the challenging real world environment of an operating plant during a major upset or incident. Critical alarms are defined here as those where the alarm coupled with the expected operator response is considered to be a ‘layer of protection' against a major accident hazard scenario. The effectiveness (or otherwise) of a critical alarm is a combination of engineering, organisational, and human aspects including design, ergonomics, training, safety culture, and auditing. Some important factors include:
- Assessment methodologies that clearly highlight when the alarm and associated response would be considered ‘critical', taking into account whether there is sufficient time available for intervention between the alarm and the dangerous occurrence
- Designing the alarm system (instrument, signal processor, annunciator) to meet the required integrity
- Having the alarm appropriately segregated from lower priority plant alarms and displays (e.g. separately annunciated or always visible) to ensure it isn't lost in an alarm flood
- Having pre-determined, documented, and readily available response expectations
- Providing adequate operator training, including an overview of the hazard the alarm protects against, the likely causes, the potential consequences, and the expected response
- Auditing to ensure that the actual response times and actions are reasonable against the assumptions used in the risk assessment
- Having a culture where critical alarms are consistently acted on (as opposed to a culture where the first response is to check the instrument to ensure it isn't faulty, for example)
- Checks that the alarms are not rendered ineffective by coming in spuriously or under non-hazardous conditions, resulting in their being ignored during a real incident
In an operating company, being able to meet these success factors implies a cohesive process safety management system where the design processes, training systems, operating teams, audit programmes, and feedback mechanisms are clearly connected and the outputs from one organisation or activity become the inputs for those downstream. It also important that the organisation design provides individuals who are in a position to see the entire range of activities and ensure that the interactions are robust and effective.