Maximising the Value of Existing Process Safety Data: Three Case Studies from Empirisys | AIChE

Maximising the Value of Existing Process Safety Data: Three Case Studies from Empirisys

Vast amounts of data are being under-utilised in high-hazard industries. At Empirisys, we combine our process safety and data science expertise to maximise the value of existing data, ultimately to support - and not replace - decision-making to manage risk. The following three case studies exemplify our innovative approach: Detect connects operational and organisational data sources to measure and track Performance Influencing Factors (PIFs); Boost uses Large-Language Models (LLMs) to streamline and increase the utility of safety observations; and our enhanced barrier model offers new understanding of operational risk through data modelling and visualization.

Detect: Tools to Avoid Human Error

Detect is an application which connects to operational and organisational data sources, including control of work, maintenance, and incident applications. Detect is being developed in collaboration between Empirisys and BP and has been extensively reviewed and tested by industry experts and academic research. Detect uses a machine learning algorithm to identify underlying causes contributing to incidents, commonly known as Performance Influencing Factors (PIFs).

PIFs are conditions that influence an individual or team's performance in safety-critical environments. PIFs can encompass a wide range of elements, including safety culture, leadership, equipment reliability, training, procedures, communication, and many more. The goal of Detect is to help users to understand and manage these factors that are essential to ensuring safe and efficient operations in high-risk industries. Each PIF is calculated by averaging the score of their respective sub-components, which we call Markers. Each PIF score is built up of these individual Markers. The relevance and weight of each Marker to a specific PIF is calculated through a machine learning algorithm, which associates the impact of Markers on the likelihood of incidents by analysing past incidents.

The strength of Detect comes from the fact that PIFs and Markers can easily be monitored in the same tool. The machine learning algorithm helps to determine which Markers are of interest (i.e., there is a relationship between them and incidents at the same asset), and which Markers have relatively little impact. This ensures that only relevant data is monitored, and Markers that are useful indicators of incidents are brought into focus. It is important to note that a causal relationship between the markers and incidents is not assumed. The algorithm developed allows for greater understanding of the relationship between each marker and incident occurrence. This relationship is rarely linear; therefore, a RAG (Red, Amber, Green) status is utilised to indicate at what values of a marker incidents are more likely to happen. This facilitates monitoring of each Marker in a bespoke, statistically validated manner, rather than using a one-size fits all approach.

Boost: Utilising NLP & AI to extract insight from observation datasets

Boost is an AI enhanced Observation collection, analysis and reporting tool developed by Empirisys to enhance the value of observations. Boost ingests observation data and provides a prediction for relevant supplementary information using a combination of machine learning and large-language models. The analytical modelling pipeline ensures that consistent, complete, high-quality observation data can be utilised as a process safety indicator.

Core functionality of Boost was presented at the Global Conference on Process Safety and Big Data in 2023. This presentation will explore a selection of analytical techniques to maximise data value and utilization that have been developed in the elapsed months. Boost development has taken a data-centric approach to improve data quality of available observation data. This, combined with considered selection of appropriate machine learning techniques, has improved the quality of analysis insight from the existing data.

Data-centric model development focuses on increasing the quality of the input to a model to improve performance, rather than the ‘traditional’ machine learning approach of attempting to devise a model that can ‘replicate’ a training dataset. This approach has been applied to a supervised machine learning classification task and has increased classification accuracy significantly. To translate this approach to the broader classification tasks that utilise Large Language Models (LLMs), Empirisys are supervising a Master’s project topic focused on identifying measures of intrinsic observation quality and reliability. For a subjective, text-based classification task, understanding the intrinsic quality of the input data will guide selection of targeted analytical techniques and elucidate valuable information that can be used to encourage high-quality observation submission and accurate analysis. This project utilises a dataset of observations labelled by subject matter experts, in a close collaboration between process safety experts and data scientists. Pertinent findings will be included in the presentation.

In addition, an alternative view of the observation data has been developed: unsupervised topic modelling to identify clusters of similar text with no prior injection of industry knowledge. This technique uses LLM text embeddings to identify patterns within observation datasets as a retrospective analysis. This is used to find similar observations, validate other classification tasks and aggregate information into meaningful sectors. Surfacing insight from observation data in this manner highlights the value that can be generated from existing data sources.

Barrier Model: The Power of Data Modelling & Visualization

Evidently, there are beneficial use cases for AI to unlock hidden insight. However, AI is one of many tools at the disposal of the data scientist and used only when it adds value. The enhanced hardware barrier model, developed in collaboration with Anasuria Operating Company (AOC), shows how restructuring data and visualising this in a new, intuitive way can produce great results.

The objective was to create useful insight from existing process safety data to support decision-making to manage risk. The work was split into two phases. In Phase 1, performance standards (SECEs) were mapped to Major Accident Hazards (MAHs) using the AOC safety case. In Phase 2, equipment was mapped to SECEs and MAHs. After a data quality assessment, risk criteria were established for Process Safety Performance Indicators (PSPIs) before visualising the barrier model insights in Power BI.

Management of process safety requires understanding of the status of people, process and plant. The focus of this work was on process and plant. Existing PSPIs were used to measure the integrity of hardware barriers and SECEs, as well as MAHs and their related Quantitative Risk Assessment (QRA) Events. Vulnerabilities to people were understood via AOC’s competence management processes.

Prior to engagement with Empirisys, AOC monitored PSPIs, however only at the barrier level. In addition, as PSPI data spanned several disparate databases, there was no facility to drill-down into individual records to better understand identified risks. The aim was to build on the use of existing PSPIs to create more granularity and insight. The approach was to design a data model with PSPIs recorded at the lowest level, i.e. by equipment, and aggregate upwards. This was achieved by mapping equipment to SECEs and QRA Events based on the equipment’s function, parent system and physical location on the asset. The mapping of equipment to Barriers and MAHs could then be inferred. Crucially, all insights had direct line of sight back to the source data to support decision-making.

For the PoC, our scope included two MAHs and a subset of dynamic1 hardware barriers. Further work is ongoing to scale up the enhanced barrier model and integrate the data with the GDi Vision tool to provide a risk-informed digital walkthrough of the asset.

The three case studies demonstrate some of the additional insights and outputs that are possible to uncover from existing operational data. Joining datasets together and visualising them in an intuitive way with our Proof-of-Concept enhanced barrier model, we were able to give our client, AOC, a more granular and useful view of their MAH, SECE, and QRA data. With our Detect tool, we, in partnership with BP, were able to model the relationship of different datasets with process safety events, enabling users to focus in on the Performance Influencing Factors that were most relevant in a process safety context. Finally, we leveraged the power of large language models to enhance the quality and insight gained from observations. Data science offers a wide range of techniques, from data mapping all the way to advanced AI models. We apply the right methodology and technique to the right data and use case to open the door to a new level of insight. While doing this, we maintain direct line-of-sight to the ultimate purpose of our projects and product development: to ensure the safety of the industry and every person working in it.

Topics