(47a) Outlier Detection and Analysis in Batch and Continuous Processes | AIChE

(47a) Outlier Detection and Analysis in Batch and Continuous Processes

Authors 

Ryan, P. - Presenter, Response Process Consulting LLC

Title:  Outlier Detection and Analysis in Batch and
Continuous Processes

Author/Presenter:  Peter J. Ryan, Ph.D., P.E.

Author/Presenter email:  peter.ryan@responsepc.com

Company:  Response Process Consulting LLC

Motivation

All manufacturing sectors, continuous
or batch, gather process data for archiving and analysis purposes.  Often the
amount of data gathered, and the quality of data, makes it difficult to use
this resource effectively.  Major drawbacks in the quality of the available
data include:

·        
large gaps in the process data

·        
noise, poor signal-to-noise ratios

·        
correlated data

·        
accuracy

·        
precision

New methods have been
developed to handle these issues and to develop process models based on this
resource.  Specifically, a new approach to handling missing data and
reconstructing quality data based on the observed (archived) data is
presented.  Once the models are developed, outliers can be identified in the
continuous or batch data, and relationships between the Key Process Indicators
(KPI's) and the upstream process variables can be established.

 

Approach

Often, steady state or
dynamic models are available to describe a process.  However, these
first-principle models often do not have the granularity needed to describe
product specifications such as color, turbidity or solvent loss due to
entrainment in a separator.  Machine learning can be used to examine large
historical process data sets and determine the leading controlled variables of
a process.  The machines learning methods first fit the data to a specified
model, and then use the model to explore the data space.  Unsupervised learning
gives the fitting method full ability to determine what is different in the
data.  Supervised learning requires the fitting method to consider both the
archived process data and measured quality data (acquired off-line of the
process).  An example of unsupervised modeling is Principal Component Analysis
(PCA).  An example of supervised modeling is Partial Least Squares (PLS).  Both
of these methods can be used to model the historical process data and find
outliers by plotting the two principal component elements (scores) that capture
the most variability in the data.  The score plots examine the data for
clusters that classify the data as meeting product specifications and not
meeting product specifications.  The clusters that represent production runs
not meeting product specifications can be further examined to discover the
upstream process variables that are the cause of the product not meeting
specification.  While visual inspection has been described, statistical metrics
such as the Squared Prediction Error (SPE) and Hotellings T2 metrics
can be calculated to find the same results in the data.

 

Results

Examples of modeling both
continuous and batch processes are given.  The continuous example is of a
commodity chemical process where color is the product specification of
interest.  The batch example is of a nylon process where the relative viscosity
is the product specification of interest.  While the analysis methods are the
same, one significant difference between handling continuous and batch data is
that the batch data must first be “unfolded” before a model can be developed. 
Once the models are developed, clusters corresponding to successful and
non-compliant products are found.  The non-compliant product clusters are
further examined, and the relationships between the product specification (KPI)
and the upstream process variables causing the non-compliance are discovered. 
Figure 1 shows the results of the scores plot of the continuous commodity
chemical example.  Figure 2 is an example of the scores plot of the batch nylon
example.  In both cases, clusters of production activity where both in-spec and
non-compliant quality production are observed.  Focusing on the batch example,
Figures 3 and 4 show the SPE and Hotelling T2 charts of the initial
process data.  The outliers observed visually in Figure 2 are also detected
numerically by calculating the Hotelling T2 metric.  The Hotelling T2
metric finds points in the scores plot that are on the model plane but far away
from the center-of-mass of the model.  The SPE metric finds points in the
clusters that are away from the model plane.

Contribution charts are used
to discover the relationships between the KPI's and the upstream process
variables, as shown in Figure 5.  Note that in Figure 3 (SPE metric), batch 49
is far away from the model plane, even though its projection is in the cluster
of points that represent batches with good product specification.  The SPE contribution
chart (Figure 5) reveals that deviations in five process variables caused the
batch to product off-spec product.  Specifically, the batch did not meet the
turning points in the prescribed trajectories when the batch reached the 58th
time interval of its run.  An examination of the control system revealed that a
control issue was accountable for missing the turning points.

These examples show how a
very large set of process data – of varying quality (missing data, low
signal-to-noise ratio, correlation, etc) – can be reduced in dimensionality and
how the leading process variables can be identified and related to the
downstream KPI's to improve product quality and consistency.  The model has the
granularity that is typically missing in first-principal models.  The resource
for this method of model building – historical process data – is readily
available but seldom used.  The resource is seldom used for process
optimization and analysis because, without the methods needed to reduce the
data-space dimensionality and cope with missing and correlated data, this
resource presents too much raw, unconditioned information.

 Model developed using continuous historical process data

Batch No. 49,95% Confidence Limit
 Model developed using batch historical process data
 Squared Prediction Error (SPE) of the batch data (Nylon example)
Batches 50 - 55,95% Confidence Limit
 Hotellings T2 metric of the batch data (Nylon example)