Outlier Detection and Analysis in Batch and Continuous Processes
Southwest Process Technology Conference
2015
7th Southwest Process Technology Conference
Southwest Process Technology Conference
Thermodynamics & Process Simulation
Friday, October 2, 2015 - 10:31am to 10:55am
Title: Outlier Detection and Analysis in Batch and
Continuous Processes
Author/Presenter: Peter J. Ryan, Ph.D., P.E.
Author/Presenter email: peter.ryan@responsepc.com
Company: Response Process Consulting LLC
Motivation
All manufacturing sectors, continuous
or batch, gather process data for archiving and analysis purposes. Often the
amount of data gathered, and the quality of data, makes it difficult to use
this resource effectively. Major drawbacks in the quality of the available
data include:
·
large gaps in the process data
·
noise, poor signal-to-noise ratios
·
correlated data
·
accuracy
·
precision
New methods have been
developed to handle these issues in archived process data, and to develop
process models based on this historical data. Specifically, a new approach to
handling missing data and reconstructing quality data based on the observed
(archived) data is presented. Once the models are developed, outliers can be
identified in the continuous or batch data, and relationships between the Key
Process Indicators (KPI's) and the upstream process variables can be
established.
Approach
Often, steady state or
dynamic models are available to describe a process. However, these
first-principle models often do not have the granularity needed to describe product
specifications such as color, turbidity or solvent loss due to entrainment in a
separator. Machine learning can be used to examine large historical process data
sets and determine the leading controlled variables of a process. The machines
learning methods first fit the data to a specified model, and then use the
model to explore the data space. Unsupervised learning gives the fitting
method full ability to determine what is different in the data. Supervised
learning requires the fitting method to consider both the archived process data
and measured quality data (acquired off-line of the process). An example of
unsupervised modeling is Principal Component Analysis (PCA). An example of
supervised modeling is Partial Least Squares (PLS). Both of these methods can
be used to model the historical process data and find outliers by plotting the
two principal component elements (scores) that capture the most variability in
the data. The score plots examine the data for clusters that classify the data
as meeting product specifications and not meeting product specifications. The
clusters that represent production runs not meeting product specifications can
be further examined to discover the upstream process variables that are the
cause of the product not meeting specification. While visual inspection has
been described, statistical metrics such as the Squared Prediction Error (SPE)
and Hotellings T2 metrics can be calculated to find the same results
in the data.
Results
Examples of modeling both
continuous and batch processes are given. The continuous example is of a
commodity chemical process where color is the product specification of
interest. The batch example is of a nylon process where the relative viscosity
is the product specification of interest. While the analysis methods are the
same, one significant difference between handling continuous and batch data is
that the batch data must first be ?unfolded? before a model can be developed.
Once the models are developed, clusters corresponding to successful and
non-compliant products are found. The non-compliant product clusters are
further examined, and the relationships between the product specification (KPI)
and the upstream process variables causing the non-compliance are discovered.
Figure 1 shows the results of the scores plot of the continuous commodity
chemical example. Figure 2 is an example of the scores plot of the batch nylon
example. In both cases, clusters of production activity where both in-spec and
non-compliant quality production are observed. Focusing on the batch example,
Figures 3 and 4 show the SPE and Hotelling T2 charts of the initial
process data. The outliers observed visually in Figure 2 are also detected
numerically by calculating the Hotelling T2 metric. The Hotelling T2
metric finds points in the scores chart that are on the scores plane but far
away from the center-of-mass of the primary cluster. The SPE metric finds
points in the clusters that are away from the scores plane.
A ?Contribution to the SPE?
chart is used to discover the relationships between the KPI's and the upstream
process variables, as shown in Figure 5. Note that in Figure 3 (SPE metric),
batch 49 is far away from the scores plane, even though its projection is in
the cluster of points that represent batches with good product specification.
The SPE contribution chart (Figure 5) reveals that deviations in two
temperatures, two pressures and a flow caused the batch to product off-spec
product. Specifically, the batch did not meet the turning points in the
prescribed trajectories when the batch reached the 65th time
interval of its run. An examination of the control system revealed that a
control issue was accountable for missing the turning points.
These examples show how a
very large set of process data ? of varying quality (missing data, low
signal-to-noise ratio, correlation, etc) ? can be reduced in dimensionality and
how the leading process variables can be identified and related to the
downstream KPI's to improve product quality and consistency. The model has the
granularity that is typically missing in first-principal models. The resource
for this method of model building ? historical process data ? is readily
available but seldom used. The resource is seldom used for process optimization
and analysis because, without the methods needed to reduce the data-space
dimensionality and cope with missing and correlated data, this resource
presents too much raw, unconditioned information.
Checkout
This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.
Do you already own this?
Log In for instructions on accessing this content.
Pricing
Individuals
AIChE Pro Members | $250.00 |
AIChE Graduate Student Members | $250.00 |
AIChE Undergraduate Student Members | $250.00 |
AIChE Explorer Members | $300.00 |
Non-Members | $300.00 |