(107c) Regression Strategies for Large Data Sets | AIChE

(107c) Regression Strategies for Large Data Sets

Authors 

ABSTRACT for AIChE
2017 Spring Meeting

Regression
Strategies for Large Data Sets

James C Cross III

The MathWorks, Inc.

james.cross@mathworks.com

617-605-5818

Plant operators and engineers are increasingly using
historian data to gain insight into the relationships between and amongst process
parameters and product attributes.  A universal goal is the determination of
the operating conditions that produce the highest product output and/or quality
per unit cost.  Models provide insight, but seldom account for myriad physical
nuances which sometimes exhibit significant influence.

The quantity of data that must be analyzed in pursuit of the
sought inferences is invariably large, consisting of hundreds or thousands of
quantities sampled at many millions of points in time.  It is seldom possible
to process data sets of this size in computing resource memory.

A number of frameworks for partitioning large data sets have
been developed and popularized, though adoption by industrial companies has
been notably slower than by IT-centric enterprises.  This paper endeavors to
demystify the processing of out-of-memory data for an engineering audience.

Regression, which is fundamental in data analysis, is used
to motivate use of the techniques.  This paper begins with computing simple
statistics.  Subsequently, some strategies for matrix manipulations (relevant
to both direct and iterative methods) are discussed.  Finally, an illustrative
example of a large data set regression is presented.