(54a) Regression Strategies for Large Data Sets (poster version) | AIChE

(54a) Regression Strategies for Large Data Sets (poster version)

Authors 

Plant operators and engineers are increasingly using historian data to gain insight into the relationships between and amongst process parameters and product attributes. A universal goal is the determination of the operating conditions that produce the highest product output and/or quality per unit cost. Models provide insight, but seldom account for myriad physical nuances which sometimes exhibit significant influence.

The quantity of data that must be analyzed in pursuit of the sought inferences is invariably large, consisting of hundreds or thousands of quantities sampled at many millions of points in time. It is seldom possible to process data sets of this size in computing resource memory.

A number of frameworks for partitioning large data sets have been developed and popularized, though adoption by industrial companies has been notably slower than by IT-centric enterprises. This paper endeavors to demystify the processing of out-of-memory data for an engineering audience.

Regression, which is fundamental in data analysis, is used to motivate use of the techniques. This paper begins with computing simple statistics. Subsequently, some strategies for matrix manipulations (relevant to both direct and iterative methods) are discussed. Finally, an illustrative example of a large data set regression is presented.