(54a) Regression Strategies for Large Data Sets (poster version)
AIChE Spring Meeting and Global Congress on Process Safety
2017
2017 Spring Meeting and 13th Global Congress on Process Safety
Spring Meeting Poster Session and Networking Reception
Big Data Analytics Poster
Monday, March 27, 2017 - 5:00pm to 7:00pm
The quantity of data that must be analyzed in pursuit of the sought inferences is invariably large, consisting of hundreds or thousands of quantities sampled at many millions of points in time. It is seldom possible to process data sets of this size in computing resource memory.
A number of frameworks for partitioning large data sets have been developed and popularized, though adoption by industrial companies has been notably slower than by IT-centric enterprises. This paper endeavors to demystify the processing of out-of-memory data for an engineering audience.
Regression, which is fundamental in data analysis, is used to motivate use of the techniques. This paper begins with computing simple statistics. Subsequently, some strategies for matrix manipulations (relevant to both direct and iterative methods) are discussed. Finally, an illustrative example of a large data set regression is presented.