(33c) An Open-Source Tool for Implementing and Comparing Sparse Regression Methods
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Computing and Systems Technology Division
Software Tools and Implementations for Process Systems Engineering
Monday, November 16, 2020 - 8:30am to 8:45am
In this work, we systematically study various types of data and problem settings to help users pick a regression method for practical application. To the extent possible, we connect empirical results to the theory. We start by building upon some previous work that assumes that the underlying model is linear and then compares sparse linear regression methods in their ability to recover the true feature set without selecting many irrelevant features, and to accurately predict. We then focus on the case where the underlying model is not assumed linear, as is the case in many engineering applications. First defining performance metrics, we then examine many different problem settings to see which sparse regression method performs best. These experiments are done with synthetic and real data.
Finally, we want users to be able to compare regression methods for their own application in a single step. Consequently, we release a framework for regression comparison and general model-building in an open-source Python package that aggregates numerous popular regression methods (sparse linear-regression algorithms and others, as well), feature-engineering methods, and dimensionality reduction-techniques into a single common, easy-to-use interface. The intentions of this package are to allow users, in one step, to: (1) Take their own data and easily build, compare, and pick models from various different methods, (2) Investigate the relative performance of various methods using synthetic data, and (3) Benchmark their novel regression methods against those in the package.