(33c) An Open-Source Tool for Implementing and Comparing Sparse Regression Methods

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Software Tools and Implementations for Process Systems Engineering

Time

Monday, November 16, 2020 - 8:30am to 8:45am

Authors

Sarwar, O. - Presenter, Carnegie Mellon University

Sahinidis, N. - Presenter, Carnegie Mellon University

Hubbs, C. D., The Dow Chemical Company

Using data to predict the value of an output variable (response) from a set of input variables (regressors), data-derived regression models have become popular across disciplines in science and engineering. Linear regression surrogates are powerful substitutes for physics-based models because they allow complex processes to be represented by simple equations that can be derived quickly. Sparse regression is a model-building paradigm that assumes that the regressor can be predicted by only a few of the many regressor and penalizes overly-complex models. Sparse linear regression is useful in engineering because the measured regression variables must be nonlinearly-transformed into many more regressors in order to effectively capture complexityâ€”leading to high-dimensional problems. There are many algorithms for regression including subset selection methods, Lasso-based methods, nonconvex penalties like MCP/SCAD, and others. Unfortunately for practitioners, there is little guidance available for choosing between methods and the process of regression is usually trial-and-error.

In this work, we systematically study various types of data and problem settings to help users pick a regression method for practical application. To the extent possible, we connect empirical results to the theory. We start by building upon some previous work that assumes that the underlying model is linear and then compares sparse linear regression methods in their ability to recover the true feature set without selecting many irrelevant features, and to accurately predict. We then focus on the case where the underlying model is not assumed linear, as is the case in many engineering applications. First defining performance metrics, we then examine many different problem settings to see which sparse regression method performs best. These experiments are done with synthetic and real data.

Finally, we want users to be able to compare regression methods for their own application in a single step. Consequently, we release a framework for regression comparison and general model-building in an open-source Python package that aggregates numerous popular regression methods (sparse linear-regression algorithms and others, as well), feature-engineering methods, and dimensionality reduction-techniques into a single common, easy-to-use interface. The intentions of this package are to allow users, in one step, to: (1) Take their own data and easily build, compare, and pick models from various different methods, (2) Investigate the relative performance of various methods using synthetic data, and (3) Benchmark their novel regression methods against those in the package.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: November 2024

CEP: October 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(33c) An Open-Source Tool for Implementing and Comparing Sparse Regression Methods

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Computing and Systems Technology Division

Software Tools and Implementations for Process Systems Engineering

Monday, November 16, 2020 - 8:30am to 8:45am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams