(753f) Randomized Rounding for Best Subset Selection Regression

Conference

AIChE Annual Meeting

Year

2019

Proceeding

2019 AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Data-Driven and Hybrid Modeling for Decision Making

Time

Friday, November 15, 2019 - 9:35am to 9:54am

Authors

Sarwar, O. - Presenter, Carnegie Mellon University

Sahinidis, N. V., Carnegie Mellon University

Learning algebraic models from data is becoming an increasingly popular method to model complex chemical engineering systems [1]. This trend is driven by the difficulty of constructing and optimization physics-based models, increases in available system and experimental data, and improved strategies for regression. In this work, we focus on predicting the value of a single output variable using a linear combination of features. These features include the original regression variables as well as nonlinear transformations of the regressors.

Well-known strategies for linear regression include Forward Stepwise Selection, the LASSO, and Best Subset Selection (BSS) [2]. BSS seeks to features to minimize the sum of squared error between the actual and model-predicted values of the output while also penalizing the inclusion of features in the objective. This task can be posed as a mixed-integer quadratic program (MIQP) and leads to models with low complexity and good accuracy [3, 4]. In order to capture the nonlinearities in the data-generating function, it is necessary to consider a wide-variety of transformations of the original variables. However, as the number of considered variables grows, the resulting combinatorial explosion makes the MIQP problem intractable to solve in a reasonable amount of time [5].

In this work, we propose a new strategy to construct a solution to the Best Subset Selection MIQP. Our algorithm cleverly applies a randomized approach to rounding the continuous relaxation of the integer variables. The coefficients of the variables in the active set can then be quickly calculated using basic linear algebra techniques. The resulting algorithm is orders of magnitude faster than solving large problems to global optimality via the MIQP and results in solutions that are near-optimal for a wide-variety of synthetic and real data.

Additionally, we propose a procedure that incorporates our heuristic strategy into a larger framework that uses insights gained from the approximate solutionâ€”along with work done in the statistics community to solve convex quadratic programsâ€”to develop a faster branch-and-bound strategy to solving the Best Subset Selection problem to optimality.

Cozad , N. V. Sahinidis , and D. C. Miller. Learning surrogate models for simulation-based optimization. AIChE Journal , 60(6):2211â€“2227, 2014.
Tibshirani. Regression Selection and Shrinkage via the Lasso. Journal of the Royal Statistical Society, 58(1):267â€“288, 1996.
Cozad, N. V. Sahinidis, and D. C. Miller. A combined first-principles and data-driven approach to model building. Computers & Chemical Engineering, 73:116â€“127, 2015.
T. Wilson and N. V. Sahinidis. The ALAMO approach to machine learning. Computers & Chemical Engineering, 106:785â€“795, 2017.
Bertsimas, A. King, and R. Mazumder. Best Subset Selection via a Modern Optimization Lens. The Annals of Statistics, 44(2):813â€“852, 2015.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(753f) Randomized Rounding for Best Subset Selection Regression

AIChE Annual Meeting

2019

2019 AIChE Annual Meeting

Computing and Systems Technology Division

Data-Driven and Hybrid Modeling for Decision Making

Friday, November 15, 2019 - 9:35am to 9:54am

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams