(393a) A Metaheuristic Approach to Best Subset Selection for the Development of Regression-Based Surrogate Models

Conference

AIChE Annual Meeting

Year

2018

Proceeding

2018 AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Data Driven Modeling and Decision Making

Time

Tuesday, October 30, 2018 - 3:30pm to 3:49pm

Authors

Sarwar, O. - Presenter, Carnegie Mellon University

Sahinidis, N., Carnegie Mellon University

Learning algebraic models from data is becoming an increasingly popular method to model complex chemical engineering systems. This trend is driven by the difficulty of constructing and solving physics-based models, increases in available system and experimental data, and improved strategies for regression. In this work, we focus on predicting the value of a single output variable using a linear combination of features. These features include the regressor variables as well as nonlinear transformations of the regressors.

Well-known strategies for linear regression include Forward/Backward Selection, Ordinary-Least Squares, and Best Subset Selection (BSS). BSS seeks to select coefficient values for the features as to minimize the least-squares error between the actual and predicted values of the output while also penalizing the inclusion of features in the objective. This task can be posed as a mixed-integer optimization (MIO) problem and leads to simple models with good accuracy. However, as the amount of data points and the number of potential features grows, the resulting combinatorial explosion makes the MIO problem intractable to solve in a reasonable amount of time.

In this paper, we present a metaheuristic-based algorithm for solving the Best-Subset-Selection problem. Our strategy includes a stochastic global search and a deterministic local search. We begin by defining neighborhoods in the combinatorial space whose sizes are based-on the Hamming distance from a current point. Within this neighborhood, we randomly select a new point in the combinatorial space and optimize the continuous variable. Then, we perform a local search in the neighborhood of the current point. The local search is done for small neighborhoods via exhaustive enumeration of combinatorial solutions and application of well-known updates for the coefficient vector or by using a commercial solver for large neighborhoods. If a better solution is found, the local search is re-centered at the current best point. If no new solution is found, the neighborhood size is increased until the maximum neighborhood size is reached and the algorithm moves back into the global search phase. We demonstrate our approach on a wide range of test data and show that our approach yields accurate models in a tractable manner.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(393a) A Metaheuristic Approach to Best Subset Selection for the Development of Regression-Based Surrogate Models

AIChE Annual Meeting

2018

2018 AIChE Annual Meeting

Computing and Systems Technology Division

Data Driven Modeling and Decision Making

Tuesday, October 30, 2018 - 3:30pm to 3:49pm

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams