(393a) A Metaheuristic Approach to Best Subset Selection for the Development of Regression-Based Surrogate Models
AIChE Annual Meeting
2018
2018 AIChE Annual Meeting
Computing and Systems Technology Division
Data Driven Modeling and Decision Making
Tuesday, October 30, 2018 - 3:30pm to 3:49pm
Well-known strategies for linear regression include Forward/Backward Selection, Ordinary-Least Squares, and Best Subset Selection (BSS). BSS seeks to select coefficient values for the features as to minimize the least-squares error between the actual and predicted values of the output while also penalizing the inclusion of features in the objective. This task can be posed as a mixed-integer optimization (MIO) problem and leads to simple models with good accuracy. However, as the amount of data points and the number of potential features grows, the resulting combinatorial explosion makes the MIO problem intractable to solve in a reasonable amount of time.
In this paper, we present a metaheuristic-based algorithm for solving the Best-Subset-Selection problem. Our strategy includes a stochastic global search and a deterministic local search. We begin by defining neighborhoods in the combinatorial space whose sizes are based-on the Hamming distance from a current point. Within this neighborhood, we randomly select a new point in the combinatorial space and optimize the continuous variable. Then, we perform a local search in the neighborhood of the current point. The local search is done for small neighborhoods via exhaustive enumeration of combinatorial solutions and application of well-known updates for the coefficient vector or by using a commercial solver for large neighborhoods. If a better solution is found, the local search is re-centered at the current best point. If no new solution is found, the neighborhood size is increased until the maximum neighborhood size is reached and the algorithm moves back into the global search phase. We demonstrate our approach on a wide range of test data and show that our approach yields accurate models in a tractable manner.