(370s) GPU Parallel Forward and Backward Selection for Model Building
AIChE Annual Meeting
2019
2019 AIChE Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, November 12, 2019 - 3:30pm to 5:00pm
This feature selection problem is solved by known techniques like best subset selection. There are many approaches used to solve the best subset selection problem, such as branch-and-bound [4], forward and backward selection [3], the lasso [5], and mixed-integer optimization [1, 2]. As the number of data points and the number of features we consider grows, we are able to generate more accurate models at a much larger computational cost.
In this work, we address the problem of how to efficiently develop accurate models by solving the best subset selection problem with forward and backward selection algorithms with GPU computing. We begin by parallelizing the forward and backward selection algorithm. In each iteration, a number of linear least squares problems needs to be solved to identify the next best model. Fortunately, each of these linear least squares problems can be solved simultaneously, allowing us to take advantage of a batched problem structure to speed up the algorithms. We demonstrate that our GPU parallel version of the algorithm is five times faster than an equivalent CPU version.
References
[1] D. Bertsimas, A. King, and R. Mazumder. Best subset selection via a modern optimization lens. The Annals of Statistics, 44:813â852, 2016.
[2] A. Cozad, N. V. Sahinidis, and D. C. Miller. Automatic learning of algebraic models for optimization. AIChE Journal, 60:2211â2227, 2014.
[3] M.A. Efroymson. Multiple regression analysis. Mathematical methods for digital computers, pages 191â203, 1960.
[4] C. Gatu and EJ Kontoghiorghes. Branch-and-bound algorithms for computing the best-subset regression models. Computational and Graphical Statistics, 15:139â156, 2006.
[5] R. Tibshirani. Regression shrinkage and selection via the lasso. Royal Statistical Society, Methodological:267â288, 1996.