(370s) GPU Parallel Forward and Backward Selection for Model Building

Conference

AIChE Annual Meeting

Year

2019

Proceeding

2019 AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Interactive Session: Data and Information Systems

Time

Tuesday, November 12, 2019 - 3:30pm to 5:00pm

Authors

Sauk, B. - Presenter, Carnegie Mellon University

Sahinidis, N. V., Carnegie Mellon University

As data sets have become larger and richer, there has been an increase in the use of data-driven approaches to model complex systems. To answer questions about next generation technologies, there is a desire to incorporate more complexity into models, which has been driven by an increase in data availability, and improved model building strategies. However, it is not always obvious how to express an output variable in relation to a series of measurable parameters. In this work, we relate an output variable to a linear combination of features, where features include both input variables as well as nonlinear transformations of the variables.

This feature selection problem is solved by known techniques like best subset selection. There are many approaches used to solve the best subset selection problem, such as branch-and-bound [4], forward and backward selection [3], the lasso [5], and mixed-integer optimization [1, 2]. As the number of data points and the number of features we consider grows, we are able to generate more accurate models at a much larger computational cost.

In this work, we address the problem of how to efficiently develop accurate models by solving the best subset selection problem with forward and backward selection algorithms with GPU computing. We begin by parallelizing the forward and backward selection algorithm. In each iteration, a number of linear least squares problems needs to be solved to identify the next best model. Fortunately, each of these linear least squares problems can be solved simultaneously, allowing us to take advantage of a batched problem structure to speed up the algorithms. We demonstrate that our GPU parallel version of the algorithm is five times faster than an equivalent CPU version.

References

[1] D. Bertsimas, A. King, and R. Mazumder. Best subset selection via a modern optimization lens. The Annals of Statistics, 44:813â€“852, 2016.

[2] A. Cozad, N. V. Sahinidis, and D. C. Miller. Automatic learning of algebraic models for optimization. AIChE Journal, 60:2211â€“2227, 2014.

[3] M.A. Efroymson. Multiple regression analysis. Mathematical methods for digital computers, pages 191â€“203, 1960.

[4] C. Gatu and EJ Kontoghiorghes. Branch-and-bound algorithms for computing the best-subset regression models. Computational and Graphical Statistics, 15:139â€“156, 2006.

[5] R. Tibshirani. Regression shrinkage and selection via the lasso. Royal Statistical Society, Methodological:267â€“288, 1996.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(370s) GPU Parallel Forward and Backward Selection for Model Building

AIChE Annual Meeting

2019

2019 AIChE Annual Meeting

Computing and Systems Technology Division

Interactive Session: Data and Information Systems

Tuesday, November 12, 2019 - 3:30pm to 5:00pm

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams