(578b) Surrogate-Based Optimization Using Random Forests

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Data Driven Optimization

Time

Wednesday, November 18, 2020 - 8:15am to 8:30am

Authors

Zeng, Z. - Presenter

Williams, B., Auburn University

Cremaschi, S., Auburn University

Keywords: mixed-integer linear problem, random forest, solution approaches

The random forest algorithm has been used successfully as a general method for classification and regression since its introduction in 2001 [1]. Random forest models consist of a large number of individual uncorrelated decision trees as an ensemble and generate their predictions using the expected value of individual decision tree predictions [1]. It has been shown that these models can effectively fit nonlinear and complex interactions of input features [2]. We propose that random forests can also be used successfully in surrogate-based optimization to approximate models without closed analytic forms. When used to express the objective function of an optimization problem, the random forest models yield mixed-integer linear programs (MILPs), which allows the use of existing powerful MILP solvers [3]. However, the corresponding MILPs are generally large and require significant computational effort to solve to optimality, especially for random forest models with a large number of decision trees.

In this contribution, we examine surrogate-based optimization of nonlinear programming problems employing random forest models as surrogates. We introduce a MILP for a random forest model and specifically focus on developing solution approaches that rely on the decomposition of the original MILP exploiting its unique structure. The MILP of random forest contains the complicating variables x_i, representing the same input value in dimension i for all individual uncorrelated decision trees. We introduce additional variables z_i,t as the input values for each tree t in dimension with the equality x_i= z_i as enforcing constraints. By removing the enforcing constraints, we can fully decompose the original MILP of the random forest into each tree MILP subproblems that can be solved simultaneously.

We developed five different decomposition approaches that are inspired by Sample Average Approximation (SAA), Lagrangian decomposition with cutting plane method (LD), Progressive Hedging (PH), Alternating Direction Method of Multipliers (ADMM), and Benders Decomposition (BD). The first approach, SAA, generates dual bounds for the original MILP by removing the enforcing constraints and solving each subproblem, and it obtains primal bounds using a heuristic. The second approach, LD, generates dual bounds by dualizing enforcing constraints to the objective function Î»_i,t(x_i-z_i,t) with Lagrangian multipliers Î»_i,tâˆˆ R. The third and fourth approaches, PHLD (PH with LD) and ADLD (ADMM with LD), introduce nonlinear terms Ï/2(x_i-z_i,t)² with penalty factor Ï>0 in the objective function to further regulate the convergence of x_i=z_i,t. To reformulate the objective function as separable, PHLD fixes the x_i and obtains its value by enforcing z_i,t from each subproblem, while the ADLD generates and jointly solves two subproblems containing only variables x_i or variables z_i,t in either subproblem. Both PHLD and ADLD utilize LD to generate their dual bounds with fixed Lagrangian multipliers Î»_i,t calculated based on weight Ï. Unlike the LD, SAA, PHLD, and ADLD, the Benders decomposition addresses the complicating variables x_i and decomposes the original MILP into a master problem (MP) that chooses the x_i, and subproblems that determine the leaf node selection of each tree for a given solution x_i from the master problem.

The original problem with the ensemble of trees can be decomposed into each decision tree subproblem with the developed approaches. To address and reduce the impact of each individual treeâ€™s error, we also investigate clustering strategies using four of the decomposition approaches, which we refer to as SAA-group, LD-group, PHLD-group, ADLD-group. Instead of considering a splitting variable representation, clustering strategies consider a set of tree clusters when decomposing the original tree ensemble. All the above solution approaches are implemented in Pyomo 5.6.6 and Python 3.6.4 and will be released in our groupâ€™s GitHub page (https://github.com/CremaschiLab).

We applied the developed approaches (SAA, LD, PHLD, ADLD, BD, SAA-group, LD-group, PHLD-group, ADLD-group) to solve the MILPs of 95 trained RF models. These RF models were developed to approximate the functions from Virtual Library of Simulation Experiments [4]. The functions in the library are grouped by shape, which includes five main categories: multi-local minima (25 functions), bowl-shaped (31 functions), plate-shaped (nine functions), valley-shaped (12 functions), and other-shaped (18 functions) which contains functions that do not fit into the other four categories. The computational experiments revealed that the developed approaches were able to obtain optimality gaps for functions such as Sphere functions, Ellipsoid functions, and Trid functions within a few iterations. The approaches that utilize cluster strategies (SAA-group, LD-group, PHLD-group, ADLD-group) were able to obtain significantly smaller relative gaps compared to original approaches.

References

Breiman L (2001) Random forests. Mach Learn 45:5â€“32.
Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.
Biggs, M., & Hariss, R. (2018). Optimizing objective functions determined from random forests. Available at SSRN 2986630.
https://www.sfu.ca/~ssurjano/optimization.html

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(578b) Surrogate-Based Optimization Using Random Forests

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Computing and Systems Technology Division

Data Driven Optimization

Wednesday, November 18, 2020 - 8:15am to 8:30am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams