(578b) Surrogate-Based Optimization Using Random Forests
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Computing and Systems Technology Division
Data Driven Optimization
Wednesday, November 18, 2020 - 8:15am to 8:30am
The random forest algorithm has been used successfully as a general method for classification and regression since its introduction in 2001 [1]. Random forest models consist of a large number of individual uncorrelated decision trees as an ensemble and generate their predictions using the expected value of individual decision tree predictions [1]. It has been shown that these models can effectively fit nonlinear and complex interactions of input features [2]. We propose that random forests can also be used successfully in surrogate-based optimization to approximate models without closed analytic forms. When used to express the objective function of an optimization problem, the random forest models yield mixed-integer linear programs (MILPs), which allows the use of existing powerful MILP solvers [3]. However, the corresponding MILPs are generally large and require significant computational effort to solve to optimality, especially for random forest models with a large number of decision trees.
In this contribution, we examine surrogate-based optimization of nonlinear programming problems employing random forest models as surrogates. We introduce a MILP for a random forest model and specifically focus on developing solution approaches that rely on the decomposition of the original MILP exploiting its unique structure. The MILP of random forest contains the complicating variables xi, representing the same input value in dimension i for all individual uncorrelated decision trees. We introduce additional variables zi,t as the input values for each tree t in dimension with the equality xi = zi as enforcing constraints. By removing the enforcing constraints, we can fully decompose the original MILP of the random forest into each tree MILP subproblems that can be solved simultaneously.
We developed five different decomposition approaches that are inspired by Sample Average Approximation (SAA), Lagrangian decomposition with cutting plane method (LD), Progressive Hedging (PH), Alternating Direction Method of Multipliers (ADMM), and Benders Decomposition (BD). The first approach, SAA, generates dual bounds for the original MILP by removing the enforcing constraints and solving each subproblem, and it obtains primal bounds using a heuristic. The second approach, LD, generates dual bounds by dualizing enforcing constraints to the objective function λi,t(xi-zi,t) with Lagrangian multipliers λi,t â R. The third and fourth approaches, PHLD (PH with LD) and ADLD (ADMM with LD), introduce nonlinear terms Ï/2(xi-zi,t)2 with penalty factor Ï>0 in the objective function to further regulate the convergence of xi=zi,t. To reformulate the objective function as separable, PHLD fixes the xi and obtains its value by enforcing zi,t from each subproblem, while the ADLD generates and jointly solves two subproblems containing only variables xi or variables zi,t in either subproblem. Both PHLD and ADLD utilize LD to generate their dual bounds with fixed Lagrangian multipliers λi,t calculated based on weight Ï. Unlike the LD, SAA, PHLD, and ADLD, the Benders decomposition addresses the complicating variables xi and decomposes the original MILP into a master problem (MP) that chooses the xi, and subproblems that determine the leaf node selection of each tree for a given solution xi from the master problem.
The original problem with the ensemble of trees can be decomposed into each decision tree subproblem with the developed approaches. To address and reduce the impact of each individual treeâs error, we also investigate clustering strategies using four of the decomposition approaches, which we refer to as SAA-group, LD-group, PHLD-group, ADLD-group. Instead of considering a splitting variable representation, clustering strategies consider a set of tree clusters when decomposing the original tree ensemble. All the above solution approaches are implemented in Pyomo 5.6.6 and Python 3.6.4 and will be released in our groupâs GitHub page (https://github.com/CremaschiLab).
We applied the developed approaches (SAA, LD, PHLD, ADLD, BD, SAA-group, LD-group, PHLD-group, ADLD-group) to solve the MILPs of 95 trained RF models. These RF models were developed to approximate the functions from Virtual Library of Simulation Experiments [4]. The functions in the library are grouped by shape, which includes five main categories: multi-local minima (25 functions), bowl-shaped (31 functions), plate-shaped (nine functions), valley-shaped (12 functions), and other-shaped (18 functions) which contains functions that do not fit into the other four categories. The computational experiments revealed that the developed approaches were able to obtain optimality gaps for functions such as Sphere functions, Ellipsoid functions, and Trid functions within a few iterations. The approaches that utilize cluster strategies (SAA-group, LD-group, PHLD-group, ADLD-group) were able to obtain significantly smaller relative gaps compared to original approaches.
References
- Breiman L (2001) Random forests. Mach Learn 45:5â32.
- Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.
- Biggs, M., & Hariss, R. (2018). Optimizing objective functions determined from random forests. Available at SSRN 2986630.
- https://www.sfu.ca/~ssurjano/optimization.html