(476h) Training and Reformulating Neural Network Surrogate Models for Optimization | AIChE

(476h) Training and Reformulating Neural Network Surrogate Models for Optimization

Authors 

Tsay, C. - Presenter, Imperial College London
Thebelt, A., Imperial College London
Detailed mathematical models are often complex and difficult to embed in process optimization, motivating the use of simpler, “surrogate” models. Data-driven surrogate modeling has been of particular interest [1-2], wherein a model that is tractable/amenable to optimization is fitted to samples obtained from a more computationally expensive, high-fidelity model. Neural networks (NNs) are the surrogate model form of choice for many applications, owing to their ability to represent complex functions well, scalability to high-dimensional problems, and accessibility via many open-source software tools [3-5]. In optimization, NNs are commonly represented as a nonlinear program [6-7], or a mixed-integer linear program [8-9] (depending on the type of NN).

This work describes two advancements in optimization using neural network surrogate models: First, to construct reduced-order models from a low volume of data, NNs can be trained in Sobolev spaces, improving their performance in gradient-based optimization; Second, for mixed-integer programming applications, NNs can be encoded with tighter relaxations (compared to widely used formulations that rely on a technique known as “big-M”), improving their performance in branch-and-bound global optimization.

Strategy (1) is based on quantifying the performance of NN models in terms of both prediction accuracy and accuracy of derivatives to arbitrary degree during model training [10]. We examine how these targets can be systematically scaled during NN training, and we find that this strategy improves the accuracy of surrogate-model-based optimization, in terms of deviation from the true optimum. Results are presented for both black-box and grey-box optimization studies, including optimization of prototypical chemical separation process models [11].

Strategy (2) is based on partitioning the inputs to each node in a trained NN model and forming the convex hull (i.e., the tightest possible formulation) over the resulting partitions via a technique known as disjunctive programming [9,12]. We present computational results on challenging “verification” problems, which examine the worst-case performance of NN models and are important for, e.g., safety guarantees in process control applications. The results show that our formulations balance model size and tightness, leading to significant improvements in performance compared to existing formulations.

References:

[1] Bhosekar, A., & Ierapetritou, M. (2018). Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering, 108, 250-267.

[2] Tsay, C., & Baldea, M. (2019). 110th anniversary: using data to bridge the time and length scales of process systems. Industrial & Engineering Chemistry Research, 58(36), 16696-16708.

[3] Zhang, D., Del Rio‐Chanona, E. A., Petsagkourakis, P., & Wagner, J. (2019). Hybrid physics‐based and data‐driven modeling for bioprocess online simulation and optimization. Biotechnology and bioengineering, 116(11), 2919-2930.

[4] Eason, J., & Cremaschi, S. (2014). Adaptive sequential sampling for surrogate model generation with artificial neural networks. Computers & Chemical Engineering, 68, 220-232.

[5] Kim, S. H., & Boukouvala, F. (2020). Surrogate-based optimization for mixed-integer nonlinear problems. Computers & Chemical Engineering, 140, 106847.

[6] Henao, C. A., & Maravelias, C. T. (2011). Surrogate‐based superstructure optimization framework. AIChE Journal, 57(5), 1216-1232.

[7] Schweidtmann, A. M., & Mitsos, A. (2019). Deterministic global optimization with artificial neural networks embedded. Journal of Optimization Theory and Applications, 180(3), 925-948.

[8] Grimstad, B., & Andersson, H. (2019). ReLU networks as surrogate models in mixed-integer linear programs. Computers & Chemical Engineering, 131, 106580.

[9] Tsay, C., Kronqvist, J., Thebelt, A., & Misener, R. (2021). Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. arXiv preprint arXiv:2102.04373.

[10] Tsay, C. (2021). Sobolev-trained neural network surrogate models for optimization. Submitted

[11] Schweidtmann, A. M., Bongartz, D., Huster, W. R., & Mitsos, A. (2019). Deterministic global process optimization: flash calculations via artificial neural networks. In Computer Aided Chemical Engineering (Vol. 46, pp. 937-942). Elsevier.

[12] Kronqvist, J., Misener, R., & Tsay, C. (2021). Between steps: Intermediate relaxations between big-M and convex hull formulations. International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR). Accepted, preprint arXiv:2101.12708.