(61aa) Adaptive Real-Time Exploration and Optimization for Safety-Critical Industrial Systems: The Arteo Algorithm | AIChE

(61aa) Adaptive Real-Time Exploration and Optimization for Safety-Critical Industrial Systems: The Arteo Algorithm

Authors 

Liu, T., Rensselaer Polytechnic Institute
Korkmaz, B. S., Imperial College London
Zagorowska, M., ETH Zurich
Real-time optimization plays a key role in improving energy efficiency and the operational effectiveness of industrial systems. However, due to the complexity of many industrial systems, their characteristics are often unknown, or change with time, which imposes a significant challenge to traditional model-based real-time optimization approaches [1]. To increase confidence in sequential optimization under uncertainty, a typical solution is to collect process data from different operating points through exploration, and then use machine learning models to capture the unknown plant characteristics. Considering the cost of exploration such as not delivering the expected demand from the system during exploration, there is a trade-off between exploring more and having more accurate estimations or exploiting collected observations and optimizing the obtained function. This trade-off is well-studied in the literature and multi-armed bandit approaches are suggested as a solution for optimally balancing exploration and exploitation [2].

In safety-critical systems, it is not possible to do exploration in some parts of the decision space due to safety concerns, which are typically modelled as safety constraints in optimization. Many industrial systems fall under the safety-critical systems due to their risk of danger to equipment, leading to substantial economic loss, or causing severe environmental damage. Therefore, if we want to approximate the unknown characteristics and optimize a process in this plant, we need to utilize safe exploration algorithms which allow only the exploration of feasible decision points by enforcing safety constraints.

The safe exploration problem is well-studied in the literature. In [3], the authors have introduced the SafeOpt algorithm and showed it is possible that safely optimize a function with an unknown functional form by creating and expanding a safe decision set through safe exploration under certain assumptions. They also provided the safety guarantee of SafeOpt by using the confidence bounds construction method of [4]. These safe exploration algorithms are applied to many control and reinforcement learning problems and proved their success in those domains [5, 6]. These algorithms either require an exploration phase or apply a trade-off strategy between optimal decisions and exploration. However, many industrial systems cannot afford an exclusive exploration phase or require following optimization goals of the system even in the explored points. For instance, any deviation from target satisfaction in an industrial process may cause high costs to the plant or damage the reputation of the responsible party. Thus, there is a need to consider the exploration in an adaptive manner in accordance with the requirements of the environment and this aspect is not covered by existing safe exploration algorithms.

Motivated by above background, this work proposes a novel safe Adaptive Real-Time Exploration and Optimization (ARTEO) algorithm, where we cast multi-armed bandits as a mathematical programming problem subject to safety constraints for the optimization of safety-critical systems. In ARTEO, we model the unknown characteristics of the system using Gaussian processes (GP) regression and utilize the covariance function of GPs to quantify uncertainty to ensure satisfaction of the safety constraints with high probability. We establish the safety of ARTEO by constructing confidence bounds as in [4]. We incorporate the uncertainty into the utility function as a contribution to encourage the exploration at points with high uncertainty while continuing to satisfy the optimization goals of the system. The size of this contribution is adaptively controlled by a hyperparameter in accordance with the requirements of the environment. Additionally, the GP models are updated online by incorporating new useful observations to capture the changing process characteristics.

In experiment, we demonstrate the implementation of ARTEO for the power management problem of an industrial refrigeration system. The system includes 5 screw compressors in a parallel configuration, including one small-sized, one medium-sized and three large-sized compressors. The characteristics of load and power consumption relationship are unknown for individual compressors, which is expected to be learned by the GP models. The goal of our optimizer is to achieve the total load target whenever possible, while ensuring that a maximum total power constraint, which is an important safety requirement, is not exceeded. The operating range of each compressor is constrained by a minimum and maximum load. The algorithm starts with initial safe seeds. Fig. 1 illustrates the expected and achieved production loads, as well as total power consumption over time. It can be clearly seen that the expected production load is satisfied unless: (1) it is higher than what compressors can achieve without crossing the safety threshold, (2) it requires compressors to operate lower than their minimum operating points. Hence, the proposed ARTEO algorithm successfully minimizes the power consumption while tracking the desired cooling load within the operational and safety constraints. Even though the process starts with very limited information about the system, online learning provides comparable performance to a solution with exact knowledge of the environment.

References

[1] J. O. Trierweiler, “Real-time optimization of industrial processes,” in Encyclopedia of Systems and Control. Springer, 2021, pp. 1827–1836.

[2] S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, “Online optimization in x-armed bandits,” Advances in Neural Information Processing Systems, pp. 201–208, 2009.

[3] Sui, Yanan, et al. "Safe exploration for optimization with Gaussian processes," International Conference on Machine Learning. pp. 997-1005, 2015.

[4] Srinivas N, Krause A, Kakade S M, et al. " Gaussian process optimization in the bandit setting: No regret and experimental design," International Conference on Machine Learning, 2010.

[5] Berkenkamp, Felix, Angela P. Schoellig, and Andreas Krause. "Safe controller optimization for quadrotors with Gaussian processes," 2016 IEEE international conference on robotics and automation. pp.491-496, May 2016.

[6] Berkenkamp, Felix, et al. "Safe model-based reinforcement learning with stability guarantees." Advances in Neural Information Processing Systems, 2017.