(371k) A Reinforcement Learning Approach with Masked Agents for Chemical Process Flowsheet Design | AIChE

(371k) A Reinforcement Learning Approach with Masked Agents for Chemical Process Flowsheet Design

Authors 

Reynoso-Donzelli, S. - Presenter, University of Waterloo
Ricardez-Sandoval, L., University of Waterloo
Process flowsheet design is a key topic in chemical engineering that aims to provide a visual representation of chemical processes by arranging unit operations (UOs), their interconnections via streams, and the involved materials. Process flowsheets are typically designed through a combination of heuristic methods and optimization techniques, integrating pre-established guidelines with mathematical solutions. Despite their effectiveness, this approach faces challenges such as formulating an appropriate superstructure for optimization, the complexity of programming all the equations related to unit operations and their thermodynamic models, constraints inherent to the problem (e.g., disjunctive constraints) and the identification of suitable initial guess to achieve convergence. In recent years, model-free optimization methods have become an attractive alternative to model-based approaches, due to their ability to incorporate within their resolution procedure commercial chemical process simulators, e.g., ASPEN Plus. Among the various model-free optimization methods, Reinforcement Learning (RL) stands out as an attractive approach for process flowsheet design, particularly for its ability to take sequential decisions that are interrelated, while searching for an optimal solution. Significant progress has been made in this field, showcasing promising outcomes in the generation and design of chemical process flowsheets. The works presented by [1] – [3] leverage advanced RL techniques, such as hierarchical agents, graph neural networks, and advanced search strategies combined with competing agents, to design chemical process flowsheets. The majority of these works makes use of short-cut methodologies to design the UOs, while [4] and [5] have demonstrated their capability to interface with chemical process simulators. Typically, RL-agents are presented during their training with all the available actions (unit operations with respective sizing’s) such that they learn to select the optimal process arrangement. However, this approach could be counterproductive for the agent’s learning, especially when the problem at hand is complex and involves cases where the coupling of certain actions is not feasible. This work addresses a relatively new emerging area in chemical process flowsheet design using RL, the introduction of masking [3], [6]; a technique aimed at enhancing the decision-making process of the agent by excluding incoherent actions from the agent's search space.

In this work, a framework to generate and design chemical process flowsheets using RL is presented. A feature in this formulation lays in the integration of the masking technique for the chemical process flowsheets design problem using a fully discrete and hybrid environments, i.e., mixture of discrete and continuous decisions. The application of masking in this context can be regarded as the incorporation of a human expert’s input or design rules, i.e., similar to a heuristic method that follows predefined rules. In this study, two masked Proximal Policy Optimizer (PPO) agents were developed: a fully discretized PPO and a hybrid PPO; to the best of the authors’ knowledge, this is the first study that presents a masked hybrid PPO. The primary goal of these agents is to generate and design a chemical process flowsheet by minimizing operational costs using rigorous UOs’ models, adhering to the process and equipment operation and design constraints. In the simulation environment, the objective function and constraints were assigned into the reward function in the form of sub-rewards, where the sum of all sub-rewards represented its total value. The underlying idea in the generation of flowsheets is the same for both agents, i.e., the proposed formulation starts from an inlet stream, from which the agent learns to build a chemical process flowsheet choosing the order and configuration of various UOs. During its training, the agent learns the correlation between adjacent UOs while seeking to optimize their sizes and operating conditions. In scenarios where the correlation is more complex, particularly when relevant unit UOs aren't contiguous, the masking function becomes pivotal. For instance, in flowsheets featuring a mixer, it is common to have a recycle stream originating from non-adjacent UOs. Precisely, the masking technique enables the agent to understand complex relationships between non-adjacent UOs such as recycling. A difference among the agents lays in their action space. The hybrid masked agent's action space consists of discrete decisions primarily involving the choice of a UO, alongside all the continuous component related to the design variables associated with these operations. During each step (i.e., full interaction between the agent and the environment) the agent samples one discrete value and one continuous value for every design variable, regardless of whether it is used in that step or not. On the other hand, for the discrete masked agent, the action space is fully discretized; hence, for each step the sampled value corresponds to a specific unit operation with a specific design.

The proposed RL framework has been tested on two case studies. Both cases were implemented on a PC with Intel® CoreTM i7-3770 CPU @ 3.40 GHz and 32 GB of RAM, using Python as the programming language. The main libraries used were torch for the development of the neural networks and gym for creating the RL environment. The first case study aimed to design a chemical process flowsheet for producing a product up to a certain conversion level from given reactants. Three non-rigorous UOs: a mixer, a reactor, and a flash tank, along with their associated design variables, were considered. For this case, the presence of either the mixer or the flash tank in the flowsheet depended on the presence of one or the other, aiming to enable recycling. The outcomes obtained by both RL-agents showed that they were able to design chemical process flowsheets that achieved the required product conversion. The introduction of the masking technique was crucial for their training, since they learned the importance of using recycling in the process, resulting in shorter and more efficient flowsheets. For this case study, it was observed that the discrete masked agent produced a shorter and thus more economically attractive flowsheet compared to the hybrid agent.

For the second case study, the goal was to design a chemical process flowsheet for the production of dimethyl ether from the dehydration of methanol. A hybrid platform that combines Python with the chemical simulation software Aspen Plus was developed. The main reason for developing this hybrid platform is to utilize the advanced thermodynamic packages and rigorous conservation equations that govern all specified UOs as part of the simulation environment. Adding to that, this method enables a wider range of UOs to be integrated in the simulation environment, leading to processes that closely resemble real design procedures. The incorporation of various types of unit operations is precisely what sets this work apart from other studies that utilize ASPEN Plus, which only focus on a single UO. Moreover, the proposed RL framework can create chemical process flowsheets from scratch, considering constraints for specific UOs and automatically linking and simulating the chosen UOs. In this case, both agents were able to design the process while respecting the imposed operational constraints, ensuring dimethyl ether and water purities above 99%, and operating the reactor(s) at a temperature below 400°C. Despite achieving a lower overall reward than the discrete agent, the flowsheet designed by the hybrid agent showed notable insights, such as achieving better mass integration in the system through recycling and losing less dimethyl ether throughout the process. One significant drawback is its lengthy training period, especially when the RL framework is linked to an external chemical simulation software like ASPEN Plus. To complete 50,000 steps, which for this case equate to 50,000 evaluations of an ASPEN flowsheet, the simulation required around 36 hours. However, the flexibility provided by this framework, along with the potential to discover feasible yet more creative and economically viable solutions, makes this approach an attractive tool to consider in the future of design and optimization of chemical process flowsheets.

References

[1] Khan A, Lapkin A. Searching for optimal process routes: A reinforcement learning approach. Computers & Chemical Engineering. 2020;141:107027. doi:10.1016/j.compchemeng.2020.107027

[2] Stops, L., Leenhouts, R., Gao, Q., & Schweidtmann, A. M. (2022). Flowsheet generation through hierarchical reinforcement learning and graph neural networks. AIChE Journal (Vol. 69, Issue 1). Wiley. https://doi.org/10.1002/aic.1793

[3] Göttl, Q., Grimm, D. G., & Burger, J. (2022). Automated synthesis of steady-state continuous processes using reinforcement learning. Frontiers of Chemical Science and Engineering, 16(2), 288–302. https://doi.org/10.1007/s11705-021-2055-9

[4] Midgley LI. Deep Reinforcement Learning for Process Synthesis. Published online September 23, 2020. Accessed September 28, 2023. http://arxiv.org/abs/2009.13265

[5] van Kalmthout SCPA, Midgley LI, Franke MB. Synthesis of separation processes with reinforcement learning. Published online November 3, 2022. http://arxiv.org/abs/2211.04327

[6] Huang, S., & Ontañón, S. (2022). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. The International FLAIRS Conference Proceedings (Vol. 35). University of Florida George A Smathers Libraries. https://doi.org/10.32473/flairs.v35i.130584