(371k) A Reinforcement Learning Approach with Masked Agents for Chemical Process Flowsheet Design
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10A: Poster Session: Interactive Session: Systems and Process Design
Tuesday, October 29, 2024 - 3:30pm to 5:00pm
In this work, a framework to generate and design chemical process flowsheets using RL is presented. A feature in this formulation lays in the integration of the masking technique for the chemical process flowsheets design problem using a fully discrete and hybrid environments, i.e., mixture of discrete and continuous decisions. The application of masking in this context can be regarded as the incorporation of a human expertâs input or design rules, i.e., similar to a heuristic method that follows predefined rules. In this study, two masked Proximal Policy Optimizer (PPO) agents were developed: a fully discretized PPO and a hybrid PPO; to the best of the authorsâ knowledge, this is the first study that presents a masked hybrid PPO. The primary goal of these agents is to generate and design a chemical process flowsheet by minimizing operational costs using rigorous UOsâ models, adhering to the process and equipment operation and design constraints. In the simulation environment, the objective function and constraints were assigned into the reward function in the form of sub-rewards, where the sum of all sub-rewards represented its total value. The underlying idea in the generation of flowsheets is the same for both agents, i.e., the proposed formulation starts from an inlet stream, from which the agent learns to build a chemical process flowsheet choosing the order and configuration of various UOs. During its training, the agent learns the correlation between adjacent UOs while seeking to optimize their sizes and operating conditions. In scenarios where the correlation is more complex, particularly when relevant unit UOs aren't contiguous, the masking function becomes pivotal. For instance, in flowsheets featuring a mixer, it is common to have a recycle stream originating from non-adjacent UOs. Precisely, the masking technique enables the agent to understand complex relationships between non-adjacent UOs such as recycling. A difference among the agents lays in their action space. The hybrid masked agent's action space consists of discrete decisions primarily involving the choice of a UO, alongside all the continuous component related to the design variables associated with these operations. During each step (i.e., full interaction between the agent and the environment) the agent samples one discrete value and one continuous value for every design variable, regardless of whether it is used in that step or not. On the other hand, for the discrete masked agent, the action space is fully discretized; hence, for each step the sampled value corresponds to a specific unit operation with a specific design.
The proposed RL framework has been tested on two case studies. Both cases were implemented on a PC with Intel® CoreTM i7-3770 CPU @ 3.40 GHz and 32 GB of RAM, using Python as the programming language. The main libraries used were torch for the development of the neural networks and gym for creating the RL environment. The first case study aimed to design a chemical process flowsheet for producing a product up to a certain conversion level from given reactants. Three non-rigorous UOs: a mixer, a reactor, and a flash tank, along with their associated design variables, were considered. For this case, the presence of either the mixer or the flash tank in the flowsheet depended on the presence of one or the other, aiming to enable recycling. The outcomes obtained by both RL-agents showed that they were able to design chemical process flowsheets that achieved the required product conversion. The introduction of the masking technique was crucial for their training, since they learned the importance of using recycling in the process, resulting in shorter and more efficient flowsheets. For this case study, it was observed that the discrete masked agent produced a shorter and thus more economically attractive flowsheet compared to the hybrid agent.
For the second case study, the goal was to design a chemical process flowsheet for the production of dimethyl ether from the dehydration of methanol. A hybrid platform that combines Python with the chemical simulation software Aspen Plus was developed. The main reason for developing this hybrid platform is to utilize the advanced thermodynamic packages and rigorous conservation equations that govern all specified UOs as part of the simulation environment. Adding to that, this method enables a wider range of UOs to be integrated in the simulation environment, leading to processes that closely resemble real design procedures. The incorporation of various types of unit operations is precisely what sets this work apart from other studies that utilize ASPEN Plus, which only focus on a single UO. Moreover, the proposed RL framework can create chemical process flowsheets from scratch, considering constraints for specific UOs and automatically linking and simulating the chosen UOs. In this case, both agents were able to design the process while respecting the imposed operational constraints, ensuring dimethyl ether and water purities above 99%, and operating the reactor(s) at a temperature below 400°C. Despite achieving a lower overall reward than the discrete agent, the flowsheet designed by the hybrid agent showed notable insights, such as achieving better mass integration in the system through recycling and losing less dimethyl ether throughout the process. One significant drawback is its lengthy training period, especially when the RL framework is linked to an external chemical simulation software like ASPEN Plus. To complete 50,000 steps, which for this case equate to 50,000 evaluations of an ASPEN flowsheet, the simulation required around 36 hours. However, the flexibility provided by this framework, along with the potential to discover feasible yet more creative and economically viable solutions, makes this approach an attractive tool to consider in the future of design and optimization of chemical process flowsheets.
References
[1] Khan A, Lapkin A. Searching for optimal process routes: A reinforcement learning approach. Computers & Chemical Engineering. 2020;141:107027. doi:10.1016/j.compchemeng.2020.107027
[2] Stops, L., Leenhouts, R., Gao, Q., & Schweidtmann, A. M. (2022). Flowsheet generation through hierarchical reinforcement learning and graph neural networks. AIChE Journal (Vol. 69, Issue 1). Wiley. https://doi.org/10.1002/aic.1793
[3] Göttl, Q., Grimm, D. G., & Burger, J. (2022). Automated synthesis of steady-state continuous processes using reinforcement learning. Frontiers of Chemical Science and Engineering, 16(2), 288â302. https://doi.org/10.1007/s11705-021-2055-9
[4] Midgley LI. Deep Reinforcement Learning for Process Synthesis. Published online September 23, 2020. Accessed September 28, 2023. http://arxiv.org/abs/2009.13265
[5] van Kalmthout SCPA, Midgley LI, Franke MB. Synthesis of separation processes with reinforcement learning. Published online November 3, 2022. http://arxiv.org/abs/2211.04327
[6] Huang, S., & Ontañón, S. (2022). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. The International FLAIRS Conference Proceedings (Vol. 35). University of Florida George A Smathers Libraries. https://doi.org/10.32473/flairs.v35i.130584