(642a) Two-Phase Optimal Design of Pharmaceutical Separation: Sampling-Based Uncertainty Analysis and Reinforcement Learning | AIChE

(642a) Two-Phase Optimal Design of Pharmaceutical Separation: Sampling-Based Uncertainty Analysis and Reinforcement Learning

Authors 

Hwangbo, S. - Presenter, Technical University of Denmark
Sin, G., Technical University of Denmark
Pharmaceutical industry has been consistently growing up because of diverse factors such as the increased size of the global aging population, drug affordability, government policies, and supply-side factors. It has reached for a number of challenges including a changing health care landscape, expiring patents and generic competition, pricing pressures, and a persistent economic slowdown. One of the feasible solutions to overcome the current situation is to develop an optimal and smart separation process of pharmaceuticals in order to improve the performance and maintain the system flexibility under the uncertainty. Liquid-liquid extraction (LLE), which is also known as solvent extraction, is the main process that is much employed during the downstream separation. In the LLE process, a feed stream of two or more components is faced with a second liquid stream containing solvent. Therefore, the use of proper solvents and the implementation of an optimal operating design for the LLE process are inevitably important issues to keep obtaining high yield of pharmaceuticals. This research aims to construct an advanced LLE framework including two-phase strategy. The first phase describes the optimal design of the LLE process based on thermodynamics and transport properties and uncertainty analysis. The second phase presents a smart operating planning by reinforcement learning according to results from the first stage. In the reinforcement learning part, a deep Q-learning algorithm to control the LLE process is investigated. Each stage in the purpose of this study is underpinned by several internal steps as the follows. For the first step in the first stage, electrolyte non-random two-liquid segment activity coefficient (eNRTL-SAC) model is employed to estimate segment parameters and activity coefficients of a target pharmaceutical. eNRTL-SAC model is reminiscent of NRTL-SAC model and additionally uses an electrolyte conceptual segment parameter. Segment parameters and activity coefficients are explicitly utilized to perform solubility modelling. The next step is that liquid-liquid extraction column as a separation process is considered and distribution coefficients are calculated based on results of thermodynamics properties; thereafter, uncertainty analysis is conducted to identify optimal solvents and operating conditions. Reinforcement learning in the second stage technically aims to optimize an operating structure using the results from the first stage. For the first step in the second stage, an agent and an environment have to be precisely defined, each of which corresponds to the controller and the LLE process respectively in this research. The agent in the second step takes an action to the environment whereby the environment operates the designed process model according to the action. The environment in the last step discharges states and rewards to the agent to be updated by deep learning algorithms. Actions, states, and rewards can be constructed in various ways as the relationship in between the agent and the environment should be considered.

Property process modelling and uncertainty analysis

Screening of an optimal solvent or solvents mixture is an inherent procedure to formulate an efficient pharmaceutical process. Solubility modelling is most often used as a criterion of solvent screening and requires a knowledge of a proper thermodynamic model. As aforementioned, eNRTL-SAC model is applied to this study, whereby regressed segments parameters as well as activity coefficients are able to be disclosed. Activity coefficients are determined by regressed segment parameters and a distribution coefficient, which is also known as a partition coefficient and means the ratio of concentration of a compound in a mixture at equilibrium, is calculated by activity coefficients. Extraction factor consists of the distribution coefficient, the flow rate of the feed stream in the LLE process, and the flow rate of the solvent stream in the LLE process, and Kremser-Souders-Brown theoretical stage equation directly uses the distribution coefficient and the extraction factor to estimate the number of stages in the extraction column that ultimately affects the operating costs. Parameter uncertainties and an influence thereof are implemented by uncertainty analysis in order to give a decision maker the guideline of an optimal process design. Sampling-based model solution prior to uncertainty analysis is conducted as means to explore and identify the optimal process variables thereby Monte Carlo method is exploited to illustrate the influence of parameter uncertainties.

Off-policy control with deep Q-leaning

General reinforcement learning basically encompasses five components, each of which is state, state transition probability matrix, action, reward, and discount factor. Action is determined by a policy included in the agent. Conventional objective function in reinforcement learning is to maximize a reward function according to the policy. Therefore, understanding of the optimal policy is the main purpose in reinforcement learning. Off-policy strategy is associated with the concept of exploration and exploitation. In the case of on-policy strategy, a specific policy can be upgraded through agent training, on the other hand, it has the weakness of exploring solutions available in the vicinity of the current policy. At the expense of consuming computational costs, off-policy can flexibly explore distinct candidates and approach much closer to the global solution. The agent is designed by deep learning algorithms to suggest the optimal action every single episode. State information and value of reward from the environment play a role of input variables of the agent and suitable hyperparameters of the deep learning algorithm have to be set up to improve the efficiency of the agent.

This study purposes to design a novel extraction process in pharmaceutical industry based on uncertainty analysis and reinforcement learning. Thermodynamics property modelling, solubility modelling, solvent screening, and uncertainty analysis are consecutively examined to design the feasible LLE process of pharmaceuticals. Afterwards reinforcement learning is considered to suggest the optimal operating planning based on the results from uncertainty analysis. Moreover, the proposed two-phase framework of the LLE process would be extended to other processes in downstream process.