(375r) Offline Rl for Optimal Bioprocess Production Scheduling | AIChE

(375r) Offline Rl for Optimal Bioprocess Production Scheduling

Authors 

Wang, H. - Presenter, Imperial College London
Kontoravdi, C., Imperial College London
del Rio Chanona, A., Imperial College London
Industrial bioprocessing is gaining significant attention for its ability to create sustainable alternatives to fossil-based materials and bioengineered medications. With increasing demands, identifying flexible and cost-efficient bioprocess scheduling strategies has become essential (Vieira et al., 2016). Typically, optimal scheduling strategies are derived from mixed integer linear programming (MILP) models. However, the uncertainty of bioprocess, including fluctuating product prices, production delays, and changing demands, makes finding optimal MILP solutions challenging (Hubbs et al., 2020). Instead, recent research has explored reinforcement learning (RL) for process scheduling (Hubbs et al., 2020, Mowbray et al., 2022). This method effectively addresses process uncertainties by formulating the decision-making framework as a Markov Decision Process (MDP). Nonetheless, the high data demands for training RL agents and the cost of data collection have led researchers to consider offline RL methods, which leverage the information from historical bioprocess data.

In this work we apply a novel offline RL framework for optimal bioprocess production scheduling. In contrast to the traditional RL algorithms like temporal difference (TD) learning, we utilise the transformer model to treat the sequential decision-making process through sequential modelling. In this way, we can train a transformer model to predict the next best action based on a desired future reward by taking the information of the current state, the scheduling actions, and the accumulative reward from the current state as the input. On the one hand, this method allows us to learn the optimal scheduling strategy only from historical data without interaction with the bioprocess operation. One the other hand, utilization of the transformer model provides the opportunity to associated advances in language modelling such as GPT-x and BERT (Chen et al., 2021).

Here we show case the capacity of the framework on a continuous biomanufacturing process with a single stage and single production unit operating under stochastic demand over a planned horizon. At the same time, the transition losses which are caused by product type change are considered and minimised in this case study (Hubbs et al., 2020). Our results show the offline RL method can provide a near optimal policy for bioprocess scheduling without interacting with the bioprocess environment.

References

CHEN, L., LU, K., RAJESWARAN, A., LEE, K., GROVER, A., LASKIN, M., ABBEEL, P., SRINIVAS, A. & MORDATCH, I. 2021. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34, 15084-15097.

HUBBS, C. D., LI, C., SAHINIDIS, N. V., GROSSMANN, I. E. & WASSICK, J. M. 2020. A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 141, 106982.

MOWBRAY, M., ZHANG, D. & CHANONA, E. A. D. R. 2022. Distributional reinforcement learning for scheduling of chemical production processes. arXiv preprint arXiv:2203.00636.

VIEIRA, M., PINTO-VARELA, T., MONIZ, S., BARBOSA-PÓVOA, A. P. & PAPAGEORGIOU, L. G. 2016. Optimal planning and campaign scheduling of biopharmaceutical processes using a continuous-time formulation. Computers & Chemical Engineering, 91, 422-444.