(579h) A Hybrid Deep Q-Learning/MILP Approach to Fast Online Planning & Rescheduling of Continuous Manufacturing Processes | AIChE

(579h) A Hybrid Deep Q-Learning/MILP Approach to Fast Online Planning & Rescheduling of Continuous Manufacturing Processes

Authors 

Charitopoulos, V. - Presenter, University College London
Johnn, S. N., University College London
Growing market competition and demand volatility constitute two of the most prominent factors that endanger the profitability and resilience of process industries (Badejo and Ierapetritou, 2022). The aim to efficiently organise production and meet the long-term manufacturing demands of multiple products necessitates the accommodation of frequent modifications due to unforeseen fluctuations such as product demand or electricity price (Gupta et al., 2016; Castro et al., 2018). Production management encompasses two key components: planning, which involves resource planning over long time horizons, and scheduling, which entails the detailed allocation of production tasks to specific resources (Perez et al., 2021). We necessitate effective response mechanisms capable of addressing recurring optimisation needs and preventing financial losses (Kopanos and Puigjaner, 2018). In this work, we focus on the integrated planning and online rescheduling of multi-product continuous manufacturing systems through a novel hybrid deep Q-learning (DQN)/MILP framework with the goal of facilitating real-time implementation of rescheduling of large-scale systems.

Conventional approaches for addressing the complete integrated planning and scheduling problem via Mixed-Integer Linear Programming (MILP) are time-consuming owing to their reliance on time-slot-based formulations, which require postulating time slots for each period within the planning horizon. In recent years, there has been considerable research interest in the effective integration of planning and scheduling within process industries. Many studies in the literature aim to reduce the computational complexity and derive a more computationally efficient framework. Previous studies (Erdirik-Dogan and Grossmann, 2006; Erdirik-Dogan and Grossmann, 2008; Sung and Maravelias, 2007) have introduced several decomposition-based approaches to mitigate the exponential increase in computational effort as instance size grows. Liu et al. (2008) and Charitopoulos et al. (2017) have proposed a hybrid time representation approach that circumvents the direct handling of continuous-time formulation within each period when the planning horizon expands.

To navigate dynamic industrial environments effectively, timely and reactive responses are crucial to managing various uncertainties, such as equipment malfunctions and rush order arrivals. Nevertheless, existing exact methods encounter challenges in accommodating recurrent online re-optimisation to address the evolving information promptly, thereby hindering the industrial applicability of theoretical models. This underscores the need for an automated online scheduling system that can bolster decision-making efficiency to handle large datasets amidst rapid modifications. Many pertinent studies have been conducted, including Gupta et al. (2016), Gupta and Maravelias (2019) and Framinan et al. (2019) to name a few. In recent years, the integration of Machine Learning (ML) into process scheduling has received increasing attention (Hubbs et al., 2020a), with recent studies showing very promising results (Chiang et al., 2022; Fuentes-Cortés et al., 2022). Reinforcement Learning (RL), a branch of ML that optimises a predefined reward function through iterative trial-and-error, demonstrates the potential for reactive online scheduling with lower computational expenses (Hubbs et al., 2020b). The contributions of this work include introducing online continuous rescheduling of manufacturing processes, and the integration of RL with MILP to enable fast and optimal decision-making.

In this work, we opt for DQN, a popular approach within the RL framework, to identify optimal production sequences for a single-unit multi-product planning and scheduling problem. Subsequently, the identified sequence is utilised as input for a Linear Programming model to determine the production amount, backlog and inventory levels, and schedule length, with computational time being linear. We frame the framework as a Markov Decision Process, a mathematical model of decision-making characterised by discrete time steps involving states, actions and rewards that form the foundation of RL. During training, a Q-learning agent is rewarded proportionally to the enhancement of the solution quality. Afterwards, the trained agent generates sequences in an offline manner. We demonstrate the effectiveness of our learning-based approach in the context of a multiproduct continuous manufacturing process. Our findings reveal that the computational time is approximately halved compared to conventional MILP formulations with minimal loss of optimality compared to the MILP global solution. Through a series of case studies, we illustrate its enhanced efficiency while maintaining reasonably high solution quality compared to the optimal solution obtained by the exact method.

Acknowledgements
Financial support from EPSRC grant EP/V051008/1 is gratefully acknowledged.

References

Badejo, O. and Ierapetritou, M., 2022. Integrating tactical planning, operational planning and scheduling using data-driven feasibility analysis. Computers & Chemical Engineering, 161, p.107759.

Gupta, D., Maravelias, C.T. and Wassick, J.M., 2016. From rescheduling to online scheduling. Chemical Engineering Research and Design, 116, pp.83-97.

Castro, P.M., Grossmann, I.E. and Zhang, Q., 2018. Expanding scope and computational challenges in process scheduling. Computers & Chemical Engineering, 114, pp.14-42.

Perez, H.D., Amaran, S., Erisen, E., Wassick, J.M. and Grossmann, I.E., 2021. Optimization of extended business processes in digital supply chains using mathematical programming. Computers & Chemical Engineering, 152, p.107323.

Kopanos, G.M. and Puigjaner, L., 2019. Solving large-scale production scheduling and planning in the process industries. Cham, Switzerland: Springer International Publishing.

Erdirik-Dogan, M. and Grossmann, I.E., 2006. A decomposition method for the simultaneous planning and scheduling of single-stage continuous multiproduct plants. Industrial & engineering chemistry research, 45(1), pp.299-315.

Erdirik-Dogan, M. and Grossmann, I.E., 2008. Simultaneous planning and scheduling of single-stage multi-product continuous plants with parallel lines. Computers & Chemical Engineering, 32(11), pp.2664-2683.

Sung, C. and Maravelias, C.T., 2007. An attainable region approach for production planning of multiproduct processes. AIChE Journal, 53(5), pp.1298-1315.

Liu, S., Pinto, J.M. and Papageorgiou, L.G., 2008. A TSP-based MILP model for medium-term planning of single-stage continuous multiproduct plants. Industrial & Engineering Chemistry Research, 47(20), pp.7733-7743.

Charitopoulos, V.M., Dua, V. and Papageorgiou, L.G., 2017. Traveling salesman problem-based integration of planning, scheduling, and optimal control for continuous processes. Industrial & Engineering Chemistry Research, 56(39), pp.11186-11205.

Gupta, D. and Maravelias, C.T., 2019. On the design of online production scheduling algorithms. Computers & Chemical Engineering, 129, p.106517.

Framinan, J.M., Fernandez-Viagas, V. and Perez-Gonzalez, P., 2019. Using real-time information to reschedule jobs in a flowshop with variable processing times. Computers & Industrial Engineering, 129, pp.113-125.

Hubbs, C.D., Li, C., Sahinidis, N.V., Grossmann, I.E. and Wassick, J.M., 2020a. A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 141, p.106982.

Hubbs, C.D., Perez, H.D., Sarwar, O., Sahinidis, N.V., Grossmann, I.E. and Wassick, J.M., 2020b. Or-gym: A reinforcement learning library for operations research problems. arXiv preprint arXiv:2008.06319.

Chiang, L.H., Braun, B., Wang, Z. and Castillo, I., 2022. Towards artificial intelligence at scale in the chemical industry. AIChE Journal, 68(6), p.e17644.

Fuentes-Cortés, L.F., Flores-Tlacuahuac, A. and Nigam, K.D., 2022. Machine learning algorithms used in PSE environments: A didactic approach and critical perspective. Industrial & Engineering Chemistry Research, 61(25), pp.8932-8962.