(471a) Real-Time Chemical Production Rescheduling Via Explorative Reinforcement Learning Considering Nervousness | AIChE

(471a) Real-Time Chemical Production Rescheduling Via Explorative Reinforcement Learning Considering Nervousness

Authors 

Hwangbo, S. - Presenter, Ewha Womans University
Na, J., Carnegie Mellon University
Liu, J. J., Pukyong National University
Ryu, J. H., Dongguk University
Lee, H., University of Wisconsin-Madison
Production scheduling is a decision-making process that plays an essential role in chemical process industries. Conventional methods for solving job shop scheduling problems include simple dispatching rules, metaheuristics, and mathematical programming [1]. These approaches can provide an optimal solution, but when applied to the real world where much environmental turbulence exists, it will no longer be the optimal solution and a new optimization must be performed. The key to rescheduling, a re-optimization process with updated states of systems, is to find an appropriate schedule that satisfies the scheduling objective under low deviation from the base schedule [2, 3]. Low deviation increases the schedule stability by obtaining low nervousness, which prevents frequent revisions of the schedule. However, to do so, existing methods require multiple optimizations with enormous variables and constraints, which increases the complexity and the computational cost, making real-time rescheduling difficult.

Therefore, reinforcement learning (RL)-based scheduling methodology has been proposed to shift the rescheduling problems from the optimization problem to the generative AI model learning problem to respond to changes in the environment instant in a short time [4, 5]. RL is a machine learning algorithm that learns to maximize a cumulative reward through continuous interaction with its environment. It does not need a separate data-collecting stage and can work in dynamic and uncertain environments with its stochastic action policy. Moreover, well-trained RL policy networks can handle real-time disturbances by taking optimal actions on the basis of updated input states. This ability of RL to make optimal decisions quickly and flexibly in an uncertain environment makes it useful for real-time optimization in dynamic scheduling systems [6].

Herein, we propose a RL model for a single-stage scheduling problem that can instantly respond to unexpected changes in the environments. We applied action masking to filter infeasible actions in advance, so that only feasible actions can be considered in the action space. In addition, we have applied an intrinsic curiosity module to encourage the agent to explore new parts of the long-term environment [7]. The backstepping method in the inference phase is developed to obtain more explorative schedules. To validate the performance of the RL model, we used the scheduling problem data given in the paper of Harjunkoski et al. [8] for training and evaluation. In static scheduling environment, our RL model achieved over 95% of the cost objective within short execution time, indicating its comparable performance to conventional scheduling methods. Furthermore, several case studies have confirmed that our RL scheduling model can generate alternative schedules under various disruptions in real-time. The rescheduling results are presented simultaneously with the cost vs nervousness Pareto curve, allowing the decision maker to choose the optimal point for the desired objective.

[1] Lee, H. and C.T. Maravelias, Combining the advantages of discrete- and continuous-time scheduling models: Part 1. Framework and mathematical formulations. Computers & Chemical Engineering, 2018. 116: p. 176-190.

[2] Ave, G.D., et al., An Explicit Online Resource-Task Network Scheduling Formulation to Avoid Scheduling Nervousness, in Computer Aided Chemical Engineering, A.A. Kiss, et al., Editors. 2019, Elsevier. p. 61-66.

[3] Atadeniz, S.N. and S.V. Sridharan, Effectiveness of nervousness reduction policies when capacity is constrained. International Journal of Production Research, 2020. 58(13): p. 4121-4137.

[4] Luo, S., Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Applied Soft Computing, 2020. 91: p. 106208.

[5] Zhou, T., et al., Multi-agent reinforcement learning for online scheduling in smart factories. Robotics and Computer-Integrated Manufacturing, 2021. 72: p. 102202.

[6] Hubbs, C.D., et al., A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 2020. 141: p. 106982.

[7] Pathak, D., et al., Curiosity-driven Exploration by Self-supervised Prediction, in Proceedings of the 34th International Conference on Machine Learning, P. Doina and T. Yee Whye, Editors. 2017, PMLR: Proceedings of Machine Learning Research. p. 2778--2787.

[8] Harjunkoski, I. and I.E. Grossmann, Decomposition techniques for multistage scheduling problems using mixed-integer and constraint programming methods. Computers & Chemical Engineering, 2002. 26(11): p. 1533-1552.