(362a) A Reinforcement Learning Approach for Stochastic Cutting Stock Problem | AIChE

(362a) A Reinforcement Learning Approach for Stochastic Cutting Stock Problem

Authors 

Kang, J. L., National Yunlin University of Science and Technology
Jang, S. S., National Tsing Hua University
In the production scheduling for chemical industries, the inventory level directly affects the production cost, such as copper foil manufacturing. Factors that affect the inventory level include customer demands, scheduled production, coordination of upstream and downstream processes, and random events in production processes. The Stochastic Cut Stock Problem (SCSP) can be considered as the complex inventory level in scheduling problem due to the presence of random event factors. Such problems are expected to provide the best possible decision in a limited time. With traditional deterministic mathematical methods such as integer programming, the speed of the solution grows exponentially with the complexity of the process, and only obtaining acceptable solutions.

Reinforcement Learning (RL) is a novel method to immediately provide the best scheduling solution by interacting with the real / virtual environments to solve the above problems of traditional integer programming methods. However, RL is doubtful if it deals with process constraints. Pitombeira-Neto and Murta (2022) provided a model-free off-policy approximate policy iteration algorithm to show the RL scheduling performance and ensure that action does not violate the constraints. Yet, the excessive mathematizations and exhaustive random methods led to huge training time cost requirements, making industrial applicability low.

Hence, the purpose of the study was to adopt Advantage Actor-Critic (A2C) for easier implementations in industries and to solve a classic SCSP that can represent most schedule problems of the continuous production in chemical industries. Two main concepts were introduced to train the RL that satisfies the constraints, including (1) adding constraints penalties into the reward in RL; (2) inducing the concept of video games that the round ends when violating constraints. The results showed that the training of the A2C method was much faster than the literature method with a sufficiently low cost. Furthermore, a surprising result is that the A2C agent can continuously provide action that met the constraints for almost a thousand interactions and showed a wide range of potential applications in similar scenarios, i.e., problems with some constraints. Compared to the literature approach, the A2C algorithm is easier to get convergence and implement in industries. In future work, the A2C would compare with traditional integer programming for evaluating the rescheduling efficiency on a real-time scheduling simulator.

References

Pitombeira-Neto, A. R., & Murta, A. H. F. (2022). A reinforcement learning approach to the stochastic cutting stock problem. EURO Journal on Computational Optimization, 100027.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00