(345v) Reinforcement Learning of Optimization Strategy and Horizon Length Selection in Online Vehicle Routing

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Computing and Systems Technology Division

Session

Interactive Session: Data and Information Systems

Time

Tuesday, November 9, 2021 - 3:30pm to 5:00pm

Authors

Ikonen, T. - Presenter, Aalto University

Heljanko, K., University of Helsinki

Harjunkoski, I., Hitachi ABB Power Grids

Mathematical optimization methods are, at present, capable of finding the optimal solution to a vast variety of complex problems emerging in process systems engineering (PSE). In online optimization, re-optimization procedures are typically performed periodically or by using event-triggered rescheduling. However, the optimal horizon length and the optimization strategy (mathematical programming or a heuristic algorithm) depend on the prevailing state of the process. This work investigates an approach where a reinforcement learning (RL) agent [1] is trained to make these decisions via adaption to the environment.

During the recent years, the attention to RL has increased in process systems engineering and operations research. Shin et al. [2] provide a review of the progress of RL and its possibilities in process control. Hubbs et al. [3] developed an open-source library OR-Gym, including is a set of classical operations research problems formulated as a RL environments. Prouvost et al. [4] propose an open-source library ECOLE, which facilitates machine learning in similar repeatedly solved combinatorial optimization problems. Hubbs et al. [5] use deep RL to make explicit scheduling decisions on a chemical production plant. In other fields, similar explicit scheduling approaches have been proposed by Semrov et al. [6] and Atallah et al. [7] Our RL-based approach operates at a higher level. It utilizes the capability of mathematical programming and heuristic algorithms to find the optimal, or a nearly optimal, solution, and focuses on deciding the scheduling horizon length and the optimization strategy in a changing operating environment. In the literature, the effect of horizon length has been studied in process scheduling as a fixed parameter [8, 9]. However, to our knowledge, no adaptive methods have been reported for this purpose.

We evaluate the approach on the urban bike share rebalancing problem, which is a generalization of the one-commodity pickup and delivery vehicle routing problem [10]. The objective is to minimize travel time of a number of rebalancing vehicles that transport bikes between fixed stations while fulfilling the service level requirement of the system. We use the mixed-integer programming model proposed by Schuijbroek et al. [11] for the rebalancing vehicle routing. The problem is chosen as a testbed because the channels of new information (updated bike counts and deviations of a vehicle locations from those scheduled) are well-defined, and the data of such systems is publicly available. Finally, we discuss the transferability of the research to more complex processes in PSE, such as process scheduling and supply chain optimization.

References

[1] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018.

[2] J. Shin, T. A. Badgwell, K.-H. Liu, J. H. Lee, Reinforcement learningâ€“overview of recent progress and implications for process control, Computers & Chemical Engineering 127 (2019) 282â€“294.

[3] C. D. Hubbs, H. D. Perez, O. Sarwar, N. V. Sahinidis, I. E. Grossmann, J. M. Wassick, Or-gym: A reinforcement learning library for operations research problem, arXiv preprint arXiv:2008.06319.

[4] A. Prouvost, J. Dumouchelle, L. Scavuzzo, M. Gasse, D. ChÃ©telat, A. Lodi, ECOLE: A gym-like library for machine learning in combinatorial optimization solvers, arXiv preprint arXiv:2011.06069.

[5] C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, J. M. Wassick, A deep reinforcement learning approach for chemical production scheduling, Computers & Chemical Engineering 141 (2020) 106982.

[6] D. Å emrov, R. MarsetiÄ, M.Å½ura, L. Todorovski, A. Srdic, Reinforcement learning approach for train rescheduling on a single-track railway, Transportation Research Part B: Methodological 86 (2016) 250â€“267.

[7] R. F. Atallah, C. M. Assi, M. J. Khabbaz, Scheduling the operation of a connected vehicular network using deep reinforcement learning, IEEE Transactions on Intelligent Transportation Systems 20 (5) (2018) 1669â€“1682.

[8] D. Gupta, C. T. Maravelias, J. M. Wassick, From rescheduling to online scheduling, Chemical Engineering Research and Design 116 (2016) 83â€“97.

[9] D. Gupta, C. T. Maravelias, On the design of online production scheduling algorithms, Computers & Chemical Engineering 129 (2019) 106517.

[10] H. HernÃ¡ndez-PÃ©rez, J.-J. Salazar-GonzÃ¡lez, A branch-and-cut algorithm for a traveling salesman problem with pickup and delivery, Discrete Applied Mathematics 145 (1) (2004) 126â€“139.

[11] J. Schuijbroek, R. C. Hampshire, W.-J. Van Hoeve, Inventory rebalancing and vehicle routing in bike sharing systems, European Journal of Operational Research 257 (3) (2017) 992â€“1004.

Topics

Computing and Systems Engineering

Plant Operations

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.