(402a) Or-Gym: A Reinforcement Learning Library for Operations Research Problems
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Computing and Systems Technology Division
Advances in Machine Learning and Intelligent Systems I
Tuesday, November 17, 2020 - 8:00am to 8:15am
Research Problems
Christian Hubbs1,2, Owais Sarwar1, Hector Perez1, Nikolaos Sahinidis1, and Ignacio Grossmann1
1Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213
2Digital Fulfillment Center, Dow Chemical, Midland, MI 48642
Keywords: Machine Learning, Reinforcement Learning, Optimization, Scheduling, Stochastic Program- ming
1 Abstract
We introduce OR-Gym, an open-source benchmark in the form of OpenAI Gym for developing reinforcement learning algorithms to address operations research problems [Brockman et al., 2016]. Reinforcement learning has been widely applied to game-playing and surpassed the best human-level performance in many domains, yet there are few use-cases in industrial or commercial settings.
We seek to provide a standardized library for the research community who wishes to extend RL into commercial applications by building on top of the preceding work and releasing OR-Gym, a single library that relies on the familiar OpenAI interface for RL, but contains problems relevant for industrial use. To this end, we have incorporated the benchmarks in Balaji et al. [2019], while extending the library to address the traveling salesman problem, knapsack problem [Kellerer et al., 2004], multi-dimensional bin packing [Coffman et al., 2013], multi-echelon news vendor problem, portfolio optimization [Dantzig and Infanger,
1993], and vehicle routing problem [Pillac et al., 2013]. These problems cover logistics, finance, engineering, and are common in many business operation settings. In each case, we select a prototypical version from the literature to benchmark reinforcement learning and other optimal approaches against.
RL problems are formulated as Markov Decision Processes (MDP), meaning they are probabilistic, se- quential decision making problems. MDPâs are defined by the state, which informs an agent to select an action, which then receives a reward during the transition to the subsequent state. This framework is not widely used in the optimization community, so we make explicit our thought process as we reformulate many optimization problems to fit into the MDP mold without loss of generality.
Many current RL libraries such as the canonical OpanAI Gym, have many interesting problems, but problems that are not directly relevant to industrial use. Moreover, many of these problems (e.g. the Atari suite) lack the same type of structure as classic optimization problems, and thus are primarily amenable to model-free RL techniques, that is RL algorithms that learn with little to no prior knowledge of the dynamics of the environment they are operating in. Bringing well-studied optimization problems to the RL community may encourage more integration of model-based and model-free methods to reduce sample complexity and provide better overall performance.
It is our goal that this work encourages further development and integration of RL into optimization and the OR community while also opening the RL community to many of the problems and challenges that the OR community has been wrestling with for decades.
We explicitly present the design decisions we made to translate these problems to MDPâs, as well as the
results of the benchmark algorithms. All code is open-source and available at www.github.com/hubbs5/or- gym.
References
B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and C. Ye. ORL: Reinforcement Learning Benchmarks for Online Stochastic Op- timization Problems. 2019. URL http://arxiv.org/abs/1911.10641.
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. OpenAI Gym. pages 1â4, 2016. ISSN 00217298. doi: 10.1241/johokanri.44.113. URL http://arxiv.org/abs/
E. G. Coffman, J. Csirik, G. Galambos, S. Martello, and D. Vigo. Bin Packing Approximation Algorithms: Survey and Classification. In Handbook of Combinatorial Optimization, pages 455â531. Springer, Heidel- berg, 2013.
G. B. Dantzig and G. Infanger. Multi-stage stochastic linear programs for portfolio optimization. Annals of
Operations Research, 45:59â76, 1993.
H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems, volume 53. Springer Verlag, Heidelberg, 2004. ISBN 9788578110796. doi: 10.1017/CBO9781107415324.004.
V. Pillac, M. Gendreau, C. Gueret, and A. L. Medaglia. A review of dynamic vehicle routing problems.
European Journal of Operational Research, (225):1â11, 2013.