(402a) Or-Gym: A Reinforcement Learning Library for Operations Research Problems | AIChE

(402a) Or-Gym: A Reinforcement Learning Library for Operations Research Problems

Authors 

Hubbs, C. D. - Presenter, The Dow Chemical Company
Sahinidis, N. - Presenter, Carnegie Mellon University
Sarwar, O., Carnegie Mellon University
Grossmann, I., Carnegie Mellon University
Perez, H., Carnegie Mellon University
OR-Gym: A Reinforcement Learning Library for Operations

Research Problems

Christian Hubbs1,2, Owais Sarwar1, Hector Perez1, Nikolaos Sahinidis1, and Ignacio Grossmann1

1Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213

2Digital Fulfillment Center, Dow Chemical, Midland, MI 48642



Keywords: Machine Learning, Reinforcement Learning, Optimization, Scheduling, Stochastic Program- ming


1 Abstract

We introduce OR-Gym, an open-source benchmark in the form of OpenAI Gym for developing reinforcement learning algorithms to address operations research problems [Brockman et al., 2016]. Reinforcement learning has been widely applied to game-playing and surpassed the best human-level performance in many domains, yet there are few use-cases in industrial or commercial settings.

We seek to provide a standardized library for the research community who wishes to extend RL into commercial applications by building on top of the preceding work and releasing OR-Gym, a single library that relies on the familiar OpenAI interface for RL, but contains problems relevant for industrial use. To this end, we have incorporated the benchmarks in Balaji et al. [2019], while extending the library to address the traveling salesman problem, knapsack problem [Kellerer et al., 2004], multi-dimensional bin packing [Coffman et al., 2013], multi-echelon news vendor problem, portfolio optimization [Dantzig and Infanger,

1993], and vehicle routing problem [Pillac et al., 2013]. These problems cover logistics, finance, engineering, and are common in many business operation settings. In each case, we select a prototypical version from the literature to benchmark reinforcement learning and other optimal approaches against.

RL problems are formulated as Markov Decision Processes (MDP), meaning they are probabilistic, se- quential decision making problems. MDP’s are defined by the state, which informs an agent to select an action, which then receives a reward during the transition to the subsequent state. This framework is not widely used in the optimization community, so we make explicit our thought process as we reformulate many optimization problems to fit into the MDP mold without loss of generality.

Many current RL libraries such as the canonical OpanAI Gym, have many interesting problems, but problems that are not directly relevant to industrial use. Moreover, many of these problems (e.g. the Atari suite) lack the same type of structure as classic optimization problems, and thus are primarily amenable to model-free RL techniques, that is RL algorithms that learn with little to no prior knowledge of the dynamics of the environment they are operating in. Bringing well-studied optimization problems to the RL community may encourage more integration of model-based and model-free methods to reduce sample complexity and provide better overall performance.

It is our goal that this work encourages further development and integration of RL into optimization and the OR community while also opening the RL community to many of the problems and challenges that the OR community has been wrestling with for decades.

We explicitly present the design decisions we made to translate these problems to MDP’s, as well as the

results of the benchmark algorithms. All code is open-source and available at www.github.com/hubbs5/or- gym.


References

B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and C. Ye. ORL: Reinforcement Learning Benchmarks for Online Stochastic Op- timization Problems. 2019. URL http://arxiv.org/abs/1911.10641.

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. OpenAI Gym. pages 1–4, 2016. ISSN 00217298. doi: 10.1241/johokanri.44.113. URL http://arxiv.org/abs/

1606.01540.

E. G. Coffman, J. Csirik, G. Galambos, S. Martello, and D. Vigo. Bin Packing Approximation Algorithms: Survey and Classification. In Handbook of Combinatorial Optimization, pages 455–531. Springer, Heidel- berg, 2013.

G. B. Dantzig and G. Infanger. Multi-stage stochastic linear programs for portfolio optimization. Annals of

Operations Research, 45:59–76, 1993.

H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems, volume 53. Springer Verlag, Heidelberg, 2004. ISBN 9788578110796. doi: 10.1017/CBO9781107415324.004.

V. Pillac, M. Gendreau, C. Gueret, and A. L. Medaglia. A review of dynamic vehicle routing problems.

European Journal of Operational Research, (225):1–11, 2013.