(402a) Or-Gym: A Reinforcement Learning Library for Operations Research Problems

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Machine Learning and Intelligent Systems I

Time

Tuesday, November 17, 2020 - 8:00am to 8:15am

Authors

Hubbs, C. D. - Presenter, The Dow Chemical Company

Sahinidis, N. - Presenter, Carnegie Mellon University

Sarwar, O., Carnegie Mellon University

Grossmann, I., Carnegie Mellon University

Perez, H., Carnegie Mellon University

OR-Gym: A Reinforcement Learning Library for Operations

Research Problems

Christian Hubbs^1,2, Owais Sarwar¹, Hector Perez¹, Nikolaos Sahinidis¹, and Ignacio Grossmann¹

¹Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213

²Digital Fulfillment Center, Dow Chemical, Midland, MI 48642

Keywords: Machine Learning, Reinforcement Learning, Optimization, Scheduling, Stochastic Program- ming

1 Abstract

We introduce OR-Gym, an open-source benchmark in the form of OpenAI Gym for developing reinforcement learning algorithms to address operations research problems [Brockman et al., 2016]. Reinforcement learning has been widely applied to game-playing and surpassed the best human-level performance in many domains, yet there are few use-cases in industrial or commercial settings.

We seek to provide a standardized library for the research community who wishes to extend RL into commercial applications by building on top of the preceding work and releasing OR-Gym, a single library that relies on the familiar OpenAI interface for RL, but contains problems relevant for industrial use. To this end, we have incorporated the benchmarks in Balaji et al. [2019], while extending the library to address the traveling salesman problem, knapsack problem [Kellerer et al., 2004], multi-dimensional bin packing [Coffman et al., 2013], multi-echelon news vendor problem, portfolio optimization [Dantzig and Infanger,

1993], and vehicle routing problem [Pillac et al., 2013]. These problems cover logistics, finance, engineering, and are common in many business operation settings. In each case, we select a prototypical version from the literature to benchmark reinforcement learning and other optimal approaches against.

RL problems are formulated as Markov Decision Processes (MDP), meaning they are probabilistic, se- quential decision making problems. MDPâ€™s are defined by the state, which informs an agent to select an action, which then receives a reward during the transition to the subsequent state. This framework is not widely used in the optimization community, so we make explicit our thought process as we reformulate many optimization problems to fit into the MDP mold without loss of generality.

Many current RL libraries such as the canonical OpanAI Gym, have many interesting problems, but problems that are not directly relevant to industrial use. Moreover, many of these problems (e.g. the Atari suite) lack the same type of structure as classic optimization problems, and thus are primarily amenable to model-free RL techniques, that is RL algorithms that learn with little to no prior knowledge of the dynamics of the environment they are operating in. Bringing well-studied optimization problems to the RL community may encourage more integration of model-based and model-free methods to reduce sample complexity and provide better overall performance.

It is our goal that this work encourages further development and integration of RL into optimization and the OR community while also opening the RL community to many of the problems and challenges that the OR community has been wrestling with for decades.

We explicitly present the design decisions we made to translate these problems to MDPâ€™s, as well as the

results of the benchmark algorithms. All code is open-source and available at www.github.com/hubbs5/or- gym.

References

B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and C. Ye. ORL: Reinforcement Learning Benchmarks for Online Stochastic Op- timization Problems. 2019. URL http://arxiv.org/abs/1911.10641.

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. OpenAI Gym. pages 1â€“4, 2016. ISSN 00217298. doi: 10.1241/johokanri.44.113. URL http://arxiv.org/abs/

^1606.01540^.

E. G. Coffman, J. Csirik, G. Galambos, S. Martello, and D. Vigo. Bin Packing Approximation Algorithms: Survey and Classification. In Handbook of Combinatorial Optimization, pages 455â€“531. Springer, Heidel- berg, 2013.

G. B. Dantzig and G. Infanger. Multi-stage stochastic linear programs for portfolio optimization. Annals of

Operations Research, 45:59â€“76, 1993.

H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems, volume 53. Springer Verlag, Heidelberg, 2004. ISBN 9788578110796. doi: 10.1017/CBO9781107415324.004.

V. Pillac, M. Gendreau, C. Gueret, and A. L. Medaglia. A review of dynamic vehicle routing problems.

European Journal of Operational Research, (225):1â€“11, 2013.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(402a) Or-Gym: A Reinforcement Learning Library for Operations Research Problems

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Computing and Systems Technology Division

Advances in Machine Learning and Intelligent Systems I

Tuesday, November 17, 2020 - 8:00am to 8:15am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams