(373ai) Multi Agent Reinforcement Learning and Graph Neural Networks for Inventory Management | AIChE

(373ai) Multi Agent Reinforcement Learning and Graph Neural Networks for Inventory Management

Authors 

del Rio Chanona, A., Imperial College London
Classic methods to solve the inventory control problem involve heuristics such as the (s,S) and (r,Q) policy [1, 2]. These heuristics are traditionally used due to the ease of their implementation but lack adaptability to changing demand patterns and coordination amongst different entities leading to sub-optimal performance [2]. Another common method involves dynamic programming, a mathematical optimization technique which solves complex problems by breaking them down into small subproblems. In a complex system, this may not be feasible due to the curse of dimensionality. Reinforcement learning (RL) offers a promising alternative approach by combining the strengths of dynamic programming and heuristics, offering an approximate dynamic programming solution. This allows RL to adapt better to changing demand patterns and uncertainties in the supply chain, enhancing decision-making in supply chains. However, as the supply chain grows in size, traditional RL, and Linear Programming (LP) methods may struggle due to the increased complexity and computational requirements. Moreover, both LP and RL can provide optimal solutions but require full online information about the system, which may not always be feasible or practical in real-world scenarios when information sharing constraints are present.

Multi-agent reinforcement learning (MARL) offers a promising solution to address the limitations of traditional RL and LP methods. MARL does not require global online information and can operate as a distributed decision-making framework, allowing individual agents to make decisions based on local observations and interactions. This decentralized approach enhances adaptability and coordination in large-scale supply chains, making it a viable alternative for optimizing inventory control. The framework proposed follows the Centralized Training, Decentralized Execution Paradigm, where central state information is shared offline during training, but solely local state information is required at an online level. This is shown in Figure 1.

In this work, we develop a multi agent RL framework with Graph Neural Networks (GNNs) for multi-echelon inventory management, leveraging the inherent graph structure of supply chains to learn hidden interdependencies. Unlike other RL studies, we redefine the action space to parametrize a heuristic inventory policy (s,S), enhancing adaptability, practicality, and explainability for real-world implementation. Our first framework uses the aggregated vector from the GNN to train the critic, guiding the learning process for more effective inventory strategies. To address increased computational complexity with more agents, our second framework, illustrated in Figure 2, employs global mean pooling to aggregate the vectors, reducing dimensionality and computational complexity without compromising the critic’s effectiveness. Both frameworks leverage the supply chain's structure to learn hidden interdependencies, enhancing communication and coordination between entities for improved decision-making in the multi-agent system.

The effectiveness of our collaborative approach is shown by testing the trained policies on a series of disruptions such as the bullwhip effect and fluctuations in costs. Our framework shifts computational costs from online to offline, ensuring faster decision-making compared to traditional optimization methods used in inventory control. As a result, the methodology proposed shows promising scalability with number of agents for a decentralized and online decision-making framework while maintaining collaboration between entities. In summary, the contribution of this work is two-fold: the parametrization of a heuristic policy enables explainability and early adoption of state-of-the-art methods in industry, and the synergy between multi-agent RL and GNNs highlights the importance of leveraging the inherent graph structure of supply chains. This approach paves the way for more efficient, adaptable, and resilient supply chain operations.

References

[1] Jackson, I., Tolujevs, J. and Kegenbekov, Z., 2020. Review of inventory control models: a classification based on methods of obtaining optimal control parameters. Transport and Telecommunication, 21(3), pp.191-202.

[2] Brunaud, B., Laínez‐Aguirre, J.M., Pinto, J.M. and Grossmann, I.E., 2019. Inventory policies and safety stock optimization for supply chain planning. AIChE journal, 65(1), pp.99-112.

[3] Mousa, M., van de Berg, D., Kotecha, N., del Rio-Chanona, E.A. and Mowbray, M., 2023. An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems. arXiv preprint arXiv:2307.11432.

[4] Liu, X., Hu, M., Peng, Y. and Yang, Y., 2022. Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management. Available at SSRN.