(15c) Leveraging Reinforcement Learning and Evolutionary Algorithms for Dynamic Multi-Objective Decision Making in Supply Chain Management | AIChE

(15c) Leveraging Reinforcement Learning and Evolutionary Algorithms for Dynamic Multi-Objective Decision Making in Supply Chain Management

Authors 

del Rio Chanona, A., Imperial College London
Qiu, Y., Imperial College London
Reinforcement learning (RL) has gained traction in supply chain management due to its adaptability in uncertain environments. However, traditional inventory control methods such as heuristics and dynamic programming have limitations in adapting to changing demand patterns and coordination among entities, leading to sub-optimal performance. RL combines the strengths of dynamic programming and heuristics, offering an approximate dynamic programming solution that can better adapt to uncertainties in the supply chain.

However, most studies in RL focus on single-objective financial rewards, which limits their applicability in addressing broader supply chain objectives, including environmental sustainability and social responsibility. Therefore, there is an increased need to develop multi-objective reinforcement learning (MORL) frameworks. Unlike traditional multi-objective optimization methods that aim to find a set of Pareto optimal solutions and face challenges in handling dynamic environments, uncertainties, and high complexity, MORL can adapt in real-time, handle uncertainties effectively, and provide more robust and scalable solutions. Therefore, MORL presents an opportunity to address complexities of modern supply chains whilst aligning with the triple bottom line sustainability principle - balancing profitability, social responsibility, and environmental sustainability.

In this work, we integrate multi-objective evolutionary algorithms (MOEA) within a reinforcement learning framework to find a set of adaptable policies that effectively balance three conflicting objectives (financial, environmental and, lead time). We employ the multi objective evolutionary algorithm to adapt the neural network (policy) parameters by applying the evolutionary algorithm in the parameter space, resulting in a Pareto front in the policy space. AGE-MOEA is used to evaluate and sort the policies, resulting in a final population of policies with non-dominated solutions representing an undominated Pareto front set of policies as shown in

The effectiveness of our method is shown through a series of case studies simulating various disruptions as shown in Figure 2 and Figure 3. Throughout these, our approach consistently outperforms a traditional single-policy method, showcasing adaptability and robustness across diverse supply chain disruptions. Therefore, by seamlessly switching between policies from the trained Pareto set in response to disruptions, it achieves significantly improved performance and enables the integration of sustainability principles and conflicting objectives in data-driven supply chain management decision-making.

References

[1] Guillén‐Gosálbez, G. and Grossmann, I.E., 2009. Optimal design and planning of sustainable chemical supply chains under uncertainty. AICHE journal, 55(1), pp.99-121.

[2]Mousa, M., van de Berg, D., Kotecha, N., del Rio-Chanona, E.A. and Mowbray, M., 2023. An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems. arXiv preprint arXiv:2307.11432.

[3] Van Moffaert, K. and Nowé, A., 2014. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1), pp.3483-3512.

[4] Hayes, C.F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L.M., Dazeley, R., Heintz, F. and Howley, E., 2022. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1), p.26.