(15c) Leveraging Reinforcement Learning and Evolutionary Algorithms for Dynamic Multi-Objective Decision Making in Supply Chain Management
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10C: Design and Operations Under Uncertainty
Sunday, October 27, 2024 - 4:12pm to 4:33pm
However, most studies in RL focus on single-objective financial rewards, which limits their applicability in addressing broader supply chain objectives, including environmental sustainability and social responsibility. Therefore, there is an increased need to develop multi-objective reinforcement learning (MORL) frameworks. Unlike traditional multi-objective optimization methods that aim to find a set of Pareto optimal solutions and face challenges in handling dynamic environments, uncertainties, and high complexity, MORL can adapt in real-time, handle uncertainties effectively, and provide more robust and scalable solutions. Therefore, MORL presents an opportunity to address complexities of modern supply chains whilst aligning with the triple bottom line sustainability principle - balancing profitability, social responsibility, and environmental sustainability.
In this work, we integrate multi-objective evolutionary algorithms (MOEA) within a reinforcement learning framework to find a set of adaptable policies that effectively balance three conflicting objectives (financial, environmental and, lead time). We employ the multi objective evolutionary algorithm to adapt the neural network (policy) parameters by applying the evolutionary algorithm in the parameter space, resulting in a Pareto front in the policy space. AGE-MOEA is used to evaluate and sort the policies, resulting in a final population of policies with non-dominated solutions representing an undominated Pareto front set of policies as shown in
The effectiveness of our method is shown through a series of case studies simulating various disruptions as shown in Figure 2 and Figure 3. Throughout these, our approach consistently outperforms a traditional single-policy method, showcasing adaptability and robustness across diverse supply chain disruptions. Therefore, by seamlessly switching between policies from the trained Pareto set in response to disruptions, it achieves significantly improved performance and enables the integration of sustainability principles and conflicting objectives in data-driven supply chain management decision-making.
References
[1] GuillénâGosálbez, G. and Grossmann, I.E., 2009. Optimal design and planning of sustainable chemical supply chains under uncertainty. AICHE journal, 55(1), pp.99-121.
[2]Mousa, M., van de Berg, D., Kotecha, N., del Rio-Chanona, E.A. and Mowbray, M., 2023. An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems. arXiv preprint arXiv:2307.11432.
[3] Van Moffaert, K. and Nowé, A., 2014. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1), pp.3483-3512.
[4] Hayes, C.F., RÄdulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L.M., Dazeley, R., Heintz, F. and Howley, E., 2022. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1), p.26.