(207b) Entropy-Maximizing TD3-Based Reinforcement Learning for Controlling and Optimizing Complex Dynamical Systems
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computing and Systems Technology Division
Advances in Process Control I
Tuesday, November 7, 2023 - 8:18am to 8:36am
In this work, we present an entropy-maximizing TD3 method (EMTD3) to address the challenges associated with the stochastic and deterministic actor-critic methods discussed above [7]. In our proposed method, a stochastic actor with an entropy-maximizing term in the objective function is deployed at the beginning to ensure sufficient explorations. This entropy-maximizing term adds uncertainty to the policy and facilitates the systematic exploration of the action space, leading to better learning performance than deterministic methods. Afterward, a deterministic actor is employed to focus on local exploitation and discover the optimal solution. The proposed method combines the advantages of stochastic actor-critic methods in exploring the space and those of deterministic methods in fast convergence. As a result, the proposed EMTD3 method can outperform existing TD3 and other DRL approaches in terms of sample efficiency and fast convergence to the global optimum for continuous state-action space.
Finally, the effectiveness of the proposed EMTD3 method is verified through two case studies. In the first case study, our proposed method is employed to facilitate the tuning of a proportional-integral-derivation (PID) control for regulating the temperature of a non-linear continuous stirred tank reactor (CSTR) system. Simulation results show that our approach can significantly improve the sample efficiency by almost 45% compared with other DRL (e.g., TD3 and DDPG) methods in discovering the global solution. For the second case study, we applied the proposed EMTD3 method to design superior fast-charging protocols for Lithium-ion batteries. Results show that with our method, the optimal charging strategy can be rapidly discovered with much less episodes compared with other DRL methods.
References:
[1] Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P., Gopaluni, B., Deep reinforcement learning for process control: A primer for beginners. arXiv preprint arXiv:2004.05490, 2020.
[2] Paternina-Arboleda, C.D., J.R. Montoya-Torres, and A. Fabregas-Ariza. Simulation-optimization using a reinforcement learning approach. in 2008 Winter Simulation Conference. 2008. IEEE.
[3] Grondman, I., et al., A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012. 42(6): p. 1291-1307.
[4] Fujimoto, S., H. Hoof, and D. Meger. Addressing function approximation error in actor-critic methods. in International conference on machine learning. 2018. PMLR.
[5] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M., Deterministic policy gradient algorithms. in International Conference on Machine Learning. 2014.
[6] Haarnoja, T., et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. in International Conference on Machine Learning. 2018. PMLR.
[7] Chowdhury, M.A. and Q. Lu, A novel entropy-maximizing TD3-based reinforcement learning for automatic PID tuning. arXiv preprint arXiv:2210.02381, 2022.