(371aa) Disruptive Arti?cial Intelligence (Reinforcement Learning) Based Predictive Control

Conference

AIChE Annual Meeting

Year

2019

Proceeding

2019 AIChE Annual Meeting

Group

Computing and Systems Technology Division

Session

Interactive Session: Systems and Process Control

Time

Tuesday, November 12, 2019 - 3:30pm to 5:00pm

Authors

Srinivas, S. - Presenter, TCS Research

Masampally, V., TCS Research

Runkana, V., TCS Research

Examining the consequences of a given designated process operating conditions over an assertive neighborhood towards the production objective and to yield the optimized value for the aforementioned process operating conditions in order to achieve the desired production target. Cognitive Artiï¬cial Intelligence optimal control strategies namely, Markov decision process (MDP, a discrete time stochastic control) is utilized for solving the complex sequential decision-making. This in turn, is posed as an optimization problem and solved through Reinforcement Learning (RL), a perspective to automating goal-directed learning and decision-making. RL is a subset of machine learning algorithms which learns to accomplish a complex objective (goal). This is achieved by self-autonomous agent of RL that interacts with their respective environment (posed as MDP) by collecting rewards to learn optimal behaviors. The algorithms leverages emerging state of the art, supervised deep learning neural network architectures for function approximation. RL algorithms approaches are sub-divided as policy optimization and dynamic programming. Dynamic programming RL algorithms are further sub-classiï¬ed as policy iteration, value iteration and Q-learning. Policy iteration algorithm includes policy evaluation (evaluate value function of a random policy) and then policy improvement (based on previous value function using the Bellman operator). This is repeated iteratively until policy converges. Value iteration algorithms includes ï¬nding optimal value function and then derive optimal policy from the optimal value function. Q-Learning maps each exclusive state-action pair to a value by estimating a value function. Deep Q-Learning (published by David Silver, Google DeepMind, 2015), Double Q-Network (published by Hado van Hasselt, Google DeepMind, 2016), Duelling Q-Network (published by Ziyu Wang, Google, DeepMind, 2015), Prioritized Experience Replay(PER-Double Q-Network, published by Tom Schaul, Google DeepMind, 2015) are extensions of vanilla Q-Learning for handling large discrete state-action space. Rainbow algorithm published by Matteo Hessel, Google DeepMind, 2017 combines improvements in the variant methods of the vanilla Q-Learning algorithms including Multi-step Returns, Distributional RL & Noisy Nets lead to the state of the art results when compared with a baseline of the individuals alone.

To handle continuous or stochastic action space, policy-based algorithms (Reinforce with policy gradients) are proposed, optimizes the policy without using a value function. A hybrid method namely, Advantage Actor-Critic (A2C) which consists of two distinct deep neural networks, a critic that measures quality of the action taken (value-based) & an actor that controls how our agent behaves (policy-based) stabilizes learning in comparison with the former. An extension to A2C namely, Asynchronous Advantage Actor-Critic (A3C) algorithm involves executing a set of environments in parallel and the policy gradient updates are done using the advantage function published by Volodymyr Mnih, Google DeepMind, 2016. For improving the stability, convergence and sample eï¬ƒciency of the stochastic policy gradient method. Proximal Policy Optimization (PPO), implements clipped surrogate objective on the policy update, published by John Schulman, Open AI, 2017. Trust Region Policy Optimization (TRPO), enforces Kullbackâ€“Leibler divergence constraint on the size of policy update at each iteration, published by John Schulman, UC Berkley, 2017. Kronecker-Factored Trust Region Actor-Critic(A2C) Policy Optimization(ACKTR), Kronecker-Factored Approximation Curvature (K-FAC) is utilized for the gradient update for both the critic and actor published by Yuhuai Wu, University of Toronto. Soft Actor-Critic(SAC), integrates the entropy computation of the policy into the reward to steer exploration. It is an oï¬€-policy actor-critic model published by Ziyu Wang, Google DeepMind, 2017.

The algorithms described above model the policy function as a probability distribution over actions for a know current state(stochastic). Deterministic Policy Gradients (DPG), published by David Silver, Google DeepMind, 2014 instead models the policy as a deterministic rather than stochastic. Deep Deterministic Policy Gradients(DDPG), incorporates DPG with DQN & learns a stable Q-function by experience replay and the ï¬xed target network. DDPG learns a deterministic policy & extends it to the continuous space with the actorcritic framework published by Lillicrap, Google DeepMind, 2015. Distributed Distributional Deep Deterministic Policy Gradients (D4PG), the distributional critic estimates the expected Q value as a random variable, multiple distributed parallel actors gather experience in parallel & implements Prioritized Experience Replay (PER). D4PG are model-free variants, oï¬€-policy, actor-critic algorithm which learns policies in high dimensional, continuous action spaces published by Gabriel Barth-Maron, Google DeepMind, 2018.

Augmented Random Search (ARS) algorithm published by Horia Mania, UC Berkely, 2018 is a random search method for training linear policies, utilized for continuous control problems ( augments the basic random search method) & achieves faster computations when compared to any other baseline RL algorithm. The subset of black-box optimization methods namely Evolution strategies(ES) are applied for a competitive alternative for training function approximators namely, deep neural networks for Reinforcement Learning. Evolution Strategies(ES), a kind of model-agnostic optimization approach by imitating Darwinâ€™s theory of the evolution of species by natural selection it learns the optimal solution. Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and Genetic Algorithms are utilized for function approximators. Deep Recurrent Q-Learning for Partially Observable MDPs published by Matthew Hausknecht, Microsoft Research, 2015 overcomes the limitation of the memory of RL agents. Distributional Reinforcement Learning with Quantile Regression published by Will Dabney, Google DeepMind, 2017 examines distinct ways of learning the value distribution rather than that of the traditional value function. GAN Qlearning, published by Thang Doan, McGill University, 2017 utilizes generative adversarial networks (GANs) for an alternative way of leveraging the distributional methodology to reinforcement learning for better learning the function approximator. Artiï¬cial Intelligence based Cognitive autonomous agents are all set for real time monitoring and predictive control. State of the art results are obtained for a Multi-Input Multi-Output(MIMO) real-time industrial scale problem. The above implemented algorithms, their architectures & the results obtained will be discussed in comparison to the baseline of traditional Model based Optimal Control. Thanks largely to GPU-backed machines for the extensive computations.

Topics

Process Automation & Control

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

The Foundations of Computer Aided Process Design (FOCAPD) Conference

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

Upcoming Conferences & Events

The Foundations of Computer Aided Process Design (FOCAPD) Conference

2024 BASF Sponsored CCPS Faculty Workshop

Artificial Intelligence in PSM: First Steps

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 Brazil Student Regional Conference

2024 Dow Sponsored CCPS Process Safety Faculty Workshop

2024 International Mammalian Synthetic Biology Workshop (mSBW)

2024 Chemical Ventures Conference

2024 China Chem-E-Car Competition

CEP: July 2024

CEP: June 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(371aa) Disruptive Arti?cial Intelligence (Reinforcement Learning) Based Predictive Control

AIChE Annual Meeting

2019

2019 AIChE Annual Meeting

Computing and Systems Technology Division

Interactive Session: Systems and Process Control

Tuesday, November 12, 2019 - 3:30pm to 5:00pm

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams