(303f) Initializing the Internal States of Lstm Neural Networks Via Manifold Learning | AIChE

(303f) Initializing the Internal States of Lstm Neural Networks Via Manifold Learning

Authors 

Kemeth, F., Johns Hopkins University
Bertalan, T., Johns Hopkins University
Evangelou, N., Johns Hopkins University
Cui, T., Johns Hopkins University
Kevrekidis, I. G., Princeton University
There has been a long standing effort to derive dynamical systems from data, in particular for tasks such as prediction and control. One particular class of functions that excel in this are recurrent neural networks. Long-short term memory (LSTM) networks have gained an increasing amount of attention in recent years, in particular due to their ability to deal with the vanishing gradient problem and their potential to model partially-observed high-dimensional systems using a set of internal cell states [1]. For accurate predictions, however, the internal states have to be initialized properly, and a precise way guaranteed to find these initial values is still missing.

Here, we present a manifold-learning approach to initialize the internal state values of LSTM recurrent neural networks consistent with initial observed input data. Our approach is based on learning the intrinsic data manifold from the observed variables as a preprocessing step. Using concepts such as generalized synchronization, we argue that the converged internal states are a function on this learned manifold. We show that the dimension of this manifold indicates the required amount of observed input data for proper initialization. This ansatz is demonstrated on a partially observed chemical model system, where we show that initializing the internal LSTM states using this approach yields visibly improved performance compared to earlier "warm-start" initialization approaches [2]. We furthermore discuss the potential application of our approach to other recurrent neural network variants such as reservoir computing [3]. Finally, we show that learning the data manifold can transform the problem of partially observed dynamics into a fully observed one, facilitating the identification of nonlinear dynamical systems [4].

[1] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997

[2] Hans-Georg Zimmermann, Christoph Tietz, and Ralph Grothmann. Forecasting with Recurrent Neural Networks:12 Tricks, pages 687–707. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012

[3] Herbert Jaeger. The echo state approach to analysing and training recurrent neural networks. Technical report, Fraunhofer Institute for Autonomous Intelligent Systems, 2001

[4] Felix P. Kemeth, Tom Bertalan, Nikolaos Evangelou, Tianqi Cui, Saurabh Malani, and Ioannis G. Kevrekidis, Initializing LSTM internal states via manifold learning. (submitted), 2021