(246c) Pathologies of Neural Networks As Models of Discrete-Time Dynamical Systems | AIChE

(246c) Pathologies of Neural Networks As Models of Discrete-Time Dynamical Systems

Authors 

Cui, T. - Presenter, Johns Hopkins University
Psarellis, G. - Presenter, Johns Hopkins University
Bertalan, T., Johns Hopkins University
Kevrekidis, I. G., Princeton University
Reich, S., University of Potsdam
Discrete-time models, which mostly arise in the processing of time series, are used to obtain predictions of evolving dynamical systems states. However, if the data is actually generated from a continuous-time system (i.e. an autonomous ordinary differential equation), and while predictions in short-term might be acceptable, the long-term dynamics and bifurcations are qualitatively wrong. These issues including, but not limited to, frequency locking [1] and noninvertible dynamics [2], are also observed in neural networks iterated as discrete-time systems (see Figure). In this work, we will use data from two models, the Brusselator model and the Takens-Bogdanov system, to train accurate, short-term predictive discrete time neural networks, and then demonstrate these pathologies. After this, we will introduce existing and novel methods to address the detection and quantification of such pathologies.

An obvious remedy is to construct the architecture of the neural network so that it is able to directly learn the continuous-time systems (i.e. ResNet, and neural ODE [3]). We will also discuss other ways to deal with the different patholoogies, for example the use of RevNet architectures to avoid noninvertibility (which however does not avoid the wrong bifurcations) [4].

While fixed-timestep learned flow maps exhibit the pathologies listed above, this does not mean that they are completely inapplicable to continuous-time dynamics. In the second portion of this talk, we describe an approach in which we train a finite-timestep flow map on variable-timestep data, and then use automatic differentiation to create from this an approximation of the infinitesimal generator (the ODE right-hand-side) for the system. We demonstrate that this approach converges to the true ODEs for a number of test cases.

Finally, we consider the approach of using a loss template based on a standard numerical integration algorithm (such as Runge Kutta, or forward or backward Euler) to train a neural network to approximate the ODE directly. We demonstrate by analysis and example that the approximate ODE will differ from the truth systematically, depending on the algorithm used to template the neural network. This "mirror" backward error analysis is intuitively related to the forward error these algorithms entail in their solutions of initial value problems.

[1] Gicquel, N., Anderson, J. S., and Kevrekidis, I. G. (1998). Noninvertibility and resonance in discrete-time neural networks for time-series processing. Physics Letters A, 238(1), 8–18. doi:10.1016/s0375-9601(97)00753-6

[2] Rico-Martinez, M., Adomaitis, R. A., and Kevrekidis, I. G. (2000). Noninvertibility in neural networks. Computers and Chemical Engineering. 24, 2417-2433. doi:10.1016/s0098-1354(00)00599-8

[3] Chen, R.T., Rubanova, Y., Bettencourt, J. and Duvenaud, D. (2018). Neural ordinary differential equations. arXiv preprint. arXiv:1806.07366.

[4] Gomez, A.N., Ren, M., Urtasun, R., and Grosse, R. B. (2017). The reversible residual network: backpropagation without storing activations. arXiv preprint. arXiv:1707.04585.

Figure: frequency locking in the neural network that approximated the discrete-time Brusselator model. For each subgraph, the dynamics will start and end at the same state after n (e.g. n = 13 in the first subgraph) iterations. Connecting states (shown in red points) along with time will form a polygon with n vertices.