(182f) Recovery of Transparent Dynamic Models from Black-Box Systems Using Symbolic Regression. | AIChE

(182f) Recovery of Transparent Dynamic Models from Black-Box Systems Using Symbolic Regression.

Authors 

Cohen, B. - Presenter, University of Connecticut
Bollas, G. M., University of Connecticut
Data-driven methods like neural-networks, support vector machines, and Gaussian processes can be leveraged to bridge the discrepancy between observed and expected behavior of dynamic processes. These hybrid modeling approaches create computationally inexpensive algorithms that perform well within the bounds of their training dataset but are opaque[1], [2]. Recent work has demonstrated that symbolic regression (SR) can be a machine learning approach that reduces opacity and increases the extrapolative ability of hybrid dynamic models[1], [3], [4]. These SR-based methods rely on significant domain and system knowledge or data that may not be practically obtainable.

Here, we propose a framework for discovering transparent hybrid models of dynamic systems when limited domain and system knowledge is available. The framework, shown in Figure 1, takes an incomplete model, genetic programming (GP) hyperparameters, and measured process data as inputs and returns an improved (ideally complete) dynamic system model. GP searches an expression space for functions to include in the incomplete model. An information-theoretic criterion evaluates the fitness of each expression by comparing an augmented representation of the incomplete system model against process data. Once an expression meets the information-theoretic fitness criterion, an improved or complete model is returned. The quality of the improved model depends on the quality and richness of information in the system data and the basis functions used in SR.

The proposed framework is tested on a one-dimensional, single component, dynamic, plug flow reactor (PFR) with varying amounts of knowledge and levels of complexity. Sensors with simulated noise collect process data at the outlet of the PFR. The package DEAP[5] performs genetic operations to augment prior reactor system knowledge, and the Bayesian Information Criterion (BIC) evaluates fitness. The framework first recovers the reaction term of an isothermal PFR with known convection and diffusion terms. The framework is then tasked with discovering more of the isothermal PFR equation as reaction and convection, and then reaction, convection, and diffusion are obscured. Finally, information theory is applied to explore the amount of sensor data needed to recover the true design equations and surrogate models that may predict output sensor data well with less computational burden at the cost of accurately describing the physical phenomena inside the reactor. The study is then extended to a non-isothermal reactor, in which the framework must discover mass and energy balances from synthetic data.

Acknowledgements

This study was supported by the UTC Institute for Advanced Systems Engineering (UTC-IASE) at the University of Connecticut (UConn).

References

[1] H. Narayanan, M. N. Cruz Bournazou, G. Guillén Gosálbez, and A. Butté, “Functional-Hybrid modeling through automated adaptive symbolic regression for interpretable mathematical expressions,” Chemical Engineering Journal, vol. 430, p. 133032, Feb. 2022, doi: 10.1016/j.cej.2021.133032.

[2] S. Zendehboudi, N. Rezaei, and A. Lohi, “Applications of hybrid models in chemical, petroleum, and energy systems: A systematic review,” Applied Energy, vol. 228, pp. 2539–2566, Oct. 2018, doi: 10.1016/j.apenergy.2018.06.051.

[3] N. M. Mangan, T. Askham, S. L. Brunton, J. N. Kutz, and J. L. Proctor, “Model selection for hybrid dynamical systems via sparse regression,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 475, no. 2223, p. 20180534, Mar. 2019, doi: 10.1098/rspa.2018.0534.

[4] H. Vaddireddy, A. Rasheed, A. E. Staples, and O. San, “Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data,” Physics of Fluids, vol. 32, no. 1, p. 015113, Jan. 2020, doi: 10.1063/1.5136351.

[5] F.-A. Fortin, F.-M. de Rainville, M.-A. Gardner, M. Parizeau, and C. Gagne, “DEAP: Evolutionary Algorithms Made Easy,” Journal of Machine Learning Research, vol. 13, pp. 2171–2175, 2012.