(147g) Development of Algorithms for Identification of Sparse Dynamic Models with Mass and Energy Conservation from Noisy Data | AIChE

(147g) Development of Algorithms for Identification of Sparse Dynamic Models with Mass and Energy Conservation from Noisy Data

Research Interests: mathematical modeling, Bayesian machine learning, uncertainty quantification, optimization

System identification has taken a new turn in the last decade with improved computational power along with the abundance of data coming from increased sensor availability at lower costs and enhanced storage capabilities. This poster discusses my research goals focusing on three fundamental challenges in the development of data-driven models for chemical systems: 1) corruption of real plant measurements with noise of unknown characteristics 2) complex structure and lack of interpretability of models from many data-driven approaches and 3) limited extrapolation capabilities as models are not guaranteed to satisfy mass, energy and thermodynamic constraints.

Model Development from Noisy Data

Bayesian approach to machine learning is popular for its great potential in uncertainty quantification while providing opportunity to incorporate user belief and/or prior knowledge into the estimation process. While this approach can be highly computationally expensive and thus may not be suitable for many online applications (e.g., model-based control and optimization), an approximate Bayesian approach with much less computational cost is developed here and implemented in an Expectation Maximization (EM) algorithm framework for simultaneous parameter estimation and quantification of uncertainties in model parameters and predictions. While the developed algorithms do not assume identical distribution of noise in the measured variables, possible noise correlations are also accounted for.

Model Interpretability and Sparsity

The method involves the approximation of system nonlinearities by a set nonlinear transformations of input variables and their interactive effects. The optimal subset of the resulting large family of basis functions is selected using a Branch and Bound (B&B) algorithm by which all possible combinations are systematically enumerated but the employment of some efficient pruning strategies alongside an estimability check ensure that only promising nodes are explored thereby minimizing the computational cost. The resulting set of hierarchically ranked candidate models are found to be consistently more parsimonious than models from existing algorithms as sparsity is promoted here by comparing candidate models based on an information criterion which incorporates a measure of estimability and rewards fitness while penalizing model size and complexity.

Mass, Energy and Other Physics-Constraints for Data-Driven Models

Many existing approaches that pursue the satisfaction of physics constraints by data-driven models are often system specific requiring some detailed knowledge of the system. Here only boundary conditions are exploited in the two algorithms proposed for the enforcement of mass and energy constraints. One approach incorporates a reconciliation step into the EM algorithm while the second approach exploits the unique model structure to impose a set of equality constraints that ensure that mass constraints are exactly satisfied. The developed approaches are therefore generalizable and applicable to any chemical system. Future research goals focus on the implementation of the Bayesian brain hypothesis where probabilistic states are used to generate predictions for a given set of inputs, after which prediction errors are used to update beliefs1.

Boiler Health Monitoring Tool for Power Plant

The developed algorithms are used to identify models from actual plant data from an industrial boiler. In collaboration with Electric Power Research Institute (EPRI) and Southern Company, the developed models are used in hybridization with first-principles model for health analyses of boiler tubes. Models are developed using data from both NGCC and coal fired power plants for which the collected data are characterized with much noise due to unstable ash conditions.

In comparison with existing methods, the developed algorithms are tested with both simulated and real plant data representing situations where we know the truth about the system and where we do not. For the theoretical case, a non-isothermal CSTR is considered while data from an industrial boiler of a power plant and the absorber unit of a solvent-based post combustion CO2 capture pilot plant are considered for the latter case. The performances of the resulting models are compared to those obtained from some state-of-the-art approaches (namely, ALAMO2 and SINDy3) which seek to identify interpretable data driven models for process systems. The BML algorithm demonstrates superior performance in terms of sparsity by requiring an average of 50% of the model parameters given by these algorithms to achieve equivalent performance. The proposed method also outperforms the existing methods in terms of robustness to noisy data as it maintains below 5% even when model has been identified from data corrupted with highly correlated noise.

References

  1. Bottemanne, H., Longuet, Y. & Gauld, C. The predictive mind: An introduction to Bayesian Brain Theory. L’Encéphale 48, 436–444 (2022).
  2. Cozad, A., Sahinidis, N. V. & Miller, D. C. A combined first-principles and data-driven approach to model building. Comput. Chem. Eng. 73, 116–127 (2015).
  3. Brunton, S. L., Proctor, J. L., Kutz, J. N. & Bialek, W. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U. S. A. 113, 3932–3937 (2016).