(432e) Fast Symbolic Regression with Constraints | AIChE

(432e) Fast Symbolic Regression with Constraints

Authors 

Sarwar, O. - Presenter, Carnegie Mellon University
Sahinidis, N., Georgia Institute of Technology
Symbolic Regression (SR) is a technique for fitting mathematical expressions to data, most commonly used for the task of knowledge discovery--whereby underlying physical relationships are learned from the regression model produced. The SR model is typically formulated as a binary expression tree, the nodes of which represent either regression variables, constants, or mathematical operators. Traditional approaches SR employ Genetic Programming, has been shown to be efficient and yield good models [1].

There has been recent interest in applying mixed-integer nonlinear programming (MINLP) to the SR problem [2-3]. This approach has the advantage of returning the provably optimal symbolic regression model, and allowing for constraints on the model response to be enforced naturally. Unfortunately, the mathematical optimization approach requires significant computational effort and, as a result, exact MINLP approaches are limited to small problems.

We propose a SR algorithm that first relaxes the integrality constraints of the MINLP formulation in [3] to solve an inexpensive NLP, we then use the values of the relaxed integer variables to probabilistically assign variables, constants, or operators to nodes in the SR expression tree. We then solve another NLP to refine the resulting expressions. Our algorithm returns SR expressions with lower error than those found by solving the MINLP, yet orders of magnitude faster. In addition, our algorithm yields interpretable regression models with lower error than those returned by other, popular machine learning packages.

We also leverage the mathematical optimization component of our algorithm to enforce constraints on the model response surface, the first SR algorithm to do so. We show that constrained SR allows users to impose domain-knowledge to yield models that generalize better than those generated by unconstrained SR.

References

[1] Schmidt, M. and H. Lipson, Distilling free-norm natural laws from experimental data, Science, 324, 81-85, 2009.

[2] Cozad, A. and N. V. Sahinidis, A global MINLP approach to symbolic regression, Mathematical Programming, 170, 97-119, 2018.

[3] Kim, J., S. Leyffer and P. Balaprakash, Learning symbolic expressions: Mixed-integer formulations, cuts, and heuristics, https://arxiv.org/abs/2102.08351, 2021.