(432e) Fast Symbolic Regression with Constraints
AIChE Annual Meeting
2022
2022 Annual Meeting
Computing and Systems Technology Division
Advances in Machine Learning and Intelligent Systems I
Wednesday, November 16, 2022 - 9:16am to 9:35am
There has been recent interest in applying mixed-integer nonlinear programming (MINLP) to the SR problem [2-3]. This approach has the advantage of returning the provably optimal symbolic regression model, and allowing for constraints on the model response to be enforced naturally. Unfortunately, the mathematical optimization approach requires significant computational effort and, as a result, exact MINLP approaches are limited to small problems.
We propose a SR algorithm that first relaxes the integrality constraints of the MINLP formulation in [3] to solve an inexpensive NLP, we then use the values of the relaxed integer variables to probabilistically assign variables, constants, or operators to nodes in the SR expression tree. We then solve another NLP to refine the resulting expressions. Our algorithm returns SR expressions with lower error than those found by solving the MINLP, yet orders of magnitude faster. In addition, our algorithm yields interpretable regression models with lower error than those returned by other, popular machine learning packages.
We also leverage the mathematical optimization component of our algorithm to enforce constraints on the model response surface, the first SR algorithm to do so. We show that constrained SR allows users to impose domain-knowledge to yield models that generalize better than those generated by unconstrained SR.
References
[1] Schmidt, M. and H. Lipson, Distilling free-norm natural laws from experimental data, Science, 324, 81-85, 2009.
[2] Cozad, A. and N. V. Sahinidis, A global MINLP approach to symbolic regression, Mathematical Programming, 170, 97-119, 2018.
[3] Kim, J., S. Leyffer and P. Balaprakash, Learning symbolic expressions: Mixed-integer formulations, cuts, and heuristics, https://arxiv.org/abs/2102.08351, 2021.