(432e) Fast Symbolic Regression with Constraints

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Machine Learning and Intelligent Systems I

Time

Wednesday, November 16, 2022 - 9:16am to 9:35am

Authors

Sarwar, O. - Presenter, Carnegie Mellon University

Sahinidis, N., Georgia Institute of Technology

Symbolic Regression (SR) is a technique for fitting mathematical expressions to data, most commonly used for the task of knowledge discovery--whereby underlying physical relationships are learned from the regression model produced. The SR model is typically formulated as a binary expression tree, the nodes of which represent either regression variables, constants, or mathematical operators. Traditional approaches SR employ Genetic Programming, has been shown to be efficient and yield good models [1].

There has been recent interest in applying mixed-integer nonlinear programming (MINLP) to the SR problem [2-3]. This approach has the advantage of returning the provably optimal symbolic regression model, and allowing for constraints on the model response to be enforced naturally. Unfortunately, the mathematical optimization approach requires significant computational effort and, as a result, exact MINLP approaches are limited to small problems.

We propose a SR algorithm that first relaxes the integrality constraints of the MINLP formulation in [3] to solve an inexpensive NLP, we then use the values of the relaxed integer variables to probabilistically assign variables, constants, or operators to nodes in the SR expression tree. We then solve another NLP to refine the resulting expressions. Our algorithm returns SR expressions with lower error than those found by solving the MINLP, yet orders of magnitude faster. In addition, our algorithm yields interpretable regression models with lower error than those returned by other, popular machine learning packages.

We also leverage the mathematical optimization component of our algorithm to enforce constraints on the model response surface, the first SR algorithm to do so. We show that constrained SR allows users to impose domain-knowledge to yield models that generalize better than those generated by unconstrained SR.

References

[1] Schmidt, M. and H. Lipson, Distilling free-norm natural laws from experimental data, Science, 324, 81-85, 2009.

[2] Cozad, A. and N. V. Sahinidis, A global MINLP approach to symbolic regression, Mathematical Programming, 170, 97-119, 2018.

[3] Kim, J., S. Leyffer and P. Balaprakash, Learning symbolic expressions: Mixed-integer formulations, cuts, and heuristics, https://arxiv.org/abs/2102.08351, 2021.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(432e) Fast Symbolic Regression with Constraints

AIChE Annual Meeting

2022

2022 Annual Meeting

Computing and Systems Technology Division

Advances in Machine Learning and Intelligent Systems I

Wednesday, November 16, 2022 - 9:16am to 9:35am

Authors

Topics

More Conference Links

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams