(375n) Large Language Models for Discovering Equations
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, October 29, 2024 - 3:30pm to 5:00pm
Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks they were not explicitly trained on. We explore the potential of LLMs to perform symbolic regression by finding closed-form interpretable models from observational data in physical sciences. Symbolic Regression (SR) is a machine learning technique that searches through a âspace of possible equationsâ to identify those that balance accuracy and simplicity for a dataset. In this work, we designed an iterative workflow using GPT-4 as a symbolic regressor. We instruct GPT-4 to suggest expressions from data through prompting, which are evaluated for complexity and loss and then sent back to suggest better ones, optimizing for both. We show how strategic prompting improves GPT-4's performance, and our observations indicate that the model can identify target model expressions when they are concise and contain basic math operators. Although GPT-4 does not outperform established SR programs where target model equations are complex with low loss, these machine-learning models require zero training and minimal programming knowledge removing barriers in interdisciplinary research. Additionally, working with natural language makes integrating background knowledge with data in SR a straightforward process.