(116a) Making the Most of Small Data

Conference

AIChE Spring Meeting and Global Congress on Process Safety

Year

2023

Proceeding

2023 Spring Meeting and 19th Global Congress on Process Safety

Group

Industry 4.0 Topical Conference

Session

Data-Driven and Hybrid Approaches to Development of New Products I

Time

Wednesday, March 15, 2023 - 8:00am to 8:30am

Authors

Heiber, M. - Presenter, University of Akron

Farrow, C., Enthought

For many traditional innovation-driven organizations, scientific data is generated to answer specific immediate research questions and then archived to protect IP with little attention paid to the future value of reusing the data to answer other similar or tangential questions. Data is essentially a side product of R&D and not viewed as a primary output. As a result, important experimental process details and implied contextual information are often not recorded. In addition, data is often not formatted in a consistent and well-structured manner, which makes it difficult and expensive to parse large volumes of historical data files that may be archived in a network drive or data lake. Finally, the experimental workflows that produce this data are manual and often require coordination between multiple teams. Manual sample preparation and handoff between labs, manual data transfer between computers, manual raw data analysis on instrument computers, and waiting for external analytical measurement requests to be completed all make new data generation very slow and expensive. Altogether, this leads to R&D labs having surprisingly small datasets that are actually clean enough and complete enough to serve as training data for a machine learning model.

Faced with their â€˜small dataâ€™ situation, researchers and managers often feel that they may not yet benefit from pursuing data-driven approaches to new product development. They are not sure what can be done given the current state of their data, and they are not sure how to efficiently gather more data to alleviate the issue. They may even think it is impossible to pursue this approach given the current cost of new data generation. It is not uncommon to feel stuck without a clear path forward. Other organizations push forward a high-level vision and implement data platforms and hire data science and engineering teams that then soon struggle to generate value due to the unique challenges inherent to scientific small data problems.

At Enthought, we have tackled many small data challenges in materials and chemicals product development and have employed multiple strategies for getting the most value out of small data to meet strategic innovation goals. There is not a one-size-fits-all solution, but in this talk, weâ€™ll present practical tips for how teams can make the most of what they have and set a course towards continuous improvement. Weâ€™ll discuss how teams can get started with little to no data at all and how they can leverage existing domain knowledge to get further with less data through well-crafted experimental designs, feature engineering, informed model constraints and priors, and improved data quality. Weâ€™ll also discuss how to assess existing data generation workflows and prioritize workflow improvements that will accelerate new data generation and improve data quality using software tools to streamline data labeling tasks and to automate or assist users with raw data analysis. Weâ€™ll then conclude with a discussion about workflow innovation where domain experts explore screening approaches that leverage theory and simulation together with proxy measurements to accelerate learning at lower cost.

Topics

Research and Development

Product Design

Chemicals & Materials

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(116a) Making the Most of Small Data

AIChE Spring Meeting and Global Congress on Process Safety

2023

2023 Spring Meeting and 19th Global Congress on Process Safety

Industry 4.0 Topical Conference

Data-Driven and Hybrid Approaches to Development of New Products I

Wednesday, March 15, 2023 - 8:00am to 8:30am

Authors

Topics

More Conference Links

Cancelation Policy

Register

Accommodations

Ethylene Producers' Conference

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams