(121c) Accelerating Product Development with Diverse Training Data

Conference

AIChE Spring Meeting and Global Congress on Process Safety

Year

2022

Proceeding

2022 Spring Meeting and 18th Global Congress on Process Safety Proceedings

Group

Industry 4.0 Topical Conference

Session

Data-Driven and Hybrid Approaches to Development of New Products II

Time

Tuesday, April 12, 2022 - 4:30pm to 5:00pm

Authors

Kroenlein, K. - Presenter, National Institute of Standards and Technology

Bernasek, S. M., Citrine Informatics

Kubie, L., Citrine Informatics

When designing a product, experts meld a variety of data including historical experiments, manufacturability limitations, and fundamental physical and chemical understanding. The breadth of these data resources has made awareness of all relevant information for a given design difficult, and the growth of data volumes following substantial digitalization efforts has exacerbated this challenge. Combining these disparate data streams is labor intensive, as differences in schema, assumptions about format, and variations in taxonomy make merging without human intervention often impracticable â€” even without considering lab notebooks or other non-digital assets. These data are heterogeneous in structure, sparsely populated, and often statistically small.

The approach Citrine Informatics has taken for both storing all of this connected data and then leveraging it for AI is one of divide and conquer. First, we use the Graphical Expression of Materials Data (GEMD [1]) model to provide structure and detailed information about process history to data sources using partner-defined terminology. This allows comparison and synthesis across data sources without forcing complex records into a rigid schema. Second, we use graph queries defined in our citrine-python library [2] to normalize data into a tabular format with consistent units. The queries are expressed using the same organization-defined terms from GEMD. Third, as the heterogeneity of the data means that not all values will be defined for all rows, we use networks of models to fill empty cells through transfer learning [3]. Finally, in model validation we use leave-one-cluster-out cross validation [4] to develop reasonable uncertainty expectations for the system of models in light of the population imbalance common to industrial data. Combining these methods into a unified data and modeling stack has resulted in a tool that can engage with data sources where they are today, allow for forward compatibility as data continues to accumulate and evolve, and permit reuse and retraining of historical models with minimal human intervention.

[1] https://citrineinformatics.github.io/gemd-docs/

[2] https://citrineinformatics.github.io/citrine-python/

[3] M Hutchinson, E Antono, B Gibbons, S Paradiso, J Ling, and B Meredig. Overcoming data scarcity with transfer learning. arXiv preprint arXiv:1711.05099, 2017

[4] B Meredig, E Antono, C Church, M Hutchinson, J Ling, S Paradiso, B Blaiszik et al. â€œCan machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery.â€ Molecular Systems Design & Engineering (2018).

Topics

Materials

Product Design

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(121c) Accelerating Product Development with Diverse Training Data

AIChE Spring Meeting and Global Congress on Process Safety

2022

2022 Spring Meeting and 18th Global Congress on Process Safety Proceedings

Industry 4.0 Topical Conference

Data-Driven and Hybrid Approaches to Development of New Products II

Tuesday, April 12, 2022 - 4:30pm to 5:00pm

Authors

Topics

More Conference Links

Register

Accommodations

Cancelation Policy

Contact Us

Code of Conduct

Beware of Hotel and Attendee-list Scams