(101a) Dataset Considerations for Rapid Product Development Applications | AIChE

(101a) Dataset Considerations for Rapid Product Development Applications

Authors 

Cardin, M., Prosensus
Nguyen, A., Prosensus
With careful consideration to the planning, capture, structure, evaluation and preprocessing of an experimental dataset, data-driven and hybrid approaches to product formulation can dramatically reduce development time and cost while improving knowledge retainment and expanding resource capabilities. The importance of spending time on these considerations should not be overlooked in favor of reaching fast preliminary modeling results (which may be misleading), nor should a dataset be quickly dismissed as insufficient without a meaningful and quantitative evaluation.

This presentation will discuss how to identify and overcome common pitfalls in formulation datasets, and will draw examples from various industries including polymers, specialty chemicals, and foods. ProSensus’ FormuSense software will be used to illustrate the typical steps required to optimally preprocess a raw formulation dataset for latent variable modeling and numerical optimization. Topics covered will include:

-structuring the raw data (such as identifying ingredient classes)

-detecting and resolving data anomalies (such as misspellings and missing ingredients)

-handling categorical variables (such as subject-matter expert knowledge)

-calculating ingredient class ratios and mixture properties to evaluate the impact of new ingredients

-meaningful statistics and visualizations to evaluate data suitability for modeling

-modeling approaches in the presence of missing data (such as raw material properties)