(140d) Applying Data Science Techniques to Solubility Data for Synthetic Compounds: An Expedited End-to-End Workflow from Data Collection to Crystallization Process Design
AIChE Annual Meeting
2018
2018 AIChE Annual Meeting
Pharmaceutical Discovery, Development and Manufacturing Forum
Data Analytics for Process Prediction
Tuesday, October 30, 2018 - 1:45pm to 2:10pm
This workflow starts with the collection of solubility data using standardized equipment sets and approaches into templated tables. These tables contextualize the solubility data by joining each measured value (concentration, temperature, composition, etc.) with relevant meta-data (solute purity, x-ray diffraction results, equipment, date, etc.). The contextualized solubility data is ingested within a database â providing a single source for all solubility data. A templated visualization then consumes the data from this source. Further it can be filtered within the visualization as necessary (e.g., limited to a specific solute and limited to only data collected for a specific lot of solute).
To facilitate solvent selection as the first task in crystallization process development, this visualization automatically applies a decision tree to collected solubility data to classify solvents with regards to crystallization as solvents that are likely: âgood for a thermal processâ, âsolvent within antisolvent driven crystallizationâ, or âgood antisolventsâ. Based on this classification the scientist working with the system can either begin the task of process design or apply a predictive solubility model that has been integrated within the visualization to determine other solvent systems that may be worth investigating. The application of this model uses simple R scripts with open source libraries (e.g., non-linear optimization packages) doing the âheavy-liftingâ.
Once a solvent system has been selected, the task of process design begins using an automated script to fit and select the best-model for solubility data across ranges of temperature/antisolvent ratios within that system. The contextualization of the solubility data allows for visual identification and rapid exclusion of outliers within the model fitting step. Once a solubility model has been selected and fit for a given system â a constrained optimization algorithm is applied to determine the process that affords the highest yield given user supplied constraints (which are modified to ensure the process meets desired chemical/physical purity needs). This initial crystallization process conditions are then attempted and refined as necessary.
This end-to-end workflow has resulted in significant time savings, and allowed for the setting of consistent expectations for initial designs across projects. Further, it is an early demonstration of the integration of modelling and data science techniques within process development.