(208c) Self-Supervised Learning Methods for Drug Substance and Drug Product Characterization in the Pharmaceutical Industry | AIChE

(208c) Self-Supervised Learning Methods for Drug Substance and Drug Product Characterization in the Pharmaceutical Industry

Authors 

Salami, H. - Presenter, Georgia Institute of Technology
Skomski, D., Merck & Co. inc.
Machine learning methods have been applied to a variety of problems in chemical engineering research and development. Among these are data-driven, neural network-based methods that are the standards for most computer vision related tasks. Typically in the form of convolutional networks, these are useful tools for various tasks such as classification and segmentation. In the context of the pharmaceutical industry, such tasks include analyzing raw image data generated by testing samples of drug product and drug substance for different modalities. Examples include microfluidic or powder-dispersed optical imaging data (for characterizing particles in sterile liquid formulations and oral formulations) as well as data generated from in situ cameras in crystallization vessels. Naturally, such characterizations have important implications regarding the regulatory aspects of product development.

Data-driven in nature, these models usually rely on large amounts of data to achieve goals such as classifying subvisible particles in a solution or detecting extraneous matter or impurity crystals in a vessel. However, training these models for such tasks requires labeled data that needs to be prepared by a human user, which can be a tedious task and very time consuming. In this talk, we will discuss how one can leverage a family of self-supervised or weakly supervised learning methods to facilitate performing speedy training tasks and thereby accelerate practical applications. These methods include autoencoder-based and contrastive learning-based approaches. In essence, the methods are built on the idea to invoke the networks to perform a pre-text task in which they learn the most important features of the available data without relying on labels provided by an operator. We will discuss applying such approaches to characterizing different systems from protein aggregates in sterile liquid formulations to impurity particles in small-molecule crystallization processes.