(345q) Classification of Cardiomyocyte Content Differentiated from Human Induced Pluripotent Stem Cells

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Computing and Systems Technology Division

Session

Interactive Session: Data and Information Systems

Time

Tuesday, November 9, 2021 - 3:30pm to 5:00pm

Authors

Mohammadi, S. - Presenter, Auburn University

Finklea, F., Auburn University

Hashemi, M., Auburn University

Williams, B., Auburn University

Lipke, E., Auburn University

Cremaschi, S., Auburn University

The human heart is one of the least regenerative organs in the body, and cardiovascular diseases are the number one cause of death in the United States.¹ Current treatments either treat the symptoms of cardiovascular diseases or decrease the associated risk. Consequently, the production of human cardiac muscle cells, i.e., cardiomyocytes (CM), can lead to potential cell therapies and high-throughput drug screening for cardiovascular diseases. The CMs can be produced via the differentiation of the human induced pluripotent stem cells (hiPSC). Many studies have been conducted to understand the differentiation process and investigate the impact of critical process parameters on the critical quality attributes of the CMs produced via hiPSC differentiation.^2,3 Differentiation of hiPSCs to CMs is a complex, expensive, and time-consuming process with high variability in the outcomes. A high number of process parameters impact the CM quality attributes. There is limited fundamental understanding to build models for the design and optimization of a reliable manufacturing process.

To overcome these challenges, we investigated machine learning techniques to identify the critical process parameters that impact the CM content on day 10 of hydrogel encapsulated hiPSC differentiation in microspheroids. We built a classification model to determine whether the CM content would be sufficient or not on day 10. The CM content on day 10 is a critical quality attribute that should be high enough to continue the production towards heart tissue maturation. We investigated two approaches for building the classification model, and this presentation will discuss each method and the results in detail. The first approach utilizes the data collected from bio-process experiments as the inputs for the construction of the classification model. In the second approach, the input data for building the classifiers are the phase-contrast images of the microspheroids taken on day 5 of differentiation.

The machine learning techniques used for the first approach are feature engineering, feature selection, and classification (Figure 1). With feature engineering, new features are extracted from the existing features with the aim of incorporating expert knowledge. Using feature selection, the combinations of the features, which could be a strong set of predictors, are identified.⁴ Finally, using the selected features, the classifiers are trained. Three data-driven models, Random Forest (RF)⁵, Gaussian Process (GP)⁶, and Support Vector Machines (SVM)⁷, were trained as classifiers. The bio-process features, which describe the experimental conditions, include initial cell number, cell concentration, the post-freeze passage of the cells, size and axial ratio of the microspheroids, differentiation media, CHIR molecule concentration, and PEG-fibrinogen concentration. Nine new features were extracted from the bio-process features using feature engineering: the surface and volume of the microspheroids, the surface-to-volume ratio, CHIR molecule concentration per surface, CHIR molecule concentration per volume, the ratio of CHIR molecule concentration and surface per volume, and inverse of the ratios. The differentiation media, which is a categorical feature, was converted to numerical variables using one-hot encoding.⁸

The feature selection methods used in this study were a filter method⁹ followed by principal component analysis (PCA)¹⁰, embedded methods^11,12, or wrapper methods¹³. Using the filter method, only one of the features, which had correlations above 0.85, was kept yielding the filtered feature set. In PCA, the principal components (PCs) describing 90% of the input data variance were selected for building the classification model. The built-in functions of RF and GP modeling were used as the embedded feature selection methods for choosing the features with a significant impact on the prediction. In wrapper methods, different combinations of the features are used to build the classifier, and the set of features with the best classification performance is selected as the final input feature set.¹⁴ We investigated forward selection, backward elimination, and bidirectional methods^15,16 as wrapper methods. The features are gradually added to the classifier model in the forward selection method, and the model with the best performance is selected. In the backward elimination method, the process is the opposite of the forward selection. In each step, the features are gradually eliminated from the feature set. The bidirectional method is a combination of the two. All three methods were employed with the filtered features and PCs as inputs. The performance of the models was compared based on Matthewâ€™s correlation coefficient (MCC)¹⁷ and accuracy¹⁸.

In the second approach, images were used as the input for building the classification model. The discussions with our experimental collaborators suggest that the cell images taken on day 5 of the differentiation (Figure 2) are indicative of the final CM content on day 10. We investigated if this information could be captured by the machine-learning techniques and compared it to the models trained using the bio-process features. For preprocessing, the images were augmented to increase the number of available data points. Each image was both flipped and rotated 180°. The Histogram of Oriented Gradient (HOG)¹⁹ was added as an additional feature. The PCA was used as the feature selection method, and the PCs describing 95% of the input variance were chosen. The classifier model utilizes SVMs. The performance metrics for evaluating the models were accuracy and MCC.

Eighty-six bio-process data points and 301 images used for modeling were collected from the experiments where the CMs were produced by a single-step cell handling in a 3D microenvironment. In this scaffold-based approach, the hiPSCs were encapsulated in PEG-fibrinogen extracellular matrix using a novel and cost-effective microfluidic system²⁰ (Figure 1). The selected features were used to construct the models to classify the CM content on day 10 of the differentiation into two groups of â€œsufficientâ€ (CM content > 65%) and â€œinsufficientâ€ (CM content > 65%).

The best classifier trained using the bio-process features as inputs is the GP model with features selected by the forward selection method on PCs. This model had an accuracy of 75% and an MCC of 0.46. The PCs selected by the forward selection method were not a strong descriptor of input variance data, which suggested more cell growth-related features may be required for improving the classifiers. The best model using images as inputs had an accuracy of 74% and an MCC of 0.49, which was comparable to the results obtained using the bio-process parameters. The current work focuses on combining the data from the bio-process experiments and data from images to construct an ensemble model with higher accuracy and MCC.

References

Murphy SL, Xu J, Kochanek KD, Arias E. Mortality in the United States, 2017. NCHS Data Brief. 2018;(328):1-8.
Kropp C, Kempf H, Halloin C, et al. Impact of Feeding Strategies on the Scalable Expansion of Human Pluripotent Stem Cells in Single-Use Stirred Tank Bioreactors. Stem Cells Transl Med. 2016;5(10):1289-1301. doi:10.5966/sctm.2015-0253
Halloin C, Schwanke K, Löbel W, et al. Continuous WNT Control Enables Advanced hPSC Cardiac Processing and Prognostic Surface Marker Identification in Chemically Defined Suspension Culture. Stem Cell Reports. 2019;13(2):366-379. doi:10.1016/j.stemcr.2019.06.004
Blum AL, Langley P. Artificial Intelligence Selection of relevant features and examples in machine. Artif Intell. 1997;97(1-2):245-271.
Breiman LEO. Random Forests. 2001:5-32.
Williams CKI, Rasmussen CE. Gaussian Processes for Machine Learning. Vol 2. MIT press Cambridge, MA; 2006.
Drucker H, Shahrary B, Gibbon DC. Support vector machines: Relevance feedback and information retrieval. Inf Process Manag. 2002;38(3):305-323. doi:10.1016/S0306-4573(01)00037-1
Potdar K, S. T, D. C. A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. Int J Comput Appl. 2017;175(4):7-9. doi:10.5120/ijca2017915495
Soper HE, Young AW, Cave BM, Lee A, Pearson K. On the Distribution of the Correlation Coefficient in Small Samples. Appendix II to the Papers of â€œStudentâ€ and R. A. Fisher. Biometrika. 1917;11(4):328. doi:10.2307/2331830
Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24(6):417.
Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S. Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol. 2018;26(1):329-340.
Naqvi S. A Hybrid Filter-Wrapper Approach for FeatureSelection. 2011.
Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. 1995.
Das S. Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml. Vol 1. ; 2001:74-81.
JoviÄ‡ A, BrkiÄ‡ K, BogunoviÄ‡ N. A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). ; 2015:1200-1205.
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(Mar):1157-1182.
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta - Protein Struct. 1975;405(2):442-451. doi:https://doi.org/10.1016/0005-2795(75)90109-9
Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 2009;45(4):427-437. doi:10.1016/j.ipm.2009.03.002
Freeman WT, Roth M. Orientation Histograms for Hand Gesture Recognition. Gesture. 1994.
Seeto WJ, Tian Y, Pradhan S, Kerscher P, Lipke EA. Photocrosslinked Microspheres: Rapid Production of Cellâ€Laden Microspheres Using a Flexible Microfluidic Encapsulation Platform (Small 47/2019). Small. 2019;15(47):1970254. doi:10.1002/smll.201970254

Topics

Biomedical Engineering

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(345q) Classification of Cardiomyocyte Content Differentiated from Human Induced Pluripotent Stem Cells

AIChE Annual Meeting

2021

2021 Annual Meeting

Computing and Systems Technology Division

Interactive Session: Data and Information Systems

Tuesday, November 9, 2021 - 3:30pm to 5:00pm

Authors

Topics

More Conference Links

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams