Storage capacity and computational power of modern computers have enabled the collection and analysis of large amounts of data for prediction and decision-making in biological and environmental systems [1]. The high-computational power combined with advancements in big-data analytics and machine-learning algorithms [2, 3] have motivated us to apply the feature selection and supervised learning techniques to solve an important human-health problem: the classification of environmental toxicants and their mixtures affecting the estrogen receptor (ER), an important hormone pathway in human pathophysiology. We measured single cell level response of selected chemical toxicants on several steps of the ER pathway using high throughput microscopy and automated high content image analysis [4]. Experiments were conducted with 45 compounds defined by the US Environmental Protection Agency (EPA) as a reference set with known
in vivo activity as either ER agonists, ER antagonists, or inactive compounds. For each condition, 77 size, shape and intensity features of cell and nuclear compartments were measured. Following consolidated, summary statistics of cell level data for each feature and compound were calculated by fitting an appropriate probability distribution (gamma or normal) function to compress the generated big datasets. It was observed that some cell level feature measurements exhibited significant measurement noise that varied across replicates/experiments. Hence, reproducible features were identified, and the biologically relevant features were selected. Using these features, an ER agonist or antagonist classification model was developed using quadratic discriminant analysis, which showed good predictive accuracy. Importantly, the classification model was developed using both data-driven and domain knowledge-based feature selection techniques to classify the
in vitro cellular impact of environmental compounds as ER agonist or antagonist.
In summary, our work combines high throughout microscopy data, high content image analysis, big data analytics and machine learning techniques to classify key biological responses to endocrine disrupting chemicals that influence estrogen receptor activity as a case study.
References
[1] Yin S and Kaynak O. Big data for modern industry: challenges and trends [point of view]. Proceedings of the IEEE. 2015; 103(2), 143-146.
[2] Onel M, Beykal B, Ferguson K, Chiu WA, McDonald TJ, Zhou L, et al. Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization. PloS one. 2019;14(10).
[3] Mukherjee R, Beykal B, Onel M, Szafran AT, Stossi F, Mancini MG, et al. Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms. Plos Computational Biology, (Under Review).
[4] Szafran AT, Stossi F, Mancini MG, Walker CL, Mancini MA. Characterizing properties of non-estrogenic substituted bisphenol analogs using high throughput microscopy and image analysis. PloS one. 2017;12(7):e0180141.