(27cf) A Systems Engineering Computer-Assisted Biomarker Detection Framework for Autism Spectrum Disorder Using Proteomic Data | AIChE

(27cf) A Systems Engineering Computer-Assisted Biomarker Detection Framework for Autism Spectrum Disorder Using Proteomic Data

Authors 

Yousefi Zowj, F. - Presenter, Auburn University
He, Q. P., Auburn University
Autism spectrum disorder (ASD) is a neurodevelopmental disease that affects approximately 1 in 44 children in the United States [1]. While behavioral criteria, such as difficulties in communication and social interaction, are used to diagnose ASD, recent proteomic analyses have detected metabolic differences in the plasma/serum of individuals with ASD [2]. For example, a group of proteins was proposed as a blood biomarker for ASD detection in [3]. However, identifying reliable biomarkers for ASD has been challenging due to significant variations in protein levels caused by confounding factors such as age, gender, diet, and comorbid diseases [4, 5].

To address this issue, we propose systematically generating physically meaningful novel features more resilient to these confounding factors than the original measurements, such as protein levels. We then propose an automated computer-assisted biomarker detection framework that integrates these novel features with a hybrid feature selection technique and a linear machine learning (ML) model. The effectiveness of the framework was demonstrated using a dataset of serum samples from 76 typically developing (TD) boys and 78 boys with ASD, aged 18 months to 8 years, which were examined to identify potential biomarkers for ASD using SomaLogic’s SOMAScan™ assay 1.3K platform [3].

Our proposed framework identifies a panel of 12 features, including a combination of protein levels and novel features defined in this work. Using the dataset mentioned above, the proposed method detects ASD with high accuracy - achieving an area under the curve (AUC) of 0.940, outperforming the previous study of 0.860. In addition to the proteins used as features in previous studies, a novel set of engineered features that includes the ratio of proteins is proposed, which reduces within-class variations due to their resilience to confounding factors. The proposed feature selection technique combines a sequential filter and a wrapper feature selection method to tackle their respective limitations. A linear ML model is then developed using training samples and independently tested using a set of hold-out samples. The linear ML model is chosen for its robustness to overfitting and superior interpretability.

Our methodology introduces systems engineering principles and techniques to ASD detection research. Specifically, biomarkers beyond the traditional physical trait are defined to include bio-information that can only be extracted by considering their interactions and correlations. The systems engineering perspective provides additional insights into the ASD mechanism, which can lead to additional discoveries in the future.

References:

  1. Matthew J Maenner et al. “Prevalence and characteristics of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, United States, 2018”. In: MMWR Surveillance Summaries 11 (2021), p. 1.
  2. Fatir Qureshi et al. “Multivariate Analysis of Metabolomic and Nutritional Pro- files among Children with Autism Spectrum Disorder”. In: Journal of Personalized Medicine 6 (2022), p. 923.
  3. Laura Hewitson et al. “Blood biomarker discovery for autism spectrum disorder: A proteomic analysis”. In: PLoS One 2 (2021), e0246581.
  4. Eleftherios P Diamandis. “Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems”. In: Journal of the National Cancer Institute 5 (2004), pp. 353–356.
  5. Keith A Baggerly et al. “Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer”. In: Journal of the National Cancer Institute 4 (2005), pp. 307–309.

Topics