(458h) An Optimization-Based Feature Selection Methodology for the Discovery of Biomarkers from High-Dimensional Data in Clinical Applications

Conference

AIChE Annual Meeting

Year

2014

Proceeding

2014 AIChE Annual Meeting

Group

Food, Pharmaceutical & Bioengineering Division

Session

Omics in Biopharmaceutical Bioprocessing

Time

Wednesday, November 19, 2014 - 10:50am to 11:10am

Authors

Guzman, Y. A. - Presenter, Princeton University

Jayachandran, D., Purdue University

Ramkrishna, D., Purdue University

Floudas, C. A., Princeton University

Biomarkers are measurable indicators of biological processes that can be applied in clinical settings for disease diagnosis and prognosis, risk-factor assessment, disease staging, and as indicators of treatment efficacy. The usage of high-throughput –omics platforms enables large-scale studies that can generate expansive datasets with thousands of candidate biomarkers (i.e., data features). These methodologies provide a great opportunity for untargeted biomarker discovery in clinical applications, but the huge volume of data yields a subsequent data analysis problem. For a biomarker to be accepted into clinical praxis, it must be subjected to large-scale, often expensive clinical validation stages [1]; the ultimate success of a discovery-phase biomarker study lies in its ability to produce a small subset of biomarkers with the greatest probability of success in large-scale targeted studies [1,2]. This high data dimensionality per sample is almost always coupled with a comparatively low number of samples in the discovery phase, yielding a statistically difficult feature selection problem with a high probability of overfitting or of selecting data artifacts as meaningful candidates [3]. In response, a number of feature-selection algorithms have been proposed in the context of biomarker selection with varying levels of efficacy [4-7].

Building on a previous study in which mixed-integer linear optimization models were proposed to classify healthy and diseased samples [8], we developed a novel optimization-based methodology for candidate biomarker selection. We evaluated our methodology using experimental data sets with a priori known discriminating data features from the literature and evaluate our method in the context of selection stability and accuracy [7,9,10]. Our method has been applied to a proteomics dataset of plasma samples from breast cancer patients to select diagnostic biomarkers [11]. The methodology yielded a multiple-reaction monitoring assay for further clinical validation. We also present the application of our method to a metabolomics study of patients undergoing chemotherapy. The utilization of a subset of metabolites that can predict which patients will develop chemotherapy-induced toxicity can guide the treatment decisions of practitioners.

References:

1. Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 24(8):971-83 (2006).

2. Srinivas PR, Verma M, Zhao Y, Srivastava S. Proteomics for cancer biomarker discovery. Clin Chem. 48(8):1160-9 (2002).

3. Rubingh CM, Bijlsma S, Derks EP, Bobeldijk I, Verheij ER, Kochhar S, Smilde AK. Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics. 2:53-61 (2006).

4. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 23(19):2507-17 (2007).

5. Hilario M, Kalousis A. Approaches to dimensionality reduction in proteomic biomarker studies. Brief Bioinform. 9(2):102-8 (2008).

6. Baek S, Tsai CA, Chen JJ. Development of biomarker classifiers from high-dimensional data. Brief Bioinform. 10(5):537-46 (2009).

7. Christin C, Hoefsloot HC, Smilde AK, Hoekman B, Suits F, Bischoff R, Horvatovich P. A critical assessment of feature selection methods for biomarker discovery in clinical proteomics. Mol Cell Proteomics. 12(1):263-76 (2013).

8. Baliban RC, Sakellari D, Li Z, Guzman YA, Garcia BA, Floudas CA. Discovery of biomarker combinations that predict periodontal health or disease with high accuracy from GCF samples based on high-throughput proteomic analysis and mixed-integer linear optimization. J Clin Periodontol. 40(2):131-9 (2013).

9. He Z, Yu W. Stable feature selection for biomarker discovery. Comput Biol Chem. 34(4):215-25 (2010).

10. Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 26(3):392-8 (2010).

11. Riley CP, Zhang X, Nakshatri H, Schneider B, Regnier FE, Adamec J, Buck C. A large, consistent plasma proteomics data set from prospectively collected breast cancer patient and healthy volunteer samples. J Transl Med. 9:80 (2011).

Topics

Biological Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.