(185j) In Silico Discovery of Biomarker Proteins for Periodontitis Using High-Throughput Proteomics and Mixed-Integer Linear Optimization | AIChE

(185j) In Silico Discovery of Biomarker Proteins for Periodontitis Using High-Throughput Proteomics and Mixed-Integer Linear Optimization

Authors 

Baliban, R. C. - Presenter, Princeton University
Li, Z., Princeton University
Guzman, Y., Princeton University
DiMaggio, P. A. Jr., Princeton University
Garcia, B. A., Princeton University


The search for biomarkers which can act as predictors of periodontal disease at the initiation and progression stage of periodontitis has received considerable interest during the last decade [1].  The diagnostic potential of Gingival Crevicular Fluid (GCF) has been extensively investigated due to the possibility of non-invasive collection and the complexity of molecules that it contains [2].  GCF has been shown to be the transudate of gingival tissue interestitial fluid, but during periodontal disease it is transformed into inflammatory exudate which reflects the composition of serum and includes substances derived from the structural tissues of the periodontium and oral bacteria colonizing the gingival pocket [3].  Several substances (up to 90) including cytokines, proteolytic enzymes, bacterial-derived metabolites, or products of tissue degradation have been investigated as possible indicators or predictors of disease activity, but currently no chairside tests exist that can be reliably applied for accurate diagnosis of prognosis in clinical practice.

In this talk, we present a comprehensive proteomic analysis of GCF samples to uncover candidate biomarker proteins for the detection of periodontal status using the PILOT_PROTEIN webtool [4].  A complete list of human and bacterial proteins was compiled from the analysis of 24 GCF samples from a mixture of 12 healthy and 12 diseased patients [5].  Strong candidate human and bacterial biomarker proteins were identified for further analysis.  A mixed-integer linear optimization model was then developed to identify the optimal combination of biomarkers which could clearly distinguish a blind subject sample as healthy or diseased.  To adequately train the optimization model, an additional 31 GCF samples from 14 healthy and 17 diseased patients were collected and analyzed with PILOT_PROTEIN.

A thorough cross-validation of the model capability was performed by selecting a training set of size N from the 55 samples, optimizing the biomarker selection, and then testing on the remaining 55 - N samples.  One hundred random combinations of training sets were used to validate the accuracy of the model on the test sets.  The mathematical model was able to consistently achieve a high degree of accuracy (i.e., greater than 90%) when annotating the testing set samples as healthy of diseased.  The effect of training sample size was investigated and it was found that the model was able to maintain a high accuracy when utilizing a training set containing as few as ten samples.  The model was then trained on all 55 samples and tested on two different blind sets containing 20 samples and 21 samples.  Using an optimal combination of 6 healthy proteins and 2 bacterial proteins, the mathematical model was able to provide a perfect separation of healthy and diseased samples from both test sets.

[1] Champagne, C. M., Buchanan, W., Reddy, M. S., Preisser, J. S., Beck, J. D. & Offenbacher, S. (2003).  Periodontology 2000 31, 167.

[2] Buduneli, N. & Kinane, D. F. (2011). Journal of Clinical Periodontology 38, 85.

[3] Delima, A. J. & Van Dyke, T. E. (2003). Periodontology 2000 31, 55.

[4] Baliban, R. C., DiMaggio, P. A., Plazas-Mayorca, M. D., Garcia, B. J. & Floudas, C. A. (2012). Analytical Chemistry, submitted.

[5] Baliban, R.C., Sakellari, D., Li, Z., DiMaggio, P.D., Garcia, B.J., & Floudas, C.A. (2011). Journal of Clinical Periodontology 39, 203.