(493p) Data Mining PubChem with Signature: Prediction of Biological Activity Using Cheminformatics for High-Throughput Screening of Small Molecules | AIChE

(493p) Data Mining PubChem with Signature: Prediction of Biological Activity Using Cheminformatics for High-Throughput Screening of Small Molecules

Authors 

Weis, D. C. - Presenter, Tennessee Technological University


High-throughput screening (HTS) is a technique to discover new lead compounds by physically screening a large library against a specified biological target. HTS was primarily available only to the pharmaceutical industry in the past. Because of the Molecular Libraries Initiative, part of the NIH Roadmap for Medical Research, HTS is now accessible to academic researchers where the data collected is deposited in a public database called PubChem. The results from more than 1,000 different HTS experiments are currently readily available in PubChem to download. Cheminformatic tools are crucial to effectively interpret and utilize this vast amount of data.

We recently demonstrated a method to create a model from existing HTS data in PubChem, and predict new compounds likely to be active for additional screening.[1] PubChem bioassay 846[2] screened for potential anticoagulant therapeutics by identifying inhibitors of factor XIa, which is involved in the blood coagulation mechanism. A classification model with 89% accuracy was created using a support vector machine (SVM) with the Signature molecular descriptor. Approximately 12 million compounds deposited in PubChem, but not present in the factor XIa assay, were virtually screened by the SVM. Based on metrics associated with SVM magnitudes and molecular descriptor overlap between candidate molecules with those from bioassay 846, we identified 296 compounds (from the 12 million not previously tested) as active. Docking studies using the crystal structure of factor XIa were performed on known actives and on these 296 predicted actives, with the predicted actives all showing binding energies consistent with the known actives. Selected compounds have been purchased and will be tested for inhibition of factor XIa using the 96-well format. In addition, the technique for data mining PubChem has been generalized for application to the many other bioassays also deposited in PubChem.

1. D. C. Weis, D. P. Visco, Jr. and J. L. Faulon, J. Mol. Graph. Model., 2008.

2. Factor XIa 1536 HTS Dose Response Confirmation, http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=846, Accessed January 14, 2008.