(729e) Prediction of siRNA Functionality Using Sequence and Thermodynamic Properties | AIChE

(729e) Prediction of siRNA Functionality Using Sequence and Thermodynamic Properties

Authors 

Roth, C. M. - Presenter, Rutgers University
Sheth, P. - Presenter, Rutgers University
Jain, A. - Presenter, Rutgers University
Shah, V. - Presenter, Rutgers University
Dunn, S. M. - Presenter, Rutgers University


Short interfering RNAs (siRNAs) are widely utilized in the laboratory for gene function studies and are being developed clinically to target viral genes, inflammatory targets and oncogenes. The selection of siRNA sequences is typically performed using software programs that rely upon heuristics and short motifs that have low predictive power. We have developed an approach to predicting siRNA functionality that combines mechanistic thermodynamic parameters with a novel representation of siRNA sequence. Specifically, the siRNA sequence is represented as a directed graph. We find that several graph metrics, including diameter and permanent, exhibit significantly different values in sequences found in a training set to be functional versus those found to be non-functional. We utilized these graph properties as features in a K nearest neighbors support vector machine classification algorithm, along with features representing the thermodynamic accessibility of the complementary region on the target mRNA and the kinetic accessibility of unwinding the antisense strand of the siRNA sequence, which is represented by the difference in thermodynamic stabilities of the 5'- and 3' termini. Applying this approach to a test set consisting of 103 functional and 93 non-functional sequences, we found that these features could be used to classify the functional sequences from the non-functional ones with a leave-one-out-cross-validation accuracy as high as 84.7%, a level significantly greater than that afforded by other classification approaches applied to the same data. Among the three types of features, the thermodynamics of target mRNA accessibility provided the greatest individual accuracy, but the three types of features were synergistic overall. Thus, the idea of combining mechanistic thermodynamic features with ones based on pure sequence representation is quite promising for the development of accurate siRNA prediction tools.