Interpretable Deep Learning Approaches to Uncover the Sequence Determinants of Cell-Type-Specific Transcription Factor Binding
International Conference on Epigenetics and Bioengineering
2018
2nd Epigenetics and Bioengineering Conference (EpiBio 2018)
Poster Session
Poster Session
Thursday, October 4, 2018 - 5:00pm to 7:05pm
Transcription factors (TFs) bind specific sequence motifs in regulatory DNA elements to activate or repress gene expression. Dysregulation of TF binding and transcription by non-coding genetic variation has been implicated in several human diseases. However, traditional TF motif discovery methods fail to adequately explain a large fraction of disease-associated non-coding variants. To address this issue, we present FelisNet, an interpretable deep learning model of cell-type specific binding of a large collection of TFs. FelisNet trains multi-task convolutional neural networks jointly on ChIP-seq data for each TF across multiple cell lines from the ENCODE project. The multi-cell type models exhibit significantly improved prediction performance over models trained on each cell type individually. Next, we use state-of-the-art model interpretation methods called DeepLIFT and TF-MoDISco to extract high-quality motifs and cell-type specific active motif instances. DeepLIFT assigns importance scores to individual nucleotides to explain the model predictions, while TF-MoDISco clusters these importance scores to identify recurring motifs implicitly learned by the models. We find that FelisNet motifs are superior to those obtained by traditional motif discovery methods, particularly for incorporating cofactor interactions, positional effects, and the influence of flanking nucleotides. We then use FelisNet to analyze and explain genetic variation associated with variation of TF binding, chromatin accessibility as well as complex traits and diseases.