In silico Model for Mining the Cis-Regulatory Determinants of Tissue-Specific Gene Expression | AIChE

In silico Model for Mining the Cis-Regulatory Determinants of Tissue-Specific Gene Expression

Authors 

Fraser, V. N. - Presenter, Oregon State University
Ansariola, M., Oregon State University
Filichkin, S., Oregon State University
Ivanchenko, M. G., Oregon State University
Bright, Z. A., Oregon State University
Gould, R. A., Oregon State University
Ozguc, O. R., Oregon State University
O'Neil, S., Oregon State University
Megraw, M., Oregon State University
Gene expression across tissues is regulated by a combination of determinants, including the binding of transcription factors (TFs), along with other aspects of cellular state. Recent studies emphasize the importance of both genetic and epigenetic states – TF binding sites and binding site chromatin accessibility have emerged as potentially causal determinants of tissue specificity. To investigate the relative contributions of these determinants, we constructed three genome-scale datasets for both root and shoot tissues of the same Arabidopsis thaliana plants: TSS-seq data to identify Transcription Start Sites, OC-seq data to identify regions of Open Chromatin, and RNA-seq data to assess gene expression levels. For genes that are differentially expressed between root and shoot, we constructed a machine learning model to predict tissue of expression from chromatin accessibility and TF binding information upstream of TSS locations. The resulting model was highly accurate (over 90% auROC and auPRC), and our analysis of model contributions (feature weights) strongly suggests that patterns of TF binding sites within ~500 nt TSS-proximal regions are predominant explainers of tissue of expression in most cases. Thus, in plants, cis-regulatory control of tissue-specific gene expression appears to be primarily determined by TSS-proximal sequences, and rarely by distal enhancer-like accessible chromatin regions. This study highlights the exciting future possibility of a native TF site-based design process for the tissue-specific targeting of plant gene promoters.