In silico Model for Mining the Cis-Regulatory Determinants of Tissue-Specific Gene Expression
International Conference on Plant Synthetic Biology and Bioengineering
2020
4th International Conference on Plant Synthetic Biology, Bioengineering, and Biotechnology
General Submissions
Genome Scale Engineering
Friday, October 30, 2020 - 12:00pm to 12:25pm
Gene expression across tissues is regulated by a combination of determinants, including the binding of transcription factors (TFs), along with other aspects of cellular state. Recent studies emphasize the importance of both genetic and epigenetic states â TF binding sites and binding site chromatin accessibility have emerged as potentially causal determinants of tissue specificity. To investigate the relative contributions of these determinants, we constructed three genome-scale datasets for both root and shoot tissues of the same Arabidopsis thaliana plants: TSS-seq data to identify Transcription Start Sites, OC-seq data to identify regions of Open Chromatin, and RNA-seq data to assess gene expression levels. For genes that are differentially expressed between root and shoot, we constructed a machine learning model to predict tissue of expression from chromatin accessibility and TF binding information upstream of TSS locations. The resulting model was highly accurate (over 90% auROC and auPRC), and our analysis of model contributions (feature weights) strongly suggests that patterns of TF binding sites within ~500 nt TSS-proximal regions are predominant explainers of tissue of expression in most cases. Thus, in plants, cis-regulatory control of tissue-specific gene expression appears to be primarily determined by TSS-proximal sequences, and rarely by distal enhancer-like accessible chromatin regions. This study highlights the exciting future possibility of a native TF site-based design process for the tissue-specific targeting of plant gene promoters.