A Massively Parallel Assay for Studying 5' UTR Dependent Translation Regulation in Mammalian Cells
Synthetic Biology Engineering Evolution Design SEED
2017
2017 Synthetic Biology: Engineering, Evolution & Design (SEED)
Poster Session
Confirmed Posters
The 5’ untranslated region (UTR) encodes sequence elements involved in translation regulation, such as translation initiation motifs, upstream start codons, upstream open reading frames (uORFs), and RNA binding protein recognition sites. Nonetheless, the rules by which 5’ UTRs exert their effects are poorly understood and, in particular, a quantitative understanding of the sequence-function relationship is lacking. To address this problem, we developed a massively parallel assay that allows us to study millions of 5’ UTR variants and their influence on translation in a single experiment. We collect quantitative data using a combination of polysome profiling and a modified ribosome profiling technique and then apply machine learning algorithms, such as convolutional neural networks (CNNs) to build a predictive model of the 5’ UTR. We have built up in vitro transcribed mRNA libraries with 50-mer randomized region in 5’ UTR following by EGFP coding sequence. Our modified ribosome profiling approach uses 4-thiouridine (s4UTP) in the construction of library. After transfection into mammalian cells, mRNA modified with s4U can be biotinylated and pulled down with streptavidin-coated beads. This enrichment procedure ensures that only ribosome footprints from our library mRNAs are sequenced rather than those from native mRNA. The ribosome profiling data can show in detail how ribosomes are interacting with sequence motifs within our 5’UTR. On the other hand, we ran polysome profiling on the cell lysates which got transfected with the EGFP library. The final output of polysome profiling is a polysome profile, which can serves as a measure of translation efficiency by measuring the number of ribosomes on transcripts. We trained a convolutional neural network on a 300,000-member library and their respective mean ribosome load to predict the translation efficiency given a 50-mer sequence in 5’ UTR region. The performance of the model has reached a high accuracy of 90%, and several motifs have be found through the first convolution layer in CNNs, such as AUGs, Kozak consensus sequence, stop codons and some potential RNA-binding protein recognition sites.