Deep Learning of the Regulatory Grammar of Yeast 5’ Untranslated Regions from 500,000 Random Sequences
Synthetic Biology Engineering Evolution Design SEED
2017
2017 Synthetic Biology: Engineering, Evolution & Design (SEED)
General Submissions
Session 2: High Throughput Design Space Exploration
Tuesday, June 20, 2017 - 1:45pm to 2:15pm
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. We have generated a model that predicts the translational efficiency of the 5â untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of nearly half a million 50 nucleotide-long random 5â UTRs and assayed them in a single massively parallel growth selection experiment. The resulting data have allowed us to quantify the impact on translation of Kozak sequence composition, upstream open reading frames (uORFs) and secondary structure. With this data, we have trained a convolutional neural network on the random library and validated it by predicting the translational efficiency of 5â UTRs that natively occur in yeast. The model additionally was used to computationally evolve highly translating 5â UTRs. We have confirmed experimentally that the great majority of the evolved sequences lead to higher translation rates than the starting sequences, demonstrating the predictive utility of this model.