Direct RNA Sequencing to Detect Modified Nucleotides Using Oxford Nanopore | AIChE

Direct RNA Sequencing to Detect Modified Nucleotides Using Oxford Nanopore

Authors 

Timp, W. - Presenter, Johns Hopkins University
Direct RNA sequencing reads have the capability to provide exon connectivity, accurate measurement of gene fusion events, an estimate of poly-A tail length and the ability to directly detect RNA modifications. As an international consortium of Oxford Nanopore MinION and GridION users, we have generated a comprehensive dataset composed of 13M direct RNA and 24M cDNA sequences and alignments based on poly-A RNA isolated from the human GM12878 reference cell line. We have made this dataset publically available here: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/RNA.md. From this dataset, we aim to reanalyze raw current data for discrimination of modified and canonical nucleotides using Nanopolish (https://github.com/jts/nanopolish). In order to accurately detect modifications on native RNA molecules, we first need to empirically establish how different modifications in variable sequence contexts will modulate the nanopore current, as per our methodology to detect 5mC in DNA (Simpson et al 2017). This method requires generation and sequencing of training sets. To this end, we have generated training sets using multiple methods to characterize N6- methyladenosine in different sequence contexts, focusing our GM12878 validation on the METTL3 motif (GGm6ACU). First, we used in vitro transcription with mixes of modified and unmodified nucleotides to generate RNAs with differing levels of m6A. Next, we used commercial direct synthesis (Trilink) of RNA oligos with precisely synthesized placement of modified and unmodified METTL3 motifs, then ligated this to a “handle” RNA strand with a polyA tail to increase transcript length and enable sequencing. Using these approaches, we identified variable current signatures in regions with known modifications, and are currently expanding our training sets to increase the scope of nucleotide context and the ability to call multiple modifications simultaneously. We will present our current work on these training sets and their application to our GM12878 dataset.