(27an) Nanopore Sequencing of 8-Letter Xenonucleic Acids | AIChE

(27an) Nanopore Sequencing of 8-Letter Xenonucleic Acids

Authors 

Marchand, J. A., University of California, Berkeley
The 4-letter genetic alphabet found in Nature (A, T, G, C) is the fundamental solution to storing and transferring genetic information. Though elegant, this natural code is restricted to the use of four building blocks. This limitation has led to the development of chemically synthesized nucleic acid analogs known as xenonucleotides, which can be used as additional base pairs in DNA. While an expanded alphabet holds great promise for the future of biotechnology, the modern technologies that exist for standard DNA bases are costly and limited for xenonucleotides. This holds particularly true for sequencing, where xenonucleotides must rely on low-throughput methods such as LCMS assays. In this work, we present methods for sequencing DNA with 8 hydrogen-bonding nucleobases (A, T, G, C, B, S, P, Z). DNA libraries are used to build kmer models for commercial nanopore sequencing platforms. We developed Xenomorph, a package for end-to-end data processing from raw nanopore data to basecalling, enabling facile sequencing of xenonucleotides. We find that while sequencing accuracy is context-dependent, a consensus read gives >96% accuracy for all four xenonucleotides. The methods described here bring xenonucleotides to third-generation sequencing, greatly lowering the barrier to exploring expanded DNA alphabets for use in biosensors, diagnostics, and synthetic biology.