(357e) Third-Generation Sequencing of 12-Letter DNA (ATGCBSPZXKJV) | AIChE

(357e) Third-Generation Sequencing of 12-Letter DNA (ATGCBSPZXKJV)

Authors 

Marchand, J. A., University of California, Berkeley
4-letter DNA (A, T, G, C) as found on Earth is the fundamental biomolecule for the storage, transfer, and evolution of biological information. While Nature has limited itself to the four building blocks, the Watson-Crick framework theoretically allows up to 12 DNA bases. Xenonucleic acids are chemically synthesized nucleic acid analogs that expand the existing alphabet, probing the limits of natural biology and nucleic acid technologies. With additional biochemical properties, xenonucleic acids hold great potential for a new generation of therapeutics, diagnostics, and biomaterials. However, molecular tools for xenonucleic acid manipulation lag decades behind tools available for standard bases. In this work, we present methods for sequencing DNA with 12 hydrogen-bonding nucleobases (A, T, G, C, B, S, P, Z, X, K, J, V). Diverse DNA libraries are used to build models for commercial nanopore sequencing platforms, and we develop a software package to readily basecall the individual xenonucleotides. Finally, we show first-time sequencing of 12-letter DNA, demonstrating that our models are able to discern this expanded alphabet. The strategies described here are versatile and cost-efficient, expanding the molecular toolset to work with modified nucleotides and bringing supernumerary genetics closer to robust use in biotechnologies and synthetic biology.