Researchers recently devised a method to expand DNA’s data storage by artificially extending the DNA alphabet. It’s part of a broad effort to use DNA to hold computer information.   “DNA is 1 million times denser than the densest mainstream digital storage device,” Luis Ceze, a computer science and engineering professor who studies DNA storage at the University of Washington, told Lifewire in an email interview.

Long-Lasting Storage

DNA contains four chemicals—adenine, guanine, cytosine, and thymine—often referred to by the initials A, G, C, and T. They form the famed double helix into combinations that scientists can decode or sequence. The researchers expanded DNA’s already broad capacity for information storage by adding seven synthetic nucleobases to the existing four-letter lineup. “Imagine the English alphabet,” Kasra Tabatabaei, a researcher at the Beckman Institute for Advanced Science and Technology and a co-author on this study, said in a news release. “If you only had four letters to use, you could only create so many words. If you had the full alphabet, you could produce limitless word combinations. That’s the same with DNA. Instead of converting zeroes and ones to A, G, C, and T, we can convert zeroes and ones to A, G, C, T, and the seven new letters in the storage alphabet.” The research team was the first to use chemically modified nucleotides for information storage in DNA, but they had to find a new way to interpret it. They combined machine learning and artificial intelligence (AI) to develop a DNA sequence readout processing method to detect modified chemicals from natural ones. “We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly,” Chao Pan, a graduate student at the University of Illinois Urbana-Champaign and a co-author on this study, said in the news release. “The deep learning framework as part of our method to identify different nucleotides is universal, which enables the generalizability of our approach to many other applications.” One point in DNA’s favor as a storage medium is its durability. “Think 1000s of years—remember the ancient DNA that’s been found,” Ceze said. Scientists can sequence fossilized strands to uncover genetic histories and breathe life into long-lost landscapes. “At a time when we are facing unprecedented climate challenges, the importance of sustainable storage technologies cannot be overestimated,” Olgica Milenkovic, a professor of electrical and computer engineering and a co-author of the study, said in the news release. “New, green technologies for DNA recording are emerging that will make molecular storage even more important in the future.”

Storing All Our Stuff

DNA could be the perfect place to keep humanity’s burgeoning data. A recent report estimates that in 2020, people generated data equivalent to 400 billion terabytes or 40 ‘shoeboxes’ of DNA data storage. The practice of storing your info on DNA is getting closer to reality. “For storage of small valuable data, DNA is viable today—think 100s of MBs,” Ceze said.  Fifteen tech companies and institutions last year formed an alliance to advance DNA data storage. Microsoft said it had demonstrated a fully automated system capable of storing and retrieving data from DNA; the company has also stored 1GB of data in DNA and recovered it. But Ceze predicts that it will be five to 10 years before DNA can compete with mainstream backup solutions like optical hard drives. The last three years have seen an explosion of interest in the technology,  “DNA will never become obsolete,” Ceze said. “There is a natural ‘air gap,’ which is desirable for security. [But] these are very desirable properties for long-term storage.”