US-based startup, Catalog, recently revealed that it had successfully stored all 16 gigabytes of Wikipedia’s English-language text on tiny DNA strands within a laboratory vial, in the latest demonstration of the power and potential of synthetic DNA as a medium for storing digital data.

DNA data storage is here

The accomplishment marks a new record for the amount of digital information to be stored on DNA. Catalog used prefabricated synthetic DNA strands to store the Wikipedia data, along with a DNA writing machine, which currently writes data at a rate of 4 megabits per second, but which Catalog wants to make at least a thousand times faster.

Catalog is one of a growing number of technology companies that sees DNA as a potential solution to the challenge of how to store rising volumes of digital data generated by smartphones, tablets and internet-connected sensors. According to technology firm Cisco, the world will generate some 4.8 zettabytes of digital data by 2022, up from 1.5 zettabytes in 2017. The growing volume of data will challenge existing storage technologies such as magnetic tape, disk drives, and flash memory to keep pace with the rapidly expanding storage requirement. The attractions and benefits of DNA as a medium for digital data storage include its longevity – DNA lasts 1000 times longer than silicon. In addition, DNA offers higher levels of storage density, with a single cubic millimetre of DNA able to hold a quintillion bytes of data.

DNA data storage works by taking digital content that is typically stored using a binary code of 0s and 1s, and converting it into the genetic code of As, Cs, Gs, and Ts that makeup DNA’s chemical building blocks. The converted DNA code is then used to create synthetic strands of DNA, which can be put into cold storage. When needed, the DNA strands can be removed from cold storage and their information decoded using a DNA sequencing machine. The DNA sequence is then translated back into binary format.

The high cost of DNA sequencing

However, existing DNA data storage techniques face challenges that include the prohibitively high cost of the DNA sequencing technology, and the slow speed at which digital data is converted to DNA and the filed DNA code sequenced and decoded back into digital format. Catalog is addressing these challenges with a method, it claims, is faster and cheaper than chemical synthesis approaches. Firstly, Catalog separates the process of synthesizing DNA molecules from that of encoding the digital data. Secondly, Catalog relies on a relatively small pool of pre-synthesized DNA molecules – fewer than 200 – that can be combined in an exponential number of ways. The approach requires less DNA synthesis, speeding-up and reducing the overall cost of encoding data for storage.

Last year Catalog announced that it had raised US$9 million from investors to help commercialise its DNA sequencing and storage technology. And although it has said little about who it expects will use the technology, Catalog is currently in discussions with government agencies, major international science projects, oil and gas firms, and businesses from the media and entertainment, finance, and other industries, with a view to lining-up pilot agreements.