The DNA could offer an astounding memory data density of 215 petabytes per gram of DNA, which is orders of magnitude higher than previous reports.
Exploring the potential of DNA as a data memory has not quite yet reached the stage where a blob of DNA can have some wires attached to it to write and read its data content, although good progress has already been made.
The their latest work, published in Science magazine, Yaniv Erlich and Dina Zielinski from Columbia University and the New York Genome Centre, mixed some clever biochemistry with some leading-edge communications data encoding techniques and added a dash of processing power. The result, under the heading of “DNA Fountain,” is a demonstration of the ability to use DNA to store a complete operating system of 1.4MB, a movie and other files for a total of greater than 2MB.
This is now possible because at the same time they have provided a new level of efficiency and reliability for the technique. If the DNA-data memory must have an acronym to fit it in the SRAM, DRAM, NVRAM memory spectrum, then biologic archival read rarely memory (BARRM) might be one choice.
Figure 1: DNA memory 2bits/link (Source: Ron Neale)
As illustrated in Figure 1, within the DNA helix each cross linking nucleotide (nt) will contain one of the four nucleobases (bases). The ability to be able to selectively place them in order along a DNA helix backbone offers the possibility of a binary data memory of two bits/base or nucleotide (i.e. 00, 01,10 and 11). The bonds between the bases linking the DNA spiral backbones are characterised by either two or three covalent hydrogen bonds.
It is suggested in the DNA would offer an eye catching memory data density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
At its core, the DNA-data memory methodology relies on a technique used in data communication where instead of repeating the transmission when an erroneous piece of a data stream is received, enough bytes are transmitted to allow by statistical analysis the correct data to be extracted. The technique is based on what are called “Fountain” codes.
Fountain codes allow data (such as a file) to be divided into an unlimited number of encoded pieces, in a form which allows them to be reassembled into the original file given any subset of the encoded pieces of data, provided that you have a little more than the size of the original file.
In data communications and now for memory a “Fountain” of suitably, encoded data is fired at a receiver, which is able to reassemble the file by catching enough "droplets" (the bits of encoded data). It is immaterial which bits of encoded data are received or missed. The water analogy using “fountains," “droplets” and “buckets” is now part of the language of these techniques. A bucket full of droplets will give you enough information to extract the original data.
Fountain codes are only a part of the method of changing a binary data stream into a form suitable for translation into strands of DNA. This latest work adds a new twist which accommodates the special stability needs of a potential DNA data memory. Emphasising the most desirable links and removing undesirable features such as too many (GC) links and long sequences of the same link the latter called homopolymer runs (TTTT...).