Today, our data is stored mainly on magnetic media. Let us imagine for a moment that it can be stored in DNA, just as the genetic data of living organisms are stored in DNA. The storage capacity would then be unlimited and would fit into an extraordinarily small volume. This is the gamble that Microsoft is making, and researchers have announced that they will be able to store our data in DNA on a large scale by the end of the decade.
L’DNA contains the genetic instructions that organize the development of living organisms. These are encoded in the famous double helix discovered by James Watson and Francis Crick. Microsoft engineers claim in the MIT Technology Review be on the way to using an operational storage system embedded in DNA and operating in data centres within three years.
Data storage is no mean feat when you consider that more data has been generated in the last two years than in the entire history of mankind. Data is accumulating exponentially and is stored on magnetic media whose lifespan beyond thirty years seems problematic. So searching for another type of medium has been a major challenge for decades. In the 1940s, the physicist Erwin, known for his cat in Schrödinger's box, proposed a hereditary "script code" that could be embedded in a non-recurring structure that he described as an aperiodic crystal.
His suggestion inspired James Watson and Francis Crick to determine the helical structure of DNA based on Rosalind Franklin's research, triggering a revolution in the understanding of the mechanics of life.
Although nucleic acid sequences have been used to gather information about living cells for billions of years, their role in storing computer data was first demonstrated only five years ago when a Harvard University geneticist encoded his book - including jpg data for illustrations - in just under 55,000 thousand strands of DNA. Since then, the technology has progressed so far that scientists have been able to record 215 petabytes (215 million gigabytes) of information on a single gram of DNA.
According to the magazine ZDNetDNA can store data at record densities. "DNA is the densest known storage medium in the universe," says Victor Zhirnov, of the University of California, Berkeley. Semiconductor Research Corporation. DNA is far more resistant than silicon. Its lifetime is estimated to be between one hundred and one thousand times longer than a device made of silicon. The molecule is so stable that it is frequently found in mammoth bones for example. But its most important characteristic is its density. DNA can contain 1,000,000,000,000,000,000 (that is, a quintillion bits of data in a cubic millimetre). On this scale, storing all the information produced since mankind has existed could fit in a space of only a few square meters.
The problem with storing data on DNA is that it's slow and expensive. Barriers that Microsoft says are being overcome.
Last year the computer firm demonstrated its DNA data storage technology by encoding about 200 megabytes of data in the form of 100 literary classics into the four DNA bases in a single process. According to the MIT review, this process would have cost about 800,000 $ US using open market materials, meaning it would have to be a thousand times cheaper to make it a competitive option. Aggravatingly, the encoding process is also incredibly slow, with data being stored at a rate of about 400 bytes per second. Microsoft says it takes about 100 megabytes per second to be commercially viable. It is true that converting digital bits into DNA code (consisting of strings of nucleotides labelled A, G, C and T) remains complicated and expensive because of the chemical process used to make DNA strands. For its demonstration Microsoft used 13,448,372 unique pieces of DNA.
Microsoft is working on this research with an ecosystem of companies and startups : Twist Biosciencea San Francisco-based DNA manufacturer, DNAScript, Nuclera Nucleics, Evonetix, Molecular Assemblies, Catalog DNA, Helixworks and Genome Foundry.
Jean Bolot, scientific director of Technicolor Research in Los Altos, told MIT Technological Review that he is funding this work at Harvard University in the lab of genomics expert George Church. "I'm confident we'll have results we can talk about as early as this year," he says. He adds that his company has been talking to film studios about how they might use DNA storage. He says half of all films made before 1951 are already lost because they were stored on celluloid. Now new formats such as high-definition video and virtual reality demand more of the studios' ability to preserve their work.
A spokesperson for Microsoft Research told MIT Technology Review that the company could not confirm "details of a product plan" at this time. Within the company, the idea of DNA storage is widely promoted but not yet universally accepted. The current goal is to build a "proto-commercial system within three years that will store an amount of DNA data in one of our data centers," says Doug Carmean, an architect at Microsoft Research.
Microsoft still holds the mystery of its ability to lower the costs of the process and speed it up, but advances in biotechnology are showing every day a decrease in the cost of gene sequencing, so the goal of the end of the decade seems realistic.
Even if this objective were to be achieved, it is likely that it would only be used in certain circumstances for customers willing to pay for a specialized storage solution - such as critical medical or legal data archives - rather than replacing current large-scale storage methods.
Here, science fiction is fast becoming reality and there is no reason to believe that, ultimately, DNA-based data storage could ever involve living computers.
While Microsoft's DNA storage solution is based on chips, future versions of storage may involve enzymes or bacteria designed to perform calculations.
Even outside of cells, DNA potentially offers new ways of calculating data, opening up ways to rapidly reduce the numbers of certain problems, just as quantum computers do in other areas of mathematics. For now, it appears that DNA has a strong role to play in solving a very real problem of managing large amounts of data, a problem that will only get worse over time.