Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111 11000101 10010100 10111101 2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1 G G A T G C A C T G C T T A C C G C C A G T T C
Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111 11000101 10010100 10111101 2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1 G G A T G C A C T G C T T A C C G C C A G T T C
Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111 11000101 10010100 10111101 2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1 G G A T G C A C A A A A T G C T T A C C A A A C G C C A G T T C A A A G Addresses within the value
Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111 11000101 10010100 10111101 2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1 A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T G T T G C C A G T T C A A A G C A T C C Key identifiers Addresses (“primers”) within the value
E ff icient reads A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T G T T G C C A G T T C A A A G C A T C C Key identifiers Addresses (“primers”) within the value
E ff icient reads A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Key identifiers Addresses (“primers”) within the value
E ff icient reads Pool containing stored strands for all keys & values! A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Key identifiers Addresses (“primers”) within the value
E ff icient reads cat.jpg Pool containing stored strands for get(key) all keys & values! A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Key identifiers Addresses (“primers”) within the value
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Strands with 3 di ff erent primers
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Strands with PCR 3 di ff erent primers Selectively amplify strands based on their primer
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Strands with PCR 3 di ff erent primers Selectively amplify strands based on their primer
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Almost all Strands with PCR Sample strands have 3 di ff erent desired primer primers Selectively amplify strands based on their primer
Random access Address Primers A T G T T G G A T G C A C A A A A C A T C C A T G T T T G C T T A C C A A A C C A T C C A T T T C C A T T C A A A C A T C C G G G G Reads are destructive, so replenish when necessary Almost all Strands with PCR Sample strands have 3 di ff erent desired primer primers Selectively amplify strands based on their primer
Error correction Both synthesis and sequencing are error prone: G G A T G C A Insertions G G A T A G C A Deletions G G A T G A Substitutions G G A T C C A Error rates ~1% per nucleotide!
Logical redundancy
Logical redundancy Primer Data Address
Logical redundancy Primer Data Address
Logical redundancy Primer Data Address
Logical redundancy Primer Data Address XOR redundancy provides simple error correction
Logical redundancy Primer Data Address XOR redundancy provides simple error correction Reserved address space to indicate redundancy data
Wet lab results
The process
The process
The process
The process catcatgg
The process catcatgg
The process catcatgg
The process catcatgg
The process catcatgg
The process catcatgg catcatg c
The process catcatgg catcatg c
The process catcatgg Throughput MBs/week catcatg c
Decoding Encoded and synthesized 3 files (151 kB):
Photo: Tara Brown / UW
Decoding Encoded and synthesized 3 files (151 kB):
Decoding Encoded and synthesized 3 files (151 kB): Selected and PCRed one file for random access (42 kB):
Decoding Encoded and synthesized 3 files (151 kB): Selected and PCRed one file for Sequenced and decoded the random access (42 kB): resulting amplified pool:
Decoding Encoded and synthesized 3 files (151 kB): Selected and PCRed one file for Sequenced and decoded the random access (42 kB): resulting amplified pool: Recovered every bit despite errors in synthesis and sequencing
The importance of redundancy Primer Data Address
The importance of redundancy Primer Data Address
The importance of redundancy Primer Data Address If we ignore redundancy data, we cannot recover the file. 75 Frequency 50 25 0 0 2500 5000 7500 Number of copies
The importance of redundancy Primer Data Address If we ignore redundancy data, we cannot recover the file. Some strands are 75 Frequency missing entirely 50 25 0 0 2500 5000 7500 Number of copies
A DNA-based archival storage system Redundancy E ff icient and density retrieval Write Read Wet lab experiments Store
A DNA-based archival storage system Also in the paper: • Reliability-density trade-o ff Redundancy E ff icient • Simulation of decay and density retrieval over time • Error analysis • Model of truncated Write Read strands Wet lab experiments Store
MBs/week GBs/second
DNA productivity is growing Transistors on Chip 10 10 Reading DNA Writing DNA Productivity 10 8 10 6 10 4 10 2 1970 1980 1990 2000 2010 Year Source: Robert Carlson
DNA technology is miniaturizing
We’ve just barely scratched the surface 100% Accuracy 75% 50% 25% 0% 0.01% 0.1% 1% 10% Reads used
Our community has seen these challenges before Simulation Cache locality Latency-hiding optimizations Scheduling Error correction Spatial addressing Programming Circuit design with errors
Recommend
More recommend