Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1 / 25
Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 2 / 25
Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 3 / 25
Background and objective Background symbolwise MAP estimation for multiple-trace channel application: DNA archival storage high durability due to the biochemical properties of DNA high capacity (e.g., 10 15 to 10 20 bytes per gram) prone to synchronization errors multiple-trace readout Objective symbol wise MAP estimation using m ( ≥ 2) traces channel: insertion/deletion/substitution (IDS) channel R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 4 / 25
Related works: DNA storage major sequencing platforms [1]: Illumina, Sanger, Nanopore insertion/deletion error probabilities in DNA storage [3]: Illumina: around 10 − 3 Nanopore: around 10 − 2 channel model and information-theoretic bound for nanopore sequencer [4,5] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 5 / 25
Related works: DNA storage model DNA storage model: coverage m for reliable reconstruction (in DNA storage): several tens to several hundreds [1] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 6 / 25
Related works: IDS error correction coding example of IDS error correction code (single-trace decoding) single IDS error correction code [11] LDPC code + watermark [12] LDPC code + marker [13] spatially-coupled code [14] polar code: for deletion channel [15], for IDS channel [16] coding schemes for DNA storage (multiple-trace decoding) majority voting [6] Reed-Solomon code [7,8] DNA fountain architecture [9]: based on Luby transform code soft-decision decoding [17] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 7 / 25
Related works: multiple-trace channel minimum number of traces for perfect reconstruction various types of channels including IDS channel [18] probabilistic IDS channel [19] symbolwise MAP estimation using m traces: calculate the posterior probability from a limited number of traces the calculated probability is used as soft input to outer error correcting code (e.g., LDPC code, polar code) MAP estimation for deletion channel [21,22] for IDS channel: this work deletion channel IDS channel perfect reconstruction [18,19] MAP estimation [21,22] (this work) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 8 / 25
Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 9 / 25
Channel model: outline error probabilities: p i (insertion), p d (deletion), p s (substitution) input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q output: z 1 ( z 1 z 1 z 1 1 , 2 , . . . , n 1 ) z 2 ( z 2 z 2 z 2 1 , 2 , . . . , n 2 ) Z = = . . . . . . z m ( z m z m z m 1 , 2 , . . . , n m ) z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q : k th trace with length n k at most one insertion per symbol (as in [13]) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 10 / 25
Channel model: drift vector maximum drift value between input and output symbols: D D = {− D, . . . , − 1 , 0 , 1 , . . . , D } set of drift values: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 drift vector of k th output: determined according to Markov process (with d k 1 = 0 ) ( d k i +1 = d k i + 1 , d k p i i < D ) ( d k i +1 = d k i − 1 , d k p d i > − D ) ( d k i +1 = d k i , − D < d k 1 − p i − p d i < D ) p ( d k i +1 | d k i ) = . ( d k i +1 = d k i , d k 1 − p i i = − D ) ( d k i +1 = d k i , d k 1 − p d i = D ) 0 ( otherwise ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 11 / 25
Channel model: definition of multiple trace IDS channel channel input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q determine drift vector according to the Markov process: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 ( k ∈ [ m ]) drifted vector: y k = ( y k 1 , y k 2 , . . . , y k n k ) ∈ Z n k q ( j ∈ { j ′ | i + d k i ≤ j ′ ≤ i + d k y k j = x i i +1 } ) channel output ( k th trace): z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q { ( z k i = y k 1 − p s i ) p ( z k i | y k i ) = ( z k i ̸ = y k p s / ( q − 1) i ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 12 / 25
Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 13 / 25
Notations array of drift values: d 1 d 1 d 1 d 1 . . . 1 2 n +1 d 2 d 2 d 2 d 2 . . . 1 2 n +1 D = = . . . . . . . . . . . . d m d m d m d m . . . 1 2 n +1 ∈ D m × ( n +1) [ ] = . . . d 1 d 2 d n +1 drift vector of k th trace z k k th row d k : i th column d i : drift values corresponding to i th input symbol x i i th segment of Z (for given D ): ( z 1 i , . . . , z 1 i +1 ) i + d 1 i + d 1 ( z 2 i , . . . , z 2 i +1 ) i + d 2 i + d 2 Z i + d i +1 = . i + d i . . ( z m i , . . . , z m i +1 ) i + d m i + d m R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 14 / 25
Derivation of factor graph (1/2) derive p ( x i | Z ) using factor graph of joint probability p ( Z , x , D ) : p ( Z , x , D ) = p ( Z | x , D ) p ( x , D ) = p ( Z | x , D ) p ( D ) p ( x ) n ( � ) ∏ Z i + d i +1 = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) , � i + d i i =1 where m { 1 ( d 1 = (0 , . . . , 0)) ∏ p ( d k p ( d 1 ) = 1 ) = 0 ( otherwise ) k =1 m ( � ) i + d k � ( ) Z i + d i +1 ∏ ( z k ) � x i , d k i , d k � p � x i , d i , d i +1 = p i +1 � � i +1 i + d i i + d k i k =1 m ∏ p ( d k i +1 | d k p ( d i +1 | d i ) = i ) k =1 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 15 / 25
Derivation of factor graph (2/2) likelihood for k th trace = single-trace channel ( m = 1 ) ( d k i +1 = d k 1 i − 1) � ( ) ( ) i + d k x i , z k ( d k i +1 = d k ( z k ) � x i , d k � i , d k f i ) i +1 p = � i + d k i +1 i + d k i i ( ) ( ) x i , z k x i , z k ( d k i +1 = d k f f i + 1) i + d k i +1+ d k i i substitution error probability: { 1 − p s ( x = z ) f ( x, z ) = p s / ( q − 1) ( x ̸ = z ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 16 / 25
Factor graph joint probability: n � ( ) ∏ Z i + d i +1 p ( Z , x , D ) = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) � i + d i i =1 factor graph: calculation of posterior probability p ( x i | Z ) : perform sum-product algorithm on the factor graph MAP estimation: ˜ x i = arg max p ( x i | Z ) . x i ∈ Z q R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 17 / 25
Simple heuristic estimation computational complexity for the MAP estimation: O ( D 2 m ) impractical for large number of traces simple heuristic method based on the MAP estimation for m = 3 expressed by ternary tree: m ′ traces ( z 0 , z 1 , . . . ) leaf nodes: internal/root nodes: MAP estimation for m = 3 traces outputs estimation ˜ root node: x R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 18 / 25
Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 19 / 25
Simulation parameters block length: n = 152 number of traces: m ∈ { 3 , 4 , 11 } maximum drift value: D = 4 evaluated error rates: word error rate error rate by Levenshtein distance: summation of Levenshtein distance between x and ˜ x total number of estimated symbols ( x : original word, ˜ x : estimated word) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 20 / 25
Recommend
More recommend