In-place Longest Common Extensions Nicola Prezza University of - PowerPoint PPT Presentation

Overview Monte Carlo LCE structure Deterministic data structure In-place Longest Common Extensions Nicola Prezza University of Udine, department of Computer Science Dagstuhl Seminar 16431: "Computation over Compressed Structured Data"

Overview Monte Carlo LCE structure Deterministic data structure Longest Common Extension queries 0 1 2 3 4 5 6 7 8 9 T = a a b a b a b a a b LCE ( 1 , 5 ) = 3

Overview Monte Carlo LCE structure Deterministic data structure State of the art Space (bits) Query time build time Reference O ( n log n ) O ( 1 ) O ( n ) ST + LCA O ( n log n ) O ( 1 ) O ( n ) RMQ + LCP O ( n 2 + ǫ ) n ⌈ log 2 σ ⌉ + O ( nw /τ ) O ( τ ) [Bille2015] O ( n 3 / 2 ) exp. n ⌈ log 2 σ ⌉ + O ( nw /τ ) O ( τ ) [Bille2015] n ⌈ log 2 σ ⌉ + O ( nw /τ ) O ( τ log τ ) O ( n τ ) [Tanimura2016] n ⌈ log 2 σ ⌉ O ( ℓ ) — store only T ℓ = LCE ( i , j )

Overview Monte Carlo LCE structure Deterministic data structure Result presented Deterministic data structure of size n ⌈ log 2 σ ⌉ bits supporting optimal O ( m log σ/ w ) -time extraction of T [ i , . . . , i + m − 1 ] O ( log 2 ℓ ) LCE queries Construction: O ( n log n ) expected time and O ( n ) words of space in-place data structure: no little-o terms LCE improvable to O ( log ℓ ) using O ( log n ) words of additional space

Overview Monte Carlo LCE structure Deterministic data structure Applications In-place algorithms to Compute Suffix array in O ( n log 2 n ) exp time (exact) Compute LCP array in O ( n log 2 n ) exp time (exact) Sparse suffix sorting (Monte Carlo)

Overview Monte Carlo LCE structure Deterministic data structure Steps Replace text with Karp-Rabin fingerprints of a subset of its prefixes Choose randomly the modulo q in such a way that we can statistically compress fingerprints down to n ⌈ log 2 σ ⌉ bits De-randomize For simplicity, only binary case σ = 2 considered here. Easy to extend to σ ∈ O ( w )

Overview Monte Carlo LCE structure Deterministic data structure Choose a block size τ ∈ Θ( w ) 1 Choose a τ -bits random prime q (modulo of KR function) 2 Chose uniform seed ¯ s ∈ [ 0 , q ) 3 Left-pad T with ¯ s 4 Break text in τ -bits blocks: array B [ 1 , . . . , n /τ ] of τ -bits integers 5 Example τ = 5 , q = 10001 (= 17 ) , ¯ s = 00101 B = 00101 01011 11010 10101 11010 00001

Overview Monte Carlo LCE structure Deterministic data structure Build array P’ of Karp-Rabin fingerprints of prefixes ending at block boundaries add bitvector D [ 1 , . . . , n /τ ] marking P ′ [ i ] ≥ q Example τ = 5 , q = 10001 (= 17 ) B = 00101 01011 11010 10101 11010 00001 P’ = 01101 10010 01110 10101 00101 01011 D = 0 1 0 1 0 0 Property 1 With P’ and D we can recover B (therefore T ): If B [ i ] < q : B [ i ] = P ′ [ i ] − 2 τ · P ′ [ i − 1 ] mod q 1 If B [ i ] ≥ q the following holds: B [ i ] mod q = B [ i ] − q 2 ⇒ add q to the value in (1)

Overview Monte Carlo LCE structure Deterministic data structure P’ and D take n + n /τ bits of space and support: Optimal-time text extraction Computation of Karp-Rabin fingerprint of any text substring ⇒ LCE queries in O ( log ℓ ) steps of exponential+binary search a a O ( log 2 ℓ ) total time because we need to compute powers of 2 mod q Can we reduce space to n bits?

Overview Monte Carlo LCE structure Deterministic data structure Idea Pick q in such a way that few P ′ [ i ] start with a 1 Property: each P ′ [ i ] is a uniform number in [ 0 , q ) (thanks to the seed) Combinations of block values with τ = 4 . q = 1011 (= 11 ) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 = q 1100 1101 1110 1111 P ( P ′ [ i ] begins with 1 ) = red / ( red + black ) = ( q − 2 τ − 1 ) / q

Overview Monte Carlo LCE structure Deterministic data structure Goal: few P ′ [ i ] starting with 1. Solve ( q − 2 τ − 1 ) / q ≤ 1 / n Result Pick q uniformly from � � �� n 2 τ − 1 , 2 τ − 1 Z = n − 1

Overview Monte Carlo LCE structure Deterministic data structure Final step: build array P by removing first bit from each P ′ [ i ] , store ranks of P ′ -blocks starting with 1 in an array S Example τ = 5 P = 1101 0010 1110 0101 0101 1011 D = 0 1 0 1 0 0 S = { 2 , 4 } E [ | S | + | P | + | D | ] = n + O ( w ) bits Construction Pick pairs � q , ¯ s � until overall size is n bits (+ O ( 1 ) words) ⇒ O ( n ) exp construction time τ = ( 8 + c ) w for any constant c (see why in the paper:) LCE failure probability ≤ n − c (proof in paper)

Overview Monte Carlo LCE structure Deterministic data structure In-place construction We can replace T with our structure in O ( n ) expected time while using only O ( 1 ) additional words of working space. Construction can be inverted in the same space/time (restoring text)

Overview Monte Carlo LCE structure Deterministic data structure Applications Suffix sorting Easy to lexicographically compare two text suffixes using LCE queries Result 1: in-place sparse suffix sorting Any set S = { i 1 , . . . , i b } of b suffixes of a text T ∈ Σ n can be sorted correctly with high probability in O ( n + b log b · log 2 n ) expected time using O ( 1 ) words of space on top of T and S

Overview Monte Carlo LCE structure Deterministic data structure Important: while computing LCE queries, in exponential/binary searches we only compare (fingerprints of) text substrings of length 2 e Theorem 1 In O ( n log n ) expected time and O ( n ) words of space we can check whether the KR function is collision-free over all pairs of substrings of T having the same length k = 2 e , for all 0 ≤ e ≤ log 2 n Theorem 2 In O ( n log 2 n ) worst-case time and n words of space (on top of T ) we can check whether the KR function is collision-free over all pairs of substrings of T having the same length k = 2 e , for all 0 ≤ e ≤ log 2 n ⇒ our deterministic structure can be built in O ( n log n ) exp time and linear space

Overview Monte Carlo LCE structure Deterministic data structure Applications in-place SA construction The suffix array SA of T ∈ Σ n can be computed in O ( n log 2 n ) expected time using O ( 1 ) words of space on top of T and SA . The above does not improve state of the art [Franceschini2007]. The following does: in-place LCP construction The Longest Common Prefix ( LCP ) array can be computed in O ( n log 2 n ) expected time using O ( 1 ) words of space on top of the text and the LCP . Previous fastest in-place LCP array construction algorithm runs in quadratic time.

In-place Longest Common Extensions Nicola Prezza University of - PowerPoint PPT Presentation

Overview Monte Carlo LCE structure Deterministic data structure In-place Longest Common Extensions Nicola Prezza University of Udine, department of Computer Science Dagstuhl Seminar 16431: "Computation over Compressed Structured

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

CSE 421 Longest Path in a DAG, LIS, Shortest Path with Negative Weights Shayan Oveis Gharan 1

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , Inge Li Grtz 1 , Benjamin

On the Length of the Longest Common Subsequence Peter Rabinovitch Summary Consider two

Fast Parallel Longest Common Subsequence with General Integer Scoring Support Adnan Ozsoy , Arun

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Product Ads Sitelink Extensions xo group; Jam & Toast, Feb 2012 1 xo group; Jam &

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

2 nd place 3 rd place 5 th place 17 th place Ledning och styrning Vision, ml

A place where spiritual people go A place where spiritual people go A place to

Common vertex of longest cycles in circular arc graphs Hehui Wu University of Illinois at

My Longest Journey Poem Reading week 5 session 3 1 star - Miss Crook's English set 2 stars - Ms

The Undersea Internet Backbone The Story of Really Really Long Wires Trivia What is the longest

Learning to Compare Examples NIPS06 Workshop Organizers David Grangier and Samy

Faster Longest Common Extension Queries in Strings over General Alphabets Pawe Gawrychowski 1 ,

Valley Clean Energy Alliance A locally controlled energy provider Board of Directors Meeting

Towards an Effective Collaboration between Industry and Academia Alessandro Di Bucchianico

Linked Cluster Expansions for the Functional Renormalization Group Rudrajit (Rudi) Banerjee (In

Lecture 13: How to train Observation Probability Densities Mark Hasegawa-Johnson All content

Local Properties of Graphs and the Hamilton Cycle Problem Johan de Wet 1 , 2 and Marietjie Frick 1

2021 Integrated Resource Plan (IRP) Technical Advisory Committee (TAC) Meeting #4 July 22, 2020

In-place Longest Common Extensions Nicola Prezza University of - PowerPoint PPT Presentation

Overview Monte Carlo LCE structure Deterministic data structure In-place Longest Common Extensions Nicola Prezza University of Udine, department of Computer Science Dagstuhl Seminar 16431: "Computation over Compressed Structured

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

CSE 421 Longest Path in a DAG, LIS, Shortest Path with Negative Weights Shayan Oveis Gharan 1

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , Inge Li Grtz 1 , Benjamin

On the Length of the Longest Common Subsequence Peter Rabinovitch Summary Consider two

Fast Parallel Longest Common Subsequence with General Integer Scoring Support Adnan Ozsoy , Arun

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Product Ads Sitelink Extensions xo group; Jam &amp; Toast, Feb 2012 1 xo group; Jam &amp;

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

No place like No place like HOME No place like No place like HOME HOME HOME (Harmonising

2 nd place 3 rd place 5 th place 17 th place Ledning och styrning Vision, ml

A place where spiritual people go A place where spiritual people go A place to

Common vertex of longest cycles in circular arc graphs Hehui Wu University of Illinois at

My Longest Journey Poem Reading week 5 session 3 1 star - Miss Crook's English set 2 stars - Ms

The Undersea Internet Backbone The Story of Really Really Long Wires Trivia What is the longest

Learning to Compare Examples NIPS06 Workshop Organizers David Grangier and Samy

Faster Longest Common Extension Queries in Strings over General Alphabets Pawe Gawrychowski 1 ,

Valley Clean Energy Alliance A locally controlled energy provider Board of Directors Meeting

Towards an Effective Collaboration between Industry and Academia Alessandro Di Bucchianico

Linked Cluster Expansions for the Functional Renormalization Group Rudrajit (Rudi) Banerjee (In

Lecture 13: How to train Observation Probability Densities Mark Hasegawa-Johnson All content

Local Properties of Graphs and the Hamilton Cycle Problem Johan de Wet 1 , 2 and Marietjie Frick 1

2021 Integrated Resource Plan (IRP) Technical Advisory Committee (TAC) Meeting #4 July 22, 2020

Product Ads Sitelink Extensions xo group; Jam & Toast, Feb 2012 1 xo group; Jam &