String Attractors: A unifying theory of repetitiveness Dominik Kempa - PowerPoint PPT Presentation

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1 University of Helsinki 2 University of Pisa HALG, Amsterdam, June 4-6, 2018 Based on D. Kempa and N. Prezza. At the roots of dictionary compression: String attractors. STOC 2018 . Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

Background: Dictionary compression Definition Dictionary compression : Encoding of string that replaces repetitions with pointers to other occurrences. Example: Lempel-Ziv ’77 (LZ77) LZ77 = Greedy left-to-right partition of text into longest previous factors. T = B A B B A B A B B B A B Encoding: ( b ,0),( a ,0),(1,1),(1,3),(2,3),(4,3) Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

Background: Dictionary compression Example: Run-length Burrows-Wheeler transform (RLBWT) RLBWT = invertible text transformation defined as follows. Input: text T = BANANA$ 1. Build a matrix 2. Sort the rows 3. Apply run-length with the text compression to L = ANNB$AA rotations as rows L (the last column) B A N A N A $ $ B A N A N A A N A N A $ B A $ B A N A N N A N A $ B A A N A $ B A N A N A $ B A N A N A N A $ B N A $ B A N A B A N A N A $ A $ B A N A N N A $ B A N A $ B A N A N A N A N A $ B A Output: RLBWT = ( 1 , A ) , ( 2 , N ) , ( 1 , B ) , ( 1 , $ ) , ( 2 , A ) Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

Background: Dictionary compression Other (less known) dictionary compressors: (run-length) grammars (SLP) collage systems macro schemes word graphs (CDAWG) Applications Compression : reducing the size of data before archiving or transfer, e.g., over the network. Examples: 7-zip, gzip = LZ77. Compressed computation : supporting operations on data structures taking space close to dictionary-compressed text. Example operations: random access pattern matching queries Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

String Attractors New combinatorial object generalizing all known dictionary compressors. Definition A set Γ ⊆ [ 1 .. n ] is a string attractor of T ∈ Σ n if every substring of T has an occurrence containing an element of Γ . Example T = CDABCCDABCCA Γ = { 3 , 6 , 10 , 11 } Theorem: “compressors are attractors” Let T ∈ Σ n and let α be the output size of any the following dictionary compressors on T : (1) (RL)SLP , (2) collage system, (3) LZ77, (4) macro scheme, (5) RLBWT, (6) CDAWG. Claim: T has a string attractor of size O ( α ) . T = B A B B A B A B B B A B Example: Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

String Attractors Theorem (bad news) Computing the smallest attractor is NP-complete and APX-hard. But, the reduction Compressors → Attractors can be reversed! Theorem: Given a string T ∈ Σ n and a string attractor Γ of size γ for T , we can build a macro scheme for T of size O ( γ log ( n / γ )) , a collage system for T of size O ( γ log ( n / γ )) , an SLP for T of size O ( γ log 2 ( n / γ )) . Consequence : many new (and easier proofs of existing) relations between sizes of dictionary compressors, for example, z ∈ O ( r log 2 ( n / r )) , where z (resp. r ) is the size of LZ77 (resp. RLBWT). Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

String Attractors String attractors carry enough information about the string to design data structures. Theorem If T ∈ Σ n has an attractor of size γ , then we can build a data structure of size O ( γ polylog n ) w -bit words that can extract any length- ℓ substring of T in O ( ℓ log ( σ ) / w + log n / log log n ) time. O ( γ log ( n / γ )) that, given a pattern P [ 1 .. m ] , outputs all its occurrences in T in O ( m log n + occ log ǫ n ) time. The resulting data structures are universal thanks to reductions Attractors → Compressors , i.e., they translate to concrete data structures working on different compressed representations. Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

Thank You! Dominik Kempa, Nicola Prezza String Attractors: A unifying theory of repetitiveness

String Attractors: A unifying theory of repetitiveness Dominik Kempa - PowerPoint PPT Presentation

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1 University of Helsinki 2 University of Pisa HALG, Amsterdam, June 4-6, 2018 Based on D. Kempa and N. Prezza. At the roots of dictionary compression: String

The String Class Trace Code Constructing a String String s = "Java"; String

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

String Objectives Discuss string handling System.String class

String Theory String Theory Thiago Macieira Thiago Macieira Qt Developer Days 2014 Qt

Continuous attractors as unreliable estimators Arvind Murugan Dept. of Physics Regression using

Black Hole Attractors and Superconformal Quantum Mechanics Davide Gaiotto

What Is String Theory? An Introduction for Data Scientists Tom Rudelius IAS String Data 2017

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

String theory and the String theory and the mysterious quantum matter of mysterious quantum

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

Data suppression and compression SW in DUNE detector simula6on

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

Compression with Flows via Local Bits-Back Coding Jonathan Ho, Evan Lohn, Pieter Abbeel

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering

Compression of Propositional Resolution Proofs by Lowering Subproofs Joseph Boudou 1 Bruno

String Attractors: A unifying theory of repetitiveness Dominik Kempa - PowerPoint PPT Presentation

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1 University of Helsinki 2 University of Pisa HALG, Amsterdam, June 4-6, 2018 Based on D. Kempa and N. Prezza. At the roots of dictionary compression: String

The String Class Trace Code Constructing a String String s = &quot;Java&quot;; String

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

String Objectives Discuss string handling System.String class

String Theory String Theory Thiago Macieira Thiago Macieira Qt Developer Days 2014 Qt

Continuous attractors as unreliable estimators Arvind Murugan Dept. of Physics Regression using

Black Hole Attractors and Superconformal Quantum Mechanics Davide Gaiotto

What Is String Theory? An Introduction for Data Scientists Tom Rudelius IAS String Data 2017

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

String theory and the String theory and the mysterious quantum matter of mysterious quantum

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

Data suppression and compression SW in DUNE detector simula6on

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

Compression with Flows via Local Bits-Back Coding Jonathan Ho, Evan Lohn, Pieter Abbeel

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering

Compression of Propositional Resolution Proofs by Lowering Subproofs Joseph Boudou 1 Bruno

The String Class Trace Code Constructing a String String s = "Java"; String