efficient counting of square substrings in a tree
play

Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, - PowerPoint PPT Presentation

Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, Jakub Pachocki , Jakub Radoszewski, Wojciech Rytter, Tomasz Wale University of Warsaw ISAAC 2012 Taipei, December 19, 2012 Jakub Pachocki Efficient Counting of Square


  1. Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, Jakub Pachocki , Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń University of Warsaw ISAAC 2012 Taipei, December 19, 2012 Jakub Pachocki Efficient Counting of Square Substrings in a Tree 1/15

  2. Square a b b b a b b b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

  3. Square in a string a b b b a b b b b a b a a b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

  4. Square in a tree a b b b a b b b b a b a a b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

  5. Square in a tree a b b b a b a a a a b b a b a b a b a a a b a a b b b a b b b a b b b b a b a a b a a a b b a b a aa b a b a b a b ba b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

  6. Square in a tree a b b b a b a a a a b b a b a b a b a a a b a a b b b a b b b a b b b b a b a a b a a a b b a b a aa b a b a b a b ba b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

  7. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  8. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b T : c b b Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  9. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a a b a a a a c b b T : c b b Squares in T : aa Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  10. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a a b a a a a c b b T : c b b Squares in T : aa Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  11. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a a b b a a a a a a c b b b T : c b b Squares in T : aa, abaaba Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  12. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a a b b a a a a a a c b b b T : c b b Squares in T : aa, abaaba Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  13. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b b b T : c b b Squares in T : aa, abaaba, bb Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  14. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b b b T : c b b Squares in T : aa, abaaba, bb Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  15. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b T : c b b b Squares in T : aa, abaaba, bb Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  16. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b T : c b b b Squares in T : aa, abaaba, bb Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  17. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c c b b b T : c c b b b Squares in T : aa, abaaba, bb, bcbc Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  18. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c c b b b T : c c b b b Squares in T : aa, abaaba, bb, bcbc, cbcb Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  19. Number of squares in a tree We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. a b a a a c b b T : c b b Squares in T : aa, abaaba, bb, bcbc, cbcb. There are 5 distinct squares, i.e. sq ( T ) = 5. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

  20. Previous findings Theorem (Fraenkel & Simpson, 1998) A word of length n contains at most 2 n squares. Theorem (Gusfield & Stoye, 2004) It is possible to compute the number of squares in a string in O ( n ) time. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 4/15

  21. Previous findings Theorem (Fraenkel & Simpson, 1998) A word of length n contains at most 2 n squares. Theorem (Gusfield & Stoye, 2004) It is possible to compute the number of squares in a string in O ( n ) time. Theorem (Crochemore et al., 2012) A tree of n nodes contains O ( n 4 / 3 ) squares. This bound is asymptotically tight. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 4/15

  22. Our result Theorem (this paper) It is possible to compute an O ( n log n ) -sized representation of all the squares in a tree in O ( n log 2 n ) time. The representation allows counting distinct squares in the tree. In this presentation, we assume that the trees are fully deterministic. That is, no two adjacent edges have the same label. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 5/15

  23. Counting squares in strings Main-Lorentz algorithm. We split the string at the center, r : val ( v ) r v SUF [ v ] PREF [ v ] suffix of val ( v ) suffix of val ( v ) prefix of val ( v ) prefix of val ( v ) We need to efficiently compute: SUF PREF Jakub Pachocki Efficient Counting of Square Substrings in a Tree 6/15

  24. Packages Definition A package is a substring s with an integer interval [ l , r ] describing cyclic shifts of s . We obtain O ( n log n ) (possibly intersecting) packages of squares. To remove duplicates efficiently, we group the strings with respect to cyclic equivalence. It is enough to know maxRot ( s ) for each s . We need to efficiently compute: SUF PREF maxRot Jakub Pachocki Efficient Counting of Square Substrings in a Tree 7/15

  25. Main-Lorentz algorithm for trees (1) Similar approach. Rather than at the center, split at a centroid: T 1 T k T 2 | T i | ≤ | T | r . . . 2 T 3 T 4 Jakub Pachocki Efficient Counting of Square Substrings in a Tree 8/15

  26. Main-Lorentz algorithm for trees (2) r r SUF [ v ] v v PREF [ v ] We need to efficiently compute SUF , PREF , and maxRot generalized to trees. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 9/15

  27. Computation of SUF Theorem (Shibuya, 1999) The suffix tree S T of a labeled tree T can be computed in O ( n ) time. r a b c c a b b T Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

  28. Computation of SUF Theorem (Shibuya, 1999) The suffix tree S T of a labeled tree T can be computed in O ( n ) time. r r a a c b b c c c c a b a a a b b b S T T Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

  29. Computation of SUF Theorem (Shibuya, 1999) The suffix tree S T of a labeled tree T can be computed in O ( n ) time. r r r a a c a c b b b c c c c a c c a b b a a a a a b b b b b T ∪ S T S T T Forexample , SUF [ acb ] = ( cb ) R . Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

  30. Computation of PREF (1) PREF [ v ] = x if and only if: c � = c r v x We extend Imre Simon’s automata to trees. The pair ( x , c ) is an essential transition if val ( x ) c has a nonempty border. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 11/15

  31. Computation of PREF (2) Lemma In a deterministic tree of size n there are at most 2 n − 1 essential transitions. At most n − 1 transitions in which an edge labeled c leaves node x . Every other transition fixes some value of PREF . Through enumerating all essential transitions, we can compute all values of PREF in linear time. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 12/15

  32. Computation of maxRot s We devise a general incremental algorithm for finding maximal rotations. It can be used to find maxRot of the path from r to v for all v . s t x maxSuf ( sx ) Call t a nonredundant suffix of s iff tx is the maximum suffix of sx , for some string x . maxRot ( s ) always starts at a nonredundant suffix. Jakub Pachocki Efficient Counting of Square Substrings in a Tree 13/15

Recommend


More recommend