String indexing in the Word RAM model, part 3 Paweł Gawrychowski University of Wrocław & Max-Planck-Institut für Informatik Paweł Gawrychowski String indexing in the Word RAM model III 1 / 30
We want to reduce the space usage. The goal will be to construct a structure of size ( 1 + 1 n ǫ ) n + O ( log log n ) allowing answering any lookup ( i ) in O ( log ǫ n ) time, for any ǫ ∈ ( 0 , 1 ] . Idea We had ℓ = log log n levels of recursion. Now we will try to simulate jumping ǫℓ levels at once, so that we only have to store 1 ǫ levels. Paweł Gawrychowski String indexing in the Word RAM model III 2 / 30
We want to reduce the space usage. The goal will be to construct a structure of size ( 1 + 1 n ǫ ) n + O ( log log n ) allowing answering any lookup ( i ) in O ( log ǫ n ) time, for any ǫ ∈ ( 0 , 1 ] . Idea We had ℓ = log log n levels of recursion. Now we will try to simulate jumping ǫℓ levels at once, so that we only have to store 1 ǫ levels. Paweł Gawrychowski String indexing in the Word RAM model III 2 / 30
The first step is to replace Ψ k with Φ k . � j if SA k [ j ] = SA k [ i ] + 1 Φ k ( i ) = 1 if SA k [ i ] = n k So, we store the successor for every SA k [ i ] . Now if we store all Φ k ( i ) in a list, then computing Φ k ( i ) is really taking the i th element of a list, and the vectors B k are no longer necessary. Paweł Gawrychowski String indexing in the Word RAM model III 3 / 30
The first step is to replace Ψ k with Φ k . � j if SA k [ j ] = SA k [ i ] + 1 Φ k ( i ) = 1 if SA k [ i ] = n k So, we store the successor for every SA k [ i ] . Now if we store all Φ k ( i ) in a list, then computing Φ k ( i ) is really taking the i th element of a list, and the vectors B k are no longer necessary. Paweł Gawrychowski String indexing in the Word RAM model III 3 / 30
Lemma n Φ 0 can be stored in n + O ( log log n ) bits, so that accessing any entry takes O ( 1 ) time. Lemma 1 n For k > 0, Φ k can be stored in n ( 1 + 2 k − 1 ) + O ( 2 k log log n ) bits, so that accessing any entry takes O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 4 / 30
Lemma n Φ 0 can be stored in n + O ( log log n ) bits, so that accessing any entry takes O ( 1 ) time. Lemma 1 n For k > 0, Φ k can be stored in n ( 1 + 2 k − 1 ) + O ( 2 k log log n ) bits, so that accessing any entry takes O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 4 / 30
Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30
Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30
Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30
But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30
But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30
But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30
But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30
But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30
Succinct dictionaries Pagh 2001 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in B + O ( log log U ) + o ( n ) bits of space, so that a membership query can be answered in O ( 1 ) time. � �� � U where B = . We can also add O ( 1 ) time rank queries, but it log 2 n requires a little bit of work. We will see a (small fragment) of a much weaker result. Brodnik and Munro 1999 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in O ( B ) bits of space, so that a membership query can be answered in O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 7 / 30
Succinct dictionaries Pagh 2001 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in B + O ( log log U ) + o ( n ) bits of space, so that a membership query can be answered in O ( 1 ) time. � �� � U where B = . We can also add O ( 1 ) time rank queries, but it log 2 n requires a little bit of work. We will see a (small fragment) of a much weaker result. Brodnik and Munro 1999 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in O ( B ) bits of space, so that a membership query can be answered in O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 7 / 30
We allow O ( B ) = O ( n log U n ) bits of space. We can clearly encode the whole set in such space, but the question is whether we can answer a membership query efficiently! Let r = U n . We consider four cases: very sparse r ∈ [ U ǫ , ∞ ] , then we have O ( n log U ) space, so we can explicitly list of the elements. We use some form of perfect hashing. moderately sparse r ∈ [ log λ U , U ǫ ] , see the next slide. moderately dense r ∈ [ 1 α , log λ U ] , complicated! dense r ∈ [ 2 , 1 α ] , then we can use O ( U ) bits of space, so we store a bitmap. Paweł Gawrychowski String indexing in the Word RAM model III 8 / 30
Recommend
More recommend