Introduction The resulting random Tree A recursive construction of the Trie Ξ 3 Ξ 2 Ξ 1 Ξ 4 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 5 Ξ 3 Ξ 2 Ξ 1 Ξ 4 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 3 , Ξ 5 Ξ 2 Ξ 1 Ξ 4 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 5 Ξ 2 Ξ 3 Ξ 1 Ξ 4 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 2 Ξ 3 , Ξ 5 Ξ 1 Ξ 4 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 2 Ξ 5 Ξ 1 Ξ 4 Ξ 3 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree A recursive construction of the Trie Ξ 2 Ξ 1 Ξ 4 Ξ 3 Ξ 5 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Recall: Depth D n = Length of the shortest unique prefix of Ξ 1 = ξ 1 ξ 2 ξ 3 . . . Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Recall: Depth D n = Length of the shortest unique prefix of Ξ 1 = ξ 1 ξ 2 ξ 3 . . . P ( D n ≤ k ) = P (Ξ 2 , . . . , Ξ n do not start with ξ 1 . . . ξ k ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Recall: Depth D n = Length of the shortest unique prefix of Ξ 1 = ξ 1 ξ 2 ξ 3 . . . P ( D n ≤ k ) = P (Ξ 2 , . . . , Ξ n do not start with ξ 1 . . . ξ k ) � � k � n − 1 � 1 = 1 − 2 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Recall: Depth D n = Length of the shortest unique prefix of Ξ 1 = ξ 1 ξ 2 ξ 3 . . . P ( D n ≤ k ) = P (Ξ 2 , . . . , Ξ n do not start with ξ 1 . . . ξ k ) � � k � n − 1 � 1 = 1 − 2 Consequence: � 1 − n − α � n − 1 P ( D n ≤ α log 2 ( n )) = Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Consider n words Ξ 1 , . . . , Ξ n . What is the depth of the vertex Ξ 1 ? Recall: Depth D n = Length of the shortest unique prefix of Ξ 1 = ξ 1 ξ 2 ξ 3 . . . P ( D n ≤ k ) = P (Ξ 2 , . . . , Ξ n do not start with ξ 1 . . . ξ k ) � � k � n − 1 � 1 = 1 − 2 Consequence: � � 1 − n − α � n − 1 n →∞ 1 , if α > 1 , P ( D n ≤ α log 2 ( n )) = − → 0 , if α < 1 . Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth Results on D n Shown on the previous slide: D n P − → 1 ( n → ∞ ) log 2 ( n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth Results on D n Shown on the previous slide: D n P − → 1 ( n → ∞ ) log 2 ( n ) Considering the previous slide more carefully: � � n − 1 1 − 2 − x n →∞ → e − 2 − x P ( D n − log 2 ( n ) < x ) ≈ − n (Limit is a Gumbel distribution known from extreme value theory) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth Results on D n Shown on the previous slide: D n P − → 1 ( n → ∞ ) log 2 ( n ) Considering the previous slide more carefully: � � n − 1 1 − 2 − x n →∞ → e − 2 − x P ( D n − log 2 ( n ) < x ) ≈ − n (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E [ D n ] = log 2 ( n ) + Ψ(log 2 ( n )) + o (1) with periodic function Ψ Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth Results on D n Shown on the previous slide: D n P − → 1 ( n → ∞ ) log 2 ( n ) Considering the previous slide more carefully: � � n − 1 1 − 2 − x n →∞ → e − 2 − x P ( D n − log 2 ( n ) < x ) ≈ − n (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E [ D n ] = log 2 ( n ) + Ψ(log 2 ( n )) + o (1) with periodic function Ψ Thm (Szpankowski ’86): Var ( D n ) ∼ Φ(log 2 ( n )) with periodic function Φ Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = P ( D n (Ξ i ) > α log 2 ( n ) for some i ∈ { 1 , . . . , n } ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = P ( D n (Ξ i ) > α log 2 ( n ) for some i ∈ { 1 , . . . , n } ) ≤ n · P ( D n > α log 2 ( n )) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = P ( D n (Ξ i ) > α log 2 ( n ) for some i ∈ { 1 , . . . , n } ) ≤ n · P ( D n > α log 2 ( n )) � � 1 − n − α � n � ≤ n · 1 − Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = P ( D n (Ξ i ) > α log 2 ( n ) for some i ∈ { 1 , . . . , n } ) ≤ n · P ( D n > α log 2 ( n )) � � 1 − n − α � n � ≤ n · 1 − ≤ n 2 − α Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Consider n words Ξ 1 , . . . , Ξ n . What is the height of the resulting Trie? Def: Height H n = max { D n (Ξ i ) : i = 1 , . . . , n } . The result P ( D n ≤ k ) = (1 − 2 − k ) n − 1 implies: P ( H n > α log 2 ( n )) = P ( D n (Ξ i ) > α log 2 ( n ) for some i ∈ { 1 , . . . , n } ) ≤ n · P ( D n > α log 2 ( n )) � � 1 − n − α � n � ≤ n · 1 − ≤ n 2 − α Consequence: P ( H n > α log 2 ( n )) → 0 for α > 2 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height Results on H n Partly proven on the previous slide: H n P − → 1 2 log 2 ( n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height Results on H n Partly proven on the previous slide: H n P − → 1 2 log 2 ( n ) Thm (Devroye ’84): n →∞ P ( H n − 2 log 2 ( n ) − 1 ≤ x ) = exp( − 2 − x ) , x ∈ R lim Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height Results on H n Partly proven on the previous slide: H n P − → 1 2 log 2 ( n ) Thm (Devroye ’84): n →∞ P ( H n − 2 log 2 ( n ) − 1 ≤ x ) = exp( − 2 − x ) , x ∈ R lim Thm (Regnier ’82): E [ H n ] ∼ 2 log 2 ( n ) ( n → ∞ ) (Flajolet, Steyaert ’82 → periodic second order term) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height Summary: Typical depth: log 2 ( n ), height: 2 log 2 ( n ). Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19
Analysis The Height Summary: Typical depth: log 2 ( n ), height: 2 log 2 ( n ). Profile (Park, Hwang, Nicod` eme, Szpankowski): log 2 log n + O (1) n log 2 log n + O (1) n log 2 n + O (1) log 2 n + O (1) 2 log 2 n + O (1) 2 log 2 n + O (1) (External nodes/Leaves) (Internal nodes) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19
Analysis The External Path Length Consider n words Ξ 1 , . . . , Ξ n . External Path Length: n � L n := D n , i , D n , i = D n (Ξ i ) . i =1 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length Consider n words Ξ 1 , . . . , Ξ n . External Path Length: n � L n := D n , i , D n , i = D n (Ξ i ) . i =1 Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length Consider n words Ξ 1 , . . . , Ξ n . External Path Length: n � L n := D n , i , D n , i = D n (Ξ i ) . i =1 Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 Example: L 6 = 2 + 3 + 4 · 4 = 21 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 K n = # words starting with 0 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 K n = # words starting with 0 d = L n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 K n = # words starting with 0 d = L K n L n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 K n = # words starting with 0 d = L K n + ˜ L n L n − K n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length A Recursion for L n Ξ 3 Ξ 1 Ξ 4 Ξ 2 Ξ 6 Ξ 5 K n = # words starting with 0 d = L K n + ˜ L n − K n + n L n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n L n − K n + n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n L n − K n + n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n L n − K n + n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → ??? Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n √ √ 2) − 1 , ( 2) − 1 , 0) 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → (( Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n √ √ 2) − 1 , ( 2) − 1 , 0) 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → (( 1 X + 1 X d � = √ √ (1) X 2 2 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n √ √ 2) − 1 , ( 2) − 1 , 0) 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → (( 1 X + 1 X d � = √ √ (1) X 2 2 3. Solution to (1) : Existence of a solution to (1). Here: Normal distribution with mean 0. Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n √ √ 2) − 1 , ( 2) − 1 , 0) 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → (( 1 X + 1 X d � = √ √ (1) X 2 2 3. Solution to (1) : Existence of a solution to (1). Here: Normal distribution with mean 0. 4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map. Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length The Contraction Method in a Nutshell Aim: Find a limit law for L n (after rescaling properly) d = L K n + ˜ L n − K n + n L n � 1. Rescaling: X n = ( L n − E [ L n ]) / Var ( L n ) d = A n , 1 X K n + A n , 2 � X n − K n + b n X n √ √ 2) − 1 , ( 2) − 1 , 0) 2. Find the Limits: ( A n , 1 , A n , 2 , b n ) − → (( 1 X + 1 X d � = √ √ (1) X 2 2 3. Solution to (1) : Existence of a solution to (1). Here: Normal distribution with mean 0. 4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map. 5. Convergence: Prove convergence with respect to that metric. Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length Results on L n Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): L n − E [ L n ] d � − → N (0 , 1) Var ( L n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length Results on L n Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): L n − E [ L n ] d � − → N (0 , 1) Var ( L n ) From the analysis of D n : � n � � E [ L n ] = E D n (Ξ i ) i =1 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length Results on L n Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): L n − E [ L n ] d � − → N (0 , 1) Var ( L n ) From the analysis of D n : � n � � E [ L n ] = E D n (Ξ i ) = n E [ D n ] i =1 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length Results on L n Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): L n − E [ L n ] d � − → N (0 , 1) Var ( L n ) From the analysis of D n : � n � � E [ L n ] = E D n (Ξ i ) = n E [ D n ] = n log 2 ( n ) + n Ψ(log 2 ( n )) + o ( n ) i =1 Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length Results on L n Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): L n − E [ L n ] d � − → N (0 , 1) Var ( L n ) From the analysis of D n : � n � � E [ L n ] = E D n (Ξ i ) = n E [ D n ] = n log 2 ( n ) + n Ψ(log 2 ( n )) + o ( n ) i =1 Thm (Kirschenhofer, Prodinger ’86): Var ( L n ) = n � Ψ(log 2 ( n )) + O ( log 2 ( n )) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Summary Trie: tree-like data structure to store words Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’ Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’ Typical search/insert time (depth): around log 2 ( n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’ Typical search/insert time (depth): around log 2 ( n ) Worst search/insert time (height): around 2 log 2 ( n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’ Typical search/insert time (depth): around log 2 ( n ) Worst search/insert time (height): around 2 log 2 ( n ) Construction cost (path length): around n log 2 ( n ) Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’ Typical search/insert time (depth): around log 2 ( n ) Worst search/insert time (height): around 2 log 2 ( n ) Construction cost (path length): around n log 2 ( n ) Input model not very realistic, what about more general input models? Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
The Markov Source Model Markov Model Generate n words Ξ 1 , . . . , Ξ n such that Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
Recommend
More recommend