pr t by definition of pr i n h x i t pr a s x h a t
play

{ } { } Pr { t } = by definition of Pr i [ n ] , h ( x i ) t - PDF document

1 stream from left to right and we want to minimize the memory needed by the algorithm to ac- . But, fortunately: However, the following fact seems to imply that the algorithm is wrong. thetical family of hash functions and then see how to turn


  1. 1 stream from left to right and we want to minimize the memory needed by the algorithm to ac- . But, fortunately: However, the following fact seems to imply that the algorithm is wrong. thetical family of hash functions and then see how to turn it into an effective algorithm. We start with an hypothetical algorithm using uniform real random numbers and a hypo- complish this task. One can show that any deterministic algorithm that approximates the value � Exercise 1 (A streaming algorithm for counting the number of distinct values). [ ⋆ ] We are given a stream of numbers x 1 , . . . , x n ∈ [ m ] and we want to compute the number of distinct values in the stream: F 0 ( x ) = # { x i : i ∈ [ n ] } . (Note that if f a ( x ) = # { i : x i = a } , we can express F 0 ( x ) = ∑ m − 1 a =0 ( f a ( x )) 0 , as the zero-th moment of the frequencies of each element of [ m ] in the stream). Let us denote by S x = { x i : i ∈ [ n ] } the set of the values in the stream x . Note that F 0 ( x ) = # S x . (We may drop the x when the context is clear.) The streaming constraint is that the algorithm will see every x i only once as it reads the of F 0 within 10 % requires at least Ω( n ) bits of memory. Here, we will design a randomized algorithm that accomplish this task using only O ( log n + log m ) bits of memory. Assume that we are given a random function h : [ m ] → (0 , 1] , i.e. such that for every x ∈ [ m ] , h ( x ) is a (�xed) independent uniform random real in (0 , 1] . The algorithm proceeds asfollows: whenreadingthestream, recordinmemorytheminimumvalue µ sofarofthe h ( x i ) s, and output 1/ µ − 1 at the end. Show that Pr { µ � t } = (1 − t ) F 0 . ◮ Question 1.1 ) Answer. ◃ By independence of the values of h , { } { } Pr { µ � t } = by definition of µ Pr ∀ i ∈ [ n ] , h ( x i ) � t = Pr ∀ a ∈ S x , h ( a ) � t ∏ Pr { h ( a ) � t } = (1 − t ) F 0 . = by independence of the h ( a ) s ▹ a ∈ S x Show that E [ µ ] = F 0 +1 . 1 ◮ Question 1.2 ) ∫ ∞ ∫ 1 1 (1 − t ) F 0 dt = Answer. ◃ As µ � 0 , E [ µ ] = Pr { µ � t } dt = F 0 + 1 . ▹ 0 0 Show that E [1/ µ ] = ∞ . ◮ Question 1.3 ) ∫ 1 ∫ 1 F 0 · (1 − t ) F 0 − 1 − d Pr { µ � t } Answer. ◃ Indeed, E [1/ µ ] = = dt = ∞ since t t 0 0 ∫ ε (1 − t ) F 0 − 1 ∼ 1 dt t for t → 0 and t = ∞ for all ε > 0 . ▹ t 0 ◮ Question 1.4 ) Compute V ar ( µ ) and show that V ar ( µ ) � E [ µ ] 2 . ∫ 1 2 t 2 · F 0 · (1 − t ) F 0 − 1 dt = Answer. ◃ E [ µ 2 ] = ( F 0 + 2)( F 0 + 1) < 2 E [ µ ] 2 . Thus, 0 V ar ( µ ) = E [ µ 2 ] − E [ µ ] 2 < E [ µ ] 2 . ▹ Design andanalyzea ( ε, δ ) -estimatorfor F 0 . Still, whatis the expectedvalue ◮ Question 1.5 ) of its output? Is there a paradox here? ◃ Hint. First, design an ( ε, δ ) -estimator for µ . Answer. ◃ We use the standard technics: output the median ν of A = ⌈ α ln (1/ δ ) ⌉ average of B = ⌈ β / ε 2 ⌉ simultaneous independent evaluations of µ : µ i j for i ∈ [ A ] and j ∈ [ B ] . µ i 1 + · · · µ i 1 B Let µ i We have E [ µ i ] = E [ µ ] = F 0 + 1 and V ar ( µ i ) = = B V ar ( µ ) {� � 1 ε } � µ i − � � . Thus, by Chebyshev inequality, for all i ∈ [ A ] , Pr � � � � � B F 0 + 1 F 0 + 1 V ar ( µ )/ B B · ε 2 � 1 1 4 if we set β = 4 . ε 2 /( F 0 + 1) 2 �

  2. 2 But, we have Pr only the position of their �rst non-zero bit in their binary writing. We proceed as follows. reduce the memory needed is to relax the independence of the hash value to pairwise indepen- Pr Pr dence only. In the following, we will approximate the minimum of the hash keys by recording From the 1 ± ε Now, let Y i be the indicator variable for the event µ i ̸∈ F 0 +1 .   { ν ̸∈ 1 ± ε } Y i � A   1 ∑ above, E [ Y i ] � � � 4 . F 0 + 1 2   i ∈ [ A ]   − 2( A /4) 2 ( ) E [ Y i ] � A   ∑ ∑ Y i −  � Hoeffding exp � δ if we set α = 8 . 4 A  i ∈ [ A ] i ∈ [ A ] The ( ε, δ ) -estimator thus compute ν according to the above and output 1/ ν − 1 . This ensures that with probability at least 1 − δ , the output value belongs to [ F 0 1+ ε , F 0 1 − ε ] yielding a ( ε + o ( ε ) , δ ) -estimator for F 0 . Note that the expected value of each 1/ µ i j is still ∞ and thus the expected value of the output 1/ ν − 1 is ∞ as well. However, with probability 1 − δ , 1/ ν − 1 is within ε of F 0 . ▹ Unfortunately, such a random function h requires storing m reals in memory. The key to Let ℓ = ⌈ log 2 m ⌉ such that 2 ℓ − 1 < m � 2 ℓ and consider the �eld with 2 ℓ elements F 2 ℓ . We identify F 2 ℓ through canonical bijections to the set of bit-vectors { 0 , 1 } ℓ and to the set of integers { 0 , . . . , 2 ℓ − 1 } written in binary. For every pair ( a, b ) ∈ F 2 2 ℓ , consider the hash function h ab : F 2 ℓ → F 2 ℓ de�ned as h ab ( y ) = a + b · y . For every y ∈ F (2 ℓ ) ≡ { 0 , 1 } ℓ , we denote by ρ ( y ) = max { j ∈ [ ℓ ] : y 1 = · · · = y j = 0 } the largest index j such that the �rst j bits of y , seen as a bit-vector, are all zero. Let us now consider the following streaming algorithm: Algorithm 2 Streaming algorithm for F 0 Let ℓ = ⌈ log 2 m ⌉ , we identify each element x i ∈ [ m ] of the stream with its corresponding element in F 2 ℓ . Pick uniformly and independently two random elements a, b ∈ F 2 ℓ . Compute R = max i =1 ..n ρ ( h ab ( x i )) . return 2 R . = 1 ◮ Question 1.6 ) Show that for all c ∈ F 2 ℓ and r ∈ { 0 , . . . , ℓ } , Pr { } 2 r . ρ ( h ab ( c )) � r a,b ◃ Hint. Show that h ab ( c ) is uniform in F 2 ℓ . Answer. ◃ Since a is chosen uniformly at random in F 2 ℓ and independently from bc , then a + bc is uniform in F 2 ℓ and h ab ( c ) is an uniform random variable for all c ∈ F 2 ℓ . It follows that for all c ∈ F 2 ℓ and r ∈ { 0 , . . . , ℓ } , the probability that the binary writing of h ab ( c ) starts with r zeros is exactly 1/2 r . ▹ Let W r c ∈ S x W r c the indicator random variable for the event ρ ( h ab ( c )) � r . Let Z r = ∑ c , be the number of the values in the stream whose r �rst bits of their hash key are all zero. ◮ Question 1.7 ) Show that E [ Z r ] = F 0 /2 r . ∑ ∑ E [ W r Answer. ◃ E [ Z r ] = linearity c ] = indicator variables Pr { ρ ( h ab ( c )) � r } = c ∈ S x c ∈ S x 2 r = F 0 # S x 2 r . ▹ Show that the random values h ab (0) , . . . , h ab (2 ℓ − 1) are uniform and pair- ◮ Question 1.8 ) wise independent. 1 ◃ Hint. Show that if c ̸ = d , then for all γ, δ ∈ F 2 ℓ , Pr a,b { } 2 ℓ . ( h ab ( c ) , h ab ( d )) = ( γ, δ ) = # F 2

  3. 3 # Pr It follows that: Answer. Answer. Answer. ◃ Consider c ̸ = d ∈ F 2 ℓ and ( γ, δ ) ∈ F 2 2 ℓ . # { ( a, b ) ∈ F 2 2 ℓ : ( h ab ( c ) , h ab ( d )) = ( γ, δ ) } { } ( h ab ( c ) , h ab ( d )) = ( γ, δ ) = # F 2 a,b 2 ℓ ( 1 ) ( a ( γ { ) )} c ( a, b ) ∈ F 2 2 ℓ : = 1 d b δ 1 = = , # F 2 # F 2 2 ℓ 2 ℓ since the matrix is inversible as c ̸ = d (its determinant is d − c ). ▹ Show that V ar ( Z r ) = F 0 1 − 1 ( ) ◮ Question 1.9 ) < E [ Z r ] . 2 r 2 r As the random variables h ab (0) , . . . , h ab (2 ℓ − 1) are pairwise indepen- ◃ dent, the random variables ( W r c ) c ∈ S x are also pairwise independent. As the variance c ∈ S x V ar ( W r is linear for pairwise independent variables, we have V ar ( Z r ) = ∑ c ) = 2 r (1 − 1 1 2 r ) = F 0 2 r (1 − 1 2 r ) < F 0 ∑ 2 r = E [ Z r ] , since V ar ( Bernouilli ( α )) = α (1 − α ) . c ∈ S x ▹ Fix some η > 1 . η for all r ∈ { 0 , . . . , ℓ } such that 2 r > ηF 0 . Show that Pr { Z r > 0 } < 1 ◮ Question 1.10 ) ◃ Hint. Z r is an integer and use Markov’s inequality. Answer. ◃ Consider r such that 2 r > ηF 0 , i.e. such that 1/ η > F 0 /2 r = E [ Z r ] . Then, Pr { Z r > 0 } = Pr { Z r � 1 } � E [ Z r ] < 1/ η by Markov's inequality. ▹ η for all r ∈ { 0 , . . . , ℓ } such that 2 r < F 0 / η . Show that Pr { Z r = 0 } < 1 ◮ Question 1.11 ) ◃ Hint. Z r is an integer and apply Chebyshev’s inequality. Answer. ◃ Consider r such that 2 r < F 0 / η , i.e. such that η < F 0 /2 r = E [ Z r ] . Then, Pr { Z r = 0 } � Pr {| Z r − E [ Z r ] | � E [ Z r ] } � V ar ( Z r ) E [ Z r ] 2 < 1/ E [ Z r ] < 1/ η by Chebyshev's inequality. ▹ 2 R ∈ [ F 0 / η, ηF 0 ] > 1 − 2 ◮ Question 1.12 ) Conclude that for all η > 2 , Pr { } η . The algorithm outputs thus a η -approximation of F 0 with probability at least 1 − 2/ η for all η > 2 . How many bits of memory does it require? ◃ Note that R = max { r : Z r > 0 } . Thus, for all r ∈ { 0 , . . . , ℓ } , Pr { R � r } = Pr { Z r > 0 } and Pr { R < r } = Pr { Z r = 0 } . with r = ⌊ log 2 ( F 0 / η ) ⌋ , we get Pr { 2 R < F 0 / η } = Pr { Z r = 0 } < 1/ η by question ?? . And with r = ⌈ log 2 ( ηF 0 ) ⌉ , we get Pr { 2 R � ηF 0 } = Pr { Z r > 0 } < 1/ η by question ?? . It follows that the value 2 R output by the algorithm belongs to [ F 0 / η, ηF 0 ] with prob- ability at least 1 − 2/ η > 0 , for all η > 2 . The algorithm requires 2 ℓ + ⌈ log 2 ℓ ⌉ < 2 log 2 m + log log 2 m + 3 = O ( log m ) bits of memory to remember a , b and R . ▹ We have thus obtained a ( ε, 2/(1 + ε )) -estimator for F 0 using O ( log m ) bits of memory ε ε > 1 . Getting a ( ε, δ ) -estimator for F 0 in O ε,δ ( log m + log n ) bits of memory for arbitrarily for ε small ε, δ > 0 requires a lot more work...

Recommend


More recommend