Randomized Algorithms Lecture 4: “Two-point Sampling, Coupon Collector’s problem” Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013 - 2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 1 / 36
Overview A. Pairwise independence of random variables B. The pairwise independent sampling theorem C. Probability amplification via reduced randomness D. The Coupon Collector’s problem Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 2 / 36
A. On the Additivity of Variance In general the variance of a sum of random variables is not equal to the sum of their variances However, variances do add for independent variables (i.e. mutually independent variables) In fact, mutual independence is not necessary and pairwise independence suffices This is very useful, since in many situations the random variables involved are pairwise independent but not mutually independent. Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 3 / 36
Conditional distributions Let X, Y be discrete random variables. Their joint probability density function is f ( x, y ) = Pr { ( X = x ) ∩ ( Y = y ) } ∑ Clearly f 1 ( x ) = Pr { X = x } = f ( x, y ) y ∑ and f 2 ( y ) = Pr { Y = y } = f ( x, y ) x Also, the conditional probability density function is: f ( x | y ) = Pr { X = x | Y = y } = Pr { ( X = x ) ∩ ( Y = y ) } = Pr { Y = y } = f ( x, y ) f ( x, y ) f 2 ( y ) = ∑ x f ( x, y ) Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 4 / 36
Pairwise independence Let random variables X 1 , X 2 , . . . , X n . These are called pairwise independent iff for all i ̸ = j it is Pr { ( X i = x ) | ( X j = y ) } = Pr { X i = x } , ∀ x, y Equivalently, Pr { ( X i = x ) ∩ ( X j = y ) } = = Pr { X i = x } · Pr { X j = y } , ∀ x, y Generalizing, the collection is k-wise independent iff, for every subset I ⊆ { 1 , 2 , . . . , n } with | I | < k for every set of values { a i } , b and j / ∈ I , it is { } ∧ Pr X j = b | X i = a i = Pr { X j = b } i ∈ I Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 5 / 36
Mutual (or“full”) independence The random variables X 1 , X 2 , . . . , X n are mutually independent iff for any subset X i 1 , X i 2 , . . . , X i k , (2 ≤ k ≤ n ) of them, it is Pr { ( X i 1 = x 1 ) ∩ ( X i 2 = x 2 ) ∩ · · · ∩ ( X i k = x k ) } = = Pr { X i 1 = x 1 } · Pr { X i 2 = x 2 } · · · Pr { X i k = x k } Example (for n = 3). Let A 1 , A 2 , A 3 3 events. They are mutually independent iff all four equalities hold: Pr { A 1 A 2 } = Pr { A 1 } Pr { A 2 } (1) Pr { A 2 A 3 } = Pr { A 2 } Pr { A 3 } (2) Pr { A 1 A 3 } = Pr { A 1 } Pr { A 3 } (3) Pr { A 1 A 2 A 3 } = Pr { A 1 } Pr { A 2 } Pr { A 3 } (4) They are called pairwise independent if (1), (2), (3) hold. Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 6 / 36
Mutual vs pairwise independence Important notice: Pairwise independence does not imply mutual independence in general. Example. Let a probability space including all permutations of a, b, c as well as aaa, bbb, ccc (all 9 points considered equiprobable). Let A k =“at place k there is an a” (for k = 1 , 2 , 3). It is Pr { A 1 } = Pr { A 2 } = Pr { A 3 } = 2+1 = 1 9 3 Also Pr { A 1 A 2 } = Pr { A 2 A 3 } = Pr { A 1 A 3 } = 1 9 = 1 3 · 1 3 thus A 1 , A 2 , A 3 are pairwise independent. But Pr { A 1 A 2 A 3 } = 1 9 ̸ = Pr { A 1 } Pr { A 2 } Pr { A 3 } = 1 27 thus the events are not mutually independent Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 7 / 36
Variance: key features Definition: ( x − µ ) 2 Pr { X = x } ∑ V ar ( X ) = E [( X − µ ) 2 ] = x ∑ where µ = E [ X ] = x Pr { X = x } x √ We call standard deviation of X the σ = V ar ( X ) Basic Properties: (i) V ar ( X ) = E [ X 2 ] − E 2 [ X ] (ii) V ar ( cX ) = c 2 V ar ( X ), where c constant. (iii) V ar ( X + c ) = V ar ( X ), where c constant. proof of (i): V ar ( X ) = E [( X − µ ) 2 ] = E [ X 2 − 2 µX + µ 2 ] = E [ X 2 ] + E [ − 2 µX ] + E [ µ 2 ] = E [ X 2 ] − 2 µE [ X ] + µ 2 = E [ X 2 ] − µ 2 Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 8 / 36
The additivity of variance Theorem: if X 1 , X 2 , . . . , X n are pairwise independent random ( n ) n ∑ ∑ variables, then: V ar X i = V ar ( X i ) i =1 i =1 Proof: V ar ( X 1 + · · · + X n ) = E [( X 1 + · · · + X n ) 2 ] − E 2 [ X 1 + · · · + X n ] = n n n n ∑ ∑ ∑ ∑ X 2 − µ 2 = = E i + X i X j i + µ i µ j i =1 i =1 1 ≤ i ̸ = j ≤ n 1 ≤ i ̸ = j ≤ n n n n ∑ ∑ ∑ ( E [ X 2 i ] − µ 2 = i ) + ( E [ X i X j ] − µ i µ j ) = V ar ( X i ) i =1 i =1 1 ≤ i ̸ = j ≤ n (since X i pairwise independent, so ∀ 1 ≤ i ̸ = j ≤ n E ( X i X j ) = E ( X i ) E ( X j = µ i µ j ) □ Note: As we see in the proof, the pairwise independence suffices, and mutual (full) independence is not needed. Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 9 / 36
B. The pairwise independent sampling theorem Another Example. Birthday matching: Let us try to estimate the number of pairs of people in a room having birthday on the same day. Note 1: Matching birthdays for different pairs of students are pairwise independent, since knowing that (George, Takis) have a match tell us nothing about whether (George, Petros) match. Note 2: However, the events are not mutually independent. Indeed they are not even 3-wise independent since if (George,Takis) match and (Takis, Petros) match then (George, Petros) match! Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 10 / 36
Birthday matching Let us calculate the probability of having a certain number of birthday matches. Let B 1 , B 2 , . . . , B n the birthdays of n independently chosen people and let E i,j be the indicator variable for the event of a ( i, j ) match (i.e. B i = B j ). As said, the events E i,j are pairwise independent but not mutually independent. Clearly, Pr {E i,j } = Pr { B i = B j } = 365 1 1 1 365 = 365 (for 365 i ̸ = j ). Let D the number of matching pairs. Then ∑ D = E i,j 1 ≤ i<j ≤ n By linearity of expectation we have ) 1 ( n ∑ = ∑ E [ D ] = E E i,j E [ E i,j ] = 2 365 1 ≤ i<j ≤ n 1 ≤ i<j ≤ n Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 11 / 36
Birthday matching Since the variances of pairwise independent variables E i,j add up, it is: ∑ = ∑ V ar [ D ] = V ar E i,j V ar [ E i,j ] = 1 ≤ i<j ≤ n 1 ≤ i<j ≤ n ) 1 ( n 1 = 365 (1 − 365 ) 2 As an example, for a class of n = 100 students, it is 1 E [ D ] ≃ 14 and V ar [ D ] < 14(1 − 365 ) < 14. So by Chebyshev’s inequality we have Pr {| D − 14 | ≥ x } ≤ 14 x 2 Letting x = 6, we conclude that with more than 50% chance the number of matching birthdays will be between 8 and 20. Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 12 / 36
The Pairwise Independent Sampling Theorem (I) We can actually generalize and not restrict to sums of zero-one (indicator) valued variables neither to variables with the same distribution. We below state the theorem for possibly different distributions with same mean and variance (but this is done for simplicity, and the result holds for distributions with different means and/or variances as well). Theorem. Let X 1 , . . . , X n pairwise independent variables with the same mean µ and variance σ 2 . Let S n = ∑ n i =1 X i ( σ ) 2 � S n � ≥ x ≤ 1 {� � } Then Pr n − µ n x Proof. Note that E [ S n n ] = nµ n = µ , ( 1 ) 2 nσ 2 = σ 2 V ar [ S n n ] = n and apply Chebyshev’s n inequality. □ Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 13 / 36
The Pairwise Independent Sampling Theorem (II) Note: This Theorem actually provides a precise general evaluation of how the average of pairwise independent random samples approaches their mean. If the number n of samples becomes large enough we can arbitrarily close approach the mean with confidence arbitrarily close to 100% ( n > σ 2 x 2 ) i.e. a large number of samples is needed for distributions of large variance and when we want to assure high concentration around the mean). Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 14 / 36
C. Reduced randomness at probability amplification Motivation: Randomized Algorithms, for a given input x , actually choose n random numbers (“witnesses”) and run a deterministic algorithm on the input, using each of these random numbers. intuitively, if the deterministic algorithm has a probability of error ϵ (e.g. 1 2 ), t independent runs reduce the error probability to ϵ t (e.g. 1 2 t ) and amplify the correctness probability from 1 2 to 1 − 1 2 t . however, true randomness is quite expensive! What happens if we are constrained to use no more than a constant c random numbers? The simplest case is when c = 2 e.g. we choose just 2 random numbers (thus the name two-point sampling) Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 4 15 / 36
Recommend
More recommend