1 Let us now waive the assumption that the integersare distinct and let us design an algorithm client (dis-)connecting. We assume that the stream is wellformed, i.e. that there are always at i.e. only a constant number of integers. We model the problem as follows: do not want to slow down the server and wish to dedicate to this task only a constant memory, website. And we want to detect if all the clients connected are from the same IP address. But we currently connected to the server. Along time, clients connect and then disconnect from the website and we want to prevent attacks by keeping track of the origins of the various clients Imagine that we are running a huge Assume now that the elements in the stream are not all distinct. Then, the difference to check this property. � Exercise 1 (Missing element & distinct elements). Assume we are reading a stream of n distinct integers in { 1 , . . . , n + 1 } . Assumefirstthatalloftheelementsinthestreamareindeeddistinctelements ◮ Question 1.1 ) of { 1 , . . . , n +1 } )anddesignforthiscaseadeterministic O ( log n ) bits-memoryalgorithmthat outputs the missing element. Answer. ◃ Just compute S = ∑ n i =1 x i and output ( n +1)( n +2) − S . This requires at 2 ( n +1)( n +2) most ⌈ log 2 ⌉ � ⌈ 2 log 2 n ⌉ bits of memory to store S . ▹ 2 Consideraprimenumber p � n 2 andanon-zeropolynomial U ( X ) ofdegree ◮ Question 1.2 ) at most n over the field Z p . Show that Pr a { U ( a ) = 0 mod p } � 1 n when a is chosen uniformly at random in Z p . ◃ Hint. How many solutions are there to U ( a ) = 0 in the field Z p ? Answer. ◃ As Z p is a field, a non-zero polynomial of degree d has at most d roots. It follows that U ( a ) = 0 admits at most n solutions. Thus, Pr a { U ( a ) = 0 mod p } � n p � 1 n . ▹ Consider the following algorithm: Pick a prime number p such that n 2 � p < 2 n 2 (there is always one). Pick an integer a ∈ { 0 , . . . , p − 1 } uniformly at random. Compute S := ∑ n i =1 x i , y := ( n +1)( n +2) − S , U := ∑ n i =1 a x i − 1 mod p and V := ∑ n i =0 a i mod p . 2 If U == V − a y − 1 mod p , thenanswer « y isthemissingelement » , andanswer « thestreamdoes not contain n distinct integers in { 1 , . . . , n + 1 } » otherwise. ◮ Question 1.3 ) Show that this is a O ( log n ) bits-memory streaming algorithm that always outputs the right answer when the stream matches the specification, and that detects every er- roneous stream with probability at least 1 − 1/ n . Answer. ◃ Assume that all the element in the stream are distinct integers in { 1 , . . . , n + 1 } , then by the previous question, y is indeed the missing element and the difference between U and V is indeed a y − 1 . of the polynomials U ( X ) = ∑ n i =1 X x i − 1 and V y ( x ) = ∑ n i =0 X i − X y − 1 is a non-zero polynomial in Z p whatever y is in Z p . It follows that U ( a ) = U ̸ = V − a y − 1 = V y ( a ) with probability at least 1 − 1/ n by the previous question. ▹ � Exercise 2 (Traffic monitoring: uniformity detection). We are given an in�nite stream of events e 1 , e 2 , . . . , e n , . . . where each e i is either connect ( x ) or disconnect ( x ) where x is a positive integer standing for the IP address of the least as many events connect ( x ) as disconnect ( x ) from the beginning of the stream to any po- sition for every integer x . We want to detect when all the clients connected have the same IP address x . ◮ Question 2.1 ) Spot when to set the alarm on in the following sequence where x denotes the event connect ( x ) and ¯ x the event disconnect ( x ) : 1 , 2 , 3 , ¯ 2 , ¯ 3 , 1 , 1 , ¯ 1 , 4 , 6 , 7 , ¯ 1 , ¯ 6 , ¯ 1 , 2 , ¯ 2 , ¯ 4 , 8 , 3 , ¯ 3 , ¯ 7 , 9
2 We consider the following algorithm that uses only three integer variables: clients connected have the same IP address): The right way to the correctness of this deterministic algorithm passes through the analysis Answer. ◃ The sets of currently connected clients are (an ∗ spots every date when all the ¯ ¯ ¯ ¯ 1 : 1 ∗ 3 : 1 ∗ 4 : 7 ∗ 7 : 8 ∗ 4 : 114 6 : 147 ¯ 1 : 11 ∗ 2 : 12 6 : 1146 1 : 47 8 : 78 9 : 89 1 : 111 ∗ 3 : 123 7 : 11467 2 : 247 3 : 378 ¯ ¯ ¯ ¯ ¯ 1 : 11 ∗ 2 : 13 1 : 1467 2 : 47 3 : 78 ▹ • start with n := 0 , a := 0 and b := 0 at t = 0 ; • on event connect ( x ) : do n := n + 1 , a := a + x and b := b + x 2 ; • on event disconnect ( x ) : do n := n − 1 , a := a − x and b := b − x 2 ; • set on the alarm every time that n > 0 and b = a 2 / n. of a random variable. Consider a random variable X taking positive integer values. We denote by supp ( X ) = { x : Pr { X = x } > 0 } and assume that | supp ( X ) | < ∞ . We denote by E [ X ] and V ar [ X ] = E [( X − E ( X )) 2 ] respectively the expectation and the variance of X . Show that | supp ( X ) | = 1 if and only if V ar [ X ] = 0 . ◮ Question 2.2 ) Answer. ◃ First remark that for all integer valued random variable X , supp ( X ) ̸ = ∅ . If supp ( X ) = { x } , then Pr { X = x } = 1 and E [ X ] = x and V ar [ X ] = 0 . Assume now that | supp ( X ) | � 2 , there are x, x ′ ∈ supp ( X ) such that x ̸ = x ′ , and thus either E [ X ] ̸ = x or E [ X ] ̸ = x ′ (or both). Assume that E [ X ] ̸ = x . The random variable Z = ( X − E [ X ]) 2 only takes non-negative values. Thus, V ar [ X ] = E [ Z ] = ∑ y Pr { X = y } · ( y − E [ X ]) 2 � · ( x − E [ X ]) 2 Pr { X = x } > 0 . ▹ � �� � � �� � > 0 > 0 Conclude that the algorithm is correct. ◮ Question 2.3 ) Answer. ◃ Let us fix some time t , and let T denote the multiset of the IP adresses of the people currently connected to the server at time t . Assume that | T | � 1 . We want to decide if T contains only the same integer. let X be the uniform random variable over the multiset T . Since supp ( X ) = T , by the previous question, V ar ( X ) = 0 if and only if T ∑ contains only the same integer. But, at time t , n = | T | and E [ X ] = 1 x ∈ T x = a / n n [ X 2 − 2 E [ X ] X + E [ X ] 2 ] and V ar ( X ) = E [( X − E [ X ]) 2 ] = E = E [ X 2 ] − E [ X ] = x ∈ T x 2 − a 2 / n 2 = b / n − a 2 / n 2 = 0 if and only if b = a 2 / n . The algorithm detects ∑ 1 n thus correctly when all the clients connected have the same IP address. ▹ � Exercise 3 ( ( ε, δ ) -estimator). Suppose we want to compute a value µ from some data. As- sume that we have a randomized algorithm A that computes a random variable Z such that E [ Z ] = µ and V ar ( Z ) � A · µ 2 for some constant A > 0 . Design a ( ε, δ ) -estimator for µ for all ε > 0 and δ > 0 making O ( log (1/ δ ) ◮ Question 3.1 ) ) ε 2 calls to the randomized algorithm A . Give exact bounds on the number of calls, explain how you proceed. Answer. ◃ We proceed as usual by outputting the median Y of k averages of ℓ independent Z i 1 + ··· + Z iℓ runs Z ij of A for i ∈ [ k ] and j ∈ [ ℓ ] . Let us denote by µ i = . E [ µ i ] = ℓ � Aµ 2 E [ Z ] = µ and V ar ( µ i ) = V ar ( Z i 1 )+ ··· + V ar ( Z iℓ ) = V ar ( Z ) ℓ 2 ℓ ℓ . By Chebychev inequality, Pr {| µ i − µ | � εµ } � V ar ( µ i ) ℓε 2 � 1 A 4 as soon as ℓ � 4 A � ε 2 µ 2 ε 2 . Now, let X i be the indicator random variable for the event µ i ̸∈ (1 ± ε ) µ . Then, E [ X i ] = Pr {| µ i − µ | � εµ } � 1 4 . Not that if the median Y ̸∈ (1 ± ε ) µ then at least k 2
Recommend
More recommend