Whither the Binomial… Binomial in the Limit • Recall example of sending bit string over network • Recall the Binomial distribution n ! n = 4 bits sent over network where each bit had i n i ( ) ( 1 ) P X i p p independent probability of corruption p = 0.1 ! ( )! i n i • Let l = np (equivalently: p = l / n ) X = number of bit corrupted. X ~ Bin(4, 0.1) In real networks, send large bit strings (length n 10 4 ) l i l n i l l i n ! ( 1 / ) n n ( n 1 )...( n i 1 ) n P ( X i ) 1 Probability of bit corruption is very small p 10 -6 l i i i ! ( n i )! n n n i ! ( 1 / n ) X ~ Bin(10 4 , 10 -6 ) is unwieldy to compute • When n is large, p is small, and l is “moderate”: • Extreme n and p values arise in many cases ( 1 )...( 1 ) n n n i l l l n n i 1 ( 1 / ) e ( 1 / n ) 1 # bit errors in file written to disk (# of typos in a book) i n # of elements in particular bucket of large hash table l l l i i e l • Yielding: P ( X i ) 1 e # of servers crashes in a day in giant data center i ! 1 i ! # Facebook login requests that go to particular server Poisson Random Variable Sending Data on Network Redux • X is a Poisson Random Variable: X ~ Poi( l ) • Recall example of sending bit string over network X takes on values 0, 1, 2… Send bit string of length n = 10 4 and, for a given parameter l > 0, Probability of (independent) bit corruption p = 10 -6 X ~ Poi( l = 10 4 * 10 -6 = 0.01) has distribution (PMF): l i What is probability that message arrives uncorrupted? l P ( X i ) e l i 0 ( 0 . 01 ) i ! l 0 . 01 P ( X 0 ) e e 0 . 990049834 l l l l 0 1 2 i ! 0 ! i l • Note Taylor series: ... e 0 ! 1 ! 2 ! ! i 4 , 10 -6 ): Using Y ~ Bin(10 i 0 l l i i l l l l P ( Y 0 ) 0 . 990049829 • So: P ( X i ) e e e e 1 i ! i ! Caveat emptor: Binomial computed with built-in function in R software i 0 i 0 i 0 package, so some approximation may have occurred. Approximation are closer to you than they may appear in some software packages. Simeon-Denis Poisson Poisson Random is Binomial in Limit • Simeon-Denis Poisson (1781-1840) was a prolific • Poisson approximates Binomial where n is large, p is small, and l = np is “moderate” French mathematician • Different interpretations of "moderate" n > 20 and p < 0.05 n > 100 and p < 0.1 • Really, Poisson is Binomial as • Published his first paper at 18, became professor n and p 0, where np = l at 21, and published over 300 papers in his life He reportedly said “Life is good for only two things, discovering mathematics and teaching mathematics.” • Definitely did not look like Charlie Sheen 1
Bin(10, 0.3), Bin(100, 0.03) vs. Poi(3) Tender (Central) Moments with Poisson • Recall: Y ~ Bin( n , p ) E[Y] = np Var(Y) = np (1 – p ) • X ~ Poi( l ) where l = np ( n and p 0) P(X = k ) E[X] = np = l Var(X) = np (1 – p ) = l (1 – 0) = l Yes, expectation and variance of Poisson are same o It brings a tear to my eye… Recall: Var(X) = E[X 2 ] – (E[X]) 2 E[X 2 ] = Var(X) + (E[X]) 2 = l + l 2 = l(1 + l) k It’s Really All About Raisin Cake CS = Baking Raisin Cake With Code • Hash tables strings = raisins buckets = cake slices • Server crashes in data center • Bake a cake using many raisins and lots of batter servers = raisins • Cake is enormous (in fact, infinitely so…) list of crashed machines = particular slice of cake Cut slices of “moderate” size (w.r.t. # raisins/slice) • Facebook login requests (i.e., web server requests) Probability p that a particular raisin is in a certain slice requests = raisins is very small ( p = 1/# cake slices) server receiving request = cake slice • Let X = number of raisins in a certain cake slice # raisins l • X ~ Poi( l ), where # cake slices Defective Chips Efficiently Computing Poisson • Let X ~ Poi( l ) • Computer chips are produced p = 0.1 that a chip is defective Want to compute P( X = i ) for multiple values of i a Consider a sample of n = 10 chips E.g., Computing P ( X a ) P ( X i ) What is P(sample contains 1 defective chip)? i 0 • Iterative formulation: Compute P(X = i + 1) from P(X = i) Using Y ~ Bin(10, 0.1): l l l i 1 P ( X i 1 ) e /( i 1 )! 10 10 l l i 0 10 1 9 P ( X i ) e / i ! i 1 P ( Y 1 ) ( 0 . 1 ) ( 1 0 . 1 ) ( 0 . 1 ) ( 1 0 . 1 ) 0 . 7361 0 1 Use recurrence relation: Using X ~ Poi( l = (0.1)(10) = 1) l l 0 l P ( X 0 ) e e 0 1 1 1 0 ! 1 1 1 ( 1 ) 2 0 . 7358 P X e e e l 0 ! 1 ! P ( X i 1 ) P ( X i ) i 1 2
Approximately Poisson Approximation Birthday Problem Redux • Poisson can still provide good approximation • What is the probability that of n people, none share even when assumptions “mildly” violated the same birthday (regardless of year)? • “Poisson Paradigm” n trials, one for each pair of people ( x , y ), x y n = 2 • Can apply Poisson approximation when... Let E x,y = x and y have same birthday (trial success) “Successes” in trials are not entirely independent P(E x,y ) = p = 1/365 (note: all E x,y not independent) o Example: # entries in each bucket in large hash table n 1 ( 1 ) n n X ~ Poi( l ) where l Probability of “Success” in each trial varies (slightly) 2 365 730 o Small relative change in a very small p 0 ( n ( n 1 ) / 730 ) n ( n 1 ) / 730 n ( n 1 ) / 730 o Example: average # requests to web server/sec. may fluctuate P ( X 0 ) e e 0 ! slightly due to load on network Solve for smallest integer n , s.t.: n ( n 1 ) / 730 e 0 . 5 n ( n 1 ) / 730 ln( e ) ln( 0 . 5 ) n ( n 1 ) 730 ln( 0 . 5 ) n 23 Same as before! Poisson Processes Web Server Load • Consider “rare” events that occur over time • Consider requests to a web server in 1 second Earthquakes, radioactive decay, hits to web server, etc. In past, server load averages 2 hits/second Have time interval for events (1 year, 1 sec, whatever...) X = # hits server receives in a second Events arrive at rate: l events per interval of time What is P(X = 5)? • Split time interval into n sub-intervals • Model Assume at most one event per sub-interval Assume server cannot acknowledge > 1 hit/msec. Event occurrences in sub-intervals are independent 1 sec = 1000 msec. (= large n ) With many sub-intervals, probability of event occurring P(hit server in 1 msec) = 2/1000 (= small p ) in any given sub-interval is small X ~ Poi( l = 2) • N(t) = # events in original time interval ~ Poi( l ) 5 2 2 ( 5 ) 0 . 0361 P X e 5 ! Geometric Random Variable Negative Binomial Random Variable • X is Geometric Random Variable: X ~ Geo( p ) • X is Negative Binomial RV: X ~ NegBin( r , p ) X is number of independent trials until first success X is number of independent trials until r successes p is probability of success on each trial p is probability of success on each trial X takes on values 1, 2, 3, …, with probability: X takes on values r , r + 1, r + 2 …, with probability: n 1 ( ) ( 1 ) n 1 P X n p p r n r P ( X n ) p ( 1 p ) , where n r , r 1 ,... r 1 Var(X) = (1 – p )/ p 2 E[X] = 1/ p Var(X) = r (1 – p )/ p 2 E[X] = r / p • Examples: • Note: Geo( p ) ~ NegBin(1, p ) Flipping a fair ( p = 0.5) coin until first “heads” appears. • Examples: Urn with N black and M white balls. Draw balls (with replacement, p = N/(N + M)) until draw first black ball. # of coin flips until r- th “heads” appears Generate bits with P(bit = 1) = p until first 1 generated # of strings to hash into table until bucket 1 has r entries 3
Recommend
More recommend