advanced algorithms
play

ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 - PowerPoint PPT Presentation

ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 ANNOUNCEMENTS HW 3 is due tomorrow! Send project topics n n Send email to utah-algo-ta@googlegroups.com, with subject Project topic; one email per group; names and


  1. ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling � 1

  2. ANNOUNCEMENTS ➤ HW 3 is due tomorrow! ➤ Send project topics n n ➤ Send email to utah-algo-ta@googlegroups.com, with subject “Project topic”; one email per group; names and UIDs easy calculus n l E n e e I I I I � 2

  3. LAST CLASS ➤ Hashing ➤ place n balls into n bins, independently and uniformly at random ➤ expected size of a bin = 1 ➤ number of bins with k balls ~= n/k! T ➤ max size of bin = O(log n/log log n) n � 3

  4. 
 
 MAIN IDEAS ➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation 
 Theorem. Let X be a non-negative random variable. For any t > 0, Pr[ X > t ⋅ 피 [ X ]] ≤ 1 Randomvaria t denotdeviate ➤ Markov’s inequality is usually not tight too much from ➤ Union bound their expectations � 4

  5. 
 
 MAIN IDEAS ➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation 
 Theorem. Let X be a non-negative random variable. For any t > 0, Pr[ X > t ⋅ 피 [ X ]] ≤ 1 t ➤ Markov’s inequality is usually not tight ➤ Union bound Theorem. suppose E 1 , E 2 , …, E n are n events in a probability space. Then Pr[ E 1 ∪ E 2 ∪ … ∪ E n ] ≤ Pr[ E 1 ] + Pr[ E 2 ] + … + Pr[ E n ] � 5

  6. THOUGHTS ➤ When hashing n balls to n bins, outcomes not “as uniform” as one likes load of abin a login Max log login ➤ Many empty bins (HW) I ➤ What happens if there are more balls? hash m balls, where Ign m ≫ n t much better ➤ “Power of two choices” (Broder et al. 91) balancing load toad E0 log log n happens Max � 6

  7. ESTIMATION winner in the full populam winner in sample Want Question: suppose each person votes R or B. Can we predict the winner without counting all votes? m of the people ask for who they Sample Answer will vote for and output the winnerin the Samp le � 7

  8. matter i ought to Things that sampling to be truly uniform everyone answering truthfully n of samples the number how close the matters the margin votes true are in our prediction confidence

  9. ANALYZING SAMPLING Each person che has a choice of Natural formalism: or I 0 ➤ Choose n people uniformly at random. entire population N ➤ Let X i (0/1) be outcome of i’th person that vote 0 Tanipmle No u that vote 1 in sample voting 0 µ mo 1 n n i if hi no Predicted winner 1 of wise Paki of Prf Xi D � 8

  10. t Xn t Xzt X N l X X XitX Mo n what is 1 IE Xi n.IT Efm Elmo n rgue that tend a Eln ist n'ad n fraction of votes I received Estimation error in thesample fraction of votes I i the population in 1 We j st argued the fro r expressions

  11. We justar from the corr expressions gued Eln EE Elmo and no n If true winner then winner sample what if 0.4N 0.6N N No 1 Our prediction is right iff moan Elmo 0.4 n claim prediction is Asks Is our no right our prediction is right iff o o l Elmo L no n 051N what if N No 0.49N and 2 our prediction is right iffno IEf.no LOo01n what is we take G oal n samples if theprob.thatno aster JLO.am

  12. ANALYZING SAMPLING Natural formalism: ➤ Choose n people uniformly at random. ➤ Let X i (0/1) be outcome of i’th person ➤ Error in estimation: ||empirical mean - true expectation||? ➤ “Confidence” Ideal guarantee: || empirical mean - true expectation || < 0.001 w.p. 0.999 � 9

  13. ii I t MARKOV? ItIno3Lo 1rij want nyo E I t Emf Prfn.s no s EIndftooI f sa I failure prob t _In my 106 I n N � 10

  14. CAN WE USE THE “NUMBER OF SAMPLES”? Variance of random variable X r variable X Hitman T E EX µ Barity fx Y tXn It Iat X t var xn if theyL varlx.lt var Xi var x independent � 11

  15. CHEBYCHEV’S INEQUALITY a random variable has low variance If MY then Markov can be improved whose be ar variable variance Then r Let theorem Pr III t E Ext r a Prl t yxY � 12

  16. we wanted Backtosany sling so.in Efno no d or Pr f Ino Efno E 0 O l d r n compute this a idea u

  17. VARIANCE OF AVERAGE � 13

  18. BOUND VIA CHEBYCHEV � 14

  19. WHAT IF WE TAKE HIGHER POWERS? 피 [( X − 피 X ) 4 ] ≤ … “Moment methods” ➤ Usually get improved bounds � 15

  20. CHERNOFF BOUND � 16

  21. INTERPRETING THE CHERNOFF BOUND � 17

  22. INTERPRETING THE CHERNOFF BOUND Useful heuristic: ➤ Sums of independent random variables don’t deviate much more than the variance � 18

  23. MCDIARMID’S INEQUALITY � 19

  24. ESTIMATING THE SUM OF NUMBERS � 20

  25. ESTIMATING THE SUM OF NUMBERS � 21

Recommend


More recommend