information theory and statistical inference
play

Information Theory and Statistical Inference Samuel Cheng School of - PowerPoint PPT Presentation

Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45 Lecture 2 Introduction to probabilistic


  1. Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45

  2. Lecture 2 Introduction to probabilistic inference Inference o : (Observed) evidence, θ : Parameter, x : prediction Maximum Likelihood (ML) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( o | θ ) Maximum A Posteriori (MAP) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( θ | o ) Bayesian � x = � ˆ x x p ( x | θ ) p ( θ | o ) θ � �� � p ( x | o ) where p ( θ | o ) = p ( o | θ ) p ( θ ) ∝ p ( o | θ ) p ( θ ) p ( o ) ���� prior S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 2 / 45

  3. Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Prior: Probability of a hypothesis before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 3 / 45

  4. Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Uniform Prior: All hypothesis are equally likely before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 4 / 45

  5. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? 3 P ( C 1 | H ) = P ( H | C 1 ) P ( C 1 ) � P ( H ) = P ( H | C i ) P ( C i ) P ( H ) i =1 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 5 / 45

  6. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.6 Posterior: Probability of a hypothesis given data C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 6 / 45

  7. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = ? P(C 2 |HT) = ? P(C 3 |HT) = ? P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 7 / 45

  8. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.21 P(C 2 |HT) = 0.58 P(C 3 |HT) = 0.21 P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 8 / 45

  9. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.21 P(C 2 |HT) = 0.58 P(C 3 |HT) = 0.21 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 9 / 45

  10. Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 10 / 45

  11. Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum Likelihood Estimate: The best hypothesis that fits observed data assuming uniform prior Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 P(H|C 2 ) = 0.5 P(C 2 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 11 / 45

  12. Lecture 2 Introduction to probabilistic inference Using Prior Knowledge • Should we always use Uniform Prior? • Background knowledge: • Heads => you go first in Abalone against TA • TAs are nice people • => TA is more likely to use a coin biased in your favor C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 12 / 45

  13. Lecture 2 Introduction to probabilistic inference Using Prior Knowledge We can encode it in the prior: P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 13 / 45

  14. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? P ( C 1 | H ) = α P ( H | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 14 / 45

  15. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.006 P(C 2 |H) = 0.165 P(C 3 |H) = 0.829 ML posterior after Exp 1: P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.600 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 15 / 45

  16. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = ? P(C 2 |HT) = ? P(C 3 |HT) = ? P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 16 / 45

  17. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 17 / 45

  18. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 18 / 45

  19. Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 19 / 45

  20. Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data assuming a non-uniform prior Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 3 P(H|C 3 ) = 0.9 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 20 / 45

  21. Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 21 / 45

  22. Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 2 and C 3 are almost equally likely C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 22 / 45

Recommend


More recommend