introduction to bayesian methods from a cognitive
play

Introduction to Bayesian Methods from a Cognitive Perspective Tejas - PowerPoint PPT Presentation

Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14 Everyday Inductive Leaps How do we learn so much from so little data? Properties of natural kinds


  1. Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14

  2. Everyday Inductive Leaps • How do we learn so much from so little data? • Properties of natural kinds • One shot recognition of novel objects • Meaning of words • Future outcomes of dynamic process • Hidden causal properties of objects • Causes of person’s action (beliefs, goals) • Causal laws governing a domain Sunday, September 21, 14

  3. Learning concepts and words � tufa � � tufa � � tufa � Can you pick out the tufas? Sunday, September 21, 14

  4. Why Probability? • Our internal models of reality are often incomplete. Therefore we need a mathematical language to handle uncertainty • Probability theory is a framework to extend logic to include reasoning on uncertain information • Probability need not have anything to do with randomness. Probabilities do not describe reality -- only our information about reality - E.T. Jaynes • Bayesian statistics describes epistemological ( study of the nature and scope of knowledge ) uncertainty using the mathematical language of probability • Start with prior beliefs and update these using data to give posterior beliefs Sunday, September 21, 14

  5. Fundamentals Given: D = { x 1 , x 2 , ..., x n } Prior probability: P ( H ) Likelihood: P ( D | H = h ) Calculate Posterior: P ( D | H = h ) P ( H = h ) P ( H = h | D ) = P i P ( D | H = h i ) P ( H = h i ) Sunday, September 21, 14

  6. Hypothesis Testing: Coin Flipping Data (D): H H T H T => fair coin => always heads H 1 H 2 1 P ( H 1 ) = 0 . 5 P ( D | H 1 ) = 2 5 P ( D | H 2 ) P ( H 2 ) = = 0 . 5 0 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) inf = = P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

  7. Hypothesis Testing: Coin Flipping Data (D): H H H H H => fair coin => always heads H 1 H 2 1 P ( H 1 ) 999 / 1000 = P ( D | H 1 ) = 2 5 P ( D | H 2 ) P ( H 2 ) 1 / 1000 = = 1 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) = = 31 . 21 P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

  8. Hypothesis Testing: Coin Flipping Data (D): H H H H H H H H H H => fair coin => always heads H 1 H 2 1 P ( H 1 ) 999 / 1000 = P ( D | H 1 ) = 2 10 P ( D | H 2 ) P ( H 2 ) 1 / 1000 = = 1 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) 0 . 97 = = P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

  9. Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Inference Problem: Simulator P ( S, T, L, A | I ) ∝ P ( I | S, T, L, A ) P ( L ) P ( S ) P ( T ) P ( A ) Y ∝ N ( I − O ; 0 , 0 . 1) P ( L ) P ( A ) P ( S i ) P ( T i ) Image i Sunday, September 21, 14

  10. Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Random Draw Simulator Image Sunday, September 21, 14

  11. Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Random Draw Simulator Image Sunday, September 21, 14

  12. Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Random Draw Simulator Image Sunday, September 21, 14

  13. Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Random Draw Simulator Image Sunday, September 21, 14

  14. Nose Nose Eyes Eyes ? ? Light Outline Outline Mouth Mouth ? Shape Texture Shading Affine ? Simulator Image Sunday, September 21, 14

  15. Nose Nose Eyes Eyes Light Outline Outline Mouth Mouth Shape Texture Shading Affine Simulator Image Sunday, September 21, 14

  16. Nose Nose Eyes Eyes Light Outline Outline Mouth Mouth Shape Texture Shading Affine Simulator Image Sunday, September 21, 14

  17. Nose Nose Eyes Eyes Light Outline Outline Mouth Mouth Shape Texture Shading Affine Simulator Image Sunday, September 21, 14

  18. Example: Vision as Inverse Graphics Aldrian et. al., Inverse Rendering with a Morphable Model: A Multilinear Approach, ECCV 2011 Sunday, September 21, 14

  19. Optimal Predictions in Everyday Cognition • How well do cognitive judgements compare with optimal statistical inferences in real-world settings? • In Griffiths&Tenenbaum [06], people were asked to predict the duration or extent of everyday phenomena such as human life spans and the gross of movies • During experiments, the phenomena and amount of data for each phenomena was parametrically varied to test predictions from an optimal bayesian model to reported human predictions. Sunday, September 21, 14

  20. Optimal Predictions in Everyday Cognition • Let us denote t total : eg. total amount of time man/woman will live : eg. indicates his or her current age t • The Bayesian predictor computes a probability distribution over given , by applying Bayes’s rule: t total t P ( t total | t ) ∝ p ( t | t total ) p ( t total ) • The likelihood is the probability of first encountering a man/woman at age t given that his total life span is t total • For eg. when we are equally likely to meet a man/woman at any point in his life, the likelihood probability is uniform: p ( t | t total ) = 1 /t total Sunday, September 21, 14

  21. Sample Questions Sunday, September 21, 14

  22. Comparing model with humans ... Sunday, September 21, 14

  23. Comparing model with humans ... • These results are inconsistent with claims that cognitive judgments are based on non-Bayesian heuristics that are insensitive to priors (Kahneman et al., 1982; Tversky & Kahneman, 1974) • The results are also inconsistent with simpler Bayesian prediction models that adopt a single uninformative prior , regardless of the phenomenon to p ( t total ) ∝ 1 /t total be predicted (Gott, 1993, 1994; Jaynes, 2003; Jeffreys, 1961; Ledford et al., 2001) • Why is variance high for the Pharaoh experiment? Sunday, September 21, 14

  24. Comparing model with humans ... • Given an unfamiliar prediction task, people might be able to identify the appropriate form of the distribution by making an analogy to more familiar phenomena in the same broad class, even if they do not have sufficient direct experience to set the parameters of that distribution accurately • If participants predicted the reign of the pharaoh by drawing an analogy to modern monarchs and adjusting the mean reign duration downward by some uncertain but insufficient factor, that would be entirely consistent with the pattern of errors we observed. Such a strategy of prediction by analogy could be an adaptive way of making judgments that would otherwise lie beyond people’s limited base of knowledge and experience Ref: http://web.mit.edu/cocosci/Papers/Griffiths-Tenenbaum-PsychSci06.pdf Sunday, September 21, 14

  25. Graphical Models: Bayes Nets • Compact way to represent probabilities • Mental model of causal information flow that gives rise to data/ observations Sunday, September 21, 14

  26. Graphical Models: Bayes Nets Joint: P ( C, S, R, W ) = P ( C ) P ( S | C ) P ( R | C, S ) P ( W | C, S, R ) Space required to represent probability table: O (2 N ) Sunday, September 21, 14

  27. Graphical Models: Bayes Nets Joint (conditional ind): P ( C, S, R, W ) = P ( C ) P ( S | C ) P ( R | C ) P ( W | S, R ) Space required to represent probability table (K is max fan-in of a node): O ( N 2 K ) Sunday, September 21, 14

  28. Inference Suppose we observe that grass is wet. This reasons could be: (1) either it is raining, or (2) the sprinkler is on. Which is more likely? X P ( W = 1) = P ( C = c, S = s, R = r, W = 1) = 0 . 6471 c,r,s P ( S = 1 | W = 1) = P ( S = 1 , W = 1) P ( C = c, S = 1 , R = r, W = 1) X = = 0 . 2781 / 0 . 6471 P ( W = 1) P ( W = 1) c,r P ( R = 1 | W = 1) = P ( R = 1 , W = 1) P ( C = c, S = s, R = 1 , W = 1) X = = 0 . 4581 / 0 . 6471 P ( W = 1) P ( W = 1) c,s Sunday, September 21, 14

  29. Bayes Nets: Explaining Away • Two causes ( R =1 and S =1) were competing to explain the data. Therefore, S and R become conditionally dependent given that W is observed (even though they are marginally independent) • Suppose grass is wet ( W =1) but we also know that it is raining. Then the posterior probability that the sprinkler is on goes down: P ( S = 1 | W = 1 , R = 1) = 0 . 19 • Remember Earlier: P ( S = 1 | W = 1) = P ( S = 1 , W = 1) P ( C = c, S = 1 , R = r, W = 1) 0.42 X = = 0 . 2781 / 0 . 6471 P ( W = 1) P ( W = 1) c,r Sunday, September 21, 14

  30. More complex models ... Sunday, September 21, 14

  31. Inference • Many inference strategies for generative models: MCMC, Variational, Message Passing, Particle filtering etc. • Today we will discuss an algorithm that is simple and general (not always efficient) Sunday, September 21, 14

  32. Inference • Simplest MCMC Algorithm: Metropolis Hastings • For simplicity, let us fix light and affine variables. S Nose ∼ randn (50) T Nose ∼ randn (50) . Nose Nose . Eyes Eyes . Light Outline Outline Mouth Mouth S Mouth ∼ randn (50) Shape Texture T Mouth ∼ randn (50) Shading P ( I | S, T ) ∝ Normal( O − R ; 0 , σ 0 ) Affine Simulator Image Sunday, September 21, 14

Recommend


More recommend