(How) does the brain do Bayesian inference? Sampling, search, and conditional probability in the mind Kim Scott 1 Probcomp tutorial 11/1/2012
Marr’s levels of analysis for Bayesian inference Computation Implementation Algorithm • a case for biological plausibility • Markov chain? • inspiration and • Monte Carlo? encouragement for hardware • Boesing et al. Today: A review of literature relevant to the algorithmic level, & discussion of potential directions. 2
Hypotheses: from conscious states to percepts I appeal to anyone's experience In the ordinary acts of vision this whether upon sight of an OBJECT knowledge of optics is lacking. Still he computes its distance by the it may be permissible to speak of bigness of the ANGLE made by the the psychic acts of ordinary meeting of the two OPTIC AXES? perception as unconscious [ … ] In vain shall all the conclusions, thereby making a MATHEMATICIANS in the world distinction of some sort between tell me, that I perceive certain them and the common so-called LINES and ANGLES which conscious conclusions. And while it introduce into my mind the is true that there has been [ … ] a various IDEAS of DISTANCE, so measure of doubt as to the long as I myself am conscious of similarity of the psychic activity in no such thing. the two cases, there can be no doubt as to the similarity between the results [ … ] (Berkeley, 1709, “An essay (Helmholtz, 1924, Treatise on towards a new theory of Physiological Optics) vision”) 3
MC(?) MC(?) in the mind overview 1. Brief motivation 2. Examples of people “doing Bayesian inference” 3. Evidence for computational framing 4. MCMC for Bayes net demo 5. Evidence for sampling 6. Evidence for Markov chains 4
Why movement through a hypothesis space? “Yet I say again that learning must be nondemonstrative inference; there is nothing else for it to be. And the only model of a nondemonstrative inference that has ever been proposed anywhere by anyone is hypothesis formation and confirmation.” (Fodor, “Fixation of Belief and Concept Acquisition”) 1. We really don’t have anything else 2. Subjective familiarity of the analogy for explicit problem- solving 3. “One state at a time” 5
Why care about algorithms? [In] most distributional learning procedures there are vast numbers of properties that a learner could record, and since the child is looking for correlations among these properties, he or she faces a combinatorial explosion of possibilities. […] To be sure, the inappropriate properties will correlate with no others and hence will eventually be ignored […], but only after astronomical amounts of memory space, computation, or both. (Pinker, Language Learnability and Language Development) In addition to standard curiosity… 1. Getting from behavioral data to representation of hypotheses and what is actually being learned requires assumptions about algorithms. 2. As inspiration for engineering systems for inference 3. To find out whether Bayesian inference is actually applied to varied problems in the same way 6
Word learning Property Physical events generalization Bayesian model Preschoolers Toddlers use both the Graded infant looking times Preschoolers constrain sample and sampling show effects of both frequency generalization of a new label process to generalize and arrangement, dependent when more examples are properties on time given Gweon, Tenenbaum, & 7 Xu & Tenenbaum 2007 Schulz 2010 Teglas et al 2011
Causal inference Gopnik et al 2004 Griffiths et al 2004 Griffiths & Tenenbaum 2007 8
Griffiths & Tenenbaum 2006 Baker, Saxe, & Tenenbaum 2009 Tenenbaum & Griffiths 2001 9
Computational-level evidence: psychological reality of priors 10 Griffiths & Tenenbaum 2009
Computational-level evidence: MCMC with people Idea: use people’s 2AFC category-membership choices as acceptance function for Markov chain so it converges to P(x|c) 11
Computational-level evidence • Priming affects spontaneously generated explanations, but not evaluation of given hypotheses – Bonawitz & Griffiths 2008: “ Deconfounding hypothesis generation and evaluation in Bayesian models” • Reading time ~ log probability of word (Smith & Levy 2008) 12
Algorithmic level: plausibility of MCMC • Alternatives? – Importance sampling – Magic to represent hypothesis space exponential in parameters in parallel… phase relative to a vector of frequencies? • To model exact Bayesian inference (computing the posterior distribution), we have to make approximations, e.g. MCMC methods. – …maybe the system we’re modeling does exactly the same thing. – Unfounded, but maybe still true. – And that would be great news about samplers! • If we buy into this framework enough to consider specific algorithms, we want to be able to identify… – What is the hypothesis space? – How do we move from one state to another? – What does a percept or judgment correspond to; how many samples does it use? 1. Demo 2. Monte Carlo: Evidence for sampling 3. Markov chain: Evidence for movement through a hypothesis space 13
Demo: Diagnosis net P(A) = 0.01 P(A) = 0.0001 P(B) = 0.01 • Gibbs sampler for “medical diagnosis” A B C Bayes net • Binary nodes, single layer • Observes effects, uses X Y (correct!) structure of ~A A ~A A net to wander towards ~B .001 0.99 ~C .001 0.99 posterior distribution B 0.99 0.995 C 0.99 0.995 14
Diagnosis net example 15
Diagnosis net: simple “causal” net 16 15 causes, 50 effects, ~4 causes/effect. P(effect|no cause) = 0.1, P(cause) = 0.01
Sampling in human cognition • Interpretations: – Explicit responses are individual samples – Monte Carlo: approximate a distribution by a finite number of samples • Probability matching – Phylogenetically old foraging behavior: Bees in two-armed bandits (Keaser et al 2002) – Adults often probability-match rather than maximizing (Gardner 1957); children tend to maximize more (e.g. Hudson Kam & Newport 2009, in language learning) – But even ten-month-olds are capable of probability matching (Davis, Newport, & Aslin 2009) – Evidence of sampling or separate faculty? 17
Population responses as samples • Sampling hypothesis: variation in judgments reflects the true distribution • Population level: graded fractions of Schulz, Bonawitz, & Griffiths 2007 correct responses as indirect evidence 18
Within-subject responses as samples “What percentage of the world’s Denison et al 2009: “Preschoolers sample airports are in the United States?” from probability distributions” Bonawitz et al. “Rational randomness” • Follow-up experiments showed children were not just doing probability matching to chip frequencies • Correlation between hypotheses consistent with win-stay lose-shift mechanism but not independent Vul & Pashler 2008: “the crowd within” sampling Analogous results for visual attention (Vul, Hanus, & Kanwisher 2010) 19
Sampling in intuitive physics? What would sampling (more uniquely) predict? • Dropoff in accuracy with limited resources, consistent with discrete jumps from n to n-1 samples • Rare outcomes should (rarely) skew or (usually) not affect estimates • Precision of posthoc judgment of a conditional probability should depend on conditional probability • Potential improved precision over time if objects pulled toward some location, in contrast with simple propagation of uncertainty 20 Hamrick, Battaglia, & Tenenbaum 2011
Monte Carlo estimates: a caveat “One and Done” • Often just a few samples is plenty for practical purposes • Adding any cost to sampling can even make getting just one rational • So how can we situate ourselves to grab a • Samples are from a Bernoulli good “just one”? distribution, p ~ uniform • Action is prediction of next outcome 21 Vul, Goodman, Griffiths, Tenenbaum 2008
Hypothesis space search example 22 Ullman, Goodman, Tenenbaum 2012
23 Ullman, Goodman, Tenenbaum 2012
24 Ullman, Goodman, Tenenbaum 2012
Hypothesis space search: explicit hypotheses • MCMC with an appropriate grammar can capture some qualitative features of children’s learning. What sort of evidence would admit differential predictions? – Basic: temporal correlation of hypotheses (often demonstrated) – Dependence of likely paths (and perhaps thereby posterior) on grammar used to generate hypotheses – Lack of effect of having considered and rejected a hypothesis already (special case of Markov property — no history used) – Effects of steepness around an attractive solution, rather than just its likelihood? 25
Markov chain example in perception: multistable percepts • Used Markov random field (MRF) lattice model; MCMC to infer hidden cause of image • Recovered… – gamma-distributed dominance times, – bias due to context, – situations that lead to fusion, – switches occurring in travelling waves 26 Gerschman, Vul, Tenenbaum 2012
Recommend
More recommend