Beyond calculation: Probabilistic Computing Machines and Universal Stochastic Inference Vikash K. Mansinghka December 17, 2011 NIPS Workshop on Philosophy and Machine Learning Monday, January 9, 12 1
Acknowledgements Noah Eric Jonas Keith Bonawitz Josh Tenenbaum Dan Roy Cap Petschulat Goodman Max Gasner m a s o Beau Cronin Cameron Freer Tom Knight Gerry Sussman T o o g g i o P Monday, January 9, 12 2
Computers were built for calculation and deduction compute, v: to determine by mathematical means; to calculate Monday, January 9, 12 3
Probabilistic inference seems central to intelligence, but also cumbersome, intractable, so we simplify and approximate Simulation is easy Generative Data Process Inference is hard P(model) P(data | model) P(model | data) = P(data) Exponential Exponential domain normalizer Monday, January 9, 12 4
The mind and brain accomplish far more than our smartest computer systems, and they do it with far less. We need greater expressiveness and tractability , for making both inferences and decisions . 80 kilowatts, 30 watts, 3.55 GHz, 100Hz, world sees, hears, Jeopardy! navigates, champion, negotiates via statistical relationships, calculation led team that built Watson Function f() , as program Genetic & physical constraints that calculates Sense Universal x Turing data Machine Output f(x) Cognition, perception, action Monday, January 9, 12 5
Outline • Motivation : the capability and efficiency gaps with biology • Phenomenon : examples of probabilistic programming systems • Philosophy : a new mathematical model of computation • Potential: computing machines for which induction, abduction are natural Monday, January 9, 12 6
CAPTCHAs are easy to make, hard to break Generating CAPTCHAs: easy {N,A,V,I,A} Breaking CAPTCHAs: hard Google CAPTCHA OCR (CVPR 2010) Monday, January 9, 12 7
Breaking simple CAPTCHAs by running a randomized CAPTCHA generator backwards Input Captcha Guessed Explanation Monday, January 9, 12 8
Breaking simple CAPTCHAs by running a randomized CAPTCHA generator backwards Generator program that outputs random CAPTCHAs Observed CAPTCHA Probabilistic Programming System How it ran: glyphs={N,A,V,I,A}, ... (different sample each time) Monday, January 9, 12 9
Breaking simple CAPTCHAs by running a randomized CAPTCHA generator backwards 1. Randomly choose some glyphs (with font and location) Glyph Font X Y ~uniform(A,Z) ~uniform(0,2) ~uniform(0,W) ~uniform(0,H) A 1 98 19 J 2 140 10 Q 1 43 7 S 0 98 3 J 1 80 15 2. Render to an image Inference 3. Add some noise (spatial noise + pixelwise errors) 4. Observe that the output = image we’re interpreting Monday, January 9, 12 10
Breaking simple CAPTCHAs by running a randomized CAPTCHA generator backwards 2. Render to an image (define w 200) (define rendered (define h 70) (image/scene/draw blank glyphs)) (define rate 0.5) 3. Add some noise (define maxglyphs 12) (define blur_radius (continuous-uniform:double-drift 0 10)) (define blank (image/make w h)) (define blurred 1. Randomly choose some glyphs (image/blur rendered blur_radius)) (define maybe_sample_glyph (define constant0 (lambda () (discrete-uniform 1 127)) (if (bernoulli rate) (define constant255 (image/glyph/sample w h) (discrete-uniform 128 254)) #f ) (define blurred_with_noise ) (image/interpolate ) blurred constant0 constant255 ) (define glyphs ) (vector/remove 4. Observe that the output matches the target image (vector/draw (define observed (image/load "image.png")) maybe_sample_glyph (observe maxglyphs) (image/stochastic_binarize #f blurred_with_noise) ) observed ) ) Monday, January 9, 12 11
Breaking simple CAPTCHAs by running a randomized CAPTCHA generator backwards Generator program that outputs random CAPTCHAs Random walk (MCMC) over execution Observed CAPTCHA Probabilistic histories; Programming landscape System defined locally by P(history,data) How it ran: glyphs={N,A,V,I,A}, ... (different sample each time) Converges well due to inclusion of (overlooked) randomness Fast iterations due to conditional independence (asymptotics, locality, parallelism) and software+systems engineering (small state, fast updates) Monday, January 9, 12 12
Computer vision as “inverse Pixar”, using stochastic MATLAB Posterior geometry, rendered Target (known lighting, Image unknown mesh; weak smoothness prior) (Wingate et al, 2011) Monday, January 9, 12 13
Applications of Probabilistic Programming Systems, including Church 1. Nuclear safety via CTBT monitoring (Arora, Russell, Sudderth et al, 2009-2011) 2. Tracking multiple targets from video, radar (Arora et al 2010; Oh et al 2009) 3. Automatic control system synthesis (Wingate et al 2011) 4. Information extraction from unstructured text (McCallum et al 2009) 5. Automatic statistical analyst (Mansinghka et al 2009; 2011 in prep) 6. Clinical reasoning and pharmacokinetics (Mansinghka et al in prep) 7. Human learning as probabilistic program induction (Stuhlmuller et al 2011, Tenenbaum; Kemp; Griffiths; Goodman) Monday, January 9, 12 14
Probabilistic computing: Computation as stochastic inference , not deterministic calculation Space of possibilities ~P(H) Function f() (as a probabilistic program that guesses (as a program that calculates ) possible explanations and actions ) Data D (as a probabilistic Universal Universal predicate Stochastic Calculator that Input x Inference (Turing checks Machine Machine) constraints ) Output f(x) Sampled probable explanation ~P(H|D) or satisficing decision (key idea: different each time) Monday, January 9, 12 15
Probabilistic computing: Computation as stochastic inference , not deterministic calculation Turing embedding: Space of possibilities ~P(H) H = (x, f(x)) (as a probabilistic program that guesses possible explanations and actions ) D: H x == x Data D Universality: ~P(H), D (as a probabilistic contain arbitrary stochastic Universal predicate inference Stochastic that Inference checks Prob. program induction: Machine constraints ) H = <prob. program text> D: <spec checker> Meta-reasoning: Sampled probable explanation ~P(H|D) or satisficing decision H = <model of agent> (key idea: different each time) D: <agent’s behavior> Monday, January 9, 12 16
Probabilistic computing: Computation as stochastic inference , not deterministic calculation Space of possibilities ~P(H) AI systems, (as a probabilistic program that guesses models of cognition, possible explanations and actions ) perception and action Data D (as a Specialized Universal probabilistic Inference Inference Universal predicate Modules Machines Stochastic that Inference checks Parallel Stochastic Machine constraints ) Finite State Machines Probabilistic Commodity Hardware Hardware Sampled probable explanation ~P(H|D) or satisficing decision (key idea: different each time) Mansinghka 2009 Monday, January 9, 12 17
Snapshot of the field: new languages, systems, architectures, theory 10+ research ... prototype languages, 2 universal Figaro Church: 5 implementations BLOG New Probabilistic Architectures New Theorems in Probabilistic for Universal Inference Computability and Complexity Wingate, Stuhlmuller, Goodman 2011 Ackerman, Freer, Roy 2011 (& in prep) Arora, Russell et al 2010 Freer, Mansinghka, Roy 2010 Goodman, Mansinghka et al 2008 Haeupler, Saha, Srinivasan 2010, Propp & Wilson 1996 Monday, January 9, 12 18
These machines make sampling more natural than optimization and integration Claim: sampling is easier than both optimization and integration Explaining away becomes natural (and a question of convergence), but calculating low probabilities exactly may be nearly impossible Monday, January 9, 12 19
These machines make sampling more natural than optimization and integration Dominates the optimum; has negligible mass Claim: sampling is easier than both optimization and integration Explaining away becomes natural (and a question of convergence), but calculating low probabilities exactly may be nearly impossible Monday, January 9, 12 20
These machines make sampling more natural than optimization and integration Claim: sampling is easier than both optimization and integration Explaining away becomes natural (and a question of convergence), but calculating low probabilities exactly may be nearly impossible Monday, January 9, 12 21
These machines make sampling more natural than optimization and integration Many expectations (e.g. test functions for rare events) are hard to estimate Claim: sampling is easier than both optimization and integration Explaining away becomes natural (and a question of convergence), but calculating low probabilities exactly may be nearly impossible Monday, January 9, 12 22
What is the computational complexity of stochastic inference? (not P , NP , #P , BPP , ...) Bayes reduction targets are usually easy: SAT, k-color, ... see e.g. phase transitions (Selman et al), semi-random sources (Vazirani et al), smoothed analysis (Spielman et al) Usually hard, basis of crypto (via generation of instances that are hard in practice): factoring, graph iso Monday, January 9, 12 23
Recommend
More recommend