Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015
Setup
Java (> v. 1.5) Java Installation Mac and Windows: Linux: Download and run the installer # Debian/Ubuntu from https://www.java.com/ sudo apt-get install en/download/manual.jsp default-jre → , # Fedora sudo yum install java-1.7.0-openjdk , →
Leiningen (v. > 2.0) Leiningen Installation # Download lien to ˜/bin mkdir ˜/bin cd ˜/bin wget http://git.io/XyijMQ # Make executable chmod a+x ˜/bin/lein # Add ˜/bin to path echo ’export PATH="$HOME/bin:$PATH"’ >> ˜/.bashrc # Run lein lein Further details: http://leiningen.org/
Practical Materials • https://bitbucket.org/probprog/mlss2015/get/ master.zip • cd mlss2015 • lein gorilla • open the url 5
Schedule • 15:35 - 16:05 Intro/Hello World! • 16:05 - 16:30 Gaussian (you code) • 16:30 - 16:40 Discuss / intro to physics problem • 16:40 - 16:55 Physics (you code) • 16:55 - 17:00 Share / discuss solutions • 17:00 - 17:20 Inference explanation • 17:20 - 17:45 Poisson (you code) • 17:45 - 17:50 Inference Q/A • 17:50 - 18:05 Coordination (you code)
What is probabilistic programming?
An Emerging Field ML: STATS: Algorithms & Inference & Applications Theory Probabilistic Programming PL: Compilers, Semantics, Analysis
Conceptualization p ( x | Parameters Parameters p ( y | x ) p ( x ) Program Program Observations Output y CS Probabilistic Programming Statistics
Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations.” Gordon et al, 2014
What are the goals of probabilistic programming?
Simplify Machine Learning… Existing Model Performs Well Search For Useable Start Performs Well Scale Sufficient? Statistically? Implementation Computationally? Y Y N N N Y Derive Updates And Exists And Identify And Formalize Design Model = Test Code Inference Deploy Can Use? Problem, Gather Data Read Papers, Do Math Algorithm Y N Y N Simple Model? End Feasible? LEGEND Color indicates the skills that are N required to traverse the edge. Non-specialist Y Y PhD-level machine learning or statistics N Tool PhD-level machine learning or computer Implement Using High Choose Approximate Supports Required science Level Modeling Tool Inference Algorithm Features?
To This Existing Model Search For Useable Start Performs Well Performs Well? Scale Sufficient? Implementation Computationally? N Y Design Model = Derive Updates And Exists And Identify And Formalize Write Probabilistic Debug, Test, Profile Code Inference Deploy Can Use? Problem, Gather Data Program Algorithm Simple Model? End Feasible? LEGEND Color indicates the skills that are required to traverse the edge. Non-specialist Tool Implement Using High Choose Approximate Supports Required Level Modeling Tool Inference Algorithm Features?
Automate Inference Models / Stochastic Simulators α α H r c 0 γ λ m r 0 r 1 r 2 r 3 π r T π G G o o k k z d w d K 1 β k θ d α γ i i c 1 π m i = 1 . . . N d s 0 s 1 s 2 s 3 s T . k = 1 ...K c θ c θ d = 1 ...D i k i k H y θ m K 1 y 1 y 2 y 3 y T ∞ y y i i N N Programming Language Representation / Abstraction Layer Inference Engine(s)
Hello World! 15
First Exercise Gaussian Unknown Mean • Learning objectives 1. Clojure 2. Gorilla REPL 3. Anglican 4. Automatic inference over generative models expressed as programs via query • Resources • https://clojuredocs.org/ • https://bitbucket.org/probprog/anglican/ • http://www.robots.ox.ac.uk/~fwood/anglican/index.html 16
Simulation
Second Exercise Learning objectives 1. Develop experience thinking about expressing problems as inference over program executions 2. Understand how to perform inference over a complex deterministic generative process, here a 2D-physics simulator 18
Second Exercise Use inference to solve a mechanism design optimization task: • get all balls safely in bin 19
Inference
Trace Probability • observe data points N y n Y p ( y 1: N , x 1: N ) = g ( y n | x 1: n ) f ( x n | x 1: n − 1 ) n =1 • internal random choices x n x 1 x 2 x 3 • simulate from y 3 y 1 y 2 f ( x n | x 1: n − 1 ) x 2 by running the program x 1 { { etc forward θ x 12 x 11 x 13 x 21 x 22 • weight traces by observes y 1 y 2 g ( y n | x 1: n )
Trace x 2 , 2 = 0 x 2 , 2 = 1 x 2 , 1 = 7 . . . x 1 , 2 = 0 x 1 , 2 = 1 x 1 , 1 = 3 x 1 , 2 = 2 x 2 , 1 = 9 ( let [x-1-1 3 x-1-2 ( sample ( discrete (range x-1-1)))] ( if (not= x-1-2 1) ( let [x-2-1 (+ x-1-2 7)] ( sample ( poisson x-2-1)))))
Observe x 2 , 2 = 0 x 2 , 2 = 1 x 2 , 1 = 7 . . . x 1 , 2 = 0 x 1 , 2 = 1 x 1 , 1 = 3 x 1 , 2 = 2 x 2 , 1 = 9 ( let [x-1-1 3 x-1-2 ( sample ( discrete (range x-1-1)))] ( if (not= x-1-2 1) ( let [x-2-1 (+ x-1-2 7)] ( sample ( poisson x-2-1)))) ( observe ( gaussian x-2-1 0.0001) 7)))
“Single Site” MCMC = LMH Posterior distribution of execution traces is proportional to trace score with observed values plugged in p ( x | y ) ∝ ˜ p ( y = observes , x ) Metropolis-Hastings acceptance rule ✓ ◆ 1 , p ( y | x 0 ) p ( x 0 ) q ( x | x 0 ) min p ( y | x ) p ( x ) q ( x 0 | x ) ▪ Need ▪ Proposal ▪ Have ▪ Likelihoods (via observe statement restrictions) ▪ Prior (sequence of ERP returns; scored in interpreter) 24 Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation [Wingate, Stuhlmüller et al, 2011]
LMH Proposal Probability of new part of Single stochastic proposed execution trace procedure (SP) output p ( x 0 \ x | x 0 ∩ x ) q ( x 0 | x ) = κ ( x 0 m,j | x m,j ) m,j | x 0 ∩ x ) . | x | p ( x 0 Number of SP’s in original trace Probability of new SP return value (sample) given trace prefix [Wingate, Stuhlmüller et al, 2011]
LMH Implementation “Single site update” = sample from the prior = run program forward pled conditioned on the preced m,j | x 0 ∩ x ). at κ ( x 0 m,j | x m,j ) = p ( x 0 density can now be expressed Simplified MH acceptance ratio Number of SP applications Probability of regenerating current trace in original trace continuation given proposal trace beginning p ( y | x 0 ) p ( x 0 ) | x | p ( x \ x 0 | x ∩ x 0 ) p ( y | x ) p ( x ) | x 0 | p ( x 0 \ x | x 0 ∩ x ) . Number of SP applications Probability of generating proposal trace in new trace continuation given current trace beginning 26
Introduction : Sequential Monte Carlo Sequential ¡Monte ¡Carlo ¡targets ¡ N Y p ( x 1: N | y 1: N ) ∝ ˜ p ( y 1: N , x 1: N ) g ( y n | x 1: n ) f ( x n | x 1: n − 1 ) ≡ n =1 With ¡a ¡weighted ¡set ¡of ¡particles ¡ L X w ` p ( x 1: N | y 1: N ) ≈ N � x ` 1: N ( x 1: N ) . ` =1 Noting ¡the ¡identity ¡ We ¡can ¡use ¡importance ¡sampling ¡to ¡generate ¡samples ¡from ¡ ¡ p ( x 1: n | y 1: n ) = Given ¡a ¡sample-‑based ¡approximation ¡to ¡ ¡ 1 ) p ( x 1: n − 1 | y 1: n − 1 ) ¡ ¡
SMC n = 1 n = 2 Iteratively, - simulate - weight Particle - resample Observe
SMC for Probabilistic Programming Parallel executions L X w ` p ( x 1: n � 1 | y 1: n � 1 ) 1: n − 1 ( x 1: n � 1 ) n � 1 δ x ` ≈ ` =1 Sequence of environments p ( x 1: n | y 1: n ) = g ( y n | x 1: n ) f ( x n | x 1: n � 1 ) p ( x 1: n � 1 | y 1: n � 1 ) q ( x 1: n | y 1: n ) = f ( x n | x 1: n � 1 ) p ( x 1: n � 1 | y 1: n � 1 ) Proposal L a ` X g ( y n | x ` x ` 1: n = x ` n − 1 p ( x 1: n | y 1: n ) ≈ 1: n ) δ x ` 1: n ( x 1: n ) , 1: n � 1 ∼ f n x ` =1 Run program forward Weight of particle until next observe Is observation likelihood W., van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014
SMC Methods Only Require • Initialization (sample) p ( x 1 ) • Forward simulation (sample) f ( x n | x 1: n � 1 ) • Observation likelihood computation • pointwise evaluation up to normalization g ( y n | x 1: n )
SMC for Probabilistic Programming Algorithm 1 Parallel SMC program execution Assume: N observations, L particles launch L copies of the program (parallel) for n = 1 . . . N do wait until all L reach observe y n (barrier) w 1: L update unnormalized weights ˜ (serial) n if ESS < ⌧ then sample number of offspring O 1: L (serial) n w 1: L set weight ˜ = 1 (serial) n for ` = 1 . . . L do fork or exit (parallel) end for else set all number of offspring O ` n = 1 (serial) end if continue program execution (parallel) end for wait until L program traces terminate (barrier) p ( x 1: L predict from L samples from ˆ 1: N | y 1: N ) (serial) Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014 Paige and W. “A Compilation Target for Probabilistic Programming Languages.” ICML, 2014
SMC for Probabilistic Programming Intuitively - run - wait Threads - fork continuations observe delimiter
Recommend
More recommend