probabilistic programming practical
play

Probabilistic Programming Practical Frank Wood, Brooks Paige - PowerPoint PPT Presentation

Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015 Setup Java (> v. 1.5) Java Installation Mac and Windows: Linux: Download and run the installer # Debian/Ubuntu from


  1. Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015

  2. Setup

  3. Java (> v. 1.5) Java Installation Mac and Windows: Linux: Download and run the installer # Debian/Ubuntu from https://www.java.com/ sudo apt-get install en/download/manual.jsp default-jre → , # Fedora sudo yum install java-1.7.0-openjdk , →

  4. Leiningen (v. > 2.0) Leiningen Installation # Download lien to ˜/bin mkdir ˜/bin cd ˜/bin wget http://git.io/XyijMQ # Make executable chmod a+x ˜/bin/lein # Add ˜/bin to path echo ’export PATH="$HOME/bin:$PATH"’ >> ˜/.bashrc # Run lein lein Further details: http://leiningen.org/

  5. Practical Materials • https://bitbucket.org/probprog/mlss2015/get/ master.zip • cd mlss2015 • lein gorilla • open the url 5

  6. Schedule • 15:35 - 16:05 Intro/Hello World! • 16:05 - 16:30 Gaussian (you code) • 16:30 - 16:40 Discuss / intro to physics problem • 16:40 - 16:55 Physics (you code) • 16:55 - 17:00 Share / discuss solutions • 17:00 - 17:20 Inference explanation • 17:20 - 17:45 Poisson (you code) • 17:45 - 17:50 Inference Q/A • 17:50 - 18:05 Coordination (you code)

  7. What is probabilistic programming?

  8. An Emerging Field ML: STATS: Algorithms & Inference & Applications Theory Probabilistic Programming PL: Compilers, Semantics, Analysis

  9. Conceptualization p ( x | Parameters Parameters p ( y | x ) p ( x ) Program Program Observations Output y CS Probabilistic Programming Statistics

  10. Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations.” Gordon et al, 2014

  11. What are the goals of probabilistic programming?

  12. Simplify Machine Learning… Existing Model Performs Well Search For Useable Start Performs Well Scale Sufficient? Statistically? Implementation Computationally? Y Y N N N Y Derive Updates And Exists And Identify And Formalize Design Model = Test Code Inference Deploy Can Use? Problem, Gather Data Read Papers, Do Math Algorithm Y N Y N Simple Model? End Feasible? LEGEND Color indicates the skills that are N required to traverse the edge. Non-specialist Y Y PhD-level machine learning or statistics N Tool PhD-level machine learning or computer Implement Using High Choose Approximate Supports Required science Level Modeling Tool Inference Algorithm Features?

  13. To This Existing Model Search For Useable Start Performs Well Performs Well? Scale Sufficient? Implementation Computationally? N Y Design Model = Derive Updates And Exists And Identify And Formalize Write Probabilistic Debug, Test, Profile Code Inference Deploy Can Use? Problem, Gather Data Program Algorithm Simple Model? End Feasible? LEGEND Color indicates the skills that are required to traverse the edge. Non-specialist Tool Implement Using High Choose Approximate Supports Required Level Modeling Tool Inference Algorithm Features?

  14. Automate Inference Models / Stochastic Simulators α α H r c 0 γ λ m r 0 r 1 r 2 r 3 π r T π G G o o k k z d w d K 1 β k θ d α γ i i c 1 π m i = 1 . . . N d s 0 s 1 s 2 s 3 s T . k = 1 ...K c θ c θ d = 1 ...D i k i k H y θ m K 1 y 1 y 2 y 3 y T ∞ y y i i N N Programming Language Representation / Abstraction Layer Inference Engine(s)

  15. Hello World! 15

  16. First Exercise Gaussian Unknown Mean • Learning objectives 1. Clojure 2. Gorilla REPL 3. Anglican 4. Automatic inference over generative models expressed as programs via query • Resources • https://clojuredocs.org/ • https://bitbucket.org/probprog/anglican/ • http://www.robots.ox.ac.uk/~fwood/anglican/index.html 16

  17. Simulation

  18. Second Exercise Learning objectives 1. Develop experience thinking about expressing problems as inference over program executions 2. Understand how to perform inference over a complex deterministic generative process, here a 2D-physics simulator 18

  19. Second Exercise Use inference to solve a mechanism design optimization task: • get all balls safely in bin 19

  20. Inference

  21. Trace Probability • observe data points N y n Y p ( y 1: N , x 1: N ) = g ( y n | x 1: n ) f ( x n | x 1: n − 1 ) n =1 • internal random choices x n x 1 x 2 x 3 • simulate from y 3 y 1 y 2 f ( x n | x 1: n − 1 ) x 2 by running the program x 1 { { etc forward θ x 12 x 11 x 13 x 21 x 22 • weight traces by observes y 1 y 2 g ( y n | x 1: n )

  22. Trace x 2 , 2 = 0 x 2 , 2 = 1 x 2 , 1 = 7 . . . x 1 , 2 = 0 x 1 , 2 = 1 x 1 , 1 = 3 x 1 , 2 = 2 x 2 , 1 = 9 ( let [x-1-1 3 x-1-2 ( sample ( discrete (range x-1-1)))] ( if (not= x-1-2 1) ( let [x-2-1 (+ x-1-2 7)] ( sample ( poisson x-2-1)))))

  23. Observe x 2 , 2 = 0 x 2 , 2 = 1 x 2 , 1 = 7 . . . x 1 , 2 = 0 x 1 , 2 = 1 x 1 , 1 = 3 x 1 , 2 = 2 x 2 , 1 = 9 ( let [x-1-1 3 x-1-2 ( sample ( discrete (range x-1-1)))] ( if (not= x-1-2 1) ( let [x-2-1 (+ x-1-2 7)] ( sample ( poisson x-2-1)))) ( observe ( gaussian x-2-1 0.0001) 7)))

  24. “Single Site” MCMC = LMH Posterior distribution of execution traces is proportional to trace score with observed values plugged in p ( x | y ) ∝ ˜ p ( y = observes , x ) Metropolis-Hastings acceptance rule ✓ ◆ 1 , p ( y | x 0 ) p ( x 0 ) q ( x | x 0 ) min p ( y | x ) p ( x ) q ( x 0 | x ) ▪ Need ▪ Proposal ▪ Have ▪ Likelihoods (via observe statement restrictions) ▪ Prior (sequence of ERP returns; scored in interpreter) 24 Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation [Wingate, Stuhlmüller et al, 2011]

  25. LMH Proposal Probability of new part of Single stochastic proposed execution trace procedure (SP) output p ( x 0 \ x | x 0 ∩ x ) q ( x 0 | x ) = κ ( x 0 m,j | x m,j ) m,j | x 0 ∩ x ) . | x | p ( x 0 Number of SP’s in original trace Probability of new SP return value (sample) given trace prefix [Wingate, Stuhlmüller et al, 2011]

  26. LMH Implementation “Single site update” = sample from the prior = run program forward pled conditioned on the preced m,j | x 0 ∩ x ). at κ ( x 0 m,j | x m,j ) = p ( x 0 density can now be expressed Simplified MH acceptance ratio Number of SP applications Probability of regenerating current trace in original trace continuation given proposal trace beginning p ( y | x 0 ) p ( x 0 ) | x | p ( x \ x 0 | x ∩ x 0 ) p ( y | x ) p ( x ) | x 0 | p ( x 0 \ x | x 0 ∩ x ) . Number of SP applications Probability of generating proposal trace in new trace continuation given current trace beginning 26

  27. Introduction : Sequential Monte Carlo Sequential ¡Monte ¡Carlo ¡targets ¡ N Y p ( x 1: N | y 1: N ) ∝ ˜ p ( y 1: N , x 1: N ) g ( y n | x 1: n ) f ( x n | x 1: n − 1 ) ≡ n =1 With ¡a ¡weighted ¡set ¡of ¡particles ¡ L X w ` p ( x 1: N | y 1: N ) ≈ N � x ` 1: N ( x 1: N ) . ` =1 Noting ¡the ¡identity ¡ We ¡can ¡use ¡importance ¡sampling ¡to ¡generate ¡samples ¡from ¡ ¡ p ( x 1: n | y 1: n ) = Given ¡a ¡sample-­‑based ¡approximation ¡to ¡ ¡ 1 ) p ( x 1: n − 1 | y 1: n − 1 ) ¡ ¡

  28. SMC n = 1 n = 2 Iteratively, 
 - simulate 
 - weight 
 Particle - resample Observe

  29. SMC for Probabilistic Programming Parallel executions L X w ` p ( x 1: n � 1 | y 1: n � 1 ) 1: n − 1 ( x 1: n � 1 ) n � 1 δ x ` ≈ ` =1 Sequence of environments p ( x 1: n | y 1: n ) = g ( y n | x 1: n ) f ( x n | x 1: n � 1 ) p ( x 1: n � 1 | y 1: n � 1 ) q ( x 1: n | y 1: n ) = f ( x n | x 1: n � 1 ) p ( x 1: n � 1 | y 1: n � 1 ) Proposal L a ` X g ( y n | x ` x ` 1: n = x ` n − 1 p ( x 1: n | y 1: n ) ≈ 1: n ) δ x ` 1: n ( x 1: n ) , 1: n � 1 ∼ f n x ` =1 Run program forward Weight of particle until next observe Is observation likelihood W., van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014

  30. SMC Methods Only Require • Initialization (sample) p ( x 1 ) • Forward simulation (sample) f ( x n | x 1: n � 1 ) • Observation likelihood computation • pointwise evaluation up to normalization g ( y n | x 1: n )

  31. SMC for Probabilistic Programming Algorithm 1 Parallel SMC program execution Assume: N observations, L particles launch L copies of the program (parallel) for n = 1 . . . N do wait until all L reach observe y n (barrier) w 1: L update unnormalized weights ˜ (serial) n if ESS < ⌧ then sample number of offspring O 1: L (serial) n w 1: L set weight ˜ = 1 (serial) n for ` = 1 . . . L do fork or exit (parallel) end for else set all number of offspring O ` n = 1 (serial) end if continue program execution (parallel) end for wait until L program traces terminate (barrier) p ( x 1: L predict from L samples from ˆ 1: N | y 1: N ) (serial) Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014 Paige and W. “A Compilation Target for Probabilistic Programming Languages.” ICML, 2014

  32. SMC for Probabilistic Programming Intuitively 
 - run 
 - wait 
 Threads - fork continuations observe delimiter

Recommend


More recommend