probabilistic programming
play

Probabilistic Programming Hongseok Yang University of Oxford - PowerPoint PPT Presentation

Probabilistic Programming Hongseok Yang University of Oxford Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. Computer. Produced by Stracheys Love Letter (1952) Generated by


  1. Probabilistic Programming Hongseok Yang University of Oxford

  2. Manchester Univ. 1953

  3. Manchester Univ. 1953

  4. Manchester Univ. 1953

  5. Manchester Univ. 1953 Manchester Univ. Computer. Produced by Strachey’s “Love Letter” (1952) Generated by the reimplementation in http://www.gingerbeardman.com/loveletter/

  6. Strachey’s program Implements a simple randomised algorithm: 1. Randomly pick two opening words. 2. Repeat the following five times: • Pick a sentence structure randomly. • Fill the structure with random words. 3. Randomly pick closing words.

  7. 1. More randomness. Strachey’s Program 2. Adjust randomness. Use data. Implements a simple randomised algorithm: 1. Randomly pick two opening words. random N times 2. Repeat the following five times: • Pick a sentence structure randomly. • Fill the structure with random words. 3. Randomly pick closing words.

  8. 1. More randomness. Strachey’s Program 2. Adjust randomness. Use data. Implements a simple randomised algorithm: 1. Randomly pick two opening words. random N times 2. Repeat the following five times: • Pick a sentence structure randomly. • Fill the structure with random words. 3. Randomly pick closing words.

  9. What is probabilistic programming?

  10. (Bayesian) probabilistic modelling of data 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

  11. (Bayesian) probabilistic modelling of data in a prob. prog. language 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

  12. (Bayesian) probabilistic modelling of data in a prob. prog. language as a program 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

  13. (Bayesian) probabilistic modelling of data in a prob. prog. language as a program 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data. a generic inference algo. of the language

  14. Line fitting Y X

  15. Line fitting Y f(x) = s*x + b X

  16. Bayesian generative model s y i b i=1..5

  17. Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) gi ven y 1 .. y 5 ?

  18. Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y 1 .. y 5 ?

  19. Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y 1 =2.5, …, y 5 =10.1?

  20. Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

  21. Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

  22. Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

  23. Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

  24. Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

  25. Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) f))

  26. Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :f f))

  27. Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

  28. Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

  29. Samples from posterior Y X

  30. Why should one care about prob. programming?

  31. My favourite answer “Because probabilistic programming is a good way to build an AI.” (My ML colleague)

  32. Procedural modelling SOSMC-Controlled Sampling Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

  33. Procedural modelling Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

  34. Procedural modelling Asynchronous function call via future Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

  35. Captcha solving 2011 ( ) o ( ), ). 3 Figu Noise: N Le, Baydin, Wood [2016]

  36. Compilation Inference Training data Test data � � � � ) ; y � � ) g y Probabilistic program p � � ; y ) NN architecture SIS Compilation artifact q � � j y ; � ) Training � Posterior D KL � p � � j y ) jj p � � j y ) q � � j y ; � )) Expensive / slow Cheap / fast Le, Baydin, Wood [2016]

  37. Approximating prob. programs by neural nets. Compilation Inference Training data Test data � � � � ) ; y � � ) g y Probabilistic program p � � ; y ) NN architecture SIS Compilation artifact q � � j y ; � ) Training � Posterior D KL � p � � j y ) jj p � � j y ) q � � j y ; � )) Expensive / slow Cheap / fast Le, Baydin, Wood [2016]

  38. Nonparametric Bayesian: Indian buffer process (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Roy et al. 2008

  39. Nonparametric Bayesian: Indian buffer process (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Lazy infinite array Roy et al. 2008

  40. Nonparametric Bayesian: Higher-order Indian buffer process parameter (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Roy et al. 2008

  41. My research : Denotational semantics Joint work with Chris Heunen, Ohad Kammar, Sam Staton, Frank Wood [LICS 2016]

  42. (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

  43. (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f)

  44. (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f) Generates a random function of type R → R. But its mathematical meaning is not clear.

  45. Measurability issue • Measure theory is the foundation of probability theory that avoids paradoxes. • Silent about high-order functions. • [Halmos] ev(f,a) = f(a) is not measurable. • The category of measurable sets is not CCC. • But Anglican supports high-order functions.

  46. Use category theory to extend measure theory. Meas Monad Meas

  47. Use category theory to extend measure theory. Yoneda Embedding [Meas op , Set] ∏ Meas Monad [Meas op , Set] ∏ Meas Yoneda Embedding

  48. Use category theory to extend measure theory. Yoneda Embedding [Meas op , Set] ∏ Meas Left Kan Monad Extension [Meas op , Set] ∏ Meas Yoneda Embedding

  49. Use category theory to extend measure theory. Yoneda Embedding [Meas op , Set] ∏ Meas Left Kan Monad Extension [Meas op , Set] ∏ Meas Yoneda Embedding

Recommend


More recommend