exact bayesian inference by symbolic disintegration
play

Exact Bayesian inference by symbolic disintegration Chung-chieh Shan - PowerPoint PPT Presentation

Exact Bayesian inference by symbolic disintegration Chung-chieh Shan Norman Ramsey Indiana University Tufts University POPL, 18 January 2017 1 1. Probabilistic programs denote distributions 2. Exact inference by transforming terms 2 1.


  1. Observation, inference, and query in core Hakaru 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5

  2. Observation, inference, and query in core Hakaru 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5

  3. Observation, inference, and query in core Hakaru ambiguous 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5

  4. Observation of measure-zero sets is paradoxical 1 y = 2 · x 0 1 6

  5. Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 6

  6. Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 6

  7. Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 1 1 y = 2 · x 0 1 0 1 6

  8. Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 4 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 3 6

  9. Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 4 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 3 6

  10. Resolving the paradox via disintegration 1 1 y − 2 · x @ 0 0 1 0 1 E ( x ) = 1 / 4 1 1 y / x @ 2 0 1 0 1 E ( x ) = 1 / 3 6

  11. Resolving the paradox via disintegration 1 1 y − 2 · x @ 0 0 1 0 1 E ( x ) = 1 / 4 1 1 y / x @ 2 0 1 0 1 E ( x ) = 1 / 3 6

  12. Resolving the paradox via disintegration prior posterior 6

  13. Resolving the paradox via disintegration disintegrate prior posterior 6

  14. Resolving the paradox via disintegration disintegrate prior posterior 6

  15. Resolving the paradox via disintegration disintegrate prior posterior Soundness: If the disintegrator succeeds then the result is correct. 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 6

  16. Specifying disintegration by semantics disintegrate 7

  17. Specifying disintegration by semantics disintegrate 7

  18. Specifying disintegration by semantics disintegrate ξ : M ( α × β ) 7

  19. Specifying disintegration by semantics µ : M α disintegrate κ : α → M β ξ : M ( α × β ) 7

  20. Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ : α → M β ξ : M ( α × β ) 7

  21. Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ : α → M β ξ : M ( α × β ) 7

  22. Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) 7

  23. Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) 7

  24. Specifying disintegration by semantics do { a � ; µ : M α b � ; ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) return ( a , b ) } 7

  25. Specifying disintegration by semantics do { a � ; µ : M α b � ; ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) return ( a , b ) } 7

  26. do { a � ; µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } 8

  27. α = R β = R × R do { a � ; µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8

  28. α = R : α β = R × R y − 2 · x do { a � ; observation µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8

  29. α = R : α β = R × R y − 2 · x do { a � ; observation µ : M α do { x � uniform 0 1 ; b � y � uniform 0 1 ; let a = y − 2 · x ; return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8

  30. α = R : α β = R × R y − 2 · x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a + 2 · x < 1 ; let a = y − 2 · x ; return ( x , a + 2 · x ) } return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8

  31. α = R : α β = R × R y − 2 · x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a + 2 · x < 1 ; let a = y − 2 · x ; return ( x , a + 2 · x ) } return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; do { x � uniform 0 1 ; observe 0 < 0 + 2 · x < 1 ; y � uniform 0 1 ; return ( x , 0 + 2 · x ) } return ( x , y ) } prior : M β κ 0 : M β 8

  32. α = R : α β = R × R y / x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a · x < 1 ; let a = y / x ; factor x ; return ( a , ( x , y )) } return ( x , a · x ) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; do { x � uniform 0 1 ; observe 0 < 2 · x < 1 ; y � uniform 0 1 ; factor x ; return ( x , y ) } return ( x , 2 · x ) } prior : M β κ 2 : M β 8

  33. � � do { a � ; � � � � � � � � � � � � � � � � b � � � � � � � � � � � � � � � � � � � � � � � ; � � � � � � � � return ( a , b ) } Measure semantics ⋆ Compositional denotation! ⋆ ⋆ ⋆ Equational reasoning! ⋆ ⋆ ⋆ ⋆ ⋆ Integrator formulation! ⋆ ⋆ ⋆ 8

  34. Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9

  35. Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9

  36. Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9

  37. “fantastic introduction! ” a d ! ★ r e t o u r e a s p l e “ a ★ “very polished!” ★ “loved reading!” ★ ★ “ best written of the last 30 papers I have read!” ★ ★ “deft!” ★ “self contained!” ★ ★ ” ! e l t n e ★ g “ ★ “easy to follow!” ★ ★ “beautifully explained!” 10

  38. “fantastic introduction! ” a d ! ★ r e t o u r e a s p l e “ a ★ “very polished!” ★ “loved reading!” ★ ★ “ best written of the last 30 papers I have read!” ★ “ ★ “deft!” PLDI readers without lots of ★ “self contained!” ★ background in probability theory should be able to follow; ★ ” ! e l t n e ” ★ g “ this is impressive ★ “easy to follow!” ★ ★ “beautifully explained!” 10

  39. � � � � do { a � ; � � � � � � � � � � � � � � � � � � � � ; b � � � � � return ( a , b ) } 11

  40. do { a � ; ; b � return ( a , b ) } 11

  41. do { a � ; ; b � return ( a , b ) } 11

  42. 1. Probabilistic programs denote distributions 2. Exact inference by transforming terms do { a � ; ; b � return ( a , b ) } 11

  43. 1. Probabilistic programs denote distributions 2. Exact inference by transforming terms do { a � ; ; b � return ( a , b ) } 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 11

  44. When it works ◮ y − 2 · x y / x max ( x , y ) … ◮ multivariate Gaussian distributions (for regression and dynamics) ◮ mixtures of distributions (for classifying points and documents) ◮ seismic event detection (Arora et al.) ◮ point masses’ total momentum (Afshar et al.) 12

  45. When it works do { x � · · · ; y � · · · ; ◮ y − 2 · x y / x max ( x , y ) z � · · · ; … return ( f ( x , y , z ) , . . . ) } invertible ◮ multivariate Gaussian distributions (for regression and dynamics) ◮ mixtures of distributions (for classifying points and documents) ◮ seismic event detection (Arora et al.) ◮ point masses’ total momentum (Afshar et al.) 12

  46. Where it helps disintegrate prior posterior 13

  47. Where it helps disintegrate prior posterior . . . inference procedure 13

  48. Where it helps disintegrate prior posterior . . .  maximum likelihood  inference procedure Markov chain Monte Carlo  … 13

  49. Where it helps disintegrate prior posterior . . . disintegrate . . .  maximum likelihood  inference procedure Markov chain Monte Carlo  … 13

  50. Where it helps disintegrate prior posterior . . . disintegrate . . .  maximum likelihood  inference procedure Markov chain Monte Carlo  … 13

  51. Where it helps disintegrate prior posterior . . . disintegrate ( µ � = lebesgue , arrays…) . . .  maximum likelihood  inference procedure Markov chain Monte Carlo  … 13

  52. 1. Probabilistic programs denote distributions 2. Exact inference by transforming terms  dependent variable of regression     noisy measurement of location   : α total momentum of point masses  detected amplitude of seismic event     condition  … … 71.4 disintegrate distribution conditional distribution 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 14

Recommend


More recommend