Observation, inference, and query in core Hakaru 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5
Observation, inference, and query in core Hakaru 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5
Observation, inference, and query in core Hakaru ambiguous 1 1 y = 2 · x observation 0 1 0 1 m 2 = do { x � uniform 0 1 ; m 0 = do { x � uniform 0 1 ; y � uniform 0 1 ; y � uniform 0 1 ; observe y = 2 · x ; return ( x , y ) } return ( x , y ) } � � m 2 x d ( x , y ) y = 2 · x · x d ( x , y ) y = 2 · x · 1 d ( x , y ) = 0 m 0 E m 2 ( λ ( x , y ) . x ) = m 2 1 d ( x , y ) = � � 0 m 0 5
Observation of measure-zero sets is paradoxical 1 y = 2 · x 0 1 6
Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 6
Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 6
Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 1 1 y = 2 · x 0 1 0 1 6
Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 4 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 3 6
Observation of measure-zero sets is paradoxical 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 4 1 1 y = 2 · x 0 1 0 1 E ( x ) = 1 / 3 6
Resolving the paradox via disintegration 1 1 y − 2 · x @ 0 0 1 0 1 E ( x ) = 1 / 4 1 1 y / x @ 2 0 1 0 1 E ( x ) = 1 / 3 6
Resolving the paradox via disintegration 1 1 y − 2 · x @ 0 0 1 0 1 E ( x ) = 1 / 4 1 1 y / x @ 2 0 1 0 1 E ( x ) = 1 / 3 6
Resolving the paradox via disintegration prior posterior 6
Resolving the paradox via disintegration disintegrate prior posterior 6
Resolving the paradox via disintegration disintegrate prior posterior 6
Resolving the paradox via disintegration disintegrate prior posterior Soundness: If the disintegrator succeeds then the result is correct. 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 6
Specifying disintegration by semantics disintegrate 7
Specifying disintegration by semantics disintegrate 7
Specifying disintegration by semantics disintegrate ξ : M ( α × β ) 7
Specifying disintegration by semantics µ : M α disintegrate κ : α → M β ξ : M ( α × β ) 7
Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ : α → M β ξ : M ( α × β ) 7
Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ : α → M β ξ : M ( α × β ) 7
Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) 7
Specifying disintegration by semantics µ : M α disintegrate ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) 7
Specifying disintegration by semantics do { a � ; µ : M α b � ; ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) return ( a , b ) } 7
Specifying disintegration by semantics do { a � ; µ : M α b � ; ξ = µ ⊗ κ κ a : M β ξ : M ( α × β ) return ( a , b ) } 7
do { a � ; µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } 8
α = R β = R × R do { a � ; µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8
α = R : α β = R × R y − 2 · x do { a � ; observation µ : M α b � ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8
α = R : α β = R × R y − 2 · x do { a � ; observation µ : M α do { x � uniform 0 1 ; b � y � uniform 0 1 ; let a = y − 2 · x ; return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8
α = R : α β = R × R y − 2 · x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a + 2 · x < 1 ; let a = y − 2 · x ; return ( x , a + 2 · x ) } return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; y � uniform 0 1 ; return ( x , y ) } prior : M β 8
α = R : α β = R × R y − 2 · x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a + 2 · x < 1 ; let a = y − 2 · x ; return ( x , a + 2 · x ) } return ( a , ( x , y )) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; do { x � uniform 0 1 ; observe 0 < 0 + 2 · x < 1 ; y � uniform 0 1 ; return ( x , 0 + 2 · x ) } return ( x , y ) } prior : M β κ 0 : M β 8
α = R : α β = R × R y / x do { a � ; lebesgue observation µ : M α do { x � uniform 0 1 ; do { x � uniform 0 1 ; b � y � uniform 0 1 ; observe 0 < a · x < 1 ; let a = y / x ; factor x ; return ( a , ( x , y )) } return ( x , a · x ) } ; ξ : M ( α × β ) κ a : M β return ( a , b ) } do { x � uniform 0 1 ; do { x � uniform 0 1 ; observe 0 < 2 · x < 1 ; y � uniform 0 1 ; factor x ; return ( x , y ) } return ( x , 2 · x ) } prior : M β κ 2 : M β 8
� � do { a � ; � � � � � � � � � � � � � � � � b � � � � � � � � � � � � � � � � � � � � � � � ; � � � � � � � � return ( a , b ) } Measure semantics ⋆ Compositional denotation! ⋆ ⋆ ⋆ Equational reasoning! ⋆ ⋆ ⋆ ⋆ ⋆ Integrator formulation! ⋆ ⋆ ⋆ 8
Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9
Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9
Integrator semantics integrand � �� � � M α � = ( � α � → R ) → R � 1 � uniform 0 1 � = λ f . f ( x ) dx 0 � ∞ � lebesgue � = λ f . f ( x ) dx −∞ � return ( x , y ) � = λ f . f ( x , y ) � do { x � m ; M } � = λ f . � m � ( λ x . � M � f ) � � do { x � uniform 0 1 ; � 1 � 1 � � y � uniform 0 1 ; = λ f . f ( x , y ) dy dx return ( x , y ) } 0 0 9
“fantastic introduction! ” a d ! ★ r e t o u r e a s p l e “ a ★ “very polished!” ★ “loved reading!” ★ ★ “ best written of the last 30 papers I have read!” ★ ★ “deft!” ★ “self contained!” ★ ★ ” ! e l t n e ★ g “ ★ “easy to follow!” ★ ★ “beautifully explained!” 10
“fantastic introduction! ” a d ! ★ r e t o u r e a s p l e “ a ★ “very polished!” ★ “loved reading!” ★ ★ “ best written of the last 30 papers I have read!” ★ “ ★ “deft!” PLDI readers without lots of ★ “self contained!” ★ background in probability theory should be able to follow; ★ ” ! e l t n e ” ★ g “ this is impressive ★ “easy to follow!” ★ ★ “beautifully explained!” 10
� � � � do { a � ; � � � � � � � � � � � � � � � � � � � � ; b � � � � � return ( a , b ) } 11
do { a � ; ; b � return ( a , b ) } 11
do { a � ; ; b � return ( a , b ) } 11
1. Probabilistic programs denote distributions 2. Exact inference by transforming terms do { a � ; ; b � return ( a , b ) } 11
1. Probabilistic programs denote distributions 2. Exact inference by transforming terms do { a � ; ; b � return ( a , b ) } 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 11
When it works ◮ y − 2 · x y / x max ( x , y ) … ◮ multivariate Gaussian distributions (for regression and dynamics) ◮ mixtures of distributions (for classifying points and documents) ◮ seismic event detection (Arora et al.) ◮ point masses’ total momentum (Afshar et al.) 12
When it works do { x � · · · ; y � · · · ; ◮ y − 2 · x y / x max ( x , y ) z � · · · ; … return ( f ( x , y , z ) , . . . ) } invertible ◮ multivariate Gaussian distributions (for regression and dynamics) ◮ mixtures of distributions (for classifying points and documents) ◮ seismic event detection (Arora et al.) ◮ point masses’ total momentum (Afshar et al.) 12
Where it helps disintegrate prior posterior 13
Where it helps disintegrate prior posterior . . . inference procedure 13
Where it helps disintegrate prior posterior . . . maximum likelihood inference procedure Markov chain Monte Carlo … 13
Where it helps disintegrate prior posterior . . . disintegrate . . . maximum likelihood inference procedure Markov chain Monte Carlo … 13
Where it helps disintegrate prior posterior . . . disintegrate . . . maximum likelihood inference procedure Markov chain Monte Carlo … 13
Where it helps disintegrate prior posterior . . . disintegrate ( µ � = lebesgue , arrays…) . . . maximum likelihood inference procedure Markov chain Monte Carlo … 13
1. Probabilistic programs denote distributions 2. Exact inference by transforming terms dependent variable of regression noisy measurement of location : α total momentum of point masses detected amplitude of seismic event condition … … 71.4 disintegrate distribution conditional distribution 1. Motivate by puzzle 2. Specify by semantics 3. Implement by derivation 14
Recommend
More recommend