semantics for probabilistic programming
play

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes - PowerPoint PPT Presentation

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) 2 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) Bayesian reasoning: predict future, based on model and


  1. Semantics for Probabilistic Programming Chris Heunen 1 / 27

  2. Bayes’ law P ( A | B ) = P ( B | A ) × P ( A ) P ( B ) 2 / 27

  3. Bayes’ law P ( A | B ) = P ( B | A ) × P ( A ) P ( B ) Bayesian reasoning: ◮ predict future, based on model and prior evidence ◮ infer causes, based on model and posterior evidence ◮ learn better model, based on prior model and evidence 2 / 27

  4. Bayesian networks 3 / 27

  5. Bayesian inference 4 / 27

  6. Bayesian data modelling 1. Develop probabilistic (generative) model 2. Design inference algorithm for model 3. Use algorithm to fit model to data Example: find effect of drug on patient, given data 5 / 27

  7. Linear regression Generative model s ∼ normal ( 0 , 2 ) ∼ normal ( 0 , 6 ) b f ( x ) = s · x + b y i = normal ( f ( i ) , 0 . 5 ) for i = 0 . . . 6 Conditioning y 0 = 0 . 6 , y 1 = 0 . 7 , y 2 = 1 . 2 , y 3 = 3 . 2 , y 4 = 6 . 8 , y 5 = 8 . 2 , y 6 = 8 . 4 Predict f 6 / 27

  8. Linear regression 7 / 27

  9. Probabilistic programming 1. Develop probabilistic (generative) model Write a program 2. Design inference algorithm for model 2. Use built-in algorithm to fit model to data 8 / 27

  10. Probabilistic programming 1. Develop probabilistic (generative) model Write a program 2. Design inference algorithm for model 2. Use built-in algorithm to fit model to data P ( A | B ) ∝ P ( B | A ) × P ( A ) posterior ∝ likelihood × prior functional programming + observe + sample 8 / 27

  11. Probabilistic programming 1. Develop probabilistic (generative) model Write a program 2. Design inference algorithm for model 2. Use built-in algorithm to fit model to data P ( A | B ) ∝ P ( B | A ) × P ( A ) posterior ∝ likelihood × prior functional programming + observe + sample 8 / 27

  12. Linear regression (defquery Bayesian-linear-regression (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) (predict :f f))) 9 / 27

  13. Linear regression 10 / 27

  14. Linear regression 11 / 27

  15. Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 12 / 27

  16. Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 A measurable space is a set X with a family Σ X of subsets that is closed under countable unions and complements A (probability) measure on X is a function p : Σ X → [ 0 , ∞ ] that satisfies p ( � U n ) = � p ( U n ) (and has p ( X ) = 1) 12 / 27

  17. Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 A measurable space is a set X with a family Σ X of subsets that is closed under countable unions and complements A (probability) measure on X is a function p : Σ X → [ 0 , ∞ ] that satisfies p ( � U n ) = � p ( U n ) (and has p ( X ) = 1) A function f : X → Y is measurable if f − 1 ( U ) ∈ Σ X for U ∈ Σ Y A random variable is a measurable function R → X 12 / 27

  18. Function types Z × X f × id X ˆ f [ X → Y ] × X Y ev 13 / 27

  19. Function types Z × X f × id X ˆ f [ X → Y ] × X Y ev [ R → R ] cannot be a measurable space! 13 / 27

  20. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: 14 / 27

  21. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant 14 / 27

  22. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable 14 / 27

  23. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n 14 / 27

  24. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n A morphism is a function f : X → Y with f ◦ α ∈ M Y if α ∈ M X 14 / 27

  25. Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n A morphism is a function f : X → Y with f ◦ α ∈ M Y if α ∈ M X ◮ has product types ◮ has sum types ◮ has function types! M [ X → Y ] = { α : R → [ X → Y ] | ˆ α : R × X → Y morphism } 14 / 27

  26. Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X 15 / 27

  27. Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X Qbs Set ⊤ ( X , { α : R → X } ) X ( X , M X ) X 15 / 27

  28. Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X Qbs Set ⊤ ( X , { α : R → X } ) X ( X , M X ) X Meas ⊤ Qbs ( X , Σ X ) ( X , { α : R → X measurable } ) ( X , { U | ∀ α ∈ M X : α − 1 ( U ) measurable } ) ( X , M X ) 15 / 27

  29. Distribution types A measure on a quasi-Borel space ( X , M X ) consists of ◮ α ∈ M X and ◮ a probability measure µ on R Two measures are identified when they induce the same µ ( α − 1 ( − )) 16 / 27

  30. Distribution types A measure on a quasi-Borel space ( X , M X ) consists of ◮ α ∈ M X and ◮ a probability measure µ on R Two measures are identified when they induce the same µ ( α − 1 ( − )) Gives monad ◮ P ( X , M X ) = { ( α, µ ) measure on ( X , M X } / ∼ ◮ return x = [ λ r . x , µ ] ∼ for arbitrary µ ◮ bind uses integral � � f d ( α, µ ) := ( f ◦ α ) d µ if f : ( X , M X ) → R for distribution types 16 / 27

  31. Example: facts about distributions � � let x = sample(gauss(0.0,1.0)) = � sample(bern(0.5)) � in return (x<0) 17 / 27

  32. Example: importance sampling � � sample(exp(2)) � let x = sample(gauss(0,1))) � = observe(exp-pdf(2,x)/gauss-pdf(0,1,x)); return x 18 / 27

  33. Example: conjugate priors � let x = sample(beta(1,1)) � � observe(bern(0.5), true); � = in observe(bern(x), true); let x = sample(beta(2,1)) return x in return x 19 / 27

  34. Linear regression (defquery Bayesian-linear-regression Prior: (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] Likelihood: (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) Posterior: (predict :f f))) 20 / 27

  35. Linear regression: prior Define a prior measure on [ R → R ] � (let [f (let [s (sample (normal 0.0 3.0)) � b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] = [ α, ν ⊗ ν ] ∼ ∈ P ([ R → R ]) where ν is normal distribution, mean 0 and standard deviation 3, and α : R × R → [ R → R ] is ( s , b ) �→ λ r . sr + b 21 / 27

  36. Linear regression: likelihood Define likelihood of observations (with some noise) � � (observe (normal (f 1.0) 0.5) 2.5) � � (observe (normal (f 2.0) 0.5) 3.8) � � � � (observe (normal (f 3.0) 0.5) 4.5) � � (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) = d ( f ( 1 ) , 2 . 5 ) · d ( f ( 2 ) , 3 . 8 ) · d ( f ( 3 ) , 4 . 5 ) · d ( f ( 4 ) , 6 . 2 ) · d ( f ( 5 ) , 8 . 0 ) where f free variable of type [ R → R ] , and d : R 2 → [ 0 , ∞ ) is density of normal distribution with standard deviation 0.5 � 2 /π exp( − 2 ( x − µ ) 2 ) d ( µ, x ) = 22 / 27

  37. Linear regression: Posterior Normalise combined prior and likelihood � (predict :f f))) � ∈ P ([ R → R ]) 23 / 27

  38. Piecewise linear regression: Posterior Normalise combined prior and likelihood � (predict :f f))) � ∈ P ([ R → R ]) 24 / 27

  39. Modular inference algorithms An inference representation is monad ( T , return , ≫ =) with T X → P X , sample : 1 → T [ 0 , 1 ] , score : [ 0 , ∞ ) → T 1. ◮ Discrete weighted sampler (e.g. coin flip) ◮ Continuous sampler 25 / 27

Recommend


More recommend