Probabilistic Programming Hongseok Yang University of Oxford

Manchester Univ. 1953

Manchester Univ. 1953 Manchester Univ. Computer. Produced by Strachey’s “Love Letter” (1952) Generated by the reimplementation in http://www.gingerbeardman.com/loveletter/

Strachey’s program Implements a simple randomised algorithm: 1. Randomly pick two opening words. 2. Repeat the following five times: • Pick a sentence structure randomly. • Fill the structure with random words. 3. Randomly pick closing words.

1. More randomness. Strachey’s Program 2. Adjust randomness. Use data. Implements a simple randomised algorithm: 1. Randomly pick two opening words. random N times 2. Repeat the following five times: • Pick a sentence structure randomly. • Fill the structure with random words. 3. Randomly pick closing words.

What is probabilistic programming?

(Bayesian) probabilistic modelling of data 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

(Bayesian) probabilistic modelling of data in a prob. prog. language 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

(Bayesian) probabilistic modelling of data in a prob. prog. language as a program 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data.

(Bayesian) probabilistic modelling of data in a prob. prog. language as a program 1. Develop a new probabilistic (generative) model. 2. Design an inference algorithm for the model. 3. Using the algo., fit the model to the data. a generic inference algo. of the language

Line fitting Y X

Line fitting Y f(x) = s*x + b X

Bayesian generative model s y i b i=1..5

Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) gi ven y 1 .. y 5 ?

Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y 1 .. y 5 ?

Bayesian generative model s y i b i=1..5 s ~ normal(0, 10) b ~ normal(0, 10) f(x) = s*x + b y i ~ normal(f(i), 1) where i = 1 .. 5 Q: posterior of (s,b) given y 1 =2.5, …, y 5 =10.1?

Posterior of s and b given y i 's P(y 1 , .., y 5 | s,b) × P(s,b) P(s, b | y 1 , .., y 5 ) = P(y 1 , .., y 5 )

Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) f))

Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :f f))

Anglican program (let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

Samples from posterior Y X

Why should one care about prob. programming?

My favourite answer “Because probabilistic programming is a good way to build an AI.” (My ML colleague)

Procedural modelling SOSMC-Controlled Sampling Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

Procedural modelling Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

Procedural modelling Asynchronous function call via future Ritchie, Mildenhall, Goodman, Hanrahan [SIGGRAPH’15]

Captcha solving 2011 ( ) o ( ), ). 3 Figu Noise: N Le, Baydin, Wood [2016]

Compilation Inference Training data Test data � � � � ) ; y � � ) g y Probabilistic program p � � ; y ) NN architecture SIS Compilation artifact q � � j y ; � ) Training � Posterior D KL � p � � j y ) jj p � � j y ) q � � j y ; � )) Expensive / slow Cheap / fast Le, Baydin, Wood [2016]

Approximating prob. programs by neural nets. Compilation Inference Training data Test data � � � � ) ; y � � ) g y Probabilistic program p � � ; y ) NN architecture SIS Compilation artifact q � � j y ; � ) Training � Posterior D KL � p � � j y ) jj p � � j y ) q � � j y ; � )) Expensive / slow Cheap / fast Le, Baydin, Wood [2016]

Nonparametric Bayesian: Indian buffer process (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Roy et al. 2008

Nonparametric Bayesian: Indian buffer process (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Lazy infinite array Roy et al. 2008

Nonparametric Bayesian: Higher-order Indian buffer process parameter (define (ibp-stick-breaking-process concentration base-measure) (let ((sticks (mem (lambda j (random-beta 1.0 concentration)))) (atoms (mem (lambda j (base-measure))))) (lambda () (let loop ((j 1) (dualstick (sticks 1))} (append (if (flip dualstick) ;; with prob. dualstick (atoms j) ;; add feature j ’()) ;; otherwise, next stick (loop (+ j 1) (* dualstick (sticks (+ j 1)))) )))))) Roy et al. 2008

My research : Denotational semantics Joint work with Chris Heunen, Ohad Kammar, Sam Staton, Frank Wood [LICS 2016]

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b]))

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f)

(let [s (sample (normal 0 10)) b (sample (normal 0 10)) f (fn [x] (+ (* s x) b))] (observe (normal (f 1) 1) 2.5) (observe (normal (f 2) 1) 3.8) (observe (normal (f 3) 1) 4.5) (observe (normal (f 4) 1) 8.9) (observe (normal (f 5) 1) 10.1) (predict :sb [s b])) (predict :f f) Generates a random function of type R → R. But its mathematical meaning is not clear.

Measurability issue • Measure theory is the foundation of probability theory that avoids paradoxes. • Silent about high-order functions. • [Halmos] ev(f,a) = f(a) is not measurable. • The category of measurable sets is not CCC. • But Anglican supports high-order functions.

Use category theory to extend measure theory. Meas Monad Meas

Use category theory to extend measure theory. Yoneda Embedding [Meas op , Set] ∏ Meas Monad [Meas op , Set] ∏ Meas Yoneda Embedding

Use category theory to extend measure theory. Yoneda Embedding [Meas op , Set] ∏ Meas Left Kan Monad Extension [Meas op , Set] ∏ Meas Yoneda Embedding

Probabilistic Programming Hongseok Yang University of Oxford - PowerPoint PPT Presentation

Probabilistic Programming Hongseok Yang University of Oxford Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. 1953 Manchester Univ. Computer. Produced by Stracheys Love Letter (1952) Generated by

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

A Brief Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago Universidade

Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago BISS 2014, Bertinoro

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven

Observational constraints on the standard cosmological model and beyond L. Sriramkumar

PHYSICAL REVIEW 0 VOLUME 47, NUMBER 11 1 JUNE 1993 sector in Zy production Probing the weak-boson

Agenda Web and Widgets should be the same. Really? Application / actor identity

Face Tracking Tracking and Person and Person Face Action Recognition Recognition Action

ORIE 7791: Spring 2009 Monte Carlo Methods Guozhang Wang May 2, 2009 1 Motivation 1.1

Comparison of Systems CS/ECE 541 1 1. Stochastic Ordering Let X and Y be random variables. We

Analysis of continuous strict local martingales via h-transforms Soumik Pal and Philip Protter

Direct Kernel Partial Least Squares (DK-PLS): Feature Selection with Sensitivity Analysis Mark J.