DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Probabilistic Programming Frank Wood fwood@robots.ox.ac.uk http://www.robots.ox.ac.uk/~fwood MLSS 2014 April, 2014 Reykjavik TA : Yura Perov perov@robots.ox.ac.uk
Other People Who Could Give This Tutorial van de Meent Paige Mansinghka Pfeffer Perov Wingate Goodman Ritchie Stuhlmüller Russell Roy And others, with apologies …
What is Probabilistic Programming? Computer Science Statistics Probabilistic Programming θ Parameters Parameters p ( X | θ ) p ( θ ) Program Program X Output Observations
Overarching Goals Accelerate iteration over models (i) Inference is automatic - Writing generative code is easier than deriving model inverses - Lower technical barrier of entry to development of new models - Accelerate iteration over inference procedures (ii) Computer language is an abstraction barrier - Inference procedures can be tested against a library of models - Inference procedures become “compiler optimizations” - (iii) Enable development of more expressive models Probabilistic programs can express a superset of graphical models - Modern machine learning models are tens of lines of code -
Tutorial Outline § Programming § Understanding § Practicum / Bayesian Nonparametrics
Programming § Systems § Problem Template § Syntax § Semantics § Simple examples § Interpreting output § Limitations § Demonstration § Exercises
Systems § Application driven § BUGS [Spiegelhalter et al, 1996] § STAN [Stan Dev. Team, 2013] § Infer.NET [Minka, Winn et al, 2010] § Other § IBAL/Figaro [Pfeffer, 2001/2009] § BLOG [Milch et al, 2004] § Turing-complete § Church [Goodman, Mansinghka, et al, 2008/2012] § Random Database [Wingate, Stuhlmüller et al, 2011] § Anglican [W. et al, AISTATS, 2014] § Probabilistic-C [Paige, W., to appear @ICML, 2014] § Venture [Mansinghka, et al, arXiv, 2014] And others, with apologies …
Perov van de Meent DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Mansinghka e λ Anglican A “Church” of England “Venture” http://www.robots.ox.ac.uk/~fwood/anglican/ Please report bugs to https://bitbucket.org/fwood/anglican/issues W., van de Meent, Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014
Anglican § Applicability § Turing-complete probabilistic research programming language § Supports accurate inference in programs that make use of complex control flow, including stochastic recursion, and primitives from Bayesian nonparametric statistics § Actually useful now for small models! § Introduced Particle MCMC for prob. prog. inference § Theory suggests PMCMC, particularly particle Gibbs, has nice theoretical convergence properties * § Probabilistic programming violates most assumptions § Improved performance over a wide variety of programs anyway § Opens path to massive scalability § Very simple to implement § Requires simple machine layer abstraction * Andrieu, Lee, and Vihola , Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers, 2013
Paige Next Step : Probabilistic-C #include "probabilistic.h" #define K 3 #define N 11 /* Markov transition matrix */ static double T[K][K] = { { 0.1, 0.5, 0.4 }, { 0.2, 0.2, 0.6 }, { 0.15, 0.15, 0.7 } }; /* Observed data */ static double data[N] = { NAN, .9, .8, .7, 0, -.025, -5, -2, -.1, 0, 0.13 }; /* Prior distribution on initial state */ static double initial_state[K] = { 1.0/3, 1.0/3, 1.0/3 }; /* Per-state mean of Gaussian emission distribution */ static double state_mean[K] = { -1, 1, 0 }; /* Generative program for a HMM */ int main(int argc, char **argv) { int states[N]; for (int n=0; n<N; i++) { states[n] = (n==0) ? discrete_rng(initial_state, K) : discrete_rng(T[states[n-1]], K); if (n > 0) { observe (normal_lnp(data[n], state_mean[states[n]], 1)); } p r e d i c t f ("state[%d],%d\n", n, states[n]); } return 0; } Paige and W. “A Compilation Target for Probabilistic Programming Languages.” ICML, 2014
Paige Probabilistic-C = Compiled PMCMC ≈ 100 × Speedup § HMM 10-states, 50 observations § CRP 10 observation mixture of 1-D Gaussian Ritchie Compiled MH - https://github.com/dritchie/probabilistic-js
Paige Systems Research Path to Scalability Time to produce 10,000 samples running probabilistic-C HMM code on multi-core EC2 instances with identical processor type while varying number of particles (bars). Both more cores and more particles eventually degrade performance suggesting the existence of system optimizations for high performance probabilistic programming inference.
Mansinghka Venture http://probcomp.csail.mit.edu/venture/ § Programming Language and Platform § Interactive § Programmable Inference § Compositional language for custom inference strategies § Path to scalability § Efficient execution trace re-use § Details § Introduced “directive” syntax and semantics § Tight Python integration § Syntax inspired Anglican’s; semantics currently differ slightly Mansinghka, Selsam, and Perov “Venture: a higher-order probabilistic programming platform with programmable inference” arXiv, 2014
Problem Template § Deterministic simulator exists as code § Parameter uncertainties exist § Varying parameters to simulator = stochastic simulator § What to do with observations? § Update estimates of parameters § Posterior predictions
Houlsby Example : Jack-Up Units 60m Keppel FELS Maersk Keppel FELS Slide from Houlsby
Jack-up operations Float Lower Light Preload Dump Climb to Storm to site legs ship load preload air-gap and operate sketches after Poulos (1988) Slide from Houlsby
Spudcan Simulator + Probabilistic-C -> Inference § Deterministic simulation § ~750 lines of C code § 10-100’s of parameters § Black-box § Not differentiable § Stochastic simulation § +150 lines of C code § Priors on parameters § Automatic inference § +15 lines of Probabilistic-C § ~1000 samples / second
Parameter Posterior vs. Expert Undrained strength (kPa) 0 20 40 60 80 100 120 0 5 10 15 20 Depth (m) 25 30 UU 35 Mini Vane Torvane 40 Pocket penetrometer Expert's fit 45 Probabilistic Programming 50
Inverse Graphics via Venture • Fits Template • Generative scene model as program • Deterministic simulator (renderer) • Automatic inversion (a) (b) (c) (d) (e) Mansinghka, Kulkarni, Perov, and Tenenbaum “Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs.” NIPS, 2013
Basic Probabilistic Programming Concepts § Procedures “sample” § Programs are generative models § Mixed deterministic and stochastic procedures § Not generally differentiable w.r.t. parameters § No factor graph correspondence in general § May have nondeterministic random variable cardinality
Writing Probabilistic Programs § Syntax § Directives § Expressions § Semantics § Via examples
Syntax : Directives [assume symbol expr] [observe expr value] [predict expr] [observe-csv url csv-expr csv-value] [import url] http://www.robots.ox.ac.uk/~fwood/anglican/language/
Syntax : Expressions (Lisp/Scheme) expr = literal | symbol | list literal = boolean | long | rational | double | string | nil list = () | (keyword & exprs) | (proc & exprs) keyword = quote | define | lambda | if | cond | let | begin http://www.robots.ox.ac.uk/~fwood/anglican/language/
Keyword : quote 'expr <=> (quote expr) => expr A quoted expression. Yields an unevaluated expression. http://www.robots.ox.ac.uk/~fwood/anglican/language/
Keyword : lambda (lambda (& symbols) body) => compound procedure (lambda symbol body) => compound procedure Constructs a compound procedures. Example ((lambda (n m) (* (+ n 1) m)) 1 2) => 4 http://www.robots.ox.ac.uk/~fwood/anglican/language/
Keyword : if (if bool-expr cons-expr alt-expr) Example (if (= 1 1) "the predicate is true" "the predicate is false") => "the predicate is true" http://www.robots.ox.ac.uk/~fwood/anglican/language/
Keywords : cond, let, begin (cond (pred-1 cons-1) (pred-2 cons-2) (else alt)) (let ((a 1) (b 2)) (prn "hello world") (+ a b)) (begin & exprs) http://www.robots.ox.ac.uk/~fwood/anglican/language/
Primitives tests: nil?, some?, symbol?, number?, ratio?, long?, float?, boolean?, even?, odd?, proc? relational: and, or, not=, =, >, >=, <, <= casting: long, double, boolean, str, read-string sequences: list, car, cdr, first, second, nth, rest, count, cons, unique arithmetic: +, -, *, / math: log, log10, exp, pow, sqrt, cbrt, floor, ceil, round, rint, abs, signum, sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, inc, dec, mod, sum, cumsum, mean, normalize io: prn http://www.robots.ox.ac.uk/~fwood/anglican/language/
Recommend
More recommend