Stan: Probabilistic Modeling Language, MCMC Sampler, and Optimizer Development Team: Andrew Gelman, Bob Carpenter , Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell MCMski 2014 mc-stan.org
Goals / Aims • Scalability – model complexity, number of parameters, data size • Efficiency – fast iterations, low memory, high effective sample sizes • Robustness – numerical routines, model structure (i.e., posterior geometry) • Usability – general purpose, clear modeling language, integration (R, Python, command line), expose log prob & gradients/Hessians & I/O
History • Derived from BUGS • declarative → imperative • untyped → strong static typing • Gibbs sampling → adaptive (R)HMC & optimization • interpreted → compiled • restrictive licenses (proprietary/GPL) → liberal (BSD)
Technical Implementation • Model Specification – (trans) data, (trans) parameters, log prob, generated quantities • Sampling via Adaptive Hamiltonian Monte Carlo – warmup converges & estimates mass matrix and step size – (Geo)NUTS adapts number of steps • Optimization via BFGS Quasi-Newton • Translated to C++ with Template Metaprogramming – constraints to transforms + Jacobians; declarations to I/O – automatic differentitation for gradients & Hessians – custom probability and special functions
Strengths • high effective sample size/second (HMC / RHMC) • expressive language vs. BUGS; extensible like JAGS • extensive doc & example models • active, helpful user community • large, diverse development team • integrated into R, Python, command-line (shell) • reusable template lib (auto-diff, distributions & funs, models)
Limitations • no discrete parameters (can marginalize) • no implicit missing data (code as parameters) • not parallelized within chains • language limited relative to black boxes (cf., emcee) • limited data types and constraints • C++ template code is complex for user extension • sampling slow, nonscalable; optimization brittle or approx
Current and Future Development • (stiff) diff eq solving by integration • Riemann manifold HMC (more complex geometry) • approximate inference: [stochastic] VB, EP , max marginal • structured matrices: Cholesky correlation, sparse • L-BFGS optimization (more scalable) • more robust adaptation (cross chain?) • parallelization within and across chains • better probabilistic testing for correctness • faster, cleaner C++ code & more useful interfaces
How Stan Got its Name • “Stan” is not an acronym; Gelman mashed up 1. Eminem song about a stalker fan, and 2. Stanislaw Ulam (1909–1984), co-inventor of Monte Carlo method (and hydrogen bomb). Ulam holding the Fermiac, Enrico Fermi’s physical Monte Carlo simulator for random neutron diffusion
Recommend
More recommend