outline
play

Outline 1. Motivation 2. Gaussian process introduction 3. Change - PDF document

5/22/2016 Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces April 18, 2016 William Herlands Committee: Daniel Neill, Alex Smola, Wilbert Van Panhuis Chair: Dave Choi Outline 1. Motivation 2. Gaussian process


  1. 5/22/2016 Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces April 18, 2016 William Herlands Committee: Daniel Neill, Alex Smola, Wilbert Van Panhuis Chair: Dave Choi Outline 1. Motivation 2. Gaussian process introduction 3. Change surface model 4. Analysis of measles in the United States 1

  2. 5/22/2016 Complex Changes • In human systems changes are often complex – Policy interventions take time to trickle through government bureaucracy – Environmental hazards affect populations differentially • Simple changepoint models are not sufficiently expressive Why do we care? • Understand past changes – Explore spatio-temporal heterogeneity – Model the rate of changes in different areas • Enable more accurate or equitable policies • Applications – Measles incidence in the U.S – Concerns about lead-tainted 08 Jul 2014 21 Oct 2015 water in NYC 2

  3. 5/22/2016 Our objectives • Model complex changes in real world data – Multiple, flexible function Gaussian processes regimes for flexible functions – Non-discrete changes – Non-monotonic changes “Change surfaces” for complex changes – Heterogeneous changes over space, time, etc. Gaussian Processes (GP) • Non-parametric prior over smooth functions f ( x ) ~ GP ( m ( x ), k ( x , x ')) m ( x )  E [ f ( x )] k ( x , x ')  cov( f ( x ), f ( x ')) • Covariance function is a kernel. Defines the covariance of function values 3

  4. 5/22/2016 Gaussian Processes (GP) • Any finite set of f( x ) is Normally distributed   ~ N ( m ( x ), K ) f ( x 1 ),..., f ( x m ) • Observation model  ~ N (0,   ) y ( x )  f ( x )   , • Marginal log likelihood optimization log p ( y |  )  log | K    I |  y T ( K    I )  1 y Full Model • Our model is a convex combination of f i __ __ __ __ y ( x )  s 1 ( x ) f 1 ( x )  ...  s r ( x ) f r ( x )   n Switching functions Functional regimes i ( x )   r s 4

  5. 5/22/2016 Model part 1: Functional Regimes • GP prior for each functional regime – Use flexible stationary kernels f i ~ GP (0, K i ), i  1,..., r Model part 2: Change Surfaces • Changepoint 1 i  I ( t  T s i ) 0.5 0 −10 0 10 • Non-discrete changepoint 1 i  softmax( t  T s i ) 0.5 0 • Change surface −10 0 10 i  softmax( w 1 s i ( t )) 0.5 i   ( w i ( t )) s 0 −10 0 10 5

  6. 5/22/2016 Model part 2: Change Surfaces w i ( x ) • Random Kitchen Sink features for – Variable rate of change – Non-monotonic – Heterogeneous over input q  w i ( x )  cos(  j T x  b j ) a j j  1 Full Model • Gaussian process change surface model r  y ( x )   ( w i ( x )) f i ( x )   n ______ i  1 f i ( x ) ~ GP (0, K i ) • Can depict this as a single Gaussian process with covariance function r  k all ( x , x ')   ( w i ( x )) k i ( x , x ')  ( w i ( x ')) i  1 6

  7. 5/22/2016 Scalable Inference • Log likelihood naively O(n 3 ) log p ( y |  )  log | K    I |  y T ( K    I )  1 y • We develop scalable Kronecker inference using the Weyl bound, O(Dn D+1/D ) Measles in the United States • Data – Monthly incidence rates 1935 – 2003 – Continental United States and D.C. x   3 , 2D space and 1D time – – Measles vaccine introduced in 1963 7

  8. 5/22/2016 Measles in 3 states California 1 300 Incidence (1000s) CA 200 s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Maine 1 200 Incidence (1000s) ME s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Michigan 500 1 Incidence (1000s) 400 MI 300 s(w(x)) 0.5 200 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Measles in 3 states California 1 300 Incidence (1000s) CA 200 s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Maine 1 200 Incidence (1000s) ME s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Michigan 500 1 Incidence (1000s) 400 MI 300 s(w(x)) 0.5 200 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 8

  9. 5/22/2016 Measles in 3 states • GP change surface – 2 functional regimes w i ( x ) – as RKS with 5 features • Not a causal model! Measles in 3 states California 300 1 Incidence (1000s) CA 200 s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Maine 1 200 Incidence (1000s) ME s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Michigan 500 1 Incidence (1000s) 400 MI 300 s(w(x)) 0.5 200 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 9

  10. 5/22/2016 Measles in 3 states California 300 1 Incidence (1000s) CA 200 s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Maine 1 200 Incidence (1000s) ME s(w(x)) 0.5 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Michigan 500 1 Incidence (1000s) 400 MI 300 s(w(x)) 0.5 200 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 Measles in 3 states “Change slope” from σ(w(x)) = 0.25  0.75 . Michigan 500 1 Incidence (1000s) 400 “Change date” per state 300 s(w(x)) MI 0.5 σ(w(x)) = 0.5 200 100 0 0 1940 1940 1950 1950 1960 1960 1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 10

  11. 5/22/2016 Change date for measles in U.S. 1961.5 1967.2 For each state, date where σ(w(x)) = 0.5 Change slope for measles in U.S. 0.156 0.297 For each state, slope of σ(w(x)) = 0.75  0.25 11

  12. 5/22/2016 Regression Analysis • Explore factors that affect the change date – Birth and death rates – Population numbers per age segment – Income information – Government hospital and health workers – Slope of change surface – Average temperature Demographic Analysis 12

  13. 5/22/2016 Regression Analysis • Gini of family income 1961.5 1967.2 – Economically depressed communities – Rural regions • Slope of change surface – Fewer cases nationwide enable more effective immunization later Conclusions • Introduced model for “change surfaces” in real world data • Developed scalable inference for additive, non-stationary Gaussian processes • Identified heterogeneity in first years of the measles vaccine • Used the results of the change surface model for policy relevant conclusions 13

  14. 5/22/2016 Acknowledgements • Committee – Daniel Neill, Alex Smola, Wilbert van Panhuis • Chair – Dave Choi • Collaborators* – Andrew Wilson – Seth Flaxman – Hannes Nickisch *Subset of paper accepted to AISTATS 2016 Questions? Fin. 28 14

  15. 5/22/2016 Backup slides Conclusions • Introduced model for “change surfaces” in real world data • Developed scalable inference for additive, non-stationary Gaussian processes • Identified heterogeneity in first years of the measles vaccine • Used the results of the change surface model for policy relevant conclusions 15

  16. 5/22/2016 Spectral Mixture Kernels Inference • Compute log marginal likelihood • General Kronecker methods for scalability – Assume: – Assume: multiplicative kernel across D – Then we can decompose kernel matrix, 16

  17. 5/22/2016 Inference • For additive kernels • K -1 can be computed efficiently using LCG* • But how can we compute the log|K| ? *See Flaxman et al. (2015) Inference 17

  18. 5/22/2016 Inference • Choosing indices i, j Method Complexity Minimization for best pair O(n 2 ) “Middle” heuristic i=j O(n) OR i=j+1 Greedy search of s pairs O(2sn) below and above previous pair Inference • Scaling functions, σ(w(x)) 18

  19. 5/22/2016 Inference 3 Kernels 3 Kernels 4 10 1 Weyl exact 0.8 Weyl middle Log determinant approximation ratio Weyl greedy 0.6 Cheb−Hutch 2 10 True log det 0.4 0.2 Time (sec) 0 10 0 −0.2 −0.4 −2 10 −0.6 −0.8 −4 −1 10 2 3 4 2 4 6 10 10 10 10 10 10 Observations (#) Observations (#) Inference – so what?! • Linear complexity for additive kernels – O(Dn D+1/D ) • Scalable inference for non-separable kernels in space and time • Scalable inference for non-stationary kernels 19

  20. 5/22/2016 Numerical Experiments • 2500 points of synthetic data • 2 functional regimes defined by squared exponential kernels • Change surface define by Results - Numerical 20

  21. 5/22/2016 Demographic Analysis Demographic Analysis 21

Recommend


More recommend