Some Areas of Recent Research Michael Stein University of Chicago - PowerPoint PPT Presentation

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October 2012 Michael Stein Some Areas of Recent Research

Funders & Collaborators ◮ NSF (STATMOS), US Department of Energy ◮ Faculty: Mihai Anitescu, Liz Moyer ◮ Postdocs: Jie Chen, Bill Leeds, Ying Sun ◮ Grad students: Stefano Castruccio, Michael Horrell, Andy Poppick ◮ Undergrads: Peter Hansen, Grant Wilder Michael Stein Some Areas of Recent Research

Preconditioning and fitting Gaussian process models Gaussian process Z determined by its mean and covariance functions: ◮ EZ ( x ) = ✖ ( x ) ◮ ❝♦✈ ❢ Z ( x ) ❀ Z ( y ) ❣ = K ( x ❀ y ) Assume mean is 0 and covariance structure known up to parameter θ . Let K θ be covariance matrix for observations Z ( x 1 ) ❀ ✿ ✿ ✿ ❀ Z ( x n ) given θ . Then Then the loglik is (ignoring an additive constant) ❵ ( θ ) = � 1 2 log ❥ K ( θ ) ❥ � 1 2 Z ✵ K ( θ ) � 1 Z ✿ Problem: How to compute ❵ ( θ )? Particularly log ❥ K ( θ ) ❥ ? Michael Stein Some Areas of Recent Research

Important aside: Even when loglik can be computed exactly, maximizing it (or sampling from a posterior) may not be easy. Consider 400 evenly spaced observations on R and Z is fractional Brownian motion with variogram 1 � ☛ 2 E ❢ Z ( x ) � Z ( y ) ❣ 2 = Γ “ ” ❵ ❥ x � y ❥ ☛ 2 with ❵ = 10 and ☛ = 1 ✿ 5. ◮ Neither parameter is estimated well although there is strong evidence parameters lie along a curve in ( ❵❀ ☛ ) space. ◮ Problem is worse if leave out Γ ` � ☛ ´ . 2 ◮ I am unaware of any transformation independent of observation locations that would give concave loglikelihood. This kind of function makes some people in the optimization community unhappy. ◮ Things only get worse with more complex models. Michael Stein Some Areas of Recent Research

60 40 log−likelihood 20 0 −20 −40 2 −60 1.8 0 1.6 10 20 1.4 30 1.2 40 1 50 α ℓ Michael Stein Some Areas of Recent Research

Computing exact MLE Exact computations of likelihood function for n irregularly sited observations generally requires O ( n 3 ) computation and O ( n 2 ) memory to compute Cholesky decomposition of covariance matrix. ◮ Computation is becoming cheap much faster than memory. ◮ Increasing emphasis on “matrix-free” methods in which never have to store an n ✂ n matrix, even if requires more computation. Michael Stein Some Areas of Recent Research

Iterative solution of linear equations Computing quadratic form in likelihood best done by solving systems like K x = y , not by finding K � 1 . Iterative methods: for K positive definite, equivalent to minimizing 1 2 x ✵ K x � x ✵ y , which can solve by, for example, conjugate gradient. Main computation requires multiplying vectors by K . This is fast for ◮ sparse K ◮ some structured (e.g., Toeplitz) matrices But even for dense unstructured matrices, iterative solution is matrix-free and may require many fewer flops than Cholesky decomposition: O ( n 2 ✂ # iterations) O ( n 3 ) v ✿ Number of iterations for accurate solution related to condition number (ratio of largest to smallest eigenvalue) ✔ ( K ) of K . Michael Stein Some Areas of Recent Research

When nearby observations strongly correlated, ✔ ( K ) can be very large, so need to precondition: ◮ Find a matrix P such that P ✵ K ( θ ) P is well-conditioned for θ in vicinity of MLE and multiplying a vector by P is fast. ◮ Let Y = P ✵ Z . Then the loglik (with Z as data, but written in terms of Y ) equals ❵ ( θ ) = � 1 2 log ❥ P ✵ K ( θ ) P ❥ + log ❥ P ❥ � 1 2 Y ✵ ❢ P ✵ K ( θ ) P ❣ � 1 Y ✿ ◮ Okay for P to depend on θ as long as use this formula. ◮ Can ignore log ❥ P ❥ if it doesn’t depend on θ (even if P does). ◮ What to do about log ❥ P ✵ K ( θ ) P ❥ ? Michael Stein Some Areas of Recent Research

Solve score equations instead? (Ignore preconditioning for convenience.) ❅ Writing K i ( θ ) for ❅✒ i K ( θ ), score equations are (assume mean is 0) n o Z ✵ K ( θ ) � 1 K i ( θ ) K ( θ ) � 1 Z = tr K ( θ ) � 1 K i ( θ ) for i = 1 ❀ ✿ ✿ ✿ ❀ p . First term requires only one solve. Instead of log determinant, need, for each component of θ , n o K ( θ ) � 1 K i ( θ ) tr ❀ which requires n solves for exact calculation. Approximate by the unbiased estimate (Hutchinson, 1990) N 1 X j K ( θ ) � 1 K i ( θ ) U j ❀ U ✵ N j =1 where U j = ( U j 1 ❀ ✿ ✿ ✿ ❀ U jn ) ✵ is random vector with U jk ’s iid and Pr( U jk = 1) = Pr( U jk = � 1) = 1 2 . ◮ Yields unbiased estimating equations. Michael Stein Some Areas of Recent Research

Can bound statistical inefficiency of procedure in terms of ✔ ( K ). Thus, if can find a decent preconditioner for K , moderate N works well. ◮ Don’t need N comparable to n ! Preconditioning helps in two ways: ◮ Reduces number of iterations needed in iterative solver. ◮ Reduces need for large N . Scope for further improvement by choosing U j ’s not independent. ◮ Design of experiments! Stein, Chen and Anitescu (under revision). Michael Stein Some Areas of Recent Research

Some other interests ◮ When low rank approximations to covariance matrices don’t work. ◮ Won’t discuss this here, but work likely to annoy some who have been advocating this approach for massive spatial datasets. ◮ Modeling and computation for massive (as opposed to large) space-time datasets. ◮ Without assuming covariance (or inverse covariance) matrices are low rank or sparse. ◮ Climate model emulation. Michael Stein Some Areas of Recent Research

One-pass methods Look at data block by block and summarize the information about K ( θ ) from that block so that don’t have to go back to raw data again (Anitescu, Horrell). Simple example: ◮ Divide data into B blocks. ◮ Within each block, approximate the loglik (or score) function. ◮ Mle of θ and observed information matrix an adequate approximation? ◮ If not, store more complete representation of loglik function. Adding loglik across blocks reduces storage with little loss of information? ◮ Save a few observations (or other summaries) from each block. ◮ Add within block approximate logliks to loglik of sparse observations. For truly massive (petascale, exascale) data, will need more than two “layers.” Michael Stein Some Areas of Recent Research

Michael Stein Some Areas of Recent Research

+ Michael Stein Some Areas of Recent Research

+ + Michael Stein Some Areas of Recent Research

+ + + + + + + + + + + + + + + Michael Stein Some Areas of Recent Research

Climate model emulation Reproducing some of the output of a GCM under some forcing scenario without actually running it (Castruccio, Leeds, Moyer, Wilder). ◮ Or, better yet, producing accurate simulations of actual climate under some forcing scenario. GCM runs we have: ◮ NCAR Community Climate System Model version 3 (CCSM3), T31 resolution (approx 3 ✿ 75 ✍ ✂ 3 ✿ 75 ✍ grid cells) ◮ Input is CO 2 , output is temperature T ( t ) and precipitation P ( t ), t is year ◮ 18 forcing scenarios, 53 realizations, ❃ 10 ❀ 000 model years Michael Stein Some Areas of Recent Research

Statistical emulation of mean Separate time series model for each of 47 regions: ☞ 0 + ☞ 1 1 T ( t ) = 2 ❢ log[CO 2 ]( t ) + log[CO 2 ]( t � 1) ❣ ✶ X + ☞ 2 w i � 2 log[CO 2 ]( t � i ) + ✧ ( t ) i =2 where ◮ ✧ ( t ) is an autoregressive model of order 1 ◮ w i = (1 � ✚ ) ✚ i . Fit with small number of scenarios and a few realizations per scenario. Compare to standard computer model emulation approach in which view (CO 2 (1) ❀ ✿ ✿ ✿ ❀ CO 2 ( n )) as input and ( T (1) ❀ ✿ ✿ ✿ ❀ T ( n )) as output. Michael Stein Some Areas of Recent Research

Total column ozone OMI (Ozone Monitoring Instrument, successor to TOMS) is aboard the satellite EOS Aura: ◮ Polar-orbiting. ◮ Sun-synchronous, so satellite always at local noon. ◮ Each orbit about 100 minutes, or 14.1 orbits a day. ◮ From raw data (photon counts in multiple frequency bands), levels of many trace constituents of atmosphere are deduced, including ozone. ◮ Over 80,000 observations per orbit, so over 10 6 a day. ◮ Near global coverage (no data during polar nights, some missing data). How might statistical models be used to produce better Level-3 (gridded) product than what NASA currently does? Michael Stein Some Areas of Recent Research

Observation locations from 2 orbits 60 30 0 latitude −30 −60 −90 90 Date line −90 0 90 longitude Michael Stein Some Areas of Recent Research

Scope for fruitful interaction between statistics and numerical analysis. Information flow in both directions. ◮ Statistical problems produce new challenges in applied/computational math. ◮ Statistical/probabilistic thinking can yield new algorithms and theory for numerical analysis. Michael Stein Some Areas of Recent Research

Some Areas of Recent Research Michael Stein University of Chicago - PowerPoint PPT Presentation

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October 2012 Michael Stein Some Areas of Recent Research Funders & Collaborators NSF (STATMOS), US Department of Energy Faculty: Mihai Anitescu,

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

Designation of Areas under Evacuation Orders under Evacuation Orders Legend Legend Areas where

Recent Development in India Recent Development in India Recent Development in India Recent

Seminar on Seminar on Recent Developments in Project Management Recent Developments in Project

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

How smart APIs are different. @berndruecker Some Service Some Some Service Service Some

Some Recent Development Some Recent Development in RFID Privacy Models Robert H. Deng School of

The Good Samaritan Luke 10:25-37 Here is some test text Here is some test text Here is some

The God Who Whispers 1 Kings 19 Here is some test text Here is some test text Here is some test

God Reveals His HOLINESS Isaiah 6 Here is some test text Here is some test text Here is some

For Such a Time as This Esther 4 Here is some test text Here is some test text Here is some

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

Esther and the Great Reversal Esther 6-9 Here is some test text Here is some test text Here is

Quadratic forms and Ellipses Max Turgeon 19/09/2019 In these notes, I want to clarify a few

LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field

Introduction to Mobile Robotics A Compact Course on Linear Algebra Wolfram Burgard, Cyrill

Lecture 4: Goemans-Williamson Algorithm for MAX-CUT Lecture Outline Part I: Analyzing

On Cholesky structures on real symmetric matrices and their applications Hideyuki ISHI (Osaka

CS475/CS675 Lecture 2: May 3, 2016 Cholesky factorization, tridiagonal, band matrices Reading:

Some Areas of Recent Research Michael Stein University of Chicago - PowerPoint PPT Presentation

Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October 2012 Michael Stein Some Areas of Recent Research Funders & Collaborators NSF (STATMOS), US Department of Energy Faculty: Mihai Anitescu,

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

Designation of Areas under Evacuation Orders under Evacuation Orders Legend Legend Areas where

Recent Development in India Recent Development in India Recent Development in India Recent

Seminar on Seminar on Recent Developments in Project Management Recent Developments in Project

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

How smart APIs are different. @berndruecker Some Service Some Some Service Service Some

Some Recent Development Some Recent Development in RFID Privacy Models Robert H. Deng School of

The Good Samaritan Luke 10:25-37 Here is some test text Here is some test text Here is some

The God Who Whispers 1 Kings 19 Here is some test text Here is some test text Here is some test

God Reveals His HOLINESS Isaiah 6 Here is some test text Here is some test text Here is some

For Such a Time as This Esther 4 Here is some test text Here is some test text Here is some

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

Esther and the Great Reversal Esther 6-9 Here is some test text Here is some test text Here is

Quadratic forms and Ellipses Max Turgeon 19/09/2019 In these notes, I want to clarify a few

LU Factorization Marco Chiarandini Department of Mathematics &amp; Computer Science University

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field

Introduction to Mobile Robotics A Compact Course on Linear Algebra Wolfram Burgard, Cyrill

Lecture 4: Goemans-Williamson Algorithm for MAX-CUT Lecture Outline Part I: Analyzing

On Cholesky structures on real symmetric matrices and their applications Hideyuki ISHI (Osaka

CS475/CS675 Lecture 2: May 3, 2016 Cholesky factorization, tridiagonal, band matrices Reading:

LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University