Some Areas of Recent Research Michael Stein University of Chicago Department Retreat, October 2012 Michael Stein Some Areas of Recent Research
Funders & Collaborators ◮ NSF (STATMOS), US Department of Energy ◮ Faculty: Mihai Anitescu, Liz Moyer ◮ Postdocs: Jie Chen, Bill Leeds, Ying Sun ◮ Grad students: Stefano Castruccio, Michael Horrell, Andy Poppick ◮ Undergrads: Peter Hansen, Grant Wilder Michael Stein Some Areas of Recent Research
Preconditioning and fitting Gaussian process models Gaussian process Z determined by its mean and covariance functions: ◮ EZ ( x ) = ✖ ( x ) ◮ ❝♦✈ ❢ Z ( x ) ❀ Z ( y ) ❣ = K ( x ❀ y ) Assume mean is 0 and covariance structure known up to parameter θ . Let K θ be covariance matrix for observations Z ( x 1 ) ❀ ✿ ✿ ✿ ❀ Z ( x n ) given θ . Then Then the loglik is (ignoring an additive constant) ❵ ( θ ) = � 1 2 log ❥ K ( θ ) ❥ � 1 2 Z ✵ K ( θ ) � 1 Z ✿ Problem: How to compute ❵ ( θ )? Particularly log ❥ K ( θ ) ❥ ? Michael Stein Some Areas of Recent Research
Important aside: Even when loglik can be computed exactly, maximizing it (or sampling from a posterior) may not be easy. Consider 400 evenly spaced observations on R and Z is fractional Brownian motion with variogram 1 � ☛ 2 E ❢ Z ( x ) � Z ( y ) ❣ 2 = Γ “ ” ❵ ❥ x � y ❥ ☛ 2 with ❵ = 10 and ☛ = 1 ✿ 5. ◮ Neither parameter is estimated well although there is strong evidence parameters lie along a curve in ( ❵❀ ☛ ) space. ◮ Problem is worse if leave out Γ ` � ☛ ´ . 2 ◮ I am unaware of any transformation independent of observation locations that would give concave loglikelihood. This kind of function makes some people in the optimization community unhappy. ◮ Things only get worse with more complex models. Michael Stein Some Areas of Recent Research
60 40 log−likelihood 20 0 −20 −40 2 −60 1.8 0 1.6 10 20 1.4 30 1.2 40 1 50 α ℓ Michael Stein Some Areas of Recent Research
Computing exact MLE Exact computations of likelihood function for n irregularly sited observations generally requires O ( n 3 ) computation and O ( n 2 ) memory to compute Cholesky decomposition of covariance matrix. ◮ Computation is becoming cheap much faster than memory. ◮ Increasing emphasis on “matrix-free” methods in which never have to store an n ✂ n matrix, even if requires more computation. Michael Stein Some Areas of Recent Research
Iterative solution of linear equations Computing quadratic form in likelihood best done by solving systems like K x = y , not by finding K � 1 . Iterative methods: for K positive definite, equivalent to minimizing 1 2 x ✵ K x � x ✵ y , which can solve by, for example, conjugate gradient. Main computation requires multiplying vectors by K . This is fast for ◮ sparse K ◮ some structured (e.g., Toeplitz) matrices But even for dense unstructured matrices, iterative solution is matrix-free and may require many fewer flops than Cholesky decomposition: O ( n 2 ✂ # iterations) O ( n 3 ) v ✿ Number of iterations for accurate solution related to condition number (ratio of largest to smallest eigenvalue) ✔ ( K ) of K . Michael Stein Some Areas of Recent Research
When nearby observations strongly correlated, ✔ ( K ) can be very large, so need to precondition: ◮ Find a matrix P such that P ✵ K ( θ ) P is well-conditioned for θ in vicinity of MLE and multiplying a vector by P is fast. ◮ Let Y = P ✵ Z . Then the loglik (with Z as data, but written in terms of Y ) equals ❵ ( θ ) = � 1 2 log ❥ P ✵ K ( θ ) P ❥ + log ❥ P ❥ � 1 2 Y ✵ ❢ P ✵ K ( θ ) P ❣ � 1 Y ✿ ◮ Okay for P to depend on θ as long as use this formula. ◮ Can ignore log ❥ P ❥ if it doesn’t depend on θ (even if P does). ◮ What to do about log ❥ P ✵ K ( θ ) P ❥ ? Michael Stein Some Areas of Recent Research
Solve score equations instead? (Ignore preconditioning for convenience.) ❅ Writing K i ( θ ) for ❅✒ i K ( θ ), score equations are (assume mean is 0) n o Z ✵ K ( θ ) � 1 K i ( θ ) K ( θ ) � 1 Z = tr K ( θ ) � 1 K i ( θ ) for i = 1 ❀ ✿ ✿ ✿ ❀ p . First term requires only one solve. Instead of log determinant, need, for each component of θ , n o K ( θ ) � 1 K i ( θ ) tr ❀ which requires n solves for exact calculation. Approximate by the unbiased estimate (Hutchinson, 1990) N 1 X j K ( θ ) � 1 K i ( θ ) U j ❀ U ✵ N j =1 where U j = ( U j 1 ❀ ✿ ✿ ✿ ❀ U jn ) ✵ is random vector with U jk ’s iid and Pr( U jk = 1) = Pr( U jk = � 1) = 1 2 . ◮ Yields unbiased estimating equations. Michael Stein Some Areas of Recent Research
Can bound statistical inefficiency of procedure in terms of ✔ ( K ). Thus, if can find a decent preconditioner for K , moderate N works well. ◮ Don’t need N comparable to n ! Preconditioning helps in two ways: ◮ Reduces number of iterations needed in iterative solver. ◮ Reduces need for large N . Scope for further improvement by choosing U j ’s not independent. ◮ Design of experiments! Stein, Chen and Anitescu (under revision). Michael Stein Some Areas of Recent Research
Some other interests ◮ When low rank approximations to covariance matrices don’t work. ◮ Won’t discuss this here, but work likely to annoy some who have been advocating this approach for massive spatial datasets. ◮ Modeling and computation for massive (as opposed to large) space-time datasets. ◮ Without assuming covariance (or inverse covariance) matrices are low rank or sparse. ◮ Climate model emulation. Michael Stein Some Areas of Recent Research
One-pass methods Look at data block by block and summarize the information about K ( θ ) from that block so that don’t have to go back to raw data again (Anitescu, Horrell). Simple example: ◮ Divide data into B blocks. ◮ Within each block, approximate the loglik (or score) function. ◮ Mle of θ and observed information matrix an adequate approximation? ◮ If not, store more complete representation of loglik function. Adding loglik across blocks reduces storage with little loss of information? ◮ Save a few observations (or other summaries) from each block. ◮ Add within block approximate logliks to loglik of sparse observations. For truly massive (petascale, exascale) data, will need more than two “layers.” Michael Stein Some Areas of Recent Research
Michael Stein Some Areas of Recent Research
Michael Stein Some Areas of Recent Research
Michael Stein Some Areas of Recent Research
+ Michael Stein Some Areas of Recent Research
+ Michael Stein Some Areas of Recent Research
+ + Michael Stein Some Areas of Recent Research
+ + + + + + + + + + + + + + + Michael Stein Some Areas of Recent Research
Climate model emulation Reproducing some of the output of a GCM under some forcing scenario without actually running it (Castruccio, Leeds, Moyer, Wilder). ◮ Or, better yet, producing accurate simulations of actual climate under some forcing scenario. GCM runs we have: ◮ NCAR Community Climate System Model version 3 (CCSM3), T31 resolution (approx 3 ✿ 75 ✍ ✂ 3 ✿ 75 ✍ grid cells) ◮ Input is CO 2 , output is temperature T ( t ) and precipitation P ( t ), t is year ◮ 18 forcing scenarios, 53 realizations, ❃ 10 ❀ 000 model years Michael Stein Some Areas of Recent Research
Statistical emulation of mean Separate time series model for each of 47 regions: ☞ 0 + ☞ 1 1 T ( t ) = 2 ❢ log[CO 2 ]( t ) + log[CO 2 ]( t � 1) ❣ ✶ X + ☞ 2 w i � 2 log[CO 2 ]( t � i ) + ✧ ( t ) i =2 where ◮ ✧ ( t ) is an autoregressive model of order 1 ◮ w i = (1 � ✚ ) ✚ i . Fit with small number of scenarios and a few realizations per scenario. Compare to standard computer model emulation approach in which view (CO 2 (1) ❀ ✿ ✿ ✿ ❀ CO 2 ( n )) as input and ( T (1) ❀ ✿ ✿ ✿ ❀ T ( n )) as output. Michael Stein Some Areas of Recent Research
Total column ozone OMI (Ozone Monitoring Instrument, successor to TOMS) is aboard the satellite EOS Aura: ◮ Polar-orbiting. ◮ Sun-synchronous, so satellite always at local noon. ◮ Each orbit about 100 minutes, or 14.1 orbits a day. ◮ From raw data (photon counts in multiple frequency bands), levels of many trace constituents of atmosphere are deduced, including ozone. ◮ Over 80,000 observations per orbit, so over 10 6 a day. ◮ Near global coverage (no data during polar nights, some missing data). How might statistical models be used to produce better Level-3 (gridded) product than what NASA currently does? Michael Stein Some Areas of Recent Research
Observation locations from 2 orbits 60 30 0 latitude −30 −60 −90 90 Date line −90 0 90 longitude Michael Stein Some Areas of Recent Research
Scope for fruitful interaction between statistics and numerical analysis. Information flow in both directions. ◮ Statistical problems produce new challenges in applied/computational math. ◮ Statistical/probabilistic thinking can yield new algorithms and theory for numerical analysis. Michael Stein Some Areas of Recent Research
Recommend
More recommend