fitting large scale spatial models with applications to
play

Fitting Large-Scale Spatial Models with Applications to Microarray - PowerPoint PPT Presentation

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R. Sain Reinhard Furrer Department of Mathematics Geophysical Statistics Project University of Colorado at Denver National Center for Atmospheric


  1. Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R. Sain Reinhard Furrer Department of Mathematics Geophysical Statistics Project University of Colorado at Denver National Center for Atmospheric Research Outline • Microarrays and Climate • An Additive Spatial Model • Model Fitting • Examples

  2. Introduction • Many spatial problems are inherently multivariate – more than one measurement or observations at each spatial location. – Advent of GIS, modern computing, and methodological advances. • Many spatial problems involve lots of spatial locations. – Problems in constructing and working with design and covariance matrices. • Propose a simple multivariate spatial model and discuss some strategies for fitting and examining the results.

  3. Microarrays and Climate • Combining observed data and climate models – Examine model behavior as well as predictions of climate change. – Precipitation and temperature for sixteen models on a 5 ◦ grid. � 2 × 36 × 72 ≈ 5000 observations per chip/model. • Microarray analysis – Build a profile of differentially expressed genes relating to cerebral vascular malformations. – Roughly 20 chips with three disease groups (control, AVM, CCM) with each chip a 640 × 640 array � 2 × 16 × 12 K ≈ 400 K observations per chip.

  4. A Multivariate, Additive Spatial Model k ] ′ where each Y i denotes one spatial variable (one climate • Let Y = [ Y ′ 1 . . . Y ′ model, one microarray chip, etc.). • Then, Y = X β + h + ǫ where – X β represent fixed effects – h represents a random, zero-mean spatial process – ǫ represents a random error process orthonormal to h .

  5. A Multivariate, Additive Spatial Model • The structure of X includes both chip specific and gene specific (across chip) terms:   · · · 1 R C 0 0 G . .   . . . . 0 1 R C   X =  ,   . ... . .  0 1 R C G where – 1 is a vector of 1s – R and C represent row and column effects – G indicate gene effects.

  6. A Multivariate, Additive Spatial Model • Further, the model suggests E [ Y ] = X β Var[ Y ] = Σ h + Σ ǫ where     σ 2 · · · · · · K 1 0 0 1 I 0 0  .   .  . . σ 2 . . 0 K 2 0 2 I     Σ h = Σ ǫ =     . . ... ... . . . .     σ 2 0 K k 0 k I where – K i = K ( θ i ) represents a chip specific spatial covariance matrix parameterized by θ i – σ 2 i are chip specific variances (nugget).

  7. Backfitting • Ideally, one could use REML to fit covariance parameters and then estimates of β and predictions of the random effects follow directly: ˆ β = ( X ′ V − 1 X ) − 1 X ′ V − 1 Y (generalized least-squares) h = Σ h V − 1 ( Y − X ˆ ˆ β ) where V = Σ h + Σ ǫ • Direct computation of the design and covariance matrices impractical, not to mention the matrix computations... • Quick overview of backfitting from additive models...

  8. Backfitting • We estimate iteratively the fixed effects and the spatial process. • Algorithm: h (0) be an inital guess and put j = 1 Let � [1] � � − 1 X ′ � h ( j − 1) � β ( j ) = [2a] � Y − � X ′ X [2b] Estimate covariance parameters, then h ( j ) = Σ h V − 1 � β ( j ) � � Y − X � [3] Put j = j + 1 and repeat [2a] and [2b] until convergence • To prove equivalence at convergence, plug [2b] into [2a] . Straightforward manipulations lead to the generalized least-squares estimator. • Equivalent to universal kriging.

  9. Backfitting • Goal: computation in R on a computer with 2 GBytes RAM. • We perform the regression step iteratively on the chip specific effects and the gene effects: � � − 1 ( 1RC ) ′ � h ( j − 1) � β ( j ) � Y − � ( 1RC ) ′ ( 1RC ) [2a’] Chip = � � − 1 G ′ � h ( j − 1) � β ( j ) � Y − � G ′ G [2a’’] Gene = [2a’’’] Repeat [2a’] and [2a’’] until convergence • We need to reconstruct the individual design matrices for each step. • Calculation time for entire model ≈ 2 minutes per iteration (Xeon i686, Linux). �� β ( j − 1) � β ( j ) − � < 10 − 4 for j ≥ 4. • Quick convergence: MSE

  10. Sparse Matrix Manipulation • Matrices are stored in a sparse format. • Design matrix X contains only { 0 , 1 } and Σ h has a lot of zeros due to tapering. • Illustration with “small” sub-design matrix C (Base and SparseM library in R): Sparse format Full format Sum contr. treatment contr. Storage of C : x Calculate C ′ C Storage of C ′ C : x Solve ( C ′ C ) x = v : s s s

  11. Covariance Tapering • Introduce sparseness structure in covariance matrix K i with some taper function. • Carefully chosen taper preserves asymptotic optimality with kriging (Furrer et al. 2004, submitted). Covariance 0.0 0.2 0.4 0.6 0.8 Lag Exponential (green), spheric (yellow), Tapered = exponential × spheric (red). • Tapering with range of 2 (eight neighbors): Nonzero elements in K i : 3,614,762 (0.002%) Nonzero elements in Cholesky factor of K i : 67,070,820 (0.040%) With 12 and more neighbors, more than 2 30 nonzero elements in Cholesky factor.

  12. Microarray Example – A Single Chip • Few missing values (1.5%). • Add blurring to recompense rounding. • . . .

  13. Microarray Example – A Single Chip • Estimate the covariance structure. • Fit an exponential covariance (range, sill, nugget) = (1 . 528 , 0 . 487 , 0 . 061). • Taper with a spherical covariance with range 2. + x + empirical horizontal x empirical vertical 0.4 empirical off−axis o fitted exponential + tapered covariance: exp*spher 0.2 x taper covariance o o x + o o o o x o o o + o o o x o o o o o o o o o + o o o x o o o o o o o o x o o + o o o o o x 0.0 o o + o o o o o o o o + o o o o 0 1 2 3 4 5 6 7 lag

  14. Microarray Example – A Single Chip Row/Column effects Column effects 0.2 −0.2 0 100 200 300 400 500 600 Row effects 0.2 −0.2 0 100 200 300 400 500 600

  15. Microarray Example – A Single Chip Gene effects (normed on the right) 15 4 3 10 Miss match Miss match 2 5 1 0 0 −1 −5 −1 0 1 2 3 4 −5 0 5 10 15 Perfect match Perfect match

  16. Microarray Example – A Single Chip QQ-plots of gene effects (normed on the right) 4 15 3 10 Sample Quantiles Sample Quantiles 2 99% 99% 5 1 95% 95% 75% 75% 0 0 25% 25% 5% 5% 1% 1% −1 −4 −2 0 2 4 −4 −2 0 2 4 Theoretical Quantiles Theoretical Quantiles

  17. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  18. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  19. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  20. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  21. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  22. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  23. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  24. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  25. Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

  26. Microarray Example – More Chips. . . • The algoritm is simply extended according: � � − 1 ( 1RC ) ′ � � β ( j ) h ( j − 1) � Y Chip i − � [2a ∗ ] ( 1RC ) ′ ( 1RC ) Chip i = , i = 1 , . . . , k Chip i � � − 1 G ′ � � β ( j ) h ( j − 1) � Y Chip i − � [2a ∗∗ ] G ′ G Gene = Chip i [2b ∗ ] For i = 1 , . . . , k , estimate covariance parameters, then � � − 1 � � h ( j ) β ( j ) β ( j ) � Y Chip i − ( 1RC ) � Chip i − G � K i + σ 2 Chip i = K i i I Gene • Careful programming in R and a few Fortran routines allows calculation on a Xeon processor with 2 GBytes RAM. Results within a few minutes: ≈ 2 × k minutes per iteration.

  27. Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

  28. Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

  29. Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

  30. Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

  31. Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

  32. Microarray Example – Two Chips QQ-plots of gene effects 4 3 Sample Quantiles 2 99% 1 95% 75% 0 25% 5% 1% −1 −4 −2 0 2 4 Theoretical Quantiles

  33. Microarray Example – Two Chips Difference in gene effects

Recommend


More recommend