solving large dense linear systems with covariance
play

Solving Large Dense Linear Systems with Covariance Matrices Jie - PowerPoint PPT Presentation

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices p. 1/30 SIAM LA, June 19, 2012


  1. Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices – p. 1/30 SIAM LA, June 19, 2012

  2. Introduction Covariance matrices appear in every corner of statistical analysis: Multivariate statistics Stochastic processes Sampling Max likelihood fitting Interpolation; kriging Regression and classification Prediction; forecasting The handling of covariance matrices incurs many matrix computations Solving Large Dense Linear Systems with Covariance Matrices – p. 2/30 SIAM LA, June 19, 2012

  3. Introduction Example: Sampling Generate a random vector from an n -dimensional normal distribution with mean µ and covariance matrix K . Steps: 1. Compute a Cholesky factorization K = LL T 2. Generate a random vector z with i.i.d. standard normal variables 3. The vector y = Lz + µ is one such sample, because ... E [ y ] = L · E [ z ] + µ = µ cov[ y ] = E [( y − µ )( y − µ ) T ] = E [( Lz )( z T L T )] = K Can replace L by K 1 / 2 , so need to compute K 1 / 2 z . Solving Large Dense Linear Systems with Covariance Matrices – p. 3/30 SIAM LA, June 19, 2012

  4. Introduction Example: Maximum likelihood estimation [Opposite of sampling:] Given a vector y , what is the most likely normal distribution it comes from? Assuming that mean is zero and that the covariance matrix K is parameterized by θ , then maximize the likelihood � � − 1 2 y T K − 1 y L ( θ ) := (2 π ) − n 2 (det K ) − 1 2 exp Can use any optimization method to solve max log L or ∇ log L = 0 Need to evaluate log(det K ) and K − 1 y � N 1 log(det K ) = tr(log K ) ≈ i =1 u i (log K ) u i N [log(det K )] ′ = tr( K − 1 K ′ K − 1 ) ≈ � N 1 i =1 u i ( K − 1 K ′ K − 1 ) u i N Solving Large Dense Linear Systems with Covariance Matrices – p. 4/30 SIAM LA, June 19, 2012

  5. Introduction Example: Interpolation Given some points x i ( i = 1 , . . . , n ) and their function values f ( x i ) , what is the function value of an unknown point x 0 ? If we assume that f is a sample path of a stochastic process with covariance function φ f ( x 0 ) = � n i =1 w i f ( x i ) , then the weights w i are computed as − 1       w 1 φ ( x 1 , x 1 ) · · · φ ( x 1 , x n ) φ ( x 1 , x 0 ) . . . .    ...    . . . .  =       . . . .      w n φ ( x n , x 1 ) · · · φ ( x n , x n ) φ ( x n , x 0 ) Where does the above formula come from? Recall our old friend, least squares: w = ( A T A ) − 1 ( A T y ) . min w � y − Aw � = ⇒ Solving Large Dense Linear Systems with Covariance Matrices – p. 5/30 SIAM LA, June 19, 2012

  6. Introduction We focus on Solving linear system with covariance matrix K , where K ij = φ ( x i − x j ) What is so special/challenging about covariance matrices? Can be very large Can be fully dense Can be increasingly ill-conditioned as matrix size grows Can be associated with a large number of random right-hand sides Positive definite Solving Large Dense Linear Systems with Covariance Matrices – p. 6/30 SIAM LA, June 19, 2012

  7. Linear Solver Consider the conjugate gradient method for solving Kx = b Require: Initial guess x 0 , preconditioner M ≈ K 1: Compute r 0 = b − Kx 0 , z 0 = M − 1 r 0 and p 0 = z 0 2: for j = 0 , 1 , . . . until convergence do 3: α j = ( r j , r j ) / ( Kp j , p j ) 4: x j +1 = x j + α j p j 5: r j +1 = r j − α j Kp j Check list: z j +1 = M − 1 r j +1 6: Matrix-vector mult? 7: β j = ( r j +1 , z j +1 ) / ( r j , z j ) Preconditioner? 8: p j +1 = z j +1 + β j p j Parallelism? 9: end for Solving Large Dense Linear Systems with Covariance Matrices – p. 7/30 SIAM LA, June 19, 2012

  8. A Simple Case: Regular Grid But not at all trivial... Consider that K ij = φ ( x i − x j ) where the x i ’s are on a regular grid K is (multilevel) Toeplitz Multiplying K with vector p requires embedding K to a larger (multilevel) circulant matrix, and padding p with zeros Multiplying (multilevel) circulant matrix needs (multi-dimensional) FFT A (multilevel) circulant preconditioner M can be constructed CG can be extended by using the seed method or block method to handle multiple right-hand sides Parallel implementation? ... is a headache ... because too many data transfers and global synchronizations Solving Large Dense Linear Systems with Covariance Matrices – p. 8/30 SIAM LA, June 19, 2012

  9. Preconditioning In general, how do we precondition K ? Need some knowledge of φ ... Mat´ ern: � √ � √ � ν � 1 2 ν � x � 2 ν � x � φ ( x ) = K ν 2 ν − 1 Γ( ν ) θ θ 1 θ : Scale parameter. Can also make θ = 1, ν = 0.5 θ = 1, ν = 2 function anisotropic. 0.8 ν : Smoothness. Controls the 0.6 φ (x) differentiability of φ at 0 . 0.4 More flexible than Gaussian kernel 0.2 (infinitely differentiable). 0 When ν → ∞ , it is Gaussian. 0 1 2 3 4 |x| Solving Large Dense Linear Systems with Covariance Matrices – p. 9/30 SIAM LA, June 19, 2012

  10. Spectral Density (Covariance, spectral density) pair: � R d f ( ω ) exp( i ω T x ) dω, φ ( x ) = with f ( ω ) > 0 . Spectral density of Mat´ ern kernel 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) . � f ( ω ) ∝ If regular grid, that is, x j = j/n , then write � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω φ ( j/n ) = � ω ∈ [0 , 2 π ) d . f n ( ω ) = n f ( n ◦ ( ω + 2 πl )) , l ∈ Z d Solving Large Dense Linear Systems with Covariance Matrices – p. 10/30 SIAM LA, June 19, 2012

  11. Spectral Density 0.016 f n , n = 10 0.014 0.012 0.01 0.008 0.006 0.004 0.002 f 0 −50 0 50 Solving Large Dense Linear Systems with Covariance Matrices – p. 11/30 SIAM LA, June 19, 2012

  12. Spectrum Bilinear form: � � 2 � � � � � a T Ka = a j exp( i ω T x j ) � � a j a l φ ( x j − x l ) = R d f ( ω ) dω. � � � � j j,l If regular grid, that is, x j = j/n , then write � � 2 � � � � a T Ka = a j exp( i ω T j ) � � [0 , 2 π ) d f n ( ω ) dω � � � � j � � 2 ≈ (2 π ) d � � a j exp( i (2 πk/n ) T j ) � � f n (2 πk/n ) . � � n � � 0 ≤ k ≤ n − 1 0 ≤ j ≤ n − 1 Intuitively, eigenvalues of K “similar to” (2 π ) d f n (2 πk/n ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 12/30 SIAM LA, June 19, 2012

  13. Spectrum Definition: Two sets of real numbers { a ( n ) } j =1 ,...,n and { b ( n ) } j =1 ,...,n are j j equally distributed in the interval [ M 1 , M 2 ] if for any continuous function F : [ M 1 , M 2 ] → R , n 1 � [ F ( a ( n ) ) − F ( b ( n ) lim )] = 0 . j j n n →∞ j =1 Theorem: If φ ∈ L 1 ∩ L 2 , then the set of eigenvalues of K/n and the set { (2 π ) d f n (2 πj/n ) /n } are equally distributed. Message: Loosely speaking, the spectrum of K is like f n evenly sampled on [0 , 2 π ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 13/30 SIAM LA, June 19, 2012

  14. Spectrum ern kernel ( d = 2 , ν = 3 ). Blue: eigenvalues of K . Red: (2 π ) d f n (2 πj/n ) . Mat´ 2 10 1 10 0 10 −1 10 −2 10 −3 10 −4 10 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 14/30 SIAM LA, June 19, 2012

  15. Preconditioning Preconditioning idea: suppress the variation of f . � R d −� ω � 2 f ( ω ) exp( i ω T x ) dω ∆ φ ( x ) = Covariance Spectral density 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) ≍ (1 + � ω � ) − 8 � φ f ( ω ) ∝ � ω � 2 f ( ω ) ≍ � ω � 2 (1 + � ω � ) − 8 ∆ φ ∆ 2 φ � ω � 4 f ( ω ) ≍ � ω � 4 (1 + � ω � ) − 8 ∆ 3 φ � ω � 6 (1 + � ω � ) − 8 ∆ 4 φ � ω � 8 (1 + � ω � ) − 8 . . . . . . Solving Large Dense Linear Systems with Covariance Matrices – p. 15/30 SIAM LA, June 19, 2012

  16. � �� s p sin 2 � ω p Preconditioning − 4 � d f [ s ] p =1 n 2 n ( ω ) = f n ( ω ) 2 Discrete case: ( L , D : discrete Laplacian) Matrix Entry � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω K φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [1] D φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω K [2] = LKL T D 2 φ ( j/n ) = [0 , 2 π ) d f [2] � D 3 φ ( j/n ) = n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [3] � n ( ω ) exp( i ω T j ) dω K [4] = LK [2] L T D 4 φ ( j/n ) = [0 , 2 π ) d f [4] Solving Large Dense Linear Systems with Covariance Matrices – p. 16/30 SIAM LA, June 19, 2012

  17. Preconditioning Same Mat´ ern kernel as in page 14. Left: original. Right: after Laplacian. 2 10 1 10 0 10 −1 10 10 7 −2 10 10 6 −3 10 10 5 −4 10 10 4 −200 0 200 400 600 800 1000 1200 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 17/30 SIAM LA, June 19, 2012

Recommend


More recommend