Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer Science Division Argonne National Laboratory Solving Large Dense Linear Systems with Covariance Matrices – p. 1/30 SIAM LA, June 19, 2012
Introduction Covariance matrices appear in every corner of statistical analysis: Multivariate statistics Stochastic processes Sampling Max likelihood fitting Interpolation; kriging Regression and classification Prediction; forecasting The handling of covariance matrices incurs many matrix computations Solving Large Dense Linear Systems with Covariance Matrices – p. 2/30 SIAM LA, June 19, 2012
Introduction Example: Sampling Generate a random vector from an n -dimensional normal distribution with mean µ and covariance matrix K . Steps: 1. Compute a Cholesky factorization K = LL T 2. Generate a random vector z with i.i.d. standard normal variables 3. The vector y = Lz + µ is one such sample, because ... E [ y ] = L · E [ z ] + µ = µ cov[ y ] = E [( y − µ )( y − µ ) T ] = E [( Lz )( z T L T )] = K Can replace L by K 1 / 2 , so need to compute K 1 / 2 z . Solving Large Dense Linear Systems with Covariance Matrices – p. 3/30 SIAM LA, June 19, 2012
Introduction Example: Maximum likelihood estimation [Opposite of sampling:] Given a vector y , what is the most likely normal distribution it comes from? Assuming that mean is zero and that the covariance matrix K is parameterized by θ , then maximize the likelihood � � − 1 2 y T K − 1 y L ( θ ) := (2 π ) − n 2 (det K ) − 1 2 exp Can use any optimization method to solve max log L or ∇ log L = 0 Need to evaluate log(det K ) and K − 1 y � N 1 log(det K ) = tr(log K ) ≈ i =1 u i (log K ) u i N [log(det K )] ′ = tr( K − 1 K ′ K − 1 ) ≈ � N 1 i =1 u i ( K − 1 K ′ K − 1 ) u i N Solving Large Dense Linear Systems with Covariance Matrices – p. 4/30 SIAM LA, June 19, 2012
Introduction Example: Interpolation Given some points x i ( i = 1 , . . . , n ) and their function values f ( x i ) , what is the function value of an unknown point x 0 ? If we assume that f is a sample path of a stochastic process with covariance function φ f ( x 0 ) = � n i =1 w i f ( x i ) , then the weights w i are computed as − 1 w 1 φ ( x 1 , x 1 ) · · · φ ( x 1 , x n ) φ ( x 1 , x 0 ) . . . . ... . . . . = . . . . w n φ ( x n , x 1 ) · · · φ ( x n , x n ) φ ( x n , x 0 ) Where does the above formula come from? Recall our old friend, least squares: w = ( A T A ) − 1 ( A T y ) . min w � y − Aw � = ⇒ Solving Large Dense Linear Systems with Covariance Matrices – p. 5/30 SIAM LA, June 19, 2012
Introduction We focus on Solving linear system with covariance matrix K , where K ij = φ ( x i − x j ) What is so special/challenging about covariance matrices? Can be very large Can be fully dense Can be increasingly ill-conditioned as matrix size grows Can be associated with a large number of random right-hand sides Positive definite Solving Large Dense Linear Systems with Covariance Matrices – p. 6/30 SIAM LA, June 19, 2012
Linear Solver Consider the conjugate gradient method for solving Kx = b Require: Initial guess x 0 , preconditioner M ≈ K 1: Compute r 0 = b − Kx 0 , z 0 = M − 1 r 0 and p 0 = z 0 2: for j = 0 , 1 , . . . until convergence do 3: α j = ( r j , r j ) / ( Kp j , p j ) 4: x j +1 = x j + α j p j 5: r j +1 = r j − α j Kp j Check list: z j +1 = M − 1 r j +1 6: Matrix-vector mult? 7: β j = ( r j +1 , z j +1 ) / ( r j , z j ) Preconditioner? 8: p j +1 = z j +1 + β j p j Parallelism? 9: end for Solving Large Dense Linear Systems with Covariance Matrices – p. 7/30 SIAM LA, June 19, 2012
A Simple Case: Regular Grid But not at all trivial... Consider that K ij = φ ( x i − x j ) where the x i ’s are on a regular grid K is (multilevel) Toeplitz Multiplying K with vector p requires embedding K to a larger (multilevel) circulant matrix, and padding p with zeros Multiplying (multilevel) circulant matrix needs (multi-dimensional) FFT A (multilevel) circulant preconditioner M can be constructed CG can be extended by using the seed method or block method to handle multiple right-hand sides Parallel implementation? ... is a headache ... because too many data transfers and global synchronizations Solving Large Dense Linear Systems with Covariance Matrices – p. 8/30 SIAM LA, June 19, 2012
Preconditioning In general, how do we precondition K ? Need some knowledge of φ ... Mat´ ern: � √ � √ � ν � 1 2 ν � x � 2 ν � x � φ ( x ) = K ν 2 ν − 1 Γ( ν ) θ θ 1 θ : Scale parameter. Can also make θ = 1, ν = 0.5 θ = 1, ν = 2 function anisotropic. 0.8 ν : Smoothness. Controls the 0.6 φ (x) differentiability of φ at 0 . 0.4 More flexible than Gaussian kernel 0.2 (infinitely differentiable). 0 When ν → ∞ , it is Gaussian. 0 1 2 3 4 |x| Solving Large Dense Linear Systems with Covariance Matrices – p. 9/30 SIAM LA, June 19, 2012
Spectral Density (Covariance, spectral density) pair: � R d f ( ω ) exp( i ω T x ) dω, φ ( x ) = with f ( ω ) > 0 . Spectral density of Mat´ ern kernel 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) . � f ( ω ) ∝ If regular grid, that is, x j = j/n , then write � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω φ ( j/n ) = � ω ∈ [0 , 2 π ) d . f n ( ω ) = n f ( n ◦ ( ω + 2 πl )) , l ∈ Z d Solving Large Dense Linear Systems with Covariance Matrices – p. 10/30 SIAM LA, June 19, 2012
Spectral Density 0.016 f n , n = 10 0.014 0.012 0.01 0.008 0.006 0.004 0.002 f 0 −50 0 50 Solving Large Dense Linear Systems with Covariance Matrices – p. 11/30 SIAM LA, June 19, 2012
Spectrum Bilinear form: � � 2 � � � � � a T Ka = a j exp( i ω T x j ) � � a j a l φ ( x j − x l ) = R d f ( ω ) dω. � � � � j j,l If regular grid, that is, x j = j/n , then write � � 2 � � � � a T Ka = a j exp( i ω T j ) � � [0 , 2 π ) d f n ( ω ) dω � � � � j � � 2 ≈ (2 π ) d � � a j exp( i (2 πk/n ) T j ) � � f n (2 πk/n ) . � � n � � 0 ≤ k ≤ n − 1 0 ≤ j ≤ n − 1 Intuitively, eigenvalues of K “similar to” (2 π ) d f n (2 πk/n ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 12/30 SIAM LA, June 19, 2012
Spectrum Definition: Two sets of real numbers { a ( n ) } j =1 ,...,n and { b ( n ) } j =1 ,...,n are j j equally distributed in the interval [ M 1 , M 2 ] if for any continuous function F : [ M 1 , M 2 ] → R , n 1 � [ F ( a ( n ) ) − F ( b ( n ) lim )] = 0 . j j n n →∞ j =1 Theorem: If φ ∈ L 1 ∩ L 2 , then the set of eigenvalues of K/n and the set { (2 π ) d f n (2 πj/n ) /n } are equally distributed. Message: Loosely speaking, the spectrum of K is like f n evenly sampled on [0 , 2 π ) . Solving Large Dense Linear Systems with Covariance Matrices – p. 13/30 SIAM LA, June 19, 2012
Spectrum ern kernel ( d = 2 , ν = 3 ). Blue: eigenvalues of K . Red: (2 π ) d f n (2 πj/n ) . Mat´ 2 10 1 10 0 10 −1 10 −2 10 −3 10 −4 10 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 14/30 SIAM LA, June 19, 2012
Preconditioning Preconditioning idea: suppress the variation of f . � R d −� ω � 2 f ( ω ) exp( i ω T x ) dω ∆ φ ( x ) = Covariance Spectral density 2 ν + θ 2 � ω � 2 � − ( ν + d/ 2) ≍ (1 + � ω � ) − 8 � φ f ( ω ) ∝ � ω � 2 f ( ω ) ≍ � ω � 2 (1 + � ω � ) − 8 ∆ φ ∆ 2 φ � ω � 4 f ( ω ) ≍ � ω � 4 (1 + � ω � ) − 8 ∆ 3 φ � ω � 6 (1 + � ω � ) − 8 ∆ 4 φ � ω � 8 (1 + � ω � ) − 8 . . . . . . Solving Large Dense Linear Systems with Covariance Matrices – p. 15/30 SIAM LA, June 19, 2012
� �� s p sin 2 � ω p Preconditioning − 4 � d f [ s ] p =1 n 2 n ( ω ) = f n ( ω ) 2 Discrete case: ( L , D : discrete Laplacian) Matrix Entry � [0 , 2 π ) d f n ( ω ) exp( i ω T j ) dω K φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [1] D φ ( j/n ) = � n ( ω ) exp( i ω T j ) dω K [2] = LKL T D 2 φ ( j/n ) = [0 , 2 π ) d f [2] � D 3 φ ( j/n ) = n ( ω ) exp( i ω T j ) dω [0 , 2 π ) d f [3] � n ( ω ) exp( i ω T j ) dω K [4] = LK [2] L T D 4 φ ( j/n ) = [0 , 2 π ) d f [4] Solving Large Dense Linear Systems with Covariance Matrices – p. 16/30 SIAM LA, June 19, 2012
Preconditioning Same Mat´ ern kernel as in page 14. Left: original. Right: after Laplacian. 2 10 1 10 0 10 −1 10 10 7 −2 10 10 6 −3 10 10 5 −4 10 10 4 −200 0 200 400 600 800 1000 1200 −200 0 200 400 600 800 1000 1200 Solving Large Dense Linear Systems with Covariance Matrices – p. 17/30 SIAM LA, June 19, 2012
Recommend
More recommend