Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019 Siamak Ravanbakhsh
Learning objectives Learning objectives multivariate Gaussian density: different parametrizations marginalization and conditioning expression as Markov & Bayesian networks
Univariate Gaussian density Univariate Gaussian density ( x − μ )2 − 1 p ( x ; μ , σ ) = e 2 σ 2 2 πσ 2 motivated by central limit theorem max-entropy dist. with a fixed variance 2 μ ∈ ℜ, σ > 0 E [ X ] = μ , E [ X ] − 2 E [ X ] = 2 σ 2
Multivariate Gaussian Multivariate Gaussian x ∈ ℜ n is a column vector (convention) 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − T ( ) 2 ∣2 π Σ∣ ( x − μ )2 − compre to 1 p ( x ; μ , σ ) = e 2 σ 2 2 πσ 2 1 − n − (2 π ) ∣Σ∣ 2 2
Multivariate Gaussian: Multivariate Gaussian: sufficient statistics sufficient statistics 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ μ = E [ X ] Σ = V ar ( X ) i , i i Σ = E [ XX ] − E [ X ] E [ X ] the covariance matrix T T Σ = Cov ( X , X ) i , j i j n × n n × n only captures these two statistics
Multivariate Gaussian: Multivariate Gaussian: covariance matrix covariance matrix since ( y E [( X − E [ X ])( X − E [ X ]) ] y ) = 2 y Σ y = T T T a > 0 move this expectation out
Multivariate Gaussian: Multivariate Gaussian: covariance matrix covariance matrix since ( y E [( X − E [ X ])( X − E [ X ]) ] y ) = 2 y Σ y = T T T a > 0 move this expectation out is symmetric positive definite (PD) y Σ y > ∀ y ; ∥ y ∥ > 0 Σ ≻ 0 T 0 the inverse of a PD matrix is PD the precision matrix −1 Λ = Σ ≻ 0 is diagnoalized by orthonormal matrices
Multivariate Gaussian: Multivariate Gaussian: covariance matrix covariance matrix since ( y E [( X − E [ X ])( X − E [ X ]) ] y ) = 2 y Σ y = T T T a > 0 move this expectation out is symmetric positive definite (PD) y Σ y > ∀ y ; ∥ y ∥ > 0 Σ ≻ 0 T 0 the inverse of a PD matrix is PD the precision matrix −1 Λ = Σ ≻ 0 is diagnoalized by orthonormal matrices Σ = QDQ T diagonal orthogonal rows & columns of unit norm T = Q Q = T QQ I rotation and reflection
Multivariate Gaussian: Multivariate Gaussian: covariance matrix covariance matrix Σ = QDQ T diagonal (scaling) orthogonal rows & columns of unit norm T = Q Q = T QQ I rotation and reflection Scaling along axes in some rotated/reflected coordinate system
Multivariate Gaussian: Multivariate Gaussian: example example 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ T [ 4, 2 [ −.87, −.48 −.48, .87 ] [ 5.1, 0 0, .39 ] [ −.87, −.48 1 ] −.48, .87 ] Σ = ≈ 2, 2 Q T D Q columns of Q are the new bases
Multivariate Gaussian: example Multivariate Gaussian: example 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ T [ 4, 2 [ −.87, −.48 −.48, .87 ] [ 5.1, 0 0, .39 ] [ −.87, −.48 1 ] −.48, .87 ] Σ = ≈ 2, 2 Q T D Q columns of Q are the new bases Alternatively [ cos(208°), sin(208°) approximately sin(208°), − cos(208°) ] θ /2 = 104° reflection of coordinates by the line making an angle
Multivariate Gaussian: Multivariate Gaussian: from univariates from univariates given n univariate Gaussians X ∼ N (0, I )
Multivariate Gaussian: Multivariate Gaussian: from univariates from univariates given n univariate Gaussians X ∼ N (0, I ) scale them by D 1 D X ∼ N (0, D ) ii 2 rotate/reflect using Q 1 QD X ∼ N (0, QDQ ) = T N (0, Σ) 2
Multivariate Gaussian: Multivariate Gaussian: from univariates from univariates given n univariate Gaussians X ∼ N (0, I ) scale them by D 1 D X ∼ N (0, D ) ii 2 rotate/reflect using Q 1 QD X ∼ N (0, QDQ ) = T N (0, Σ) 2 more generally X ∼ N ( μ , Σ) ⇒ A X + b ∼ N ( Aμ + b , A Σ A ) T
parametrization parametrization moment form (mean parametrization) 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣
parametrization parametrization moment form (mean parametrization) 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ −1 η = Σ : local potential μ : precision matrix Λ = Σ −1
parametrization parametrization moment form (mean parametrization) 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ −1 η = Σ : local potential μ : precision matrix Λ = Σ −1 information form (cannonical parametrization) ∣Λ∣ 1 1 p ( x ; η , Λ) = x Λ x + η x − exp − ( T T η Λ η T ) (2 π ) n 2 2
parametrization parametrization moment form (mean parametrization) 1 1 −1 p ( x ; μ , Σ) = ( x − μ ) Σ ( x − μ ) exp − ( T ) 2 ∣2 π Σ∣ −1 η = Σ : local potential −1 μ μ = Λ η : precision matrix Λ = Σ −1 Σ = Λ −1 information form (cannonical parametrization) ∣Λ∣ 1 1 p ( x ; η , Λ) = x Λ x + η x − exp − ( T T η Λ η T ) (2 π ) n 2 2 the relationship between the two types goes beyond Gaussians
Marginalization Marginalization moment form X ∼ N ( μ , Σ) is useful for marginalization: μ = [ μ , μ B T ] A X = [ X , X B T ] A [ Σ , Σ BB ] AA AB Σ = Σ , Σ X ∼ N ( μ , Σ ) BA A m m
Marginalization Marginalization moment form X ∼ N ( μ , Σ) is useful for marginalization: μ = [ μ , μ B T ] A X = [ X , X B T ] A [ Σ , Σ BB ] AA AB Σ = Σ , Σ X ∼ N ( μ , Σ ) BA A m m = μ μ m A Σ = Σ m A
Marginalization Marginalization moment form X ∼ N ( μ , Σ) is useful for marginalization: μ = [ μ , μ B T ] A X = [ X , X B T ] A [ Σ , Σ BB ] AA AB Σ = Σ , Σ X ∼ N ( μ , Σ ) BA A m m = μ μ m A Σ = Σ m A marginalization as a linear transformation: , 0 ] A = [ I AA X ∼ N ( μ , Σ) ⇒ A X ∼ N μ ( , Σ AA ) A
Marginal independencies Marginal independencies : moment form : moment form covariance means dependence & vice versa ⊥ X ∣ ∅ ⇔ Σ = Cov ( X , X ) = 0 X i , j i j i j why? 2 , 0 [ X j ] [ μ j ] [ σ 2 ] 2 2 marginalize to get N ( μ , Σ) i i ∼ N ( i ) = N ( x ; μ , σ ) N ( x ; μ , σ ) 0, σ i i j j i j X μ j
Marginal independencies Marginal independencies : moment form : moment form covariance means dependence & vice versa ⊥ X ∣ ∅ ⇔ Σ = Cov ( X , X ) = 0 X i , j i j i j why? 2 , 0 [ X j ] [ μ j ] [ σ 2 ] 2 2 marginalize to get N ( μ , Σ) i i ∼ N ( i ) = N ( x ; μ , σ ) N ( x ; μ , σ ) 0, σ i i j j i j X μ j Gaussian is special in this sense correlation : normalized covariance Cov ( X , X ) ρ ( X , X ) = i j i j V ar ( X ) V ar ( X ) i j image from wikipedia
Conditional Conditional independencies: independencies: information form information form zeros of the precision matrix mean conditional independence X − { X ⊥ ∣ , X } ⇔ Λ = 0 X X i , j i j i j adjacency matrix in the Markov network ( Gaussian MRF ) Λ = 0 ∣Λ∣ 1 1 p ( x ; η , Λ) = x Λ x + η x − ⎡ Λ ⎤ exp − T η Λ η T ( ) , 0, Λ , 0 (2 π ) n 2 2 11 1,3 ⎢ ⎥ 0, Λ , Λ , 0 ⎢ ⎥ 2,2 2,3 Λ = Λ , Λ , Λ , Λ ⎣ 4,4 ⎦ 3,1 3,2 3,3 3,4 0, 0, Λ , Λ why? 4,3 X X X 3 1 2 X 4
Conditional Conditional independencies: independencies: information form information form zeros of the precision matrix mean conditional independence X − { X ⊥ ∣ , X } ⇔ Λ = 0 X X i , j i j i j adjacency matrix in the Markov network ( Gaussian MRF ) Λ = 0 ∣Λ∣ 1 1 p ( x ; η , Λ) = x Λ x + η x − ⎡ Λ ⎤ exp − T η Λ η T ( ) , 0, Λ , 0 (2 π ) n 2 2 11 1,3 ⎢ ⎥ 0, Λ , Λ , 0 ⎢ ⎥ 2,2 2,3 Λ = Λ , Λ , Λ , Λ ⎣ 4,4 ⎦ 3,1 3,2 3,3 3,4 write it as the product of factors: 0, 0, Λ , Λ why? 4,3 X X X 3 1 2 ( x , x ) = − x Λ corresponding ψ x i , j i , j i j i j potentials 1 2 ( x ) = − Λ + ψ x η x i , i i i i i 2 i X 4
Gaussian MRF Gaussian MRF: : information form information form ⎡ Λ ⎤ , 0, Λ , 0 ∣Λ∣ 1 1 p ( x ; η , Λ) = x Λ x + η x − 11 1,3 exp − T η Λ η T ( ) ⎢ ⎥ 0, Λ , Λ , 0 ⎢ ⎥ (2 π ) n 2 2 2,2 2,3 Λ = Λ , Λ , Λ , Λ ⎣ 4,4 ⎦ 3,1 3,2 3,3 3,4 0, 0, Λ , Λ 4,3 ( X , X ) = − x Λ corresponding ψ x i , j i , j i j i j X X X 3 1 2 potentials 2 ( X ) = −Λ + ψ x η x i , i i i i i i X 4 Λ should be positive definite otherwise the partition function ∞ 1 is not well-defined x Λ x + η x )d x Z = ∫ −∞ exp(− T T 2
Conditioning : information form Conditioning : information form marginalization: easy in the moment form conditioning: easy in the information form X = [ X , X X X B T ] ∣ ∼ N ( η , Λ ) A A ∣ B A ∣ B A B η = [ η , η B T ] Λ = Λ A A ∣ B AA why? [ Λ , Λ BB ] AA AB Λ = X = + Λ η η Λ , Λ A ∣ B A AB B BA X X A B 2 X A 1 X A 3
Recommend
More recommend