Lecture 12 Gaussian Process Models Colin Rundel 02/27/2017 1
Multivariate Normal 2
Multivariate Normal Distribution . . . . ... . . . . . Y n . . . (positive semidefinite) can be written as Y 3 Y 1 For an n -dimension multivate normal distribution with covariance Σ n × 1 ∼ N ( µ n × 1 , Σ n × n ) where { Σ } ij = σ 2 ij = ρ ij σ i σ j µ 1 ρ 11 σ 1 σ 1 · · · ρ 1 n σ 1 σ n ∼ N , µ n ρ n 1 σ n σ 1 · · · ρ nn σ n σ n
Density and its log density is given by 1 2 1 2 n For the n dimensional multivate normal given on the last slide, its density is 4 1 given by ( ) ( 2 π ) − n / 2 det ( Σ ) − 1 / 2 exp 2 ( Y − µ ) ′ Σ − 1 − n × n ( Y − µ ) 1 × n n × 1 − log 2 π − log det ( Σ ) − − 2 ( Y − µ ) ′ Σ − 1 n × n ( Y − µ ) 1 × n n × 1
Sampling • Find a matrix A such that A A t , most often we use A Chol • Draw n iid unit normals ( 0 1 ) as z • Construct multivariate normal draws using Y A z 5 To generate draws from an n -dimensional multivate normal with mean µ and covariance matrix Σ ,
Sampling • Draw n iid unit normals ( 0 1 ) as z • Construct multivariate normal draws using Y A z 5 To generate draws from an n -dimensional multivate normal with mean µ and covariance matrix Σ , • Find a matrix A such that Σ = A A t , most often we use A = Chol ( Σ )
Sampling • Construct multivariate normal draws using Y A z 5 To generate draws from an n -dimensional multivate normal with mean µ and covariance matrix Σ , • Find a matrix A such that Σ = A A t , most often we use A = Chol ( Σ ) • Draw n iid unit normals ( N ( 0 , 1 ) ) as z
Sampling • Construct multivariate normal draws using 5 To generate draws from an n -dimensional multivate normal with mean µ and covariance matrix Σ , • Find a matrix A such that Σ = A A t , most often we use A = Chol ( Σ ) • Draw n iid unit normals ( N ( 0 , 1 ) ) as z Y = µ + A z
Bivariate Example 1 1 6 0 0 ( ) ( ρ ) Σ = µ = ρ rho=0.9 rho=0.7 rho=0.5 rho=0.1 3 2 1 0 −1 −2 −3 y rho=−0.9 rho=−0.7 rho=−0.5 rho=−0.1 3 2 1 0 −1 −2 −3 −2 0 2 −2 0 2 −2 0 2 −2 0 2 x
Marginal distributions . . . j i 1 i 1 i 1 i k . . i 1 ... . . . i k i 1 i k i k . i k 7 y ij For a univariate marginal distribution, y i i ii For a bivariate marginal distribution, i y i 1 j ii ij ji jj For a k -dimensional marginal distribution, Proposition - For an n -dimensional multivate normal with mean µ and covariance matrix Σ , any of the possible marginal distributions will also (multivariate) normal.
Marginal distributions . i k i k i k i 1 . . . ... . . . i 1 i k i 1 i 1 j . . i 1 i k y i 1 For a k -dimensional marginal distribution, jj ji ij ii j i y ij For a bivariate marginal distribution, For a univariate marginal distribution, 7 Proposition - For an n -dimensional multivate normal with mean µ and covariance matrix Σ , any of the possible marginal distributions will also (multivariate) normal. y i = N ( µ i , γ ii )
Marginal distributions . . . j i 1 i 1 i 1 i k . . i 1 ... . . . i k i 1 i k i k . i k 7 y i 1 For a k -dimensional marginal distribution, For a univariate marginal distribution, For a bivariate marginal distribution, Proposition - For an n -dimensional multivate normal with mean µ and covariance matrix Σ , any of the possible marginal distributions will also (multivariate) normal. y i = N ( µ i , γ ii ) (( µ i ) ( γ ii )) γ ij y ij = N , µ j γ ji γ jj
Marginal distributions For a k -dimensional marginal distribution, . . . ... . . . . . . 7 For a univariate marginal distribution, For a bivariate marginal distribution, Proposition - For an n -dimensional multivate normal with mean µ and covariance matrix Σ , any of the possible marginal distributions will also (multivariate) normal. y i = N ( µ i , γ ii ) (( µ i ) ( γ ii )) γ ij y ij = N , µ j γ ji γ jj µ i 1 γ i 1 i 1 · · · γ i 1 i k y i 1 , ··· , i k = N , µ j γ i k i 1 · · · γ i k i k
Conditional Distributions 21 Y 2 a 1 12 1 22 a 2 11 12 1 22 Y 2 then the conditional distributions are given by Y 1 b 2 21 1 11 b 1 22 21 1 11 21 Y 1 8 then Y Y 2 Y 1 If we partition the n -dimensions into two pieces such that Y = ( Y 1 , Y 2 ) t ( ) ( ) Σ 11 Σ 12 µ 1 n × 1 ∼ N , Σ 21 Σ 22 µ 2 n × 1 n × n k × 1 ∼ N ( µ 1 k × k ) , Σ 11 k × 1 n − k × 1 ∼ N ( µ 2 n − k × n − k ) Σ 22 , n − k × 1
Conditional Distributions Y 1 then the conditional distributions are given by Y 2 8 then Y If we partition the n -dimensions into two pieces such that Y = ( Y 1 , Y 2 ) t ( ) ( ) Σ 11 Σ 12 µ 1 n × 1 ∼ N , Σ 21 Σ 22 µ 2 n × 1 n × n k × 1 ∼ N ( µ 1 k × k ) , Σ 11 k × 1 n − k × 1 ∼ N ( µ 2 n − k × n − k ) Σ 22 , n − k × 1 Y 1 | Y 2 = a ∼ N ( µ 1 + Σ 12 Σ − 1 22 ( a − µ 2 ) , Σ 11 − Σ 12 Σ − 1 22 Σ 21 ) Y 2 | Y 1 = b ∼ N ( µ 2 + Σ 21 Σ − 1 11 ( b − µ 1 ) , Σ 22 − Σ 21 Σ − 1 11 Σ 21 )
Gaussian Processes From Shumway, integer n, have a multivariate normal distribution. So far we have only looked at examples of time series where T is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where T is defined on a continuous space (e.g. or some subset of ). 9 A process, Y = { Y t : t ∈ T } , is said to be a Gaussian process if all possible finite dimensional vectors y = ( y t 1 , y t 2 , ..., y t n ) t , for every collection of time points t 1 , t 2 , . . . , t n , and every positive
Gaussian Processes From Shumway, integer n, have a multivariate normal distribution. So far we have only looked at examples of time series where T is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where T is defined on a continuous 9 A process, Y = { Y t : t ∈ T } , is said to be a Gaussian process if all possible finite dimensional vectors y = ( y t 1 , y t 2 , ..., y t n ) t , for every collection of time points t 1 , t 2 , . . . , t n , and every positive space (e.g. R or some subset of R ).
Gaussian Process Regression 10
Parameterizing a Gaussian Process up to n n • Simple parameterization of • Stationarity • Necessary to make some simplifying assumptions: n ) 2 unique values ( p 1 • The unconstrained covariance matrix for the observed data can have Imagine we have a Gaussian process defined such that process. with which to say something useful about this infinite dimension Y n • We will only have a (small) finite number of observations Y 1 • We now have an uncountably infinite set of possible Y t s. 11 Y = { Y t : t ∈ [ 0 , 1 ] } ,
Parameterizing a Gaussian Process up to n n • Simple parameterization of • Stationarity • Necessary to make some simplifying assumptions: n ) 2 unique values ( p 1 • The unconstrained covariance matrix for the observed data can have Imagine we have a Gaussian process defined such that process. with which to say something useful about this infinite dimension Y n • We will only have a (small) finite number of observations Y 1 • We now have an uncountably infinite set of possible Y t s. 11 Y = { Y t : t ∈ [ 0 , 1 ] } ,
Parameterizing a Gaussian Process Imagine we have a Gaussian process defined such that • We now have an uncountably infinite set of possible Y t s. with which to say something useful about this infinite dimension process. • The unconstrained covariance matrix for the observed data can have up to n n 1 2 unique values ( p n ) • Necessary to make some simplifying assumptions: • Stationarity • Simple parameterization of 11 Y = { Y t : t ∈ [ 0 , 1 ] } , • We will only have a (small) finite number of observations Y 1 , . . . , Y n
Parameterizing a Gaussian Process Imagine we have a Gaussian process defined such that • We now have an uncountably infinite set of possible Y t s. with which to say something useful about this infinite dimension process. • The unconstrained covariance matrix for the observed data can have • Necessary to make some simplifying assumptions: • Stationarity • Simple parameterization of 11 Y = { Y t : t ∈ [ 0 , 1 ] } , • We will only have a (small) finite number of observations Y 1 , . . . , Y n up to n ( n + 1 ) / 2 unique values ( p >>> n )
Parameterizing a Gaussian Process Imagine we have a Gaussian process defined such that • We now have an uncountably infinite set of possible Y t s. with which to say something useful about this infinite dimension process. • The unconstrained covariance matrix for the observed data can have • Necessary to make some simplifying assumptions: • Stationarity 11 Y = { Y t : t ∈ [ 0 , 1 ] } , • We will only have a (small) finite number of observations Y 1 , . . . , Y n up to n ( n + 1 ) / 2 unique values ( p >>> n ) • Simple parameterization of Σ
Recommend
More recommend