Lecture 12 Gaussian Process Models 10/16/2018 1
Multivariate Normal
Multivariate Normal Distribution ⋱ ⎞ ⎟ ⎠ ⎜ ⎝ 𝜍 11 𝜏 1 𝜏 1 ⋯ 𝜍 1𝑜 𝜏 1 𝜏 𝑜 ⋮ ⋮ ⋮ 𝜍 𝑜1 𝜏 𝑜 𝜏 1 ⋯ 𝜍 𝑜𝑜 𝜏 𝑜 𝜏 𝑜 ⎞ ⎟ ⎠ ⎞ ⎟ ⎠ 𝜈 𝑜 𝜈 1 For an 𝑜 -dimension multivate normal distribution with covariance 𝚻 ⎝ (positive semidefinite) can be written as 𝐙 𝑜×1 , 𝚻 ⎛ ⎜ ⎝ 𝑍 1 ⋮ 𝑍 𝑜 ⎞ ⎟ ⎠ ⎜ ⎝ ⎛ ⎜ 2 𝑜×1 ∼ 𝑂( 𝝂 𝑜×𝑜 ) where {𝚻} 𝑗𝑘 = 𝜏 2 𝑗𝑘 = 𝜍 𝑗𝑘 𝜏 𝑗 𝜏 𝑘 ∼ 𝑂 ⎛ , ⎛
Density For the 𝑜 dimensional multivate normal given on the last slide, its density is 𝑜×1 𝑜×𝑜 (𝐙 − 𝝂) 𝚻 −1 1×𝑜 2(𝐙 − 𝝂) ′ −𝑜 and its log density is given by ) 𝑜×1 𝑜×𝑜 (𝐙 − 𝝂) 𝚻 −1 1×𝑜 2(𝐙 − 𝝂) ′ given by 3 (2𝜌) −𝑜/2 det (𝚻) −1/2 exp (−1 2 log 2𝜌 − 1 2 log det (𝚻) − −1
Sampling To generate draws from an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance matrix 𝚻 , • Find a matrix 𝐁 such that 𝚻 = 𝐁 𝐁 𝑢 , most often we use 𝐁 = Chol (𝚻) where 𝐁 is a lower triangular matrix. • Draw 𝑜 iid unit normals ( 𝒪(0, 1) ) as 𝐴 • Obtain multivariate normal draws using 𝐙 = 𝝂 + 𝐁 𝐴 4
Sampling To generate draws from an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance matrix 𝚻 , • Find a matrix 𝐁 such that 𝚻 = 𝐁 𝐁 𝑢 , most often we use 𝐁 = Chol (𝚻) where 𝐁 is a lower triangular matrix. • Draw 𝑜 iid unit normals ( 𝒪(0, 1) ) as 𝐴 • Obtain multivariate normal draws using 𝐙 = 𝝂 + 𝐁 𝐴 4
Sampling To generate draws from an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance matrix 𝚻 , • Find a matrix 𝐁 such that 𝚻 = 𝐁 𝐁 𝑢 , most often we use 𝐁 = Chol (𝚻) where 𝐁 is a lower triangular matrix. • Draw 𝑜 iid unit normals ( 𝒪(0, 1) ) as 𝐴 • Obtain multivariate normal draws using 𝐙 = 𝝂 + 𝐁 𝐴 4
Sampling To generate draws from an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance matrix 𝚻 , • Find a matrix 𝐁 such that 𝚻 = 𝐁 𝐁 𝑢 , most often we use 𝐁 = Chol (𝚻) where 𝐁 is a lower triangular matrix. • Draw 𝑜 iid unit normals ( 𝒪(0, 1) ) as 𝐴 • Obtain multivariate normal draws using 𝐙 = 𝝂 + 𝐁 𝐴 4
Bivariate Example 𝜍 𝝂 = (0 1) 5 𝜍 𝚻 = (1 0) rho=0.9 rho=0.7 rho=0.5 rho=0.1 2 0 −2 −4 y rho=−0.9 rho=−0.7 rho=−0.5 rho=−0.1 2 0 −2 −4 −2.5 0.0 2.5 −2.5 0.0 2.5 −2.5 0.0 2.5 −2.5 0.0 2.5 x
𝑧 𝑗 = 𝒪(𝝂 𝑗 , 𝚻 𝑗𝑗 ) 𝐳 𝑗𝑘 = 𝒪 ((𝝂 𝑗 𝐳 𝑗,⋯,𝑙 = 𝒪 ⎛ , ⎛ Marginal distributions ⋱ ⎜ ⎝ 𝚻 𝑗𝑗 ⋯ 𝚻 𝑗𝑙 ⋮ 𝚻 𝑙𝑗 ⋮ ⎟ ⋯ 𝚻 𝑙𝑙 ⎞ ⎟ ⎠ ⎞ ⎟ ⎠ ⎠ 𝝂 𝑙 ⎞ 𝚻 𝑘𝑘 matrix 𝚻 , any marginal or conditional distribution of the 𝑧 ’s will also be (multivariate) normal. For a univariate marginal distribution, For a bivariate marginal distribution, 𝝂 𝑘 ) , (𝚻 𝑗𝑗 𝚻 𝑗𝑘 𝚻 𝑘𝑗 )) Proposition - For an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance For a 𝑙 -dimensional marginal distribution, ⎜ ⎝ ⎛ ⎜ ⎝ 𝝂 𝑗 ⋮ 6
𝐳 𝑗𝑘 = 𝒪 ((𝝂 𝑗 𝐳 𝑗,⋯,𝑙 = 𝒪 ⎛ , ⎛ Marginal distributions ⋱ ⎜ ⎝ 𝚻 𝑗𝑗 ⋯ 𝚻 𝑗𝑙 ⋮ 𝚻 𝑙𝑗 ⋮ ⎟ ⋯ 𝚻 𝑙𝑙 ⎞ ⎟ ⎠ ⎞ ⎟ ⎠ ⎠ 𝝂 𝑙 ⎞ Proposition - For an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance matrix 𝚻 , any marginal or conditional distribution of the 𝑧 ’s will also be (multivariate) normal. For a univariate marginal distribution, For a bivariate marginal distribution, 𝝂 𝑘 ) , (𝚻 𝑗𝑗 𝚻 𝑗𝑘 𝚻 𝑘𝑗 𝚻 𝑘𝑘 )) For a 𝑙 -dimensional marginal distribution, ⎜ ⎝ ⎛ ⎜ ⎝ 𝝂 𝑗 ⋮ 6 𝑧 𝑗 = 𝒪(𝝂 𝑗 , 𝚻 𝑗𝑗 )
𝐳 𝑗,⋯,𝑙 = 𝒪 ⎛ , ⎛ Marginal distributions ⋱ ⎠ ⎜ ⎝ 𝚻 𝑗𝑗 ⋯ 𝚻 𝑗𝑙 ⋮ ⋮ ⎞ 𝚻 𝑙𝑗 ⋯ 𝚻 𝑙𝑙 ⎞ ⎟ ⎠ ⎞ ⎟ ⎠ ⎟ 𝝂 𝑙 Proposition - For an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance ⋮ matrix 𝚻 , any marginal or conditional distribution of the 𝑧 ’s will also be (multivariate) normal. For a univariate marginal distribution, For a bivariate marginal distribution, 𝝂 𝑘 ) , (𝚻 𝑗𝑗 𝚻 𝑗𝑘 𝚻 𝑘𝑗 𝚻 𝑘𝑘 )) For a 𝑙 -dimensional marginal distribution, ⎜ ⎝ ⎛ ⎜ ⎝ 𝝂 𝑗 6 𝑧 𝑗 = 𝒪(𝝂 𝑗 , 𝚻 𝑗𝑗 ) 𝐳 𝑗𝑘 = 𝒪 ((𝝂 𝑗
Marginal distributions ⋱ ⎟ ⎠ ⎜ ⎝ 𝚻 𝑗𝑗 ⋯ 𝚻 𝑗𝑙 ⋮ ⋮ Proposition - For an 𝑜 -dimensional multivate normal with mean 𝝂 and covariance 𝚻 𝑙𝑗 ⋯ 𝚻 𝑙𝑙 ⎞ ⎟ ⎠ ⎞ ⎟ ⎠ ⎞ 𝝂 𝑙 ⋮ 𝝂 𝑗 matrix 𝚻 , any marginal or conditional distribution of the 𝑧 ’s will also be (multivariate) normal. For a univariate marginal distribution, For a bivariate marginal distribution, 𝝂 𝑘 ) , (𝚻 𝑗𝑗 𝚻 𝑗𝑘 𝚻 𝑘𝑗 𝚻 𝑘𝑘 )) For a 𝑙 -dimensional marginal distribution, ⎜ ⎝ ⎛ ⎜ ⎝ 6 𝑧 𝑗 = 𝒪(𝝂 𝑗 , 𝚻 𝑗𝑗 ) 𝐳 𝑗𝑘 = 𝒪 ((𝝂 𝑗 𝐳 𝑗,⋯,𝑙 = 𝒪 ⎛ , ⎛
𝐙 𝟐 | 𝐙 2 = 𝐛 ∼ 𝒪(𝝂 𝟐 + 𝚻 𝟐𝟑 𝚻 −1 𝟑𝟑 (𝐛 − 𝝂 𝟑 ), 𝚻 𝟐𝟐 − 𝚻 𝟐𝟑 𝚻 −1 𝟑𝟑 𝚻 𝟑𝟐 ) 𝐙 𝟑 | 𝐙 1 = 𝐜 ∼ 𝒪(𝝂 𝟑 + 𝚻 𝟑𝟐 𝚻 −1 𝟐𝟐 (𝐜 − 𝝂 𝟐 ), 𝚻 𝟑𝟑 − 𝚻 𝟑𝟐 𝚻 −1 𝟐𝟐 𝚻 𝟑𝟐 ) Conditional Distributions 𝑜−𝑙×1 , 𝚻 11 𝑙×𝑙 ) 𝐙 2 𝚻 22 ∼ 𝒪( 𝝂 2 𝑜−𝑙×1 , ∼ 𝒪( 𝝂 1 𝑜−𝑙×𝑜−𝑙 ) then the conditional distributions are given by 𝑙×1 𝑙×1 If we partition the 𝑜 -dimensions into two pieces such that 𝐙 1 𝐙 ⎜ ⎜ ⎝ (𝝂 1 𝝂 2 ) 𝑜×1 , (𝚻 11 𝚻 12 𝚻 21 𝚻 22 ) 𝑜×𝑜 ⎞ ⎟ ⎟ ⎠ 7 𝐙 = (𝐙 1 , 𝐙 2 ) 𝑢 then 𝑜×1 ∼ 𝒪 ⎛
Conditional Distributions 𝐙 2 If we partition the 𝑜 -dimensions into two pieces such that ∼ 𝒪( 𝝂 1 𝑙×1 , 𝚻 11 𝑙×𝑙 ) 𝑜−𝑙×1 ⎠ ∼ 𝒪( 𝝂 2 𝑜−𝑙×1 , 𝚻 22 𝑜−𝑙×𝑜−𝑙 ) then the conditional distributions are given by 𝐙 1 𝑙×1 ⎟ ) 𝐙 ⎜ ⎜ ⎟ (𝝂 1 𝝂 2 ⎝ 𝑜×1 ) , (𝚻 11 𝑜×𝑜 ⎞ 𝚻 22 𝚻 21 𝚻 12 7 𝐙 = (𝐙 1 , 𝐙 2 ) 𝑢 then 𝑜×1 ∼ 𝒪 ⎛ 𝐙 𝟐 | 𝐙 2 = 𝐛 ∼ 𝒪(𝝂 𝟐 + 𝚻 𝟐𝟑 𝚻 −1 𝟑𝟑 (𝐛 − 𝝂 𝟑 ), 𝚻 𝟐𝟐 − 𝚻 𝟐𝟑 𝚻 −1 𝟑𝟑 𝚻 𝟑𝟐 ) 𝐙 𝟑 | 𝐙 1 = 𝐜 ∼ 𝒪(𝝂 𝟑 + 𝚻 𝟑𝟐 𝚻 −1 𝟐𝟐 (𝐜 − 𝝂 𝟐 ), 𝚻 𝟑𝟑 − 𝚻 𝟑𝟐 𝚻 −1 𝟐𝟐 𝚻 𝟑𝟐 )
Gaussian Processes From Shumway, A process, 𝐙 = {𝑍 (𝑢) ∶ 𝑢 ∈ 𝑈} , is said to be a Gaussian process if all possible finite dimensional vectors 𝐳 = (𝑧 𝑢 1 , 𝑧 𝑢 2 , ..., 𝑧 𝑢 𝑜 ) 𝑢 , for every collection of time points 𝑢 1 , 𝑢 2 , … , 𝑢 𝑜 , and every positive integer 𝑜 , have a multivariate normal distribution. So far we have only looked at examples of time series where 𝑈 is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where 𝑈 is defined on a continuous space (e.g. or some subset of ). 8
Gaussian Processes From Shumway, A process, 𝐙 = {𝑍 (𝑢) ∶ 𝑢 ∈ 𝑈} , is said to be a Gaussian process if all possible finite dimensional vectors 𝐳 = (𝑧 𝑢 1 , 𝑧 𝑢 2 , ..., 𝑧 𝑢 𝑜 ) 𝑢 , for every collection of time points 𝑢 1 , 𝑢 2 , … , 𝑢 𝑜 , and every positive integer 𝑜 , have a multivariate normal distribution. So far we have only looked at examples of time series where 𝑈 is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where 𝑈 is defined on a continuous 8 space (e.g. R or some subset of R ).
Gaussian Process Regression
Parameterizing a Gaussian Process Imagine we have a Gaussian process defined such that 𝐙 = {𝑍 (𝑢) ∶ 𝑢 ∈ [0, 1]} , • We now have an uncountably infinite set of possible 𝑢 ’s and 𝑍 (𝑢) s. • We will only have a (small) finite number of observations 𝑍 (𝑢 1 ), … , 𝑍 (𝑢 𝑜 ) with which to say something useful about this infinite dimensional process. • The unconstrained covariance matrix for the observed data can have up to 𝑜(𝑜 + 1)/2 unique values ∗ • Necessary to make some simplifying assumptions: • Stationarity • Simple parameterization of Σ 9
Recommend
More recommend