lecture 13 even more dimension reduction techniques
play

Lecture 13: Even more dimension reduction techniques Felix Held, - PowerPoint PPT Presentation

Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019 Recap: kernel PCA The projection of a feature vector onto the -th principal


  1. Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019

  2. Recap: kernel PCA The projection of a feature vector 𝐲 onto the 𝑗 -th principal 𝑏 𝑗𝑚 𝑙(𝐲, 𝐲 𝑚 ) 𝑚=1 ∑ 𝑜 𝜃 𝑗 (𝑦) = component in the implicit space of the 𝝔(𝐲) is 1/20 𝐛 𝑈 perform kernel 𝑙(𝐲, 𝐳) , form the Gram matrix 𝐋 = (𝑙(𝐲 𝑗 , 𝐲 Given a set of 𝑛 -dimensional feature vectors 𝐲 1 , … , 𝐲 𝑜 and a 𝑘 )) 𝑗𝑘 and ▶ Solve the eigenvalue problem 𝐋𝐛 𝑗 = 𝜇 𝑗 𝑜𝐛 𝑗 for 𝜇 𝑗 and 𝐛 𝑗 ▶ Scale 𝐛 𝑗 such that 𝑗 𝐋𝐛 𝑗 = 1

  3. Centring and kernel PCA ∑ 𝑘=1 𝑜 𝑜 ∑ 𝑘=1 𝑜 2 𝑜 𝑘=1 𝑜 𝑜 ∑ 𝑛=1 𝐿 𝑘𝑛 1 𝑜 𝟐𝟐 𝑈 , centring in the implicit space is equivalent to transforming 𝐋 as 𝑜 ∑ 2/20 𝑈 feature vectors 𝝔(𝐲 𝑚 ) were centred. What if they are not? Centring in the implicit space leads to 𝑜 𝑜 ∑ 𝑘=1 𝑘 )) 𝝔(𝐲 𝑜 𝑜 ∑ 𝑘=1 𝝔(𝐲 ▶ The derivation assumed that the implicitly defined ▶ In the derivation we look at scalar products 𝝔(𝐲 𝑗 ) 𝑈 𝝔(𝐲 𝑚 ) . (𝝔(𝐲 𝑗 ) − 1 (𝝔(𝐲 𝑚 ) − 1 𝑘 )) = 𝐿 𝑗𝑚 − 1 𝐿 𝑘𝑗 − 1 𝐿 𝑘𝑚 + 1 ▶ Using the centring matrix 𝐊 = 𝐉 𝑜 − 𝐋 ′ = 𝐊𝐋𝐊 ▶ Algorithm is the same, apart from using 𝐋 ′ instead of 𝐋 .

  4. Dimension reduction while preserving distances

  5. Preserving distance Like in cartography, the goal of dimension reduction can be subject to different sub-criteria, e.g. PCA preserves the directions of largest variance. dimension? 3/20 What if we want to preserve the distance while reducing the For given vectors 𝐲 1 , … , 𝐲 𝑜 ∈ ℝ 𝑞 we want to find 𝐳 1 , … , 𝐳 𝑜 ∈ ℝ 𝑛 where 𝑛 < 𝑞 such that ‖𝐲 𝑗 − 𝐲 𝑚 ‖ 2 ≈ ‖𝐳 𝑗 − 𝐳 𝑚 ‖ 2

  6. Distance matrices and the linear kernel ⋯ 𝑜 𝟐𝟐 𝑈 1 −1 and (with element-wise exponentiation) 𝑛 𝐲 𝑛 norm. Note that = 𝐋 ⎠ Given a data matrix 𝐘 ∈ ℝ 𝑜×𝑞 , note that ⎟ ⎞ 𝑜 𝐲 𝑜 𝐲 𝑈 ⎟ 𝑜 𝐲 1 𝐲 𝑈 ⎛ ⎜ ⎜ ⎝ 𝐲 𝑈 4/20 ⋮ 𝐲 𝑈 ⋮ ⋯ 1 𝐲 1 1 𝐲 𝑜 𝐘𝐘 𝑈 = which is also the Gram matrix 𝐋 of the linear kernel . Let 𝐄 = (‖𝐲 𝑚 − 𝐲 𝑛 ‖ 2 ) 𝑚𝑛 be the distance matrix in the Euclidean ‖𝐲 𝑚 − 𝐲 𝑛 ‖ 2 2 = 𝐲 𝑈 𝑚 𝐲 𝑚 − 2𝐲 𝑈 𝑚 𝐲 𝑛 + 𝐲 𝑈 2𝐄 2 = 𝐘𝐘 𝑈 − 1 2𝟐 diag (𝐘𝐘 𝑈 ) − 1 2 diag (𝐘𝐘 𝑈 )𝟐 𝑈 . Through calculation it can be shown that with 𝐊 = 𝐉 𝑜 − 𝐋 = 𝐊 (−1 2𝐄 2 ) 𝐊

  7. Finding an exact embedding 𝑛 = rank (𝐋) ≤ rank (𝐘) ≤ min (𝑜, 𝑞) dimensions. 1. Perform PCA on 𝐋 = 𝐕𝚳𝐕 𝑈 2. If 𝑛 = rank (𝐋) , set 3. The rows of 𝐙 are the sought-after embedding, i.e. for reduction, i.e. 𝑛 = 𝑞 possible. However, usually the internal structure of the data is lower-dimensional and 𝑛 < 𝑞 . 5/20 ▶ Can be shown that if 𝐋 is positive semi-definite then there exists an exact embedding in 𝐙 = (√𝜇 1 𝐯 1 , … , √𝜇 𝑞 𝐯 𝑛 ) ∈ ℝ 𝑜×𝑛 𝐳 𝑚 = 𝐙 𝑚⋅ it holds that ‖𝐲 𝑗 − 𝐲 𝑚 ‖ 2 = ‖𝐳 𝑗 − 𝐳 𝑚 ‖ 2 ▶ Note: This is not guaranteed to lead to dimension

  8. Multi-dimensional scaling and minimizes the so-called stress or strain 𝑗≠𝑘 𝑘 ‖ 2 ) 2 ) 1/2 MDS . 6/20 ▶ Keeping only the first 𝑟 < 𝑛 components of 𝐳 𝑚 is known as classical scaling or multi-dimensional scaling (MDS) 𝑒(𝐄, 𝐙) = (∑ (𝐸 𝑗𝑘 − ‖𝐳 𝑗 − 𝐳 ▶ Results also hold for general distance matrices 𝐄 as long as 𝜇 1 , … , 𝜇 𝑛 > 0 for 𝑛 = rank (𝐋) . This is called metric

  9. Lower-dimensional data in a high-dimensional space

Recommend


More recommend