lecture 13 even more dimension reduction techniques

Lecture 13: Even more dimension reduction techniques Felix Held, - PowerPoint PPT Presentation

Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019 Recap: kernel PCA The projection of a feature vector onto the -th principal


  1. Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019

  2. Recap: kernel PCA The projection of a feature vector ๐ฒ onto the ๐‘— -th principal ๐‘ ๐‘—๐‘š ๐‘™(๐ฒ, ๐ฒ ๐‘š ) ๐‘š=1 โˆ‘ ๐‘œ ๐œƒ ๐‘— (๐‘ฆ) = component in the implicit space of the ๐”(๐ฒ) is 1/20 ๐› ๐‘ˆ perform kernel ๐‘™(๐ฒ, ๐ณ) , form the Gram matrix ๐‹ = (๐‘™(๐ฒ ๐‘— , ๐ฒ Given a set of ๐‘› -dimensional feature vectors ๐ฒ 1 , โ€ฆ , ๐ฒ ๐‘œ and a ๐‘˜ )) ๐‘—๐‘˜ and โ–ถ Solve the eigenvalue problem ๐‹๐› ๐‘— = ๐œ‡ ๐‘— ๐‘œ๐› ๐‘— for ๐œ‡ ๐‘— and ๐› ๐‘— โ–ถ Scale ๐› ๐‘— such that ๐‘— ๐‹๐› ๐‘— = 1

  3. Centring and kernel PCA โˆ‘ ๐‘˜=1 ๐‘œ ๐‘œ โˆ‘ ๐‘˜=1 ๐‘œ 2 ๐‘œ ๐‘˜=1 ๐‘œ ๐‘œ โˆ‘ ๐‘›=1 ๐ฟ ๐‘˜๐‘› 1 ๐‘œ ๐Ÿ๐Ÿ ๐‘ˆ , centring in the implicit space is equivalent to transforming ๐‹ as ๐‘œ โˆ‘ 2/20 ๐‘ˆ feature vectors ๐”(๐ฒ ๐‘š ) were centred. What if they are not? Centring in the implicit space leads to ๐‘œ ๐‘œ โˆ‘ ๐‘˜=1 ๐‘˜ )) ๐”(๐ฒ ๐‘œ ๐‘œ โˆ‘ ๐‘˜=1 ๐”(๐ฒ โ–ถ The derivation assumed that the implicitly defined โ–ถ In the derivation we look at scalar products ๐”(๐ฒ ๐‘— ) ๐‘ˆ ๐”(๐ฒ ๐‘š ) . (๐”(๐ฒ ๐‘— ) โˆ’ 1 (๐”(๐ฒ ๐‘š ) โˆ’ 1 ๐‘˜ )) = ๐ฟ ๐‘—๐‘š โˆ’ 1 ๐ฟ ๐‘˜๐‘— โˆ’ 1 ๐ฟ ๐‘˜๐‘š + 1 โ–ถ Using the centring matrix ๐Š = ๐‰ ๐‘œ โˆ’ ๐‹ โ€ฒ = ๐Š๐‹๐Š โ–ถ Algorithm is the same, apart from using ๐‹ โ€ฒ instead of ๐‹ .

  4. Dimension reduction while preserving distances

  5. Preserving distance Like in cartography, the goal of dimension reduction can be subject to different sub-criteria, e.g. PCA preserves the directions of largest variance. dimension? 3/20 What if we want to preserve the distance while reducing the For given vectors ๐ฒ 1 , โ€ฆ , ๐ฒ ๐‘œ โˆˆ โ„ ๐‘ž we want to find ๐ณ 1 , โ€ฆ , ๐ณ ๐‘œ โˆˆ โ„ ๐‘› where ๐‘› < ๐‘ž such that โ€–๐ฒ ๐‘— โˆ’ ๐ฒ ๐‘š โ€– 2 โ‰ˆ โ€–๐ณ ๐‘— โˆ’ ๐ณ ๐‘š โ€– 2

  6. Distance matrices and the linear kernel โ‹ฏ ๐‘œ ๐Ÿ๐Ÿ ๐‘ˆ 1 โˆ’1 and (with element-wise exponentiation) ๐‘› ๐ฒ ๐‘› norm. Note that = ๐‹ โŽ  Given a data matrix ๐˜ โˆˆ โ„ ๐‘œร—๐‘ž , note that โŽŸ โŽž ๐‘œ ๐ฒ ๐‘œ ๐ฒ ๐‘ˆ โŽŸ ๐‘œ ๐ฒ 1 ๐ฒ ๐‘ˆ โŽ› โŽœ โŽœ โŽ ๐ฒ ๐‘ˆ 4/20 โ‹ฎ ๐ฒ ๐‘ˆ โ‹ฎ โ‹ฏ 1 ๐ฒ 1 1 ๐ฒ ๐‘œ ๐˜๐˜ ๐‘ˆ = which is also the Gram matrix ๐‹ of the linear kernel . Let ๐„ = (โ€–๐ฒ ๐‘š โˆ’ ๐ฒ ๐‘› โ€– 2 ) ๐‘š๐‘› be the distance matrix in the Euclidean โ€–๐ฒ ๐‘š โˆ’ ๐ฒ ๐‘› โ€– 2 2 = ๐ฒ ๐‘ˆ ๐‘š ๐ฒ ๐‘š โˆ’ 2๐ฒ ๐‘ˆ ๐‘š ๐ฒ ๐‘› + ๐ฒ ๐‘ˆ 2๐„ 2 = ๐˜๐˜ ๐‘ˆ โˆ’ 1 2๐Ÿ diag (๐˜๐˜ ๐‘ˆ ) โˆ’ 1 2 diag (๐˜๐˜ ๐‘ˆ )๐Ÿ ๐‘ˆ . Through calculation it can be shown that with ๐Š = ๐‰ ๐‘œ โˆ’ ๐‹ = ๐Š (โˆ’1 2๐„ 2 ) ๐Š

  7. Finding an exact embedding ๐‘› = rank (๐‹) โ‰ค rank (๐˜) โ‰ค min (๐‘œ, ๐‘ž) dimensions. 1. Perform PCA on ๐‹ = ๐•๐šณ๐• ๐‘ˆ 2. If ๐‘› = rank (๐‹) , set 3. The rows of ๐™ are the sought-after embedding, i.e. for reduction, i.e. ๐‘› = ๐‘ž possible. However, usually the internal structure of the data is lower-dimensional and ๐‘› < ๐‘ž . 5/20 โ–ถ Can be shown that if ๐‹ is positive semi-definite then there exists an exact embedding in ๐™ = (โˆš๐œ‡ 1 ๐ฏ 1 , โ€ฆ , โˆš๐œ‡ ๐‘ž ๐ฏ ๐‘› ) โˆˆ โ„ ๐‘œร—๐‘› ๐ณ ๐‘š = ๐™ ๐‘šโ‹… it holds that โ€–๐ฒ ๐‘— โˆ’ ๐ฒ ๐‘š โ€– 2 = โ€–๐ณ ๐‘— โˆ’ ๐ณ ๐‘š โ€– 2 โ–ถ Note: This is not guaranteed to lead to dimension

  8. Multi-dimensional scaling and minimizes the so-called stress or strain ๐‘—โ‰ ๐‘˜ ๐‘˜ โ€– 2 ) 2 ) 1/2 MDS . 6/20 โ–ถ Keeping only the first ๐‘Ÿ < ๐‘› components of ๐ณ ๐‘š is known as classical scaling or multi-dimensional scaling (MDS) ๐‘’(๐„, ๐™) = (โˆ‘ (๐ธ ๐‘—๐‘˜ โˆ’ โ€–๐ณ ๐‘— โˆ’ ๐ณ โ–ถ Results also hold for general distance matrices ๐„ as long as ๐œ‡ 1 , โ€ฆ , ๐œ‡ ๐‘› > 0 for ๐‘› = rank (๐‹) . This is called metric

  9. Lower-dimensional data in a high-dimensional space

Recommend


More recommend