Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019
Recap: kernel PCA The projection of a feature vector ๐ฒ onto the ๐ -th principal ๐ ๐๐ ๐(๐ฒ, ๐ฒ ๐ ) ๐=1 โ ๐ ๐ ๐ (๐ฆ) = component in the implicit space of the ๐(๐ฒ) is 1/20 ๐ ๐ perform kernel ๐(๐ฒ, ๐ณ) , form the Gram matrix ๐ = (๐(๐ฒ ๐ , ๐ฒ Given a set of ๐ -dimensional feature vectors ๐ฒ 1 , โฆ , ๐ฒ ๐ and a ๐ )) ๐๐ and โถ Solve the eigenvalue problem ๐๐ ๐ = ๐ ๐ ๐๐ ๐ for ๐ ๐ and ๐ ๐ โถ Scale ๐ ๐ such that ๐ ๐๐ ๐ = 1
Centring and kernel PCA โ ๐=1 ๐ ๐ โ ๐=1 ๐ 2 ๐ ๐=1 ๐ ๐ โ ๐=1 ๐ฟ ๐๐ 1 ๐ ๐๐ ๐ , centring in the implicit space is equivalent to transforming ๐ as ๐ โ 2/20 ๐ feature vectors ๐(๐ฒ ๐ ) were centred. What if they are not? Centring in the implicit space leads to ๐ ๐ โ ๐=1 ๐ )) ๐(๐ฒ ๐ ๐ โ ๐=1 ๐(๐ฒ โถ The derivation assumed that the implicitly defined โถ In the derivation we look at scalar products ๐(๐ฒ ๐ ) ๐ ๐(๐ฒ ๐ ) . (๐(๐ฒ ๐ ) โ 1 (๐(๐ฒ ๐ ) โ 1 ๐ )) = ๐ฟ ๐๐ โ 1 ๐ฟ ๐๐ โ 1 ๐ฟ ๐๐ + 1 โถ Using the centring matrix ๐ = ๐ ๐ โ ๐ โฒ = ๐๐๐ โถ Algorithm is the same, apart from using ๐ โฒ instead of ๐ .
Dimension reduction while preserving distances
Preserving distance Like in cartography, the goal of dimension reduction can be subject to different sub-criteria, e.g. PCA preserves the directions of largest variance. dimension? 3/20 What if we want to preserve the distance while reducing the For given vectors ๐ฒ 1 , โฆ , ๐ฒ ๐ โ โ ๐ we want to find ๐ณ 1 , โฆ , ๐ณ ๐ โ โ ๐ where ๐ < ๐ such that โ๐ฒ ๐ โ ๐ฒ ๐ โ 2 โ โ๐ณ ๐ โ ๐ณ ๐ โ 2
Distance matrices and the linear kernel โฏ ๐ ๐๐ ๐ 1 โ1 and (with element-wise exponentiation) ๐ ๐ฒ ๐ norm. Note that = ๐ โ Given a data matrix ๐ โ โ ๐ร๐ , note that โ โ ๐ ๐ฒ ๐ ๐ฒ ๐ โ ๐ ๐ฒ 1 ๐ฒ ๐ โ โ โ โ ๐ฒ ๐ 4/20 โฎ ๐ฒ ๐ โฎ โฏ 1 ๐ฒ 1 1 ๐ฒ ๐ ๐๐ ๐ = which is also the Gram matrix ๐ of the linear kernel . Let ๐ = (โ๐ฒ ๐ โ ๐ฒ ๐ โ 2 ) ๐๐ be the distance matrix in the Euclidean โ๐ฒ ๐ โ ๐ฒ ๐ โ 2 2 = ๐ฒ ๐ ๐ ๐ฒ ๐ โ 2๐ฒ ๐ ๐ ๐ฒ ๐ + ๐ฒ ๐ 2๐ 2 = ๐๐ ๐ โ 1 2๐ diag (๐๐ ๐ ) โ 1 2 diag (๐๐ ๐ )๐ ๐ . Through calculation it can be shown that with ๐ = ๐ ๐ โ ๐ = ๐ (โ1 2๐ 2 ) ๐
Finding an exact embedding ๐ = rank (๐) โค rank (๐) โค min (๐, ๐) dimensions. 1. Perform PCA on ๐ = ๐๐ณ๐ ๐ 2. If ๐ = rank (๐) , set 3. The rows of ๐ are the sought-after embedding, i.e. for reduction, i.e. ๐ = ๐ possible. However, usually the internal structure of the data is lower-dimensional and ๐ < ๐ . 5/20 โถ Can be shown that if ๐ is positive semi-definite then there exists an exact embedding in ๐ = (โ๐ 1 ๐ฏ 1 , โฆ , โ๐ ๐ ๐ฏ ๐ ) โ โ ๐ร๐ ๐ณ ๐ = ๐ ๐โ it holds that โ๐ฒ ๐ โ ๐ฒ ๐ โ 2 = โ๐ณ ๐ โ ๐ณ ๐ โ 2 โถ Note: This is not guaranteed to lead to dimension
Multi-dimensional scaling and minimizes the so-called stress or strain ๐โ ๐ ๐ โ 2 ) 2 ) 1/2 MDS . 6/20 โถ Keeping only the first ๐ < ๐ components of ๐ณ ๐ is known as classical scaling or multi-dimensional scaling (MDS) ๐(๐, ๐) = (โ (๐ธ ๐๐ โ โ๐ณ ๐ โ ๐ณ โถ Results also hold for general distance matrices ๐ as long as ๐ 1 , โฆ , ๐ ๐ > 0 for ๐ = rank (๐) . This is called metric
Lower-dimensional data in a high-dimensional space
Recommend
More recommend