Dimension Reduction
CSE 6242 / CX 4242
Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan, Prof. Le Song
Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo - - PowerPoint PPT Presentation
Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan, Prof. Le Song What is Dimension Reduction? Data item index (n) low-dim Columns as data data items Dimension index (d) Dimension Reduction
Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan, Prof. Le Song
Data item index (n) Dimension index (d) Columns as data items
Dimension Reduction
3
3 80 24 58 63 45 3 80 24 58 78 45 5 34 78
Raw images Pixel values
5 34 63
Serialized pixels
3 80 24 58 63 45 5 34 78 49 54 78 14 67 36 22 86 15 4
Serialized/rasterized pixel values
4096x2160 image size → 8847360 dimensions 30 fps. Means for 2 mins video, you generate a matrix of size
8847360 x3600
3 80 24 58 63 45
Raw images Pixel values
5 34 63
Serialized pixels
49 54 78 14 15 67 22 86 36
Life Pi movies also
book won
Vocabulary Doc 1 Doc 2
1 1 1 1
1 1 1 1
Data item index (n) Dimension index (d) Columns as data items
We will use this during lecture
7
Dimension Reduction
dimensions (k) Additional info about data Other parameters Dim-reducing transformation for new data : user-specified
Reduced dimension (k)
Dimension index (d)
Computing distances: 100,000-dim vs. 10-dim vectors
Separates the data into General Pattern + Sparse + Noise Is Noise the important signal? Works as pre-processing for better performance e.g., microarray data analysis, information retrieval, face recognition, protein disorder prediction, network intrusion detection, document categorization, speech recognition
9
Feature = Variable = Dimension
Information gain, correlation, …
Forward selection: Empty set → Add one variable at a time Backward elimination: Entire set → Remove one variable at a time
10
12
Of the form aX+b where a, x and b are vectors/matrices e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4 Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4
13
Dimension Reduction D1 D2 X1 1 1 X2 1 X3 2 X4 1 1 D1 D2 Y1 1.75
Y2
0.58
e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4
14
15
Dimension Reduction
dimensions Other parameters Dim-reducing Transformer for a new data Additional info about data
16
Dimension Reduction
dimensions Other parameters Dim-reducing Transformer for a new data Additional info about data
e.g., grouping label
17
Dimension Reduction
dimensions Additional info about data Other parameters Dim-reducing Transformer for a new data
Focuses on preserving large distances
18
Dimension Reduction
dimensions (k) Other parameters Dim-reducing Transformer for a new data Additional info about data
Reduced dimension (k) Dimension index (d)
Dimension Reduction
dimensions Other parameters Dim-reducing Transformer for a new data Additional info about data
(i,j)-th component indicates similarity between i-th and j-th data Assuming distance is a metric, similarity matrix is symmetric
Dimension Reduction
dimensions Other parameters Dim-reducing Transformer for a new data Additional info about data
Similarity matrix(nxn) High-dim data (dxn)
Dimension Reduction
Graph Embedding
Dimension Reduction
Graph Embedding
Principal component analysis (PCA) Multidimensional scaling (MDS) Linear discriminant analysis (LDA)
Isometric feature mapping (Isomap)
23
* Matlab codes are available at
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
24
Image source: http://en.wikipedia.org/wiki/Principal_component_analysis PC1 PC2
25
Preserves given distance values
When you only know/care about ordering of distances Preserves only the orderings of distance values
26
ideal distance Low-dim distance
Nonmetric MDS is even much slower than metric MDS Fast algorithm are available. Barnes-Hut algorithm GPU-based implementations
27
e.g., grouping label
Dimension Reduction
dimensions Additional info about data Other parameters Dim-reducing Transformer for a new data
30
30
31
– 𝑧 𝐷 − 1 [𝑧1, 𝑧2, … 𝑧𝐷−1] 𝐷 − 1 𝑥𝑗 𝑋 = [𝑥1|𝑥2| … |𝑥𝐷−1] 𝑧𝑗 = 𝑥𝑗
𝑈𝑦 ⇒ 𝑧 = 𝑋𝑈𝑦
𝑇𝑋 = 𝑇𝑗
𝐷 𝑗=1
𝑦 − 𝜈𝑗 𝑦 − 𝜈𝑗 𝑈
𝑦∈𝜕𝑗
𝜈𝑗 =
1 𝑂𝑗
𝑦
𝑦∈𝜕𝑗
– es 𝑇𝐶 = 𝑂𝑗 𝜈𝑗 − 𝜈 𝜈𝑗 − 𝜈 𝑈
𝐷 𝑗=1
1 𝑂
𝑦
∀𝑦
=
1 𝑂
𝑂𝑗𝜈𝑗
𝐷 𝑗=1
– 𝑇𝑈 = 𝑇𝐶 + 𝑇𝑋
1 2 3
S B 3 S B 2 S W 3 S W 1 S W 2 x 1 x 2 1 2 3
S B 3 S B 2 S W 3 S W 1 S W 2 x 1 x 2
*http://research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdf
32
Principal component analysis (PCA) Multidimensional scaling (MDS) Linear discriminant analysis (LDA)
Isometric feature mapping (Isomap)
33
* Matlab codes are available at
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
34
35
36
* Eigen-decomposition is the main algorithm of PCA
37
38
(1, 2)-dimension (3, 4)-dimension
It is the method that will give empirically the best result.
Supervised approach is sometimes the only viable option when your data do not have clearly separable clusters
39
Subtract the global mean from each vector
Make each vector have unit Euclidean norm Otherwise, a few outlier can affect dimension reduction significantly
Document: TF-IDF weighting, remove too rare and/or short terms Image: histogram normalization
40
The results may be even better due to noise removed by PCA
Landmarked versions (only using a subset of data items) e.g., landmarked Isomap Linearized versions (the same criterion, but only allow linear mapping) e.g., Laplacian Eigenmaps → Locality preserving projection
41
See if you can impose label information Restrict the number of iterations to save computational time.
42
43
PCA MDS LDA Isomap Supervised ✖ ✖ ✔ ✖ Linear ✔ ✖ ✔ ✖ Global ✔ ✔ ✔ ✔ Feature ✔ ✖ ✔ ✔
http://www.iai.uni- bonn.de/~jz/dimensionality_reduction_a_comparative_review.pdf
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_ Reduction.html
http://www.math.ucla.edu/~wittman/mani/
44