Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. - PowerPoint PPT Presentation

Dimension Reduction CS 6242 Ramakrishnan Kannan  Thanks : Prof. Jaegul Choo and Prof. Le Song

What is Dimension Reduction? Data ¡item ¡index ¡(n) ¡ low-‑dim ¡ Columns ¡as ¡ data ¡ data ¡items ¡ ¡ Dimension ¡ index ¡(d) ¡ Dimension Reduction Why? Attribute=Feature= How ¡big ¡is ¡ Variable=Dimension this? ¡

Image Data 5 Serialized/rasterized pixel values 3 58 5 34 78 3 80 63 34 58 24 45 80 24 Raw ¡images ¡ Pixel ¡values ¡ 78 In a 4K (4096x2160) 45 image there are totally 8.8 63 million pixels Serialized ¡ pixels ¡ 3

Video Data Serialized/rasterized pixel values 5 22 49 Pixel ¡values ¡ 3 58 14 5 34 78 22 86 15 3 80 63 34 86 … ¡ 49 54 67 58 24 45 80 54 Raw ¡images ¡ 14 78 36  Huge dimensions 24 78  4096x2160 image size ¡→ ¡ ¡8847360 ¡ dimensions 63 15  30 fps.  Means for 2 mins video, you generate a matrix of size 45 67 8847360 ¡x3600 63 36 Serialized ¡ pixels ¡ 4

Text Documents  Bag-of-words vector  Document 1 = “Life of Pi won Oscar”  Document 2 = “Life of Pi is also a book.” Vocabulary ¡ Doc ¡1 ¡ Doc ¡2 ¡ Life ¡ 1 ¡ 1 ¡ Pi ¡ 1 ¡ 1 ¡ movies ¡ 0 ¡ 0 ¡ … ¡ also ¡ 0 ¡ 1 ¡ oscar ¡ 1 ¡ 0 ¡ book ¡ 0 ¡ 1 ¡ won ¡ 1 ¡ 0 ¡

Two Axes of Data Set  Data items  How many data items?  Dimensions  How many dimensions representing each item? Data ¡item ¡index ¡(n) ¡ Columns ¡as ¡ Dimension ¡ vs. ¡Rows ¡as ¡data ¡items ¡ data ¡items ¡ ¡ index ¡(d) ¡ We ¡will ¡use ¡this ¡during ¡lecture ¡

Dimension Reduction No. of dimensions High-dim (k) data (n) low-dim data (n) Reduced Dimension Dimension dimension index (d) Reduction (k) Other Dim-reducing parameters transformation Additional info for new data about data : user-specified 7

Benefits of Dimension Reduction Obviously, Compression Visualization Faster computation Computing distances: 100,000-dim vs. 10-dim vectors More importantly, Noise removal (improving data quality) Separates the data into General Pattern + Sparse + Noise Is Noise the important signal? Works as pre-processing for better performance e.g., microarray data analysis, information retrieval, face recognition, protein disorder prediction, network intrusion detection, document categorization, speech recognition

Two Main Techniques 1. Feature selection Selects a subset of the original variables as reduced dimensions relevant for a particular task e.g., the number of genes responsible for a particular disease may be small 2. Feature extraction Each reduced dimension combines multiple original dimensions The original dataset will be transformed to some other numbers Feature = Variable = Dimension 9

Feature Selection What are the optimal subset of m features to maximize a given criterion? Widely-used criteria Information gain, correlation, … Typically combinatorial optimization problems Therefore, greedy methods are popular Forward selection: Empty set → Add one variable at a time Backward elimination: Entire set → Remove one variable at a time 10

Feature Extraction

Aspects of Dimension Reduction Linear vs. Nonlinear Unsupervised vs. Supervised Global vs. Local Feature vectors vs. Similarity (as an input) 12

Linear vs. Nonlinear Linear Represents each reduced dimension as a linear combination of original dimensions Of the form aX+b where a, x and b are vectors/matrices e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4 Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4 Naturally capable of mapping new data to the same space D1 D2 D1 D2 X1 1 1 Dimension Y1 1.75 -0.27 Reduction X2 1 0 Y2 -0.21 0.58 X3 0 2 X4 1 1 13

Linear vs. Nonlinear Linear Represents each reduced dimension as a linear combination of original dimensions e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4 Naturally capable of mapping new data to the same space Nonlinear More complicated, but generally more powerful Recently popular topics 14

Unsupervised vs. Supervised Unsupervised Uses only the input data No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 15

Unsupervised vs. Supervised Supervised Uses the input data + additional info No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 16

Unsupervised vs. Supervised Supervised Uses the input data + additional info e.g., grouping label No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 17

Global vs. Local Dimension reduction typically tries to preserve all the relationships/distances in data Information loss is unavoidable! Then, what should we emphasize more? Global Treats all pairwise distances equally important Focuses on preserving large distances Local Focuses on small distances, neighborhood relationships Active research area, e.g., manifold learning 18

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) No. of dimensions High-dim (k) data (n) low-dim data Reduced Dimension Dimension dimension index (d) Reduction (k) Additional info Other Dim-reducing about data parameters Transformer for a new data

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) Alternatively, takes similarity matrix instead ( i , j )-th component indicates similarity between i -th and j -th data Assuming distance is a metric, similarity matrix is symmetric No. of Similarity dimensions matrix low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) Alternatively, takes similarity matrix instead Internally, converts feature vectors to similarity matrix before performing dimension reduction No. of Similarity dimensions matrix(nxn) Dimension Reduction High-dim low-dim data (dxn) data(kxn) low-dim Dimension data(kxn) Reduction Additional info Other Dim-reducing about data parameters Transformer for Graph Embedding a new data

Feature vectors vs. Similarity (as an input) Why called graph embedding? Similarity matrix can be viewed as a graph where similarity represents edge weight Dimension Reduction High-dim Similarity data(dxn) matrix low-dim data Graph Embedding

Methods Traditional Principal component analysis (PCA) Multidimensional scaling (MDS) Linear discriminant analysis (LDA) Advanced (nonlinear, kernelized, manifold learning) Isometric feature mapping (Isomap) * Matlab codes are available at http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html 23

Principal Component Analysis Finds the axis showing the largest variation, and project all points into this axis Reduced dimensions are orthogonal Algorithm: Eigen-decomposition Pros: Fast Cons: Limited performances PC1 PC2 Linear Unsupervised Global Feature vectors Image source: http://en.wikipedia.org/wiki/Principal_component_analysis 24

PCA – Some Questions Algorithm Subtract mean from the dataset (X-µ) Find the covariance matrix (X-µ)’ (X-µ) Perform SVD on this covariance matrix to find the leading eigen vectors Project the data point X on these leading eigen vectors. That is., multiply. Key Questions Why covariance matrix? Can’t we perform SVD on the original matrix? 25

Multidimensional Scaling (MDS) Main idea Tries to preserve given pairwise distances in low- dimensional space ideal distance Low-dim distance Nonlinear Unsupervised Metric MDS Global Preserves given distance values Similarity input Nonmetric MDS When you only know/care about ordering of distances Preserves only the orderings of distance values Algorithm: gradient-decent type c.f. classical MDS is the same as PCA 26

Multidimensional Scaling Pros: widely-used (works well in general) Cons: slow ( n -body problem) Nonmetric MDS is even much slower than metric MDS Fast algorithm are available. Barnes-Hut algorithm GPU-based implementations 27

Linear Discriminant Analysis What if clustering information is available? LDA tries to separate clusters by Putting different cluster as far as possible Putting each cluster as compact as possible (a) (b)

Aspects of Dimension Reduction Unsupervised vs. Supervised Supervised Uses the input data + additional info e.g., grouping label No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data

Linear Discriminant Analysis (LDA) vs. Principal Component Analysis 2D visualization of 7 Gaussian mixture of 1000 dimensions Linear discriminant analysis Principal component analysis (Supervised) (Unsupervised) 30 30

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. - PowerPoint PPT Presentation

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le Song What is Dimension Reduction? Data item index (n) low-dim Columns as data data items

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Developing the intercultural dimension Developing the intercultural dimension in teaching and

1 In this lecture we discuss Pansus conformal dimension . Definition (Pansu, 1989) Let X be a

Constraint solving meets machine learning and data mining Algorithm portfolios Kustaa Kangas

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero

IntroductiontoIsabelle/HOL [| A 1 ; A 2 ; canbereadasif A 1 and A

solving systems L. Olson Department of Computer Science University of Illinois at

Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Hgskolen i

Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology

Systems of Linear Equations The purpose of computing is insight, not numbers. Richard Wesley

Variational Methods for Inference based on a paper by Michael Jordan et al. Patrick Pletscher

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us