Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo - PowerPoint PPT Presentation

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan, Prof. Le Song

What is Dimension Reduction? Data item index (n) low-dim Columns as data data items Dimension index (d) Dimension Reduction Why? Attribute=Feature= How big is Variable=Dimension this?

Image Data 5 Serialized/rasterized pixel values 3 58 5 34 78 3 80 63 34 58 24 45 80 24 Raw images Pixel values 78 In a 4K (4096x2160) 45 image there are totally 8.8 63 million pixels Serialized pixels 3

Video Data Serialized/rasterized pixel values 5 22 Pixel values 49 3 58 14 5 34 78 22 86 15 3 80 63 34 86 … 49 54 67 58 24 45 80 54 Raw images 14 78 36 Huge dimensions 24 78 4096x2160 image size → 8847360 dimensions 63 15 30 fps. Means for 2 mins video, you generate a matrix of size 45 67 8847360 x3600 63 36 Serialized pixels 4

Text Documents Bag-of-words vector Document 1 = “Life of Pi won Oscar” Document 2 = “Life of Pi is also a book.” Vocabulary Doc 1 Doc 2 Life 1 1 Pi 1 1 movies 0 0 … also 0 1 oscar 1 0 book 0 1 won 1 0

Two Axes of Data Set Data items How many data items? Dimensions How many dimensions representing each item? Data item index (n) Columns as Dimension vs. Rows as data items data items index (d) We will use this during lecture

Dimension Reduction No. of dimensions High-dim (k) data (n) low-dim data (n) Reduced Dimension Dimension dimension index (d) Reduction (k) Other Dim-reducing parameters transformation Additional info for new data about data : user-specified 7

Benefits of Dimension Reduction Obviously, Compression Visualization Faster computation Computing distances: 100,000-dim vs. 10-dim vectors More importantly, Noise removal (improving data quality) Separates the data into General Pattern + Sparse + Noise Is Noise the important signal? Works as pre-processing for better performance e.g., microarray data analysis, information retrieval, face recognition, protein disorder prediction, network intrusion detection, document categorization, speech recognition

Two Main Techniques 1. Feature selection Selects a subset of the original variables as reduced dimensions relevant for a particular task e.g., the number of genes responsible for a particular disease may be small 2. Feature extraction Each reduced dimension combines multiple original dimensions The original dataset will be transformed to some other numbers Feature = Variable = Dimension 9

Feature Selection What are the optimal subset of m features to maximize a given criterion? Widely-used criteria Information gain, correlation, … Typically combinatorial optimization problems Therefore, greedy methods are popular Forward selection: Empty set → Add one variable at a time Backward elimination: Entire set → Remove one variable at a time 10

Feature Extraction

Aspects of Dimension Reduction Linear vs. Nonlinear Unsupervised vs. Supervised Global vs. Local Feature vectors vs. Similarity (as an input) 12

Linear vs. Nonlinear Linear Represents each reduced dimension as a linear combination of original dimensions Of the form aX+b where a, x and b are vectors/matrices e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4 Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4 Naturally capable of mapping new data to the same space D1 D2 D1 D2 X1 1 1 Dimension Y1 1.75 -0.27 Reduction X2 1 0 Y2 -0.21 0.58 X3 0 2 X4 1 1 13

Linear vs. Nonlinear Linear Represents each reduced dimension as a linear combination of original dimensions e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4 Naturally capable of mapping new data to the same space Nonlinear More complicated, but generally more powerful Recently popular topics 14

Unsupervised vs. Supervised Unsupervised Uses only the input data No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 15

Unsupervised vs. Supervised Supervised Uses the input data + additional info No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 16

Unsupervised vs. Supervised Supervised Uses the input data + additional info e.g., grouping label No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data 17

Global vs. Local Dimension reduction typically tries to preserve all the relationships/distances in data Information loss is unavoidable! Then, what should we emphasize more? Global Treats all pairwise distances equally important Focuses on preserving large distances Local Focuses on small distances, neighborhood relationships Active research area, e.g., manifold learning 18

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) No. of dimensions High-dim (k) data (n) low-dim data Reduced Dimension Dimension dimension index (d) Reduction (k) Additional info Other Dim-reducing about data parameters Transformer for a new data

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) Alternatively, takes similarity matrix instead ( i , j )-th component indicates similarity between i -th and j -th data Assuming distance is a metric, similarity matrix is symmetric No. of Similarity dimensions matrix low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data

Feature vectors vs. Similarity (as an input) Typical setup (feature vectors as an input) Alternatively, takes similarity matrix instead Internally, converts feature vectors to similarity matrix before performing dimension reduction No. of Similarity dimensions matrix(nxn) Dimension Reduction low-dim High-dim data (dxn) data(kxn) low-dim Dimension data(kxn) Reduction Additional info Other Dim-reducing about data parameters Transformer for Graph Embedding a new data

Feature vectors vs. Similarity (as an input) Why called graph embedding? Similarity matrix can be viewed as a graph where similarity represents edge weight Dimension Reduction High-dim Similarity data(dxn) matrix low-dim data Graph Embedding

Methods Traditional Principal component analysis (PCA) Multidimensional scaling (MDS) Linear discriminant analysis (LDA) Advanced (nonlinear, kernelized, manifold learning) Isometric feature mapping (Isomap) * Matlab codes are available at http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html 23

Principal Component Analysis Finds the axis showing the largest variation, and project all points into this axis Reduced dimensions are orthogonal Algorithm: Eigen-decomposition Pros: Fast Cons: Limited performances PC2 PC1 Linear Unsupervised Global Feature vectors Image source: http://en.wikipedia.org/wiki/Principal_component_analysis 24

PCA – Some Questions Algorithm Subtract mean from the dataset (X-μ) Find the covariance matrix (X-μ)’ (X-μ) Perform Eigen decomposition on this covariance matrix Key Questions Why covariance matrix? SVD on the original matrix vs Eigen decomposition on covariance matrix 25

Multidimensional Scaling (MDS) Main idea Tries to preserve given pairwise distances in low- dimensional space ideal distance Low-dim distance Nonlinear Unsupervised Metric MDS Global Preserves given distance values Similarity input Nonmetric MDS When you only know/care about ordering of distances Preserves only the orderings of distance values Algorithm: gradient-decent type c.f. classical MDS is the same as PCA 26

Multidimensional Scaling Pros: widely-used (works well in general) Cons: slow ( n -body problem) Nonmetric MDS is even much slower than metric MDS Fast algorithm are available. Barnes-Hut algorithm GPU-based implementations 27

Linear Discriminant Analysis What if clustering information is available? LDA tries to separate clusters by Putting different cluster as far as possible Putting each cluster as compact as possible (a) (b)

Aspects of Dimension Reduction Unsupervised vs. Supervised Supervised Uses the input data + additional info e.g., grouping label No. of High-dim dimensions data low-dim data Dimension Reduction Additional info Other Dim-reducing about data parameters Transformer for a new data

Linear Discriminant Analysis (LDA) vs. Principal Component Analysis 2D visualization of 7 Gaussian mixture of 1000 dimensions Linear discriminant analysis Principal component analysis (Supervised) (Unsupervised) 30 30

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo - PowerPoint PPT Presentation

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan, Prof. Le Song What is Dimension Reduction? Data item index (n) low-dim Columns as data data items Dimension index (d) Dimension Reduction

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Developing the intercultural dimension Developing the intercultural dimension in teaching and

1 In this lecture we discuss Pansus conformal dimension . Definition (Pansu, 1989) Let X be a

Lecture 26: MDS / Canonical Forms COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/19/2016

Proofs About Numbers Reading: EC 2.2 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 7 1/

Global surface ozone trends, a synthesis of recently published findings Owen R. Cooper CIRES,

CSSE463: Image Recognition Day 11 l Start thinking about term project ideas. l Interesting data

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Localization III Localization Local optimization: Global optimization:

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Sequential data analysis with TraMineR, Part 2 Gilbert Ritschard Department of Econometrics and