Introduction to Big Data and Machine Learning Dimensionality - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent Variables Dr. Mihail October 8, 2019 (Dr. Mihail) Intro Big Data October 8, 2019 1 / 20

Data Dimensionality Idea Many datasets have the property that the data points all lie close to a manifold of much lower dimensionality than that of the original data space Consider MNIST digits They all lie in a 768-dimensional space, but are relatively close (Dr. Mihail) Intro Big Data October 8, 2019 2 / 20

Data Dimensionality Idea Goal: “summarize” the ways in which the 3’s (observed variables) vary with only a few continuous variables (latent variables) Nonprobabilistic Principal Component Analysis: express each observed variable as a projection on a lower dimensional subspace (Dr. Mihail) Intro Big Data October 8, 2019 3 / 20

Principal Component Analysis Basics PCA is a technique widely used in dimensionality reduction, lossy data compression, feature extraction and data visualization Also known as the “Karhunen-Lo` eve” transform There are two formulations of PCA that give rise to the same algorithm: An orthogonal projection of data onto a lower dimensional linear space, 1 known as the principal subspace, such that the variance of the projected data is maximized Linear projection that minimizes the average projection cost, defined as 2 the mean squared distance between the data points and their projections (Dr. Mihail) Intro Big Data October 8, 2019 4 / 20

Maximum variance formulation PCA derivation Consider a dataset of observations { x n } where n = 1 . . . N and x n is a Euclidean variable with dimensionality D Goal: project the data onto a space with dimensionality M < D while maximizing the variance of the projected data. We shall assume that M is given To start, we can imagine projecting on a space with M = 1. We define the direction of this 1-dimensional space with a D − dimensional vector u 1 , such that u is a unit vector: u T i u i = 1 (Dr. Mihail) Intro Big Data October 8, 2019 5 / 20

Data Dimensionality PCA derivation Each data point x n is projected onto a scalar value u T 1 x n . The mean of the projected data is u T 1 ¯ x , where ¯ x is the data set mean given by: N x = 1 � ¯ x n (1) N n =1 and the variance of the projected data: N 1 x } 2 = u T � { u T 1 x n − u T 1 ¯ (2) 1 Su 1 N n =1 where S is the covariance given by: N S = 1 � x ) T ( x n − ¯ x )( x n − ¯ (3) N n =1 (Dr. Mihail) Intro Big Data October 8, 2019 6 / 20

Data Dimensionality PCA derivation We now maximize the projected variance u T 1 Su 1 with respect to u 1 . Constrained maximization to prevent the naive solution || u 1 || → ∞ The appropriate constraint should be to maintain unity || u T 1 u 1 || = 1. To enforce, we introduce a Lagrange multiplier λ 1 , and make solve unconstrained maximization of: u T 1 Su 1 + λ 1 (1 − u T 1 u 1 ) (4) and setting the derivative of above to 0 w.r.t. u 1 , we see that Su 1 = λ 1 u 1 (5) which says that u 1 has to be an eigenvalue of S (Dr. Mihail) Intro Big Data October 8, 2019 7 / 20

Data Dimensionality PCA derivation If we left-multiply by u T 1 and make use of u T 1 u 1 = 1, then the variance is given by: u T 1 Su 1 = λ 1 (6) and so the variance will be at a maximum when we set u 1 to the eigenvector with the largest eigenvalue λ 1 This eigenvector is known as the principal component (Dr. Mihail) Intro Big Data October 8, 2019 8 / 20

Data Dimensionality Summary PCA involves computing the mean ¯ x and the covariance matrix S of a dataset, and then finding the M eigenvectors of S corresponding to the largest eigenvalues (Dr. Mihail) Intro Big Data October 8, 2019 9 / 20

Data Dimensionality Summary PCA involves computing the mean ¯ x and the covariance matrix S of a dataset, and then finding the M eigenvectors of S corresponding to the largest eigenvalues Potential concern: finding the eigenvectors and eigenvalues for a DxD matrix is O ( D 3 ). If we only need M << D eigenvectors, there are other methods (Dr. Mihail) Intro Big Data October 8, 2019 9 / 20

Data Dimensionality Minimum-error formulation of PCA Let the basis vectors u i be a complete D-dimensional orthonormal set, where i = 1 . . . D (Dr. Mihail) Intro Big Data October 8, 2019 10 / 20

Data Dimensionality Minimum-error formulation of PCA Let the basis vectors u i be a complete D-dimensional orthonormal set, where i = 1 . . . D Because this basis is complete, each data point can be represented as a linear combination of the basis vectors: D � x n = (7) α ni u i i =1 where the coefficients α ni will be different for different data points Since the basis is orthonormal, this is a simple rotation, so the original D components { x n 1 , . . . , x nD } are replaced by an equivalent set { α n 1 , . . . , α nD } Taking the inner product with u j and making use of orthonormality, we obtain α nj = x T n u j (Dr. Mihail) Intro Big Data October 8, 2019 10 / 20

Data Dimensionality Minimum-error formulation of PCA Therefore we can now write each data point as follows: D � ( x T x n = n u i ) u i (8) i =1 Our goal is to reduce dimensionality, to an M < D , thus each point can be approximated by: M D � � x n = ˜ z ni u i + b i u i ?? (9) i =1 i = M +1 (Dr. Mihail) Intro Big Data October 8, 2019 11 / 20

Data Dimensionality Minimum-error formulation of PCA M D � � x n = ˜ z ni u i + b i u i i =1 i = M +1 where { z ni } depend on a particular data point, and { b i } are constants for all data points We are free to choose { u i } , { z ni } and { b i } so as to minimize the distortion introduced by the reduction in dimensionality: N J = 1 � x n || 2 || x n − ˜ (10) N n =1 (Dr. Mihail) Intro Big Data October 8, 2019 12 / 20

Data Dimensionality Minimum-error formulation of PCA Consider first { z ni } . Substituting for ˜ x n , setting the derivative wrt z nj to zero we obtain: z nj = x T n u j (11) Similarly, setting the derivative of J with respect to b i to zero, we obtain x T u j b j = ¯ (12) where j = M + 1 , . . . , D . If we substitute z ni and b i in Equation ?? we obtain: D � x ) T u i } u i x n − ˜ x n = { ( x n − ¯ (13) i = M +1 (Dr. Mihail) Intro Big Data October 8, 2019 13 / 20

Data Dimensionality Minimum-error formulation of PCA We obtain a formulation of J , purely as a function of { u i } : N D D J = 1 x T u i ) 2 = � � ( x T � u T n u i − ¯ i Su i (14) N n =1 i = M +1 i = M +1 The solution to the constrained minimization of J involves solving the eigenvalue problem: Su i = λ i u i (15) where i=1 , . . . , D and the eigenvectors are orthonormal (Dr. Mihail) Intro Big Data October 8, 2019 14 / 20

Data Dimensionality PCA algorithm shown on MNIST Compute ¯ x . (Dr. Mihail) Intro Big Data October 8, 2019 15 / 20

Data Dimensionality Code to finding ¯ x import s c i p y . i o mat = s c i p y . i o . loadmat ( ’ mnist . mat ’ ) import numpy as np import m a t p l o t l i b . pyplot as p l t X = mat [ ’ trainX ’ ] [ : , : ] y = mat [ ’ trainY ’ ] [ : , : ] [ 0 ] t h r e e s = X[ np . where ( y==3)] xbar = np . mean( threes , a x i s =0) p l t . s u b p l o t s (1 , 1) p l t . imshow ( np . reshape ( xbar , (28 , 28))) (Dr. Mihail) Intro Big Data October 8, 2019 16 / 20

Data Dimensionality PCA algorithm Subtract the mean from all x n xzeromean = t h r e e s − xbar (Dr. Mihail) Intro Big Data October 8, 2019 17 / 20

Data Dimensionality Algorithm Compute the covariance matrix x T x and its eigendecomposition: # Compute c o v a r i a n c e matrix cov mat = xzeromean .T. dot ( xzeromean ) / ( xzeromean . shape [0] − 1) # Compute e i g e n v a l u e decomposition e i g e n v a l s , e i g e n v e c s = np . l i n a l g . e i g ( cov mat ) # Arrange as p a i r s ( t u p l e s ) e i g p a i r s = [ ( e i g e n v a l s [ i ] , e i g e n v e c s [ : , i ] ) f o r i i n range ( l e n ( e i g v a l s ) ) ] # Sort the ( e i g e n v a l u e , e i g e n v e c t o r ) t u p l e s from high to low e i g p a i r s . s o r t ( key=lambda x : x [ 0 ] , r e v e r s e=True ) (Dr. Mihail) Intro Big Data October 8, 2019 18 / 20

Data Dimensionality Project to subspace and reconstruct f i g , ax = p l t . s u b p l o t s (5 , 9 , f i g s i z e = (25 , 15)) f o r d i g i t i n range ( 5 ) : onethree = xzeromean [ d i g i t , : ] ax [ d i g i t , 0 ] . imshow ( np . reshape ( onethree+xbar , (28 , 28))) ax [ d i g i t , 0 ] . s e t t i t l e ( ’ O r i g i n a l ’ ) f o r ( b a s i s i x , b a s i s ) i n enumerate ( [ 1 , 2 , 5 , 10 , 100 , 200 , 600 , 28 ∗ 28]): subspace = np . a r r a y ( [ e i g p a i r s [ i ] [ 1 ] f o r i i n range ( b a s i s ) ] ) . T X pca = np . dot ( onethree , subspace ) X recon = np . dot ( subspace , X pca ) + xbar ax [ d i g i t , b a s i s i x +1]. imshow ( np . reshape ( np . abs ( X recon ) , (28 , 28))) ax [ d i g i t , b a s i s i x +1]. s e t t i t l e ( s t r ( b a s i s )+ ’ components ’ ) ax [ d i g i t , b a s i s i x +1]. t i c k p a r a m s ( labelbottom=False , l a b e l l e f t=F a l s e ) (Dr. Mihail) Intro Big Data October 8, 2019 19 / 20

Introduction to Big Data and Machine Learning Dimensionality - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent Variables Dr. Mihail October 8, 2019 (Dr. Mihail) Intro Big Data October 8, 2019 1 / 20 Data Dimensionality Idea Many datasets have the property that

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Big Data and Machine Learning Dimensionality - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent Variables Dr. Mihail October 8, 2019 (Dr. Mihail) Intro Big Data October 8, 2019 1 / 20 Data Dimensionality Idea Many datasets have the property that

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

CSC 411: Lecture 14: Principal Components Analysis &amp; Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel