Machine Learning Dimensionality reduction Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 59

Table of contents Introduction 1 High-dimensional space 2 Dimensionality reduction methods 3 Feature selection methods 4 Feature extraction 5 Feature extraction methods 6 Principal component analysis Kernel principal component analysis Factor analysis Multidimensional Scaling Locally Linear Embedding Isomap Linear discriminant analysis Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 2 / 59

Introduction The complexity of any classifier or regressors depends on the number of input variables or features. These complexities include Time complexity: In most learning algorithms, the time complexity depends on the number of input dimensions( D ) as well as on the size of training set ( N ). Decreasing D decreases the time complexity of algorithm for both training and testing phases. Space complexity: Decreasing D also decreases the memory amount needed for training and testing phases. Samples complexity: Usually the number of training examples ( N ) is a function of length of feature vectors ( D ). Hence, decreasing the number of features also decreases the number of training examples. Usually the number of training pattern must be 10 to 20 times of the number of features. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 59

Introduction There are several reasons why we are interested in reducing dimensionality as a separate preprocessing step. Decreasing the time complexity of classifiers or regressors. Decreasing the cost of extracting/producing unnecessary features. Simpler models are more robust on small data sets. Simpler models have less variance and thus are less depending on noise and outliers. Description of classifier or regressors is simpler / shorter. Visualization of data is simpler. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 4 / 59

Peaking phenomenon In practice, for a finite N, by increasing the number of features we obtain an initial improvement in performance, but after a critical value further increase of the number of features results in an increase of the probability of error. This phenomenon is also known as the peaking phenomenon. If the number of samples increases ( N 2 ≫ N 1 ), the peaking phenomenon occures for larger number of features ( l 2 > l 1 ). Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 5 / 59

High-dimensional space In most applications of data mining/ machine learning, typically the data is very high dimensional (the number of features can easily be in the hundreds or thousands). Understanding the nature of high-dimensional space (hyperspace) is very important, because hyperspace does not behave like the more familiar geometry in two or three dimensions. Consider the N × D data matrix   x 11 x 12 . . . x 1 D x 21 x 22 . . . x 2 D   S =  .  . . .  ... . . .   . . .  x N 1 x N 2 . . . x ND Let the minimum and maximum values for each feature x j be given as min ( x j ) = min i { x ij } max ( x j ) = max { x ij } i The data hyperspace can be considered as a D -dimensional hyper-rectangle, defined as D � R D = [ min ( x j ) , max ( x j )] . j =1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 6 / 59

High-dimensional space (cont.) Hypercube Assume the data is centered to have mean : µ = 0. Let m denote the largest absolute value in S . D N m = max max i =1 {| x ij |} . j =1 The data hyperspace can be represented as a hypercube H D ( l ), centered at 0, with all sides of length l = 2 m . � � �� − l 2 , l x = ( x 1 , . . . , x D ) T | H D ( l ) = ∀ i x i ∈ . 2 Hypersphere Assume the data is centered to have mean : µ = 0. Let r denote the largest magnitude among all points in S . r = max {� x i �} . i The data hyperspace can also be represented as a D -dimensional hyperball centered at 0 with radius r B D ( r ) = { x | � x i � ≤ r } The surface of the hyperball is called a hypersphere, and it consists of all the points exactly at distance r from the center of the hyperball S D ( r ) = { x | � x i � = r } Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 7 / 59

High-dimensional space (cont.) Consider two features of Irish data set 2 1 X 2 : sepal width r 0 − 1 − 2 − 2 − 1 0 1 2 X 1 : sepal length Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 8 / 59

High-dimensional volumes The volume of a hypercube with edge length l equals to vol ( H D ( l )) = l D . The volume of a hyperball and its corresponding hypersphere equals to � � D π 2 r D . vol ( S D ( r )) = � D � Γ 2 + 1 where gamma function for α > 0 is defined as � ∞ x α − 1 e − x dx Γ( α ) = 0 The surface area of the hypersphere can be obtained by differentiating its volume with respect to r � D � area ( S D ( r )) = d 2 π 2 r D − 1 . dr vol ( S D ( r )) = � D � Γ 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 9 / 59

Asymptotic Volume An interesting observation about the hypersphere volume is that as dimensionality increases, the volume first increases up to a point, and then starts to decrease, and ultimately vanishes. For the unit hypersphere ( r = 1), � D � π 2 r D → 0 . D →∞ vol ( S D ( r )) = lim lim � D � Γ 2 + 1 D →∞ 5 4 vol ( S d ( 1 )) 3 2 1 0 0 5 10 15 20 25 30 35 40 45 50 d Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 10 / 59

Hypersphere Inscribed Within Hypercube Consider the space enclosed within the largest hypersphere that can be accommodated within a hypercube. Consider a hypersphere of radius r inscribed in a hypercube with sides of length 2 r . The ratio of the volume of the hypersphere of radius r to the hypercube with side length l = 2 r equals to vol ( H 2 (2 r )) = π r 2 vol ( S 2 ( r )) 4 r 2 = π 4 = 0 . 785 4 3 π r 3 vol ( S 3 ( r )) 8 r 3 = π vol ( H 3 (2 r )) = 6 = 0 . 524 � D � vol ( S D ( r )) π 2 lim vol ( H D (2 r )) = lim → 0 . � D � 2 D Γ 2 + 1 D →∞ D →∞ Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 11 / 59

Hypersphere Inscribed within Hypercube Hypersphere inscribed inside a hypercube for two and three dimensions. − r r − r 0 0 r Conceptual view of high-dimensional space for two, three, four, and higher dimensions. (a) (b) (c) (d) In d dimensions there are 2 d corners and 2 d 1 diagonals . Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 12 / 59

Volume of Thin Hypersphere Shell Consider the volume of a thin hypersphere shell of width ǫ bounded by an outer hypersphere of radius r , and an inner hypersphere of radius r − ǫ . Volume of the thin shell equals to the difference between the volumes of the two bounding hyperspheres. r r − � � Let S D ( r , ǫ ) denote thethin hypershell of width ǫ . Its volume equals vol ( S D ( r , ǫ )) = vol ( S D ( r )) − vol ( S D ( r − ǫ )) = K D r D − K D ( r − ǫ ) D D π 2 K D = � D � Γ 2 + 1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 13 / 59

Volume of Thin Hypersphere Shell (cont.) Ratio of the volume of the thin shell to the volume of the outer sphere equals to = K D r D − K D ( r − ǫ ) D vol ( S D ( r , ǫ )) 1 − ǫ � D � r = 1 − K D r D r − � vol ( S D ( r )) r � For r = 1 and ǫ = 0 . 01 � 2 � vol ( S 2 (1 , 0 . 01) 1 − 0 . 01 = 1 − ≃ 0 . 02 vol ( S 2 (1)) 1 � 3 . vol ( S 3 (1 , 0 . 01) � 1 − 0 . 01 = 1 − ≃ 0 . 03 vol ( S 3 (1)) 1 � 4 . vol ( S 4 (1 , 0 . 01) � 1 − 0 . 01 = 1 − ≃ 0 . 04 vol ( S 4 (1)) 1 � 5 . vol ( S 5 (1 , 0 . 01) � 1 − 0 . 01 = 1 − ≃ 0 . 05 . vol ( S 5 (1)) 1 As D increases, in the limit we obtain vol ( S D ( r , ǫ )) 1 − ǫ � D � lim = lim D →∞ 1 − → 1 . vol ( S D ( r )) r D →∞ Almost all of the volume of the hypersphere is contained in the thin shell as D → ∞ . Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 14 / 59

Machine Learning Dimensionality reduction Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 59 Table of contents Introduction 1 High-dimensional space 2

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

A growing green agenda in the charity sector 26 February 2020 Land, sustainability and Net Zero

Simulation of Multi-Material Compressible Flows with Interfaces Samuel KOKH

Second Circuit Criminal Law Update Richard Levitt LEVITT & KAIZER 40 Fulton Street, 23rd

In silico stochastic simulation of Ca 2+ triggered synaptic release Andrea Bracciali Enrico

Webinar Instructions PowerPoint and webinar recording will be available on the HUD Exchange

A Computer Scientist Looks at the Energy Problem Randy H. Katz University of California,

Financing Resilient Power in Under-Resourced Communities January 28, 2020 Hosted by Rob Sanders

Energy Estimation Methodology for Accelerator Designs Yannan Nellie Wu 1 , Joel S. Emer 1,2 ,

Machine Learning Dimensionality reduction Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 59 Table of contents Introduction 1 High-dimensional space 2

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

A growing green agenda in the charity sector 26 February 2020 Land, sustainability and Net Zero

Simulation of Multi-Material Compressible Flows with Interfaces Samuel KOKH

Second Circuit Criminal Law Update Richard Levitt LEVITT &amp; KAIZER 40 Fulton Street, 23rd

In silico stochastic simulation of Ca 2+ triggered synaptic release Andrea Bracciali Enrico

Webinar Instructions PowerPoint and webinar recording will be available on the HUD Exchange

A Computer Scientist Looks at the Energy Problem Randy H. Katz University of California,

Financing Resilient Power in Under-Resourced Communities January 28, 2020 Hosted by Rob Sanders

Energy Estimation Methodology for Accelerator Designs Yannan Nellie Wu 1 , Joel S. Emer 1,2 ,

Second Circuit Criminal Law Update Richard Levitt LEVITT & KAIZER 40 Fulton Street, 23rd