T-61.3050 Machine Learning: Basic Principles Dimensionality - PowerPoint PPT Presentation

Multivariate Methods Dimensionality Reduction T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Data are real vectors. Idea: vectors are from class-specific multivariate normal distributions. Full model: covariance matrix has O ( Kd 2 ) parameters. P(C) C µ,Σ 2 x From Figure 5.3 of Alpaydin (2004). N AB Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Data are real vectors. Idea: vectors are from class-specific multivariate normal distributions. Full model: O ( Kd 2 ) parameters in the covariance matrix. P(C) C µ,Σ 2 x From Figure 5.3 of Alpaydin (2004). N AB Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Common covariance matrix Idea: the means are class-specific, covariance matrix Σ is common. O ( d 2 ) parameters in the covariance matrix. Figure 5.4: Covariances may be arbitary but shared by both classes. From: E. Alpaydın. 2004. c Introduction to Machine Learning . � The MIT Press. AB Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Common diagonal covariance matrix Idea: the means are class-specific, covariance matrix Σ is common and diagonal (Naive Bayes). d parameters in the covariance matrix. j + log ˆ Discriminant: g i ( x ) = − 1 � d j − m ij ) 2 / s 2 j =1 ( x t P ( C i ). 2 P(C) µ,Σ C Figure 5.5: All classes have equal, diagonal x covariance matrices but variances are not equal. d N From: E. Alpaydın. 2004. Introduction to Machine AB Learning . � The MIT Press. c Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Nearest mean classifier Idea: the means are class-specific, covariance matrix Σ is common and proportional to unit matrix Σ = σ 2 1 . 1 parameter in the covariance matrix. Discriminant: g i ( x ) = − || x − m i || 2 . Nearest mean classifier. Each mean is a prototype. P(C) µ,Σ C Figure 5.6: All classes have equal, diagonal x covariance matrices of equal variances on both d N dimensions. From: E. Alpaydın. 2004. Introduction AB to Machine Learning . � The MIT Press. c Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Discrete Features Most straightforward using Naive Bayes (replace Gaussian with Bernoulli): ! " ) ' / / F ; < H ! !"#$%& '($)*%(+, 8G G 8 "'- F G $%(-"#.(/(#.(#) 01$"2(-!$&(+34 2 ! " ! " ! " * F ; & F / F < H ' / ; & / G G 8 8G 8G ' G ; )5(-."+6%"7"#$#)-"+-8"#($% ! " ! " ! " ' % ; ! 89: - / ! < H 89: - > H 8 8 8 # ! " ! " $ ! " ( ' F - 89: - / % ; & F - 89: - ; & / % 89: - > H G 8G G 8G 8 G ( $ $ F & =+)"7$)(.-/$%$7()(%+ G 8 I / ' $ ( 8G & $ 8 $ AB !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC Kai Puolam¨ aki T-61.3050

Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Multivariate Regression ! " & $ % ; G $ I H I H IBBBI H $ # G : 2 ! !"#$%&'(%'$)*#%+)'(*,-.)# H $ H G $ $ H G $ $ $ H G $ ! G : : = = 2 2 & ' : ! " ( = ! % $ ) ) $ ) ) $ , H I H IBBBI H I & H H G H G ! G : 2 G : : 2 2 = $ ! !"#$%&'(%'$)*/-#0+-,%'#*,-.)#1* 2)3%+)*+)4*5%65)(7-(.)(*&'(%'8#)9* F : ; G : <* F = ; G = <* F > ; G : = <* F ? ; G = = <* F @ ; G : G = '+.*"9)*$5)*#%+)'(*,-.)#*%+*$5%9*+)4* ! 9/'A)* B8'9%9*3"+A$%-+9<*C)(+)#*$(%AC<*DE!1*F5'/$)(*:GH !" !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050

Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Why Reduce Dimensionality? #$%&'$()*+,$)'-,./$0+*12)3$(()'-,.&*4*+-5 !" #$%&'$()(.4'$)'-,./$0+*12)3$(().474,$*$7( 6" 94:$()*;$)'-(*)-<)-=($7:+5>)*;$)<$4*&7$ 8" 9+,./$7),-%$/()47$),-7$)7-=&(*)-5)(,4//)%4*4($*( ?" A-7$)+5*$7.7$*4=/$B)(+,./$7)$0./454*+-5 @" D4*4):+(&4/+E4*+-5)F(*7&'*&7$G)>7-&.(G)-&*/+$7(G)$*'H) C" +<)./-**$%)+5)6)-7)8)%+,$5(+-5( !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Feature Selection vs. Extraction ! !"#$%&"'(")"*$+,-. /0,,(+-1' H 2 2 +34,&$#-$'5"#$%&"(6 +1-,&+-1'$0"'&"3#+-+-1' 2 7 H 8%9("$'(")"*$+,-'#)1,&+$03( ! !"#$%&"'":$&#*$+,-. ;&,<"*$'$0"' ,&+1+-#)' G 8 6' 8' =>6???6 2 @+3"-(+,-('$,' -"A' H 2 2 @+3"-(+,-(6' I J 6' J' =>6???6 H ;&+-*+4#)'*,34,-"-$('#-#)B(+('C;/DE6')+-"#&' @+(*&+3+-#-$'#-#)B(+('CFGDE6'5#*$,&'#-#)B(+('C!DE ! !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Subset Selection ! !"#$#%&$#%' 2 ()*(#+(%,-% 2 -#&+)$#( ! .,$/&$0%(#&$1"2%300%+"#%*#(+%-#&+)$#%&+%#&1"%(+#4 " 5#+%,-%-#&+)$#(% F 676+6&889%:; " 3+%#&1"%6+#$&+6,7<%-670%+"#%*#(+%7#/%-#&+)$# G =%&$>?67 8 ,' @% F' ! H 8 A " 300% H G +,% F 6-% ,' @% F' ! H G A%B% ,' @% F' A% ! C688D186?*67>%E@ 2 ' A%&8>,$6+"? ! F&1G/&$0%(#&$1"2%5+&$+%/6+"%&88%-#&+)$#(%&70%$#?,H#% ,7#%&+%&%+6?#<%6-%4,((6*8#; ! .8,&+67>%(#&$1"%@300% I <%$#?,H#% . A ! !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Subset Selection Example Toy data set consists of 100 10-dimensional vectors from two classes (1 and 0). First two dimensions x t 1 and x t 2 : drawn from Gaussian with unit variance and mean of 1 or -1 for the classes 1 and 0, respectively. Remaining eight dimensions: drawn from Gaussian with zero mean and unit variance, that is, they contain no information of the class. Optimal classifier: If x 1 + x 2 is positive the class is 1, otherwise the class is 0. Use nearest mean classifier. Split data in random into training set of 30+30 items and AB validation set of 20+20 items. Kai Puolam¨ aki T-61.3050

T-61.3050 Machine Learning: Basic Principles Dimensionality - PowerPoint PPT Presentation

Multivariate Methods Dimensionality Reduction T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

T-61.3050 Machine Learning: Basic Principles Model Selection Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Introduction Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

Unit 2: Probability and distributions 1. Probability and conditional probability GOVT 3990 -

MA162: Finite mathematics . Jack Schmidt University of Kentucky November 16, 2011 Schedule:

T-61.3050 Machine Learning: Basic Principles Dimensionality - PowerPoint PPT Presentation

Multivariate Methods Dimensionality Reduction T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

T-61.3050 Machine Learning: Basic Principles Model Selection Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Introduction Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

MLES &amp; Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

Unit 2: Probability and distributions 1. Probability and conditional probability GOVT 3990 -

MA162: Finite mathematics . Jack Schmidt University of Kentucky November 16, 2011 Schedule:

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde