Multivariate Methods Dimensionality Reduction T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Data are real vectors. Idea: vectors are from class-specific multivariate normal distributions. Full model: covariance matrix has O ( Kd 2 ) parameters. P(C) C µ,Σ 2 x From Figure 5.3 of Alpaydin (2004). N AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Data are real vectors. Idea: vectors are from class-specific multivariate normal distributions. Full model: O ( Kd 2 ) parameters in the covariance matrix. P(C) C µ,Σ 2 x From Figure 5.3 of Alpaydin (2004). N AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Common covariance matrix Idea: the means are class-specific, covariance matrix Σ is common. O ( d 2 ) parameters in the covariance matrix. Figure 5.4: Covariances may be arbitary but shared by both classes. From: E. Alpaydın. 2004. c Introduction to Machine Learning . � The MIT Press. AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Common diagonal covariance matrix Idea: the means are class-specific, covariance matrix Σ is common and diagonal (Naive Bayes). d parameters in the covariance matrix. j + log ˆ Discriminant: g i ( x ) = − 1 � d j − m ij ) 2 / s 2 j =1 ( x t P ( C i ). 2 P(C) µ,Σ C Figure 5.5: All classes have equal, diagonal x covariance matrices but variances are not equal. d N From: E. Alpaydın. 2004. Introduction to Machine AB Learning . � The MIT Press. c Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Bayes Classifier Nearest mean classifier Idea: the means are class-specific, covariance matrix Σ is common and proportional to unit matrix Σ = σ 2 1 . 1 parameter in the covariance matrix. Discriminant: g i ( x ) = − || x − m i || 2 . Nearest mean classifier. Each mean is a prototype. P(C) µ,Σ C Figure 5.6: All classes have equal, diagonal x covariance matrices of equal variances on both d N dimensions. From: E. Alpaydın. 2004. Introduction AB to Machine Learning . � The MIT Press. c Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Discrete Features Most straightforward using Naive Bayes (replace Gaussian with Bernoulli): ! " ) ' / / F ; < H ! !"#$%& '($)*%(+, 8G G 8 "'- F G $%(-"#.(/(#.(#) 01$"2(-!$&(+34 2 ! " ! " ! " * F ; & F / F < H ' / ; & / G G 8 8G 8G ' G ; )5(-."+6%"7"#$#)-"+-8"#($% ! " ! " ! " ' % ; ! 89: - / ! < H 89: - > H 8 8 8 # ! " ! " $ ! " ( ' F - 89: - / % ; & F - 89: - ; & / % 89: - > H G 8G G 8G 8 G ( $ $ F & =+)"7$)(.-/$%$7()(%+ G 8 I / ' $ ( 8G & $ 8 $ AB !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050
Bayes Classifier Multivariate Methods Discrete Variables Dimensionality Reduction Multivariate Regression Multivariate Regression ! " & $ % ; G $ I H I H IBBBI H $ # G : 2 ! !"#$%&'(%'$)*#%+)'(*,-.)# H $ H G $ $ H G $ $ $ H G $ ! G : : = = 2 2 & ' : ! " ( = ! % $ ) ) $ ) ) $ , H I H IBBBI H I & H H G H G ! G : 2 G : : 2 2 = $ ! !"#$%&'(%'$)*/-#0+-,%'#*,-.)#1* 2)3%+)*+)4*5%65)(7-(.)(*&'(%'8#)9* F : ; G : <* F = ; G = <* F > ; G : = <* F ? ; G = = <* F @ ; G : G = '+.*"9)*$5)*#%+)'(*,-.)#*%+*$5%9*+)4* ! 9/'A)* B8'9%9*3"+A$%-+9<*C)(+)#*$(%AC<*DE!1*F5'/$)(*:GH !" !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050
Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Outline Multivariate Methods 1 Bayes Classifier Discrete Variables Multivariate Regression Dimensionality Reduction 2 Subset Selection Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) AB Kai Puolam¨ aki T-61.3050
Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Why Reduce Dimensionality? #$%&'$()*+,$)'-,./$0+*12)3$(()'-,.&*4*+-5 !" #$%&'$()(.4'$)'-,./$0+*12)3$(().474,$*$7( 6" 94:$()*;$)'-(*)-<)-=($7:+5>)*;$)<$4*&7$ 8" 9+,./$7),-%$/()47$),-7$)7-=&(*)-5)(,4//)%4*4($*( ?" A-7$)+5*$7.7$*4=/$B)(+,./$7)$0./454*+-5 @" D4*4):+(&4/+E4*+-5)F(*7&'*&7$G)>7-&.(G)-&*/+$7(G)$*'H) C" +<)./-**$%)+5)6)-7)8)%+,$5(+-5( !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050
Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Feature Selection vs. Extraction ! !"#$%&"'(")"*$+,-. /0,,(+-1' H 2 2 +34,&$#-$'5"#$%&"(6 +1-,&+-1'$0"'&"3#+-+-1' 2 7 H 8%9("$'(")"*$+,-'#)1,&+$03( ! !"#$%&"'":$&#*$+,-. ;&,<"*$'$0"' ,&+1+-#)' G 8 6' 8' =>6???6 2 @+3"-(+,-('$,' -"A' H 2 2 @+3"-(+,-(6' I J 6' J' =>6???6 H ;&+-*+4#)'*,34,-"-$('#-#)B(+('C;/DE6')+-"#&' @+(*&+3+-#-$'#-#)B(+('CFGDE6'5#*$,&'#-#)B(+('C!DE ! !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050
Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Subset Selection ! !"#$#%&$#%' 2 ()*(#+(%,-% 2 -#&+)$#( ! .,$/&$0%(#&$1"2%300%+"#%*#(+%-#&+)$#%&+%#&1"%(+#4 " 5#+%,-%-#&+)$#(% F 676+6&889%:; " 3+%#&1"%6+#$&+6,7<%-670%+"#%*#(+%7#/%-#&+)$# G =%&$>?67 8 ,' @% F' ! H 8 A " 300% H G +,% F 6-% ,' @% F' ! H G A%B% ,' @% F' A% ! C688D186?*67>%E@ 2 ' A%&8>,$6+"? ! F&1G/&$0%(#&$1"2%5+&$+%/6+"%&88%-#&+)$#(%&70%$#?,H#% ,7#%&+%&%+6?#<%6-%4,((6*8#; ! .8,&+67>%(#&$1"%@300% I <%$#?,H#% . A ! !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050
Subset Selection Multivariate Methods Principal Component Analysis (PCA) Dimensionality Reduction Linear Discriminant Analysis (LDA) Subset Selection Example Toy data set consists of 100 10-dimensional vectors from two classes (1 and 0). First two dimensions x t 1 and x t 2 : drawn from Gaussian with unit variance and mean of 1 or -1 for the classes 1 and 0, respectively. Remaining eight dimensions: drawn from Gaussian with zero mean and unit variance, that is, they contain no information of the class. Optimal classifier: If x 1 + x 2 is positive the class is 1, otherwise the class is 0. Use nearest mean classifier. Split data in random into training set of 30+30 items and AB validation set of 20+20 items. Kai Puolam¨ aki T-61.3050
Recommend
More recommend