Big Picture Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University March 2 nd , 2005
What you have learned thus far � Learning is function approximation � Point estimation � Regression � Naïve Bayes � Logistic regression � Bias-Variance tradeoff � Neural nets � Decision trees � Cross validation � Boosting � Instance-based learning � SVMs � Kernel trick � PAC learning � VC dimension � Margin bounds � Mistake bounds
Review material in terms of… � Types of learning problems � Hypothesis spaces � Loss functions � Optimization algorithms
Text Classification Company home page vs Personal home page vs Univeristy home page vs …
Function fitting 12 50 9 54 OFFICE OFFICE QUIET PHONE 16 11 15 51 8 53 10 13 52 14 CONFERENCE 49 17 7 18 STORAGE 48 LAB 19 5 6 ELEC COPY 47 4 20 46 21 3 45 2 SERVER 22 44 KITCHEN 1 23 43 39 37 33 29 27 35 40 31 25 24 42 41 34 38 36 32 30 28 26 Temperature data
Monitoring a complex system � Reverse water gas shift system (RWGS) � Learn model of system from data � Use model to predict behavior and detect faults
Types of learning problems � Classification Input – Features � Regression Output? 28 26 24 22 20 18 40 30 100 80 20 60 40 10 20 0 0 � Density estimation
The learning problem Features/Function approximator Learned function Data Loss function <x 1 ,…,x n ,y> Learning task Optimization algorithm
Comparing learning algorithms � Hypothesis space � Loss function � Optimization algorithm
Logistic regression Naïve Bayes versus Logistic regression Naïve Bayes
Naïve Bayes versus Logistic regression – Classification as density estimation � Choose class with highest probability � In addition to class, we get certainty measure
Logistic regression versus Boosting Logistic regression Boosting Classifier Log-loss Exponential-loss
Linear classifiers – Logistic regression versus SVMs w . x + b = 0
What’s the difference between SVMs and Logistic Regression? (Revisited again) SVMs Logistic Regression Loss function Hinge loss Log-loss High dimensional Yes! Yes! features with kernels Solution sparse Often yes! Almost always no! Type of learning
SVMs and instance-based learning SVMs Classify as Instance based learning Data Classify as <x 1 ,…,x n ,y>
Instance-based learning versus Decision trees Decision trees 1-Nearest neighbor
Logistic regression versus Neural nets Neural Nets Logistic regression
Linear regression versus Kernel regression Linear Kernel Kernel-weighted Regression regression linear regression
Kernel-weighted linear regression Local basis functions for each region Kernels average 12 50 9 54 OFFICE OFFICE QUIET PHONE 16 between 11 15 51 8 53 10 13 52 14 CONFERENCE 49 17 regions 7 18 STORAGE 48 LAB 19 5 6 ELEC COPY 47 4 20 46 21 3 45 2 SERVER 22 44 KITCHEN 1 23 43 39 37 33 29 27 35 40 31 25 24 42 41 34 38 36 32 30 28 26
w . x + b - ε SVM regression w . x + b w . x + b + ε
BIG PICTURE DE density estimation learning Cl Classification task (a few points of comparison) Reg Regression LL Log-loss/MLE loss Mrg Margin-based Boosting function Naïve Cl, exp-loss RMS Squared error Bayes DE, LL SVM regression Logistic SVMs Reg, Mrg Cl, Mrg regression DE, LL kernel regression Instance-based Reg, RMS Learning DE,Cl,Reg Neural Nets linear Decision DE,Cl,Reg,RMS regression trees Reg, RMS DE,Cl,Reg This is a very incomplete view!!!
Recommend
More recommend