Learning From Data Lecture 8 Linear Classification and Regression - PowerPoint PPT Presentation

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100

recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out ≤ E in + Ω( d vc ) E out = bias + var 1. Did you fit your data well enough ( E in )? 1. How well can you fit your data ( bias )? 2. Are you confident your E in will generalize to E out 2. How close to that best fit can you get ( var )? out-of-sample error y y model complexity Error x x in-sample error d ∗ VC dimension, d vc ¯ g ( x ) vc y y g ( x ) ¯ sin( x ) sin( x ) x x The VC Insuarance Co. H 0 H 1 The VC warranty had conditions for becoming void: bias = 0 . 50; bias = 0 . 21; var = 0 . 25. var = 1 . 69. You can’t look at your data before choosing H . E out = 0 . 75 � E out = 1 . 90 Data must be generated i.i.d from P ( x ). Data and test case from same P ( x ) (same bin). M Linear Classification and Regression : 2 /23 � A c L Creator: Malik Magdon-Ismail Recap: learning curve − →

recap: Decomposing The Learning Curve VC Analysis Bias-Variance Analysis Expected Error Expected Error E out E out variance generalization error E in E in bias in-sample error Number of Data Points, N Number of Data Points, N Pick H that can generalize and has a good Pick ( H , A ) to approximate f and not behave chance to fit the data wildly after seeing the data M Linear Classification and Regression : 3 /23 � A c L Creator: Malik Magdon-Ismail 3 learning problems − →

Three Learning Problems Approve Classification y = ± 1 or Deny Credit Amount y ∈ R Regression Analysis of Credit Probability y ∈ [0 , 1] Logistic Regression of Default • Linear models are perhaps the fundamental model. • The linear model is the first model to try. M Linear Classification and Regression : 4 /23 � A c L Creator: Malik Magdon-Ismail Linear signal − →

The Linear Signal linear in x : gives the line/hyperplane separator ↓ s = w t x ↑ linear in w : makes the algorithms work x is the augmented vector: x ∈ { 1 } × R d M Linear Classification and Regression : 5 /23 � A c L Creator: Malik Magdon-Ismail Using the linear signal − →

The Linear Signal   → sign( w t x )  {− 1 , +1 }              → w t x s = w t x − → R            → θ ( w t x )  [0 , 1]    y = θ ( s ) M Linear Classification and Regression : 6 /23 � A c L Creator: Malik Magdon-Ismail Classification and PLA − →

Linear Classification � h ( x ) = sign( w t x ) � H lin = 1. E in ≈ E out because d vc = d + 1, �� d E out ( h ) ≤ E in ( h ) + O N log N . 2. If the data is linearly separable, PLA will find a separator = ⇒ E in = 0. w ( t + 1) = w ( t ) + x ∗ y ∗ ↑ misclassified data point E in = 0 = ⇒ E out ≈ 0 ( f is well approximated by a linear fit). What if the data is not separable ( E in = 0 is not possible)? pocket algorithm How to ensure E in ≈ 0 is possible? select good features M Linear Classification and Regression : 7 /23 � A c L Creator: Malik Magdon-Ismail Non-separable data − →

Non-Separable Data M Linear Classification and Regression : 8 /23 � A c L Creator: Malik Magdon-Ismail Pocket algorithm − →

The Pocket Algorithm Minimizing E in is a hard combinatorial problem. The Pocket Algorithm – Run PLA – At each step keep the best E in (and w ) so far. (Its not rocket science, but it works.) (Other approaches: linear regression, logistic regression, linear programming . . . ) M Linear Classification and Regression : 9 /23 � A c L Creator: Malik Magdon-Ismail Digits − →

Digits Data Each digit is a 16 × 16 image. M Linear Classification and Regression : 10 /23 � A c L Creator: Malik Magdon-Ismail Input is 256 dimensional − →

Digits Data Each digit is a 16 × 16 image. � -1 -1 -1 -1 -1 -1 -1 -0.63 0.86 -0.17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.99 0.3 1 0.31 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.41 1 0.99 -0.57 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.68 0.83 1 0.56 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.94 0.54 1 0.78 -0.72 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.1 1 0.92 -0.44 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.26 0.95 1 -0.16 -1 -1 -1 -0.99 -0.71 -0.83 -1 -1 -1 -1 -1 -0.8 0.91 1 0.3 -0.96 -1 -1 -0.55 0.49 1 0.88 0.09 -1 -1 -1 -1 0.28 1 0.88 -0.8 -1 -0.9 0.14 0.97 1 1 1 0.99 -0.74 -1 -1 -0.95 0.84 1 0.32 -1 -1 0.35 1 0.65 -0.10 -0.18 1 0.98 -0.72 -1 -1 -0.63 1 1 0.07 -0.92 0.11 0.96 0.30 -0.88 -1 -0.07 1 0.64 -0.99 -1 -1 -0.67 1 1 0.75 0.34 1 0.70 -0.94 -1 -1 0.54 1 0.02 -1 -1 -1 -0.90 0.79 1 1 1 1 0.53 0.18 0.81 0.83 0.97 0.86 -0.63 -1 -1 -1 -1 -0.45 0.82 1 1 1 1 1 1 1 1 0.13 -1 -1 -1 -1 -1 � -1 -0.48 0.81 1 1 1 1 1 1 0.21 -0.94 -1 -1 -1 -1 -1 -1 -1 -0.97 -0.42 0.30 0.82 1 0.48 -0.47 -0.99 -1 -1 -1 -1 � x = (1 , x 1 , · · · , x 256 ) ← input d vc = 257 w = ( w 0 , w 1 , · · · , w 256 ) ← linear model M Linear Classification and Regression : 11 /23 � A c L Creator: Malik Magdon-Ismail Intensity and symmetry features − →

Intensity and Symmetry Features feature : an important property of the input that you think is useful for classification. ( dictionary.com : a prominent or conspicuous part or characteristic) � x = (1 , x 1 , x 2 ) ← input d vc = 3 w = ( w 0 , w 1 , w 2 ) ← linear model M Linear Classification and Regression : 12 /23 � A c L Creator: Malik Magdon-Ismail PLA on digits data − →

PLA on Digits Data PLA 50% E out Error (log scale) 10% 1% E in 0 250 500 750 1000 Iteration Number, t M Linear Classification and Regression : 13 /23 � A c L Creator: Malik Magdon-Ismail Pocket on digits data − →

Pocket on Digits Data PLA Pocket 50% 50% E out Error (log scale) Error (log scale) 10% 10% E out 1% 1% E in E in 0 250 500 750 1000 0 250 500 750 1000 Iteration Number, t Iteration Number, t M Linear Classification and Regression : 14 /23 � A c L Creator: Malik Magdon-Ismail Regression − →

Linear Regression age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Classification: Approve/Deny Regression: Credit Line (dollar amount) regression ≡ y ∈ R d � h ( x ) = w i x i = w t x i =0 M Linear Classification and Regression : 15 /23 � A c L Creator: Malik Magdon-Ismail Regression − →

Linear Regression age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Classification: Approve/Deny Regression: Credit Line (dollar amount) regression ≡ y ∈ R d � w i x i = w t x h ( x ) = i =0 M Linear Classification and Regression : 16 /23 � A c L Creator: Malik Magdon-Ismail Squared error − →

Least Squares Linear Regression y y x 1 x 2 x M Linear Classification and Regression : 17 /23 � A c L Creator: Malik Magdon-Ismail Squared error − →

Least Squares Linear Regression y y x 1 x 2 x y = f ( x ) + ǫ ← − noisy target P ( y | x )  N  � E in ( h ) = 1 ( h ( x n ) − y n ) 2  in-sample error   N n =1 h ( x ) = w t x    E out ( h ) = E x [( h ( x ) − y ) 2 ] out-of-sample error  M Linear Classification and Regression : 18 /23 � A c L Creator: Malik Magdon-Ismail Matrix representation − →

Using Matrices for Linear Regression         — x 1 — ˆ w t x 1 y 1 y 1 — x 2 — ˆ w t x 2 y 2 y 2         ˆ X = y = y =  =  = X w  .   .   .   .  . . . . . . . .       — x N — ˆ w t x N y N y N � �� target vector in-sample predictions data matrix, N × ( d + 1) N E in ( w ) = 1 � y n − y n ) 2 (ˆ N n =1 | 2 1 = N | | ˆ y − y | 2 | 2 1 = N | | X w − y | 2 1 = N ( w t X t X w − 2 w t X t y + y t y ) M Linear Classification and Regression : 19 /23 � A c L Creator: Malik Magdon-Ismail Pseudoinverse solution − →

Linear Regression Solution E in ( w ) = 1 N ( w t X t X w − 2 w t X t y + y t y ) Vector Calculus: To minimize E in ( w ), set ∇ w E in ( w ) = 0 . ∇ w ( w t A w ) = (A + A t ) w , ∇ w ( w t b ) = b . A = X t X and b = X t y : ∇ w E in ( w ) = 2 N (X t X w − X t y ) Setting ∇ E in ( w ) = 0 : X t X w = X t y ← − normal equations w lin = (X t X) − 1 X t y ← − when X t X is invertible M Linear Classification and Regression : 20 /23 � A c L Creator: Malik Magdon-Ismail Regression algorithm − →

Linear Regression Algorithm Linear Regression Algorithm: 1. Construct the matrix X and the vector y from the data set ( x 1 , y 1 ) , · · · , ( x N , y N ), where each x includes the x 0 = 1 coordinate,     — x 1 — y 1 — x 2 — y 2     X = y = , .  .   .  . . . .     — x N — y N � �� target vector data matrix 2. Compute the pseudo inverse X † of the matrix X. If X t X is invertible, X † = (X t X) − 1 X t 3. Return w lin = X † y . M Linear Classification and Regression : 21 /23 � A c L Creator: Malik Magdon-Ismail Generalization − →

Learning From Data Lecture 8 Linear Classification and Regression - PowerPoint PPT Presentation

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in + ( d

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Homework Homework Lecture 7: Linear Classification Methods Final projects? Groups Topics

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Machine Learning Classification over Encrypted Data Raphael Bost, Raluca Ada Popa, Stephen Tu,

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Data Sciences CentraleSupelec Advance Machine Learning Course II - Linear regression/Linear

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Multiple Linear Regression Often more than one predictor variable can be used to predict the

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

w o o o o o o o o x o o o o o x o that represents how aligned the o x x x x

When Regulations Backfire: The Case of the Community Reinvestment Act Konstantin Golyaev

1D Regression i.i.d. with mean 0. Univariate Linear

Linear Regression 1 / 10 The Linear Model So far weve dealt with classification, where the

Linear regression Petr Po s k P. Po s k c 2015 Artificial Intelligence 1

Linear regression without correspondence Daniel Hsu Columbia University October 3, 2017 Joint

Learning From Data Lecture 8 Linear Classification and Regression - PowerPoint PPT Presentation

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in + ( d

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Homework Homework Lecture 7: Linear Classification Methods Final projects? Groups Topics

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Machine Learning Classification over Encrypted Data Raphael Bost, Raluca Ada Popa, Stephen Tu,

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Data Sciences CentraleSupelec Advance Machine Learning Course II - Linear regression/Linear

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Multiple Linear Regression Often more than one predictor variable can be used to predict the

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

w o o o o o o o o x o o o o o x o that represents how aligned the o x x x x

When Regulations Backfire: The Case of the Community Reinvestment Act Konstantin Golyaev

1D Regression i.i.d. with mean 0. Univariate Linear

Linear Regression 1 / 10 The Linear Model So far weve dealt with classification, where the

Linear regression Petr Po s k P. Po s k c 2015 Artificial Intelligence 1

Linear regression without correspondence Daniel Hsu Columbia University October 3, 2017 Joint

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE