Learning From Data Lecture 10 Nonlinear Transforms The Z -space Polynomial transforms Be careful M. Magdon-Ismail CSCI 4100/6100
recap: The Linear Model linear in x : gives the line/hyperplane separator ↓ s = w t x ↑ linear in w : makes the algorithms work Approve Classification Error Perceptron or Deny PLA, Pocket,.. . Credit Amount Squared Error Linear Regression Analysis of Credit Pseudo-inverse Probability Cross-entropy Error Logistic Regression of Default Gradient descent M Nonlinear Transforms : 2 /18 � A c L Creator: Malik Magdon-Ismail Limitations of linear − →
The Linear Model has its Limits (a) Linear with outliers (b) Essentially nonlinear To address (b) we need something more than linear. M Nonlinear Transforms : 3 /18 � A c L Creator: Malik Magdon-Ismail Change the features − →
Change Your Features Y ≫ 3 years Income no additional effect beyond Y = 3; Y ≪ 0 . 3 years no additional effect below Y = 0 . 3. Years in Residence, Y M Nonlinear Transforms : 4 /18 � A c L Creator: Malik Magdon-Ismail ‘Transform’ your features − →
Change Your Features Using a Transform Income Income Years in Residence, Y z 1 z 1 Y M Nonlinear Transforms : 5 /18 � A c L Creator: Malik Magdon-Ismail Feature transform I: Z -space − →
Mechanics of the Feature Transform I Transform the data to a Z -space in which the data is separable. z 2 = x 2 2 − → x 2 z 1 = x 2 x 1 1 1 1 1 − → x 2 Φ 1 ( x ) x = z = Φ ( x ) = = x 1 1 x 2 Φ 2 ( x ) x 2 2 M Nonlinear Transforms : 6 /18 � A c L Creator: Malik Magdon-Ismail Feature transform II: classify in Z -space − →
Mechanics of the Feature Transform II Separate the data in the Z -space with ˜ w : g ( z ) = sign( ˜ ˜ w t z ) − → M Nonlinear Transforms : 7 /18 � A c L Creator: Malik Magdon-Ismail Feature transform III: bring back to X -space − →
Mechanics of the Feature Transform III To classify a new x , first transform x to Φ ( x ) ∈ Z -space and classify there with ˜ g . g ( x ) = ˜ g ( Φ ( x )) g ( z ) = sign( ˜ ˜ w t z ) w t Φ ( x )) = sign( ˜ ← − M Nonlinear Transforms : 8 /18 � A c L Creator: Malik Magdon-Ismail Summary of nonlinear transform − →
The General Feature Transform Z -space is R ˜ X -space is R d d 1 1 1 Φ 1 ( x ) x 1 z 1 x = z = Φ ( x ) = = . . . . . . . . . Φ ˜ d ( x ) x d z ˜ d x 1 , x 2 , . . . , x N z 1 , z 2 , . . . , z N y 1 , y 2 , . . . , y N y 1 , y 2 , . . . , y N w 0 w 1 no weights w = ˜ . . . w ˜ d g ( x ) = sign( ˜ w t Φ ( x )) M Nonlinear Transforms : 9 /18 � A c L Creator: Malik Magdon-Ismail Generalization − →
Generalization ˜ d vc d vc − → ˜ d + 1 d + 1 Choose the feature transform with smallest ˜ d M Nonlinear Transforms : 10 /18 � A c L Creator: Malik Magdon-Ismail Many possibilities to choose from − →
Many Nonlinear Features May Work ← − x 2 1 + x 2 x 2 2 = 0 . 6 x 1 z 2 = x 2 z 2 = x 2 2 z 1 = x 2 1 + x 2 2 − 0 . 6 z 1 = ( x 1 + 0 . 05) 2 z 1 = x 2 1 A rat! A rat! This is called data snooping: looking at your data and tailoring your H . M Nonlinear Transforms : 11 /18 � A c L Creator: Malik Magdon-Ismail Many possibilities to choose from − →
Many Nonlinear Features May Work ← − x 2 1 + x 2 x 2 2 = 0 . 6 x 1 z 2 = x 2 z 2 = x 2 2 z 1 = x 2 1 + x 2 2 − 0 . 6 z 1 = ( x 1 + 0 . 05) 2 z 1 = x 2 1 A rat! A rat! This is called data snooping: looking at your data and tailoring your H . M Nonlinear Transforms : 12 /18 � A c L Creator: Malik Magdon-Ismail Choose before looking at data − →
Must Choose Φ BEFORE Your Look at the Data After constructing features carefully, before seeing the data . . . . . . if you think linear is not enough, try the 2nd order polynomial transform . 1 1 Φ 1 ( x ) x 1 1 Φ 2 ( x ) x 2 = x − → x 1 Φ ( x ) = = x 2 Φ 3 ( x ) 1 x 2 Φ 4 ( x ) x 1 x 2 x 2 Φ 5 ( x ) 2 M Nonlinear Transforms : 13 /18 � A c L Creator: Malik Magdon-Ismail The polynomial transform − →
The General Polynomial Transform Φ k We can get even fancier: degree- k polynomial transform: Φ 1 ( x ) = (1 , x 1 , x 2 ) , Φ 2 ( x ) = (1 , x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 ) , Φ 3 ( x ) = (1 , x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 , x 3 1 , x 2 1 x 2 , x 1 x 2 2 , x 3 2 ) , Φ 4 ( x ) = (1 , x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 , x 3 1 , x 2 1 x 2 , x 1 x 2 2 , x 3 2 , x 4 1 , x 3 1 x 2 , x 2 1 x 2 2 , x 1 x 3 2 , x 4 2 ) , . . . – Dimensionality of the feature space increases rapidly ( d vc )! – Similar transforms for d -dimensional original space. – Approximation-generalization tradeoff Higher degree gives lower (even zero) E in but worse generalization. M Nonlinear Transforms : 14 /18 � A c L Creator: Malik Magdon-Ismail Be carefull with nonlinear transforms − →
Be Careful with Feature Transforms M Nonlinear Transforms : 15 /18 � A c L Creator: Malik Magdon-Ismail Insist on E in = 0 − →
Be Careful with Feature Transforms High order polynomial transform leads to “nonsense”. M Nonlinear Transforms : 16 /18 � A c L Creator: Malik Magdon-Ismail Digits data − →
Digits Data “1” Versus “All” Symmetry Symmetry Average Intensity Average Intensity Linear model 3rd order polynomial model E in = 2 . 13% E in = 1 . 75% E out = 2 . 38% E out = 1 . 87% M Nonlinear Transforms : 17 /18 � A c L Creator: Malik Magdon-Ismail Use the linear model! − →
Use the Linear Model! • First try a linear model – simple, robust and works. • Algorithms can tolerate error plus you have nonlinear feature transforms. • Choose a feature transform before seeing the data. Stay simple. Data snooping is hazardous to your E out . • Linear models are fundamental in their own right; they are also the building blocks of many more complex models like support vector machines. • Nonlinear transforms also apply to regression and logistic regression. M Nonlinear Transforms : 18 /18 � A c L Creator: Malik Magdon-Ismail
Recommend
More recommend