nonlinear classifiers ii
play

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction - PDF document

1 Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised Classifiers XOR problem Linear Classifiers Perceptron Least Squares Methods Linear Support Vector Machine Nonlinear


  1. 1 Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction • Classifiers • Supervised Classifiers • XOR problem • Linear Classifiers • Perceptron • Least Squares Methods • Linear Support Vector Machine • Nonlinear Classifiers • Part I: Multi Layer Neural Networks • Part II: Polynomial Classifier, RBF, Nonlinear SVM • Decision Trees • Unsupervised Classifiers 1

  2. 3 Nonlinear Classifiers: Introduction • An example: Suppose we’re in 1 -dimension What would a linear SVMs do with this data? x=0 4 Nonlinear Classifiers: Introduction • An example: Suppose we’re in 1 -dimension Not a big surprise x=0 Positive “plane” Negative “plane” 2

  3. 5 Nonlinear Classifiers: Introduction • Harder 1-dimensional dataset What can be done about this? x=0 6 Nonlinear Classifiers: Introduction non-linear basis function  2 ( x , x ) z k k k x=0 3

  4. 7 Nonlinear Classifiers: Introduction non-linear basis function  2 ( x , x ) z k k k x=0 8 Nonlinear Classifiers: Introduction x=0 x=0 • Linear classifiers are simple and computationally efficient. • However for nonlinearly separable features, they might lead to very inaccurate decisions. • Then we may trade simplicity and efficiency for accuracy using a nonlinear classifier . 4

  5. 9 The XOR problem x 1 x 2 XOR Class 0 0 0 B 0 1 1 A 1 0 1 A 1 1 0 B • There is no single line (hyperplane) that separates class A from class B. On the contrary, AND and OR operations are linearly separable problems. 10 Nonlinear Classifiers: Agenda Part II: Nonlinear Classifiers • Polynomial Classifier – Special case of a Two-Layer Perceptron – Activation function with non linear input • Radial Basis Function Network – Special case of a two-layer network – Radial Basis activation Function – Training is simpler and faster • Nonlinear Support Vector Machine 5

  6. 11 Polynomial Classifier: XOR problem • XOR problem with polynomial function. • With nonlinear polynomial function classes can be classified. • Example XOR-Problem: X x 1 linear not separable! A B x 2 A B 12 Polynomial Classifier: XOR problem • XOR problem with polynomial function. • With nonlinear polynomial functions, classes can be classified. • Example XOR-Problem:   : X H H X z 1 x 1     A z x B B A x 2 z A B 2 z 3 …but with a polynomial function! 6

  7. 13 Polynomial Classifier: XOR problem X H     z x    x (0,0)  (0,0,0) 1    With we obtain:   (0,1)  (0,1,0) z x  2  (1,0)  (1,0,0)    x x  1 2  (1,1)  (1,1,1) … that‘s separable in H 1      g z ( ) 1 z 1 z 2 z 0 1 2 3 4 by the Hyperplane: 14 Polynomial Classifier: XOR problem X H     z x Hyperplane:    g z ( ) wz w 0 0 1 H X      g z ( ) z z 2 z 0 1 2 3 4 z z z 1 2 3  is Hyperplane in H x x x x x x 1 2 1 2 1 2 0 0 0 0 0 A (true) 1     0 1 0 1 0 B (false) g x ( ) x x 2 x x 1 2 1 2 4 1 0 1 0 0 B (false) is Polynom in X 1 1 1 1 1 A (true) 7

  8. 15 Polynomial Classifier: XOR problem X H     z x Decision Surface in X   0 x A 1     g x ( ) 1 x 1 x 2 x x   1 2 1 2 4 0 x B x  (x -0.25)/(2x -1) MatLab: 2 1 1 >> x1=[-0.5:0.1:1.5]; >> x2=( x1-0.25 )./(2*x1-1); >> plot(x1,x2); 16 Polynomial Classifier: XOR problem  With nonlinear polynomial functions, classes can be classified in original space X H z 1 – Example: XOR-Problem   X   B z x x A 1 A B z 2 z x 3 2 A B was not linear separable! X x … but linear separable in H ! 1 A … and separable in X with a B polynomial function! x 2 A B 8

  9. 17 Polynomial Classifier more general • Decision function is approximated by a polynomial function g(x) , of order p e.g. p = 2 :  l l 1 l l         2 g x ( ) w w x w x x w x 0 i i im i m ii i      i 1 i 1 m i 1 i 1   T ( ) , g x w z w 0 with    T w w w w , , , w , w , 1 2 12 11 22     T   T 2 2 z  x x x x x , , , , x  and x x x , 1 2 1 2 1 2 1 2 – Special case of a Two-Layer Perceptron – Activation function with polynomial input 18 Nonlinear Classifiers: Agenda Part II: Nonlinear Classifiers • Polynomial Classifier • Radial Basis Function Network • Special case of a two-layer network • Radial Basis activation Function • Training is simpler and faster • Nonlinear Support Vector Machine • Application: ZIP Code, OCR, FD (W-RVM) • Demo: libSVM, DHS or Hlavac 9

  10. 19 Radial Basis Function • Radial Basis Function Networks (RBF) k    • Choose g x ( ) w w g x ( ) 0 i i  i 1    2 x c     i with g x ( ) exp   i  2 2   i 20 Radial Basis Function    2 k x c        i g x ( ) w w g x ( ) with ( ) exp g x   0  i i i 2 2    i 1 i Examples:   c 2.5, 0.0, 1.0, 1.5, 2.0, i  i 1,..., , k  k 5,   1/ 2   c 2.5, 0.0, 1.0, 1.5, 2.0 i  i 1,..., , k  k 5,   1/ 12  How to choose c , ? k i , i 10

  11. 21 Radial Basis Function • Radial Basis Function Networks (RBF) • Equivalent to a single layer network, with RBF activations and linear output node. 22 Radial Basis Function: XOR problem   X H   2 z exp( x c ) x     1  2 2 z ( ) x    2  A exp( x c )      2 (0,0) 1   1 A B z x (1,1) (0,1)     1 0 1       c   , c   , A 1 2 1 2      1 0 2 x (1,1) B  1 z (1,0) A 1 B  (0,1) 1 (0,0) (1,0) 1       0 0 . 135 :      x  0   1  X 2     1 1       1   0 . 135  1 B A (0,1)     (1,1) 1 0 . 368          0 0 . 368 x     0 0 . 368 1  A 1     B (0,0) (1,0)     1 0 . 368     g z ( ) z z 1 0 1 2    2    2   g x ( ) exp( x c ) exp( x c ) 1 0 1 2 … not linear separable pattern set in X . … separable using a nonlinear function (RBF) in X that separates the set in H with a linear decision hyperplane! 11

  12. 23 Radial Basis Function • Decision function as summation of k RBF’s     T k  ( x c ) ( x c )     i i  g x ( ) w w exp  0 i 2   2  i 1 i • Training of the RBF networks 1. Fixed centers: Choose centers randomly among the data points. Also fix σ i ’ s . Then is a typical linear   T g x ( ) w w z 0 classifier design. 2. Training of the centers c i : This is a nonlinear optimization task. 3. Combine supervised and unsupervised learning procedures. 4. The unsupervised part reveals clustering tendencies of the data and assigns the centers at the cluster representatives. 24 Nonlinear Classifiers: Agenda Part II: Nonlinear Classifier • Polynomial Classifier • Radial Basis Function Network • Nonlinear Support Vector Machine • Application: ZIP Code, OCR, FD (W-RVM) • Demo: libSVM, DHS or Hlavac 12

  13. 25 Nonlinear Classifiers: SVM XOR problem: • linear separation in high dimensional space H via nonlinear functions (polynomial and RBF’s) in the original space X .   :   • for this we found nonlinear mappings x X H X direct ? X H     z x H linear  Is that possible without knowing the mapping function ?!? 26 Non-linear Support Vector Machines – Recall that, the probability of having linearly separable classes increases as the dimensionality of feature vectors increases . Assume the mapping:     l k x R z R , k l -> Then use linear SVM in k R 13

  14. 27 Non-linear SVM • Support Vector Machines:   k with x z R – Recall that in this case the dual problem formulation will be   N N N  1            T  arg max y y z z subject to y 0, 0 i i j i j i j i i i  2    i i j , i 1      k where z R , y 1,1 (class labels) i – the classifier will be   T g z ( ) w z w 0 N  s    T y z z w i i i 0  i 1 28 Non-linear SVM • Thus, only inner products in a high dimensional space are needed! => Something clever ( kernel trick ): Compute the inner products in the high dimensional space as functions of inner products performed in the low dimensional space!!! 14

Recommend


More recommend