lecture 18 recognition iv
play

Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman - PDF document

Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman Outline Discriminative classifiers SVMs Learning categories from weakly supervised images Constellation model Shape matching Shape context, visual


  1. Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman

  2. Outline • Discriminative classifiers – SVMs • Learning categories from weakly supervised images – Constellation model • Shape matching – Shape context, visual CAPTCHA application

  3. Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. Each dimension: output of a possible rectangle feature on faces and non-faces.

  4. Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. = Each dimension: output of a possible rectangle feature on faces and non-faces.

  5. Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. Optimal threshold that results = in minimal misclassifications Image subwindow Each dimension: output of a Notice that any threshold giving same error possible rectangle feature rate would be equally good here. on faces and non-faces.

  6. Lines in R 2 + + = ax by d 0

  7. Lines in R 2 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let ⎣ ⎦ ⎣ ⎦ + + = ax by d 0

  8. Lines in R 2 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0

  9. Lines in R 2 ( ) x 0 , y 0 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let D ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0

  10. Lines in R 2 ( ) x 0 , y 0 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let D ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0 + + Τ + ax by d d w x distance from = = D 0 0 point to line + a b w 2 2

  11. Lines in R 2 ( ) x 0 , y 0 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let D ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0 + + Τ + ax by d d w x distance from = = D 0 0 point to line + a b w 2 2

  12. Planes in R 3 ⎡ ⎤ ⎡ ⎤ a x ( ) ⎢ ⎥ ⎢ ⎥ x y z , , = = b y w x Let ⎢ ⎥ ⎢ ⎥ 0 0 0 ⎢ ⎥ ⎢ ⎥ w c z ⎣ ⎦ ⎣ ⎦ D + + + = ax by cz d 0 ⋅ + = d w x 0 + + + Τ + ax by cz d d w x distance from = = D 0 0 0 point to plane + + a b c w 2 2 2

  13. Hyperplanes in R n ∈ n R x Hyperplane H is set of all vectors which satisfy: + + + + = w x w x w x b K 0 n n 1 1 2 2 Τ + = b w x 0 distance from Τ + b w x = D H point to ( , x ) w hyperplane

  14. Support Vector Machines (SVMs) • Discriminative classifier based on optimal separating hyperplane • What hyperplane is optimal?

  15. Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 w x + b>0 denotes -1 w x + b=0 How would you classify this data? w x + b<0 Slides from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html

  16. Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 How would you classify this data?

  17. Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 How would you classify this data?

  18. Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 Any of these would be fine.. ..but which is best?

  19. Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 How would you classify this data? Misclassified to +1 class

  20. Classifier Margin Classifier Margin x x f f y est y est f ( x , w ,b) = sign( w x + b) f ( x , w ,b) = sign( w x + b) denotes + 1 denotes + 1 Define the margin Define the margin denotes -1 denotes -1 of a linear of a linear classifier as the classifier as the width that the width that the boundary could be boundary could be increased by increased by before hitting a before hitting a datapoint. datapoint.

  21. Maximum Margin x f y est 1. Maximizing the margin is good according to intuition and theory f ( x , w ,b) = sign( w x + b) denotes + 1 2. Implies that only support vectors are important; other training examples The maximum denotes -1 are ignorable. margin linear 3. Empirically it works very very well. classifier is the linear classifier Support Vectors with maximum are those margin. datapoints that the margin This is the pushes up simplest kind of against SVM (Called an LSVM) Linear SVM

  22. Linear SVM Mathematically ” 1 + = x + s M = Margin Width s a l C e t n c o i d z e r P “ X - ” 1 - wx+ b= 1 = s s a l C wx+ b= 0 e t c n i o d z e wx+ b= -1 r P “ For the support vectors, distance to hyperplane is 1 for a positives and -1 for negatives. − + b ± Τ 1 1 2 w x 1 = − = = M w w w w w

  23. Question • How should we choose values for w,b? 1.want the training data separated by the hyperplane so it classifies them correctly 2.want the margin width M as large as possible

  24. Linear SVM Mathematically � Goal: 1) Correctly classify all training data + b ≥ wx i 1 if y i = +1 + b ≤ wx i if y i = -1 1 + b ≥ y wx for all i ( ) 1 i i 2 = M 2) Maximize the Margin w same as minimize 1 w t w 2 Formulated as a Quadratic Optimization Problem, solve for w and b: � 1 Φ = t w w w ( ) � Minimize 2 + b ≥ y wx ∀ i ( ) 1 subject to i i

  25. The Optimization Problem Solution � Solution has the form (omitting derivation): w = Σ α i y i x i b = y k - w T x k for any x k such that α k ≠ 0 � Each non-zero α i indicates that corresponding x i is a support vector. � Then the classifying function will have the form: f ( x ) = Σ α i y i x i T x + b � Notice that it relies on an inner product between the test point x and the support vectors x i � Solving the optimization problem also involves computing the inner products x i T x j between all pairs of training points.

  26. Non-linear SVMs � Datasets that are linearly separable with some noise work out great: x 0 � But what are we going to do if the dataset is just too hard? x 0 � How about … mapping data to a higher-dimensional space: x 2 0 x

  27. Non-linear SVMs: Feature spaces � General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x )

  28. The “ Kernel Trick ” The linear classifier relies on dot product between vectors K (x i ,x j )=x i T x j � If every data point is mapped into high-dimensional space via some � transformation Φ : x → φ (x), the dot product becomes: K (x i ,x j )= φ (x i ) T φ (x j ) A kernel function is similarity function that corresponds to an inner product � in some expanded feature space. Example: � 2-dimensional vectors x=[ x 1 x 2 ]; let K (x i ,x j )=(1 + x i T x j ) 2 Need to show that K (x i ,x j )= φ (x i ) T φ (x j ): K (x i ,x j )=(1 + x i T x j ) 2 , = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2 x i1 x j1 + 2 x i2 x j2 2 √ 2 x i1 x i2 x i2 2 √ 2 x i1 √ 2 x i2 ] T [1 x j1 2 √ 2 x j1 x j2 x j2 2 √ 2 x j1 √ 2 x j2 ] = [1 x i1 = φ (x i ) T φ (x j ), where φ (x) = [1 x 1 2 √ 2 x 1 x 2 x 2 2 √ 2 x 1 √ 2 x 2 ]

  29. Examples of General Purpose Kernel Functions � Linear: K ( x i , x j )= x i T x j � Polynomial of power p : K ( x i , x j )= (1+ x i T x j ) p � Gaussian (radial-basis function network): 2 − x x = − i j K ( x , x ) exp( ) σ i j 2 2

  30. SVMs for object recognition 1. Define your representation for each example. 2. Select a kernel function. 3. Compute pairwise kernel values between labeled examples, identify support vectors. 4. Compute kernel values between new inputs and support vectors to classify.

  31. Example: learning gender with SVMs Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002. Moghaddam and Yang, Face & Gesture 2000.

  32. Face alignment processing Processed faces Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

  33. Learning gender with SVMs • Training examples: – 1044 males – 713 females • Experiment with various kernels, select Gaussian RBF

  34. Support Faces Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

  35. Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

  36. Gender perception experiment: How well can humans do? • Subjects: – 30 people (22 male, 8 female) – Ages mid-20’s to mid-40’s • Test data: – 254 face images (6 males, 4 females) – Low res and high res versions • Task: – Classify as male or female, forced choice – No time limit Moghaddam and Yang, Face & Gesture 2000.

  37. Gender perception experiment: How well can humans do? Error Error Moghaddam and Yang, Face & Gesture 2000.

  38. Human vs. Machine • SVMs perform better than any single human text subject

  39. Hardest examples for humans Moghaddam and Yang, Face & Gesture 2000.

  40. Summary: SVM classifiers • Discriminative classifier • Effective for high-dimesional data • Flexibility/modularity due to kernel • Very good performance in practice, widely used in vision applications

Recommend


More recommend