 
              9/19/2017 Announcements Recognition wrap-up • Assignment 1 due Sept 22 11:59 pm on Canvas & • Hw2 is out and due Wed Oct 11 Self-supervised representation learning • Next week: CNN hands-on tutorial with Ruohan Gao and Tushar Nagarajan • Bring laptop Kristen Grauman • Set up your TACC portal account in advance UT-Austin Wed Sept 20, 2017 Last time: Three landmark case Outline studies for image classification • Last time • Spatial verification for instance recognition • Recognizing categories • Today • Wrap up on categories/classifiers • Self-supervised learning • External papers & assigned paper discussion Boosting + face SVM + person NN + scene Gist • Shuffle and Learn (Yu-Chuan) detection detection classification • Colorization (Keivaun) • Curious Robot (Ginevra) Viola & Jones e.g., Hays & Efros e.g., Dalal & Triggs • Experiment • Network dissection (Thomas and Wonjoon) Slide credit: Kristen Grauman Linear classifiers Last time • Intro to categorization problem • Object categorization as discriminative classification • Boosting + fast face detection example • Nearest neighbors + scene recognition example • Support vector machines + pedestrian detection example • Pyramid match kernels, spatial pyramid match • Convolutional neural networks + ImageNet example 1
9/19/2017 Linear classifiers Support Vector Machines (SVMs) • Find linear function to separate positive and negative examples • Discriminative    x positive : x w b 0 classifier based on i i    x negative : x w b 0 optimal separating i i hyperplane • Maximize the margin between the positive and negative training Which line examples is best? Support vector machines Support vector machines • Want line that maximizes the margin. • Want line that maximizes the margin.         x positive ( y 1) : x w b 1 x positive ( y 1) : x w b 1 i i i i i i             x negative ( y 1) : x w b 1 x negative ( y 1) : x w b 1 i i i i i i x i w   b   1 x i w   b   1 For support, vectors, For support, vectors,   | x w b | Distance between point i and line: || w || For support vectors: w Τ x  b  1  1 1 2     M Support vectors Support vectors w w Margin M w w w Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 Support vector machines Finding the maximum margin line • Want line that maximizes the margin. 1. Maximize margin 2/|| w || 2. Correctly classify all training data points:         x positive ( y 1) : x w b 1 x positive ( y 1) : x w b 1 i i i i i i       x negative ( y 1) : x w b 1       x negative ( y 1) : x w b 1 i i i i i i     x i w b 1 For support, vectors, Quadratic optimization problem :   | x w b | Distance between point i 1 and line: || w || w T Minimize w 2 Therefore, the margin is 2 / || w || Subject to y i ( w · x i + b ) ≥ 1 Support vectors Margin M 2
9/19/2017 Finding the maximum margin line Finding the maximum margin line       w i y x w i y x • Solution: • Solution: i i i i i i b = y i – w · x i (for any support vector)        w x b y x x b learned Support i i i i weight vector • Classification function:    f ( x ) sign ( w x b)        sign y x x b i i i i C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 Person detection Non-linear SVMs with HoG’s & linear SVM’s  Datasets that are linearly separable with some noise work out great: • Map each grid cell in the x 0 input window to a histogram  But what are we going to do if the dataset is just too hard? counting the gradients per x 0 orientation.  How about … mapping data to a higher-dimensional space: • Train a linear SVM using x 2 training set of pedestrian vs. non-pedestrian windows. Dalal & Triggs, CVPR 2005 0 Code available: x http://pascal.inrialpes.fr/soft/olt/ Nonlinear SVMs Example • The kernel trick : instead of explicitly computing 2-dimensional vectors x=[ x 1 x 2 ]; the lifting transformation φ ( x ), define a kernel let K (x i ,x j )=(1 + x i T x j ) 2 function K such that K ( x i , x j j ) = φ ( x i ) · φ ( x j ) Need to show that K (x i ,x j )= φ(x i ) T φ(x j ): K (x i ,x j )=(1 + x i T x j ) 2 , • This gives a nonlinear decision boundary in the = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2 x i1 x j1 + 2 x i2 x j2 original feature space: = [1 x i1 2 √ 2 x i1 x i2 x i2 2 √ 2 x i1 √ 2 x i2 ] T    y K ( x , x ) b i i i [1 x j1 2 √ 2 x j1 x j2 x j2 2 √ 2 x j1 √ 2 x j2 ] i = φ(x i ) T φ(x j ), where φ(x) = [1 x 1 2 √ 2 x 1 x 2 x 2 2 √ 2 x 1 √ 2 x 2 ] 3
9/19/2017 Examples of kernel functions SVMs for recognition  Linear:  T K ( x , x ) x x 1. Define your representation for each i j i j example. 2. Select a kernel function. 2  x x i j 3. Compute pairwise kernel values    Gaussian RBF: K ( x ,x ) exp( ) i j  2 2 between labeled examples 4. Use this “kernel matrix” to solve for SVM support vectors & weights.  Histogram intersection: 5. To classify a new example: compute   K ( x , x ) min( x ( k ), x ( k )) kernel values between new input i j i j and support vectors, apply weights, k check sign of output. Kristen Grauman Partially matching sets of features What about a matching kernel? Optimal match: O(m 3 ) Greedy match: O(m 2 log m) Pyramid match: O(m) ( m =num pts) We introduce an approximate matching kernel that makes it practical to compare large sets of features based on their partial correspondences. Local feature correspondence useful similarity [Previous work: Indyk & Thaper, Bartal, Charikar, Agarwal & measure for generic object categories Varadarajan, …] Kristen Grauman Kristen Grauman Pyramid match: main idea Pyramid match: main idea Feature space partitions serve to “match” the local descriptors within successively wider regions. descriptor space Histogram intersection counts number of possible matches at a given partitioning. Kristen Grauman Kristen Grauman 4
9/19/2017 Pyramid match kernel Pyramid match kernel Optimal match: O(m 3 ) Pyramid match: O(mL) measures number of newly matched difficulty of a pairs at level match at level optimal partial • For similarity, weights inversely proportional to bin size matching (or may be learned) • Normalize these kernel values to avoid favoring large sets [Grauman & Darrell, ICCV 2005] Kristen Grauman Unordered sets of local features: Spatial pyramid match No spatial layout preserved! • Make a pyramid of bag-of-words histograms. • Provides some loose (global) spatial layout information Too much? Too little? [Lazebnik, Schmid & Ponce, CVPR 2006] Spatial pyramid match Spatial pyramid match • Can capture scene categories well---texture-like patterns • Make a pyramid of bag-of-words histograms. but with some variability in the positions of all the local • Provides some loose (global) spatial layout pieces. information Sum over PMKs computed in image coordinate space, one per word. [Lazebnik, Schmid & Ponce, CVPR 2006] 5
9/19/2017 SVMs: Pros and cons Spatial pyramid match • Pros • Can capture scene categories well---texture-like patterns • Kernel-based framework is very powerful, flexible but with some variability in the positions of all the local • Often a sparse set of support vectors – compact at test time pieces. • Work very well in practice, even with very small training • Sensitive to global shifts of the view sample sizes • Cons • No “direct” multi-class SVM, must combine two-class SVMs • Can be tricky to select best kernel function for a problem • Computation, memory – During training time, must compute matrix of kernel values for every pair of examples – Learning can take a very long time for large-scale problems Confusion table Adapted from Lana Lazebnik Traditional Image Categorization: Recall: Evolution of methods Training phase Training Training Training Images Labels • Hand-crafted models • Hand-crafted features • “End-to-end” learning of Image Classifier Trained • 3D geometry • Learned models Features Training Classifier features and • Hypothesize and align • Data-driven models*,** Slide credit: Jia-Bin Huang Traditional Image Categorization: Learning a Hierarchy of Feature Extractors Testing phase • Each layer of hierarchy extracts features from output Training Training Training Images Labels of previous layer • All the way from pixels  classifier Image Classifier Trained • Layers have the (nearly) same structure Features Training Classifier Labels Image/video Image/Video Simple Testing Pixels Layer 1 Layer 1 Layer 2 Layer 2 Layer 3 Layer 3 Classifier Prediction Image Trained • Train all layers jointly Classifier Features Outdoor Test Image Slide credit: Jia-Bin Huang Slide: Rob Fergus 6
Recommend
More recommend