lecture visual bag of words
play

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna - PowerPoint PPT Presentation

Visual bag of wods Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 07-Nov-2019 1 1 St Stanfor ord University CS 131 Roadmap Visual bag of wods Pixels Segments Images Videos Web


  1. Visual bag of wods Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 07-Nov-2019 1 1 St Stanfor ord University

  2. CS 131 Roadmap Visual bag of wods Pixels Segments Images Videos Web Recognition Neural networks Convolutions Resizing Motion Detection Convolutional Edges Segmentation Tracking Machine learning neural networks Descriptors Clustering 07-Nov-2019 2 St Stanfor ord University

  3. What we will learn today • Visual bag of words (BoW) • Spatial Pyramid Matching Visual bag of wods • Naive Bayes 07-Nov-2019 3 3 St Stanfor ord University

  4. What we will learn today • Visual bag of words (BoW) • Spatial Pyramid Matching Visual bag of wods • Naïve Bayes 07-Nov-2019 4 4 St Stanfor ord University

  5. Object Bag of ‘words’ Visual bag of wods 07-Nov-2019 5 St Stanfor ord University

  6. Origin 1: Texture Recognition Visual bag of wods 07-Nov-2019 Example textures (from Wikipedia) 6 St Stanfor ord University

  7. Origin 1: Texture Recognition • Texture is characterized by the repetition of basic elements or textons Visual bag of wods 07-Nov-2019 Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003 7 St Stanfor ord University

  8. Origin 1: Texture recognition histogram Visual bag of wods Universal texton dictionary Universal texton dictionary 07-Nov-2019 8 St Stanfor ord University

  9. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Visual bag of wods 07-Nov-2019 9 St Stanfor ord University

  10. 10 Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Visual bag of wods 07-Nov-2019 US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ 10 St Stanfor ord University

  11. 11 Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Visual bag of wods 07-Nov-2019 US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ 11 St Stanfor ord University

  12. 12 Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Visual bag of wods 07-Nov-2019 US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ 12 St Stanfor ord University

  13. Bags of features for object recognition Visual bag of wods face, flowers, building 07-Nov-2019 Works pretty well for image-level classification and for recognizing • object instances 13 St Stanfor ord University Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

  14. Bags of features for object recognition Caltech6 dataset Visual bag of wods bag of features bag of features Parts-and-shape model 07-Nov-2019 14 St Stanfor ord University

  15. Bag of features Visual bag of wods • First, take a bunch of images, extract features, and build up a “dictionary” or “visual vocabulary” – a list of common features • Given a new image, extract features and build a histogram – for each feature, find the closest visual word in the dictionary 07-Nov-2019 15 St Stanfor ord University

  16. Bag of features: outline 1. Extract features Visual bag of wods 07-Nov-2019 16 St Stanfor ord University

  17. Bag of features: outline 1. Extract features Visual bag of wods 2. Learn “visual vocabulary” 07-Nov-2019 17 St Stanfor ord University

  18. Bag of features: outline 1. Extract features Visual bag of wods 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 07-Nov-2019 18 St Stanfor ord University

  19. Bag of features: outline 1. Extract features Visual bag of wods 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words” 07-Nov-2019 19 St Stanfor ord University

  20. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 Visual bag of wods – Fei-Fei & Perona, 2005 07-Nov-2019 20 St Stanfor ord University

  21. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 Visual bag of wods – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005 07-Nov-2019 21 St Stanfor ord University

  22. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 Visual bag of wods – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005 • Other methods 07-Nov-2019 – Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003) 22 Stanfor St ord University

  23. 2. Learning the visual vocabulary … Visual bag of wods 07-Nov-2019 23 St Stanfor ord University

  24. 2. Learning the visual vocabulary … Visual bag of wods 07-Nov-2019 Clustering 24 St Stanfor ord University Slide credit: Josef Sivic

  25. 2. Learning the visual vocabulary Visual vocabulary … Visual bag of wods 07-Nov-2019 Clustering 25 St Stanfor ord University Slide credit: Josef Sivic

  26. K-means clustering recap • Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k Visual bag of wods å å = - 2 D ( X , M ) ( x m ) i k cluster k point i in cluster k Algorithm: • • Randomly initialize K cluster centers • Iterate until convergence: 07-Nov-2019 Assign each data point to the nearest center – Recompute each cluster center as the mean of all points – assigned to it 26 Stanfor St ord University

  27. From clustering to vector quantization • Clustering is a common method for learning a visual vocabulary or codebook Visual bag of wods – Unsupervised learning process – Each cluster center produced by k-means becomes a codevector – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features 07-Nov-2019 – A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook – Codebook = visual vocabulary – Codevector = visual word 27 Stanfor St ord University

  28. Example visual vocabulary Visual bag of wods 07-Nov-2019 28 St Stanfor ord University Fei-Fei et al. 2005

  29. Image patch examples of visual words Visual bag of wods 07-Nov-2019 29 St Stanfor ord University Sivic et al. 2005

  30. Visual vocabularies: Issues • How to choose vocabulary size? Visual bag of wods – Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting • Computational efficiency – Vocabulary trees (Nister & Stewenius, 2006) 07-Nov-2019 30 St Stanfor ord University

  31. 3. Image representation Visual bag of wods frequency 07-Nov-2019 ….. codewords 31 St Stanfor ord University

  32. Image classification • Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them? Visual bag of wods 07-Nov-2019 32 St Stanfor ord University

  33. Uses of BoW representation • Treat as feature vector for standard classifier Visual bag of wods – e.g k-nearest neighbors, support vector machine • Cluster BoW vectors over image collection – Discover visual themes 07-Nov-2019 33 St Stanfor ord University

  34. Large-scale image matching • Bag-of-words models have been useful in matching an image to a large database of Visual bag of wods object instances 11,400 images of game covers (Caltech games dataset) 07-Nov-2019 how do I find this image in the database? 34 St Stanfor ord University

  35. Large-scale image search Build the database: – Extract features from the database Visual bag of wods images – Learn a vocabulary using k-means (typical k: 100,000) – Compute weights for each word – Create an inverted file mapping words à images 07-Nov-2019 35 St Stanfor ord University

  36. Weighting the words • Just as with text, some visual words are more discriminative than others Visual bag of wods the, and, or vs. cow, AT&T, Cher • the bigger fraction of the documents a word appears in, the less useful it is for matching – e.g., a word that appears in all documents is not helping us 07-Nov-2019 36 St Stanfor ord University

  37. Large-scale image search query image top 6 results Visual bag of wods 07-Nov-2019 • Cons: – performance degrades as the database grows 41 St Stanfor ord University

  38. Large-scale image search • Pros: Visual bag of wods – Works well for CD covers, movie posters – Real-time performance possible 07-Nov-2019 real-time retrieval from a database of 40,000 CD covers Nister & Stewenius, Scalable Recognition with a Vocabulary Tree 42 St Stanfor ord University

  39. Example bag-of-words matches Visual bag of wods 07-Nov-2019 43 St Stanfor ord University

  40. Example bag-of-words matches Visual bag of wods 07-Nov-2019 44 St Stanfor ord University

Recommend


More recommend