Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

What object do these parts belong to?

Some local feature are very informative An object as a collection of local features (bag-of-features) • deals well with occlusion • scale invariant • rotation invariant

(not so) crazy assumption spatial information of local features can be ignored for object recognition (i.e., verification)

CalTech6 dataset Works pretty well for image-level classification Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-features represent a data item (document, texture, image) as a histogram over features

Bag-of-features represent a data item (document, texture, image) as a histogram over features an old idea (e.g., texture recognition and information retrieval)

Texture recognition histogram Universal texton dictionary Julesz, 1981 Mori, Belongie and Malik, 2001

Vector Space Model G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979 1 6 2 1 0 0 0 1 Tartan robot CHIMP CMU bio soft ankle sensor 0 4 0 1 4 5 3 2 Tartan robot CHIMP CMU bio soft ankle sensor http://www.fodey.com/generators/newspaper/snippet.asp

A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents?

A document (datapoint) is a vector of counts over each word (feature) v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · n ( · ) counts the number of occurrences just a histogram over words What is the similarity between two documents? Use any distance you want but the cosine distance is fast. v i d ( v i , v j ) = cos θ v i · v j = v j k v i kk v j k θ

but not all words are created equal

TF-IDF T erm F requency I nverse D ocument F requency v d = [ n ( w 1 ,d ) n ( w 2 ,d ) n ( w T,d )] · · · weigh each word by a heuristic v d = [ n ( w 1 ,d ) α 1 n ( w 2 ,d ) α 2 n ( w T,d ) α T ] · · · inverse document term frequency frequency ⇢ � D n ( w i,d ) α i = n ( w i,d ) log P d 0 1 [ w i ∈ d 0 ] (down-weights common terms)

Standard BOW pipeline (for image classification)

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

Dictionary Learning: Learn Visual Words using clustering 1. extract features (e.g., SIFT) from images

Dictionary Learning: Learn Visual Words using clustering 2. Learn visual dictionary (e.g., K-means clustering)

Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs

1. Quantization: image features gets associated to a visual word (nearest cluster center) Encode: build Bags-of-Words (BOW) vectors for each image

Encode: build Bags-of-Words (BOW) vectors for each image 2. Histogram: count the number of visual word occurrences

Feature Extraction What kinds of features can we extract?

• Regular ¡grid ¡ • Vogel ¡& ¡Schiele, ¡2003 • Fei-‑Fei ¡& ¡Perona, ¡2005 • Interest ¡point ¡detector ¡ • Csurka ¡et ¡al. ¡2004 • Fei-‑Fei ¡& ¡Perona, ¡2005 • Sivic ¡et ¡al. ¡2005 ¡ • Other ¡methods ¡ • Random ¡sampling ¡(Vidal-‑Naquet ¡& ¡ Ullman, ¡2002) • Segmentation-‑based ¡patches ¡(Barnard ¡ et ¡al. ¡2003)

Compute ¡SIFT ¡ Normalize ¡patch descriptor ¡ ¡ ¡ ¡ ¡ ¡ ¡[Lowe’99] Detect ¡patches ¡ [Mikojaczyk ¡and ¡Schmid ¡’02] ¡ [Mata, ¡Chum, ¡Urban ¡& ¡Pajdla, ¡’02] ¡ ¡ [Sivic ¡& ¡Zisserman, ¡’03]

Visual Vocabulary (coding and vector quantization)

Alternative perspective… visual vocabulary = code book visual word = code vector The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest code vector in a codebook

… Clustering

Visual ¡vocabulary … Clustering

K-means Clustering Given k: 1.Select initial centroids at random. 2.Assign each object to the cluster with the nearest centroid. 3.Compute each centroid as the mean of the objects assigned to it. 4.Repeat previous 2 steps until no change.

1. Select initial centroids at random

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid.

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the mean of the objects assigned to it (go to 2)

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid.

2. Assign each object to 1. Select initial the cluster with the centroids at random nearest centroid. 3. Compute each centroid as the 2. Assign each object to mean of the objects assigned to the cluster with the it (go to 2) nearest centroid. Repeat previous 2 steps until no change

From what data should I learn the code book? • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal”

Example ¡visual ¡vocabulary Fei-‑Fei ¡et ¡al. ¡2005

Example codebook … Appearance codebook Source: B. Leibe

Another codebook … … … … … Appearance codebook Source: B. Leibe

Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Computational efficiency • Vocabulary trees   (Nister & Stewenius, 2006)

Histogram

frequency ….. codewords

Classification

Given the bag-of-features representations of images from different classes, learn a classifier using machine learning (more on this soon)

Extension to bag-of- words models

All of these images have the same color histogram! How can we encode the spatial layout?

Spatial Pyramid representation level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial Pyramid representation level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial Pyramid representation level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well

create your own visual novel Ren'Py is a visual novel engine that helps you use words, images,

Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval Vtor

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning

Recognizing Handwritten Characters with Local Descriptors and Bags of Visual Words Presentation

VISUAL STORYTELLING 16. Oktober 2019 Eustory Next Generation Summit SOME WORDS ABOUT ME Hi!

Extracting keywords from images Bag-of-visual-words enriched with graph techniques Gjorgji

Efficient visual search of local features Cordelia Schmid Bag-of-features

Fast Discriminative Visual Codebooks using Randomized Clusering Forests Frank Moosmann, Bill

Synonyms Antonyms Are words Are words that mean the that mean the same opposite

MORPHOLOGY A Study of the internal structure of words and the relationships among words

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Simplicity in Practice https://xkcd.com/1349/ Words, words, words. Hamlet, Act 2 Scene

Words, Words, Words AND WHY THEY MATTER IN ADVERTISING AND MARKETING Steve Kaplan Becky

Today Some logistics Overview lecture on recognition models Visual Recognition and

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Words & Pictures Clustering and Bag of Words Many

1 Feature Extraction and Description Visual Vocabulary Construction From a database of

Proverbs Words: The Power of Life and Death Words: The Power of 3. Words: They Can Be

Visual Litter Surveys Where we came from where we are now and a new approach to Visual Litter

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

This week, we are going to look at words that have an /ear/ sound spelt using ere. This

Restatement Some unknown words are probably restated by using another word which is more familiar

1 Visual Supports: Categories Visual Supports in Everyday Life Organization of the

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie - PowerPoint PPT Presentation

Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well

create your own visual novel Ren'Py is a visual novel engine that helps you use words, images,

Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval Vtor

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning

Recognizing Handwritten Characters with Local Descriptors and Bags of Visual Words Presentation

VISUAL STORYTELLING 16. Oktober 2019 Eustory Next Generation Summit SOME WORDS ABOUT ME Hi!

Extracting keywords from images Bag-of-visual-words enriched with graph techniques Gjorgji

Efficient visual search of local features Cordelia Schmid Bag-of-features

Fast Discriminative Visual Codebooks using Randomized Clusering Forests Frank Moosmann, Bill

Synonyms Antonyms Are words Are words that mean the that mean the same opposite

MORPHOLOGY A Study of the internal structure of words and the relationships among words

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &amp;

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Simplicity in Practice https://xkcd.com/1349/ Words, words, words. Hamlet, Act 2 Scene

Words, Words, Words AND WHY THEY MATTER IN ADVERTISING AND MARKETING Steve Kaplan Becky

Today Some logistics Overview lecture on recognition models Visual Recognition and

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Words &amp; Pictures Clustering and Bag of Words Many

1 Feature Extraction and Description Visual Vocabulary Construction From a database of

Proverbs Words: The Power of Life and Death Words: The Power of 3. Words: They Can Be

Visual Litter Surveys Where we came from where we are now and a new approach to Visual Litter

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

This week, we are going to look at words that have an /ear/ sound spelt using ere. This

Restatement Some unknown words are probably restated by using another word which is more familiar

1 Visual Supports: Categories Visual Supports in Everyday Life Organization of the

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &

Words & Pictures Clustering and Bag of Words Many