Joo Hyun Kim Visual Recognition and Search March 7, 2008 Outline - PowerPoint PPT Presentation

Joo Hyun Kim Visual Recognition and Search March 7, 2008

Outline � Introduction � Basics of co ‐ training and how it works � Datasets � Experiment results � Conclusion March 7, 2008 Utilizing text captions to classify images 2

Images with text captions March 7, 2008 Utilizing text captions to classify images 3

Introduction � Images often come up with text captions � Lots and lots of unlabeled data are available on the internet March 7, 2008 Utilizing text captions to classify images 4

Introduction � Motivation � How can we use text captions for visual object recognition? � Use both text captions and image contents as two separate, redundant views � Use lots of unlabeled training examples with text captions to improve classification accuracy March 7, 2008 Utilizing text captions to classify images 5

Introduction � Goal � Exploit multi ‐ modal representation (text captions and image contents) and unlabeled data (usually easily available): Co ‐ training � Learn more accurate image classifiers than standard supervised learning with abundant unlabeled data March 7, 2008 Utilizing text captions to classify images 6

Co ‐ training � First proposed by Blum and Mitchell (1998) � Semi ‐ supervised learning paradigm that exploits two distinct, redundant views � Features of dataset can be divided into two sets: = × X X X � The instance space: 1 2 x = ( x 1 x , ) � Each example: 2 � Proven to be effective at several domains � Web page classification (content and hyperlink) � E ‐ mail classification (header and body) March 7, 2008 Utilizing text captions to classify images 8

Two Assumptions of Co ‐ training � The instance distribution D is compatible with the target function f=(f 1 , f 2 ) � Each set of features are sufficient to classify examples. � The features in one set are conditionally independent of the features in the second set given a class � Informative as a random document March 7, 2008 Utilizing text captions to classify images 9

How Co ‐ training Works � Training process Retrained Retrained Classifier 1 Classifier 2 Supervised Training Unlabeled Feature 1 Feature 2 + + Instance 1 Initially ‐ Labeled Feature 1 Feature 2 Unlabeled ‐ Instances Feature 1 Feature 2 ‐ Instance 2 + March 7, 2008 Utilizing text captions to classify images 10

How Co ‐ training Works � Testing process Classifier 1 Classifier 2 Testing Feature 1 Feature 2 + Instance + + Confidence ‐ ‐ March 7, 2008 Utilizing text captions to classify images 11

Why Co ‐ training Works? � Intuitive explanation � One classifier finds an easily classified example (an example classified with high confidence) which maybe difficult for the other classifier � Provide useful information each other to improve overall accuracy March 7, 2008 Utilizing text captions to classify images 12

Simple Example on Image Classification Image Text Class Initially red apple Apple labeled instances Korean Pear pear Confidence < New unlabeled green apple Apple instance March 7, 2008 Utilizing text captions to classify images 13

Co ‐ training Algorithm � Given � labeled data L � unlabeled data U � Create a pool U’ of examples at random from U � Loop for k iterations: � Train C1 using L � Train C2 using L � Allow C1 to label p positive, n negative examples from U’ � Allow C2 to label p positive, n negative examples from U’ � Add these self ‐ labeled examples to L � Randomly choose 2p+2n examples from U to replenish U’ March 7, 2008 Utilizing text captions to classify images 14

Modified Algorithm in the Experiment � Inputs � Labeled examples set L and unlabeled examples set U represented by two sets of features, f 1 for image and f 2 for text � Train image classifier C1 with f1 portion of L and text classifier C2 with f2 portion of L � Loop until |U| = 0: � 1. Compute predictions and confidences of both classifiers for all instances of U � 2. For each f1 and f2, choose the m unlabeled instances for which its classifier has the highest confidence. For each such instance, if the confidence value is less than the threshold for this view, then ignore the instance and stop labeling instances with this view, else label the instance and add it to L 3. Retrain the classifiers for both views using the augmented L � � Outputs � Two classifiers f 1 and f 2 whose predictions are combined to classify new test instances. � A test instance is labeled with the class predicted by the classifier with the higher confidence. March 7, 2008 Utilizing text captions to classify images 15

Datasets Used � IsraelImage dataset � Classes: Desert and Trees � Total 362 images � 25 image features and 363 text features � Image contents are more ambiguous and general � Text captions are natural and do not contain any particular words to directly represent class � www.israelimages.com � Flickr dataset � Images are crawled from web with text captions & tags � Classes: Cars and Motorbike & Calculator and Motorbike � Total 907 images (Cars and Motorbike), 953 images (Calculator and Motorbike) � Image contents are more distinguishing between classes � Texts usually contain particular tags to represent class � www.flickr.com March 7, 2008 Utilizing text captions to classify images 17

Image Features – IsraelImage Divided into 4 ‐ by ‐ 6 grids RGB Representation 25 ‐ dimentional image features Gabor texture filter 30 ‐ K ‐ means dimenti clustering Lab Representation onal with vectors k = 25 μ σ skewness March 7, 2008 Utilizing text captions to classify images 18

Image Features – Flickr Images downloaded from flickr.com K ‐ means SIFT clustering 75 ‐ dimentional extractor with image features k = 75 March 7, 2008 Utilizing text captions to classify images 19

Text Features Natural captions Filter out “Bag of Words” stop Stemmer representation words Tags (JPEG IPTC info) March 7, 2008 Utilizing text captions to classify images 20

An Example � IsraelImages dataset Class: Desert Class: Trees March 7, 2008 Utilizing text captions to classify images 21

An Example � Flickr dataset • Caption: Arguably one of the • Caption: 2008 Paeroa Battle of most energy efficient pocket the Streets • Tag: elm ‐ pbos4, Paeroa battle of calculators ever made • Tag: pocket, calculator, casio, the streets, paeroa, motocycle, macro, 2902, ddmm, daily, … motorbike, race, racing, speed, … Class: Calculator Class: Motorbike March 7, 2008 Utilizing text captions to classify images 22

Experiments � Using WEKA (Witten, 2000), experiments are conducted with 10 ‐ fold cross validation, 1 run � In the Co ‐ training experiment, use SVM as base classifiers for both image and text classifiers � Comparing Co ‐ training with supervised SVM classifiers on concatenated features, only image, and only text features March 7, 2008 Utilizing text captions to classify images 24

Experiments � Datasets are manually labeled � Plot graphs based on the number of labeled examples and the classification accuracy � Pick labeled examples from the training set, the other examples are used as unlabeled examples March 7, 2008 Utilizing text captions to classify images 25

Results � IsraelImage dataset � Co ‐ training vs. Supervised SVM March 7, 2008 Utilizing text captions to classify images 26

Results � Flickr dataset � Cars & Motorbike � Co ‐ training vs. Supervised SVM March 7, 2008 Utilizing text captions to classify images 27

Results � Flickr dataset � Calculator & Motorbike � Co ‐ training vs. Supervised SVM March 7, 2008 Utilizing text captions to classify images 28

Discussion � Why IsraelImage set only shows improvement with co ‐ training? � Image and text classifiers are both sufficient to classify � Both classifiers are helping each other well � Why Flickr set shows worse performance? � Text classifier was too good (tag information is nearly as good as actual labels) � Image classifier actually harms the whole classification March 7, 2008 Utilizing text captions to classify images 29

Conclusion � Using both image contents and textual data helps classification of images � Exploiting redundant separate views improves classification accuracy on visual object recognition � Using unlabeled data improves supervised learning � To use co ‐ training effectively, the two assumptions should be met (compatibility and conditional independence) March 7, 2008 Utilizing text captions to classify images 31

Joo Hyun Kim Visual Recognition and Search March 7, 2008 Outline - PowerPoint PPT Presentation

Joo Hyun Kim Visual Recognition and Search March 7, 2008 Outline Introduction Basics of co training and how it works Datasets Experiment results Conclusion March 7, 2008 Utilizing text captions to classify images 2 Images

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in

KNOWLEDGE REPRESENTATION AND REASONING@UNL Joo Leite Who are we? Alfredo Gabaldon Carlos

Efficient visual search of local features Efficient visual search of local features Cordelia

Lecture 20 March 28th, 2013 Biostatistics 602 - Lecture 20 Hyun Min Kang March 28th, 2013 Hyun

Introduction to Visual Recognition General visual recognition importance for intelligence?

Introduction to Visual Search and Recognition Visual Search Tutorial Global representations:

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Learning about images from keyword-based Web search CS 395T: Visual Recognition and Search

MP2 - Precessional dynamics, dissipation processes, elementary and soliton excitations Joo-Von Kim

MP3 - Spin-transfer and spin-orbit torques, current topics in magnetisation dynamics Joo-Von Kim

MP - Magnetisation Processes Joo-Von Kim Centre for Nanoscience and Nanotechnology, Universit

Instance-level recognition 1) Local invariant features 2) Matching and recognition with local

Instance-level recognition 1) Local invariant features 2) Matching and recognition with local

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

A generalized Dupire formula and a stable way to estimate it P. Mayer mayer@opt.math.tugraz.at

Collaborative Health Systems a Universal American company CHS and ACO Overview May 2016 CHS Is

182.694 Microcontroller VU Martin Perner SS 2017 Featuring Today: Introduction App 1 Primer

Risk and Return - Capital Market Theory Chapter 8 Learning Objectives 2 Calculate the expected

University ePro Vendor Catalog Webinar Chartfields 1 Webinar Format Approximately 30

Video Captioning Erin Grant March 1 st , 2016 Last Class: Image Captioning From Kiros et al.

Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath

C O U N C I L O F M I C H I G A N F O U N D A T I O N S 4 4 T H A N N U A L C O N F E R E N C