Multimodal semi-supervised learning for image classification - PowerPoint PPT Presentation

Multimodal semi-supervised learning for image classification Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Grenoble, France

Motivation and goal Images often come with additional textual info. Videos with scripts and subtitles, ... Matthieu Guillaumin, INRIA Grenoble 2/21

Goal of this work Visual object category recognition, Leveraging user tags available on : Tags wow San Fransisco Golden Gate Bridge SBP2005 top-f50 fog SF Chronicle 96 hours Matthieu Guillaumin, INRIA Grenoble 3/21

Overview of the talk (A) Data sets and features (B) Learning scenarios using images with tags (1) Supervised multimodal classification (2) Multimodal semi-supervised scenario (3) Weakly supervised learning Matthieu Guillaumin, INRIA Grenoble 4/21

Data sets of images with tags PASCAL VOC 07, ≈ 10000 images, 804 Flickr tags, 20 classes. Flickr tags : india aviation, airplane, airport Class labels : cow aeroplane MIR Flickr, 25000 images, 457 Flickr tags, 38 classes. Flickr tags : desert, nature, landscape, sky rose, pink Class labels : clouds, plant life, sky, tree flower, plant life Matthieu Guillaumin, INRIA Grenoble 5/21

Flickr tags as textual features Restrict to the most frequent tags. PASCAL VOC’07 tags 10 2 Tag frequency 10 1 10 0 10 0 10 1 10 2 10 3 10 4 Sorted tag index Binary vector of tag presence/absence. Linear kernel counts the number of shared tags. Matthieu Guillaumin, INRIA Grenoble 6/21

Combination of several visual features RBF kernel on average distance between 15 image representations: Bag-of-features histograms: Harris interest points and dense grid, SIFT [Lowe, 2004] and Hue [van de Weijer & Schmid, 2006], K-means quantization. Color histograms: RGB, HSV and Lab colorspaces, 16 bins per channel. GIST [Oliva & Torralba, 2001], 2 spatial layouts Global, 3 horizontal regions [Lazebnik et al. , 2006], Only global for GIST. Matthieu Guillaumin, INRIA Grenoble 7/21

Learning scenarios using images with tags Supervised multimodal classification 1 Multimodal semi-supervised scenario 2 Weakly supervised learning 3 Matthieu Guillaumin, INRIA Grenoble 8/21

Supervised multimodal classification Flickr tags = additional features for classification. Tags also available at test time, MKL to combine visual and textual kernels. DOG (+1) not DOG ( − 1) DOG? greyhound running athlete horse cars sport vermont racing dog computer rottweiler dual pets monitor yacht → black canine pet locomotive puppy cute dog Matthieu Guillaumin, INRIA Grenoble 9/21

Results of multimodal classification on PASCAL VOC 2007 tags image image+tags PASCAL VOC’07 1 Average Precision 0 . 8 0 . 6 0 . 4 0 . 2 0 e t w e g r n e e a r r p s t e d t e n n a l i n o l u o a a o n l f o e k r s i a t b a c c o a t e b d o i i c r s a a t h c y e a i b b o r r o s l n h b l c p c e t M t r h p s b g d o i o p b o n e m t r o i t e n v t m a i o t d p Tags (0.43) < Image (0.53) < Image+tags (0.67) Winner of PASCAL VOC’07: 0.59. Similar observation for MIR Flickr. Matthieu Guillaumin, INRIA Grenoble 10/21

Multimodal semi-supervised scenario Large pool of additional unlabeled images with tags. Tags NOT available at test time: visual categorization. DOG? DOG (+1) Unlabeled greyhound running athlete vermont sport horse dog rottweiler canine pets pet → puppy not DOG ( − 1) dog computer dual railroads monitor train car locomotive auto Matthieu Guillaumin, INRIA Grenoble 12/21

Three-step learning process In a nutshell, predict labels for the unlabeled images: 1 Train an MKL classifier on labeled images and tags. 2 Score unlabeled data. 3 Train an image-only classifier. 2 options: SVM: 1 Use unlabeled data with label from sign of MKL score, Using only the sign, we dismiss the confidence of classification. LSR: 2 Least-squares regression of MKL scores using the visual kernel, Regularized using KPCA projection. Matthieu Guillaumin, INRIA Grenoble 13/21

Experimental comparison Baselines: 1 Supervised, image-only: SVM , 2 Semi-supervised, image-only: SVM+SVM , 3 Semi-supervised, multimodal: Co-training , with SVM on images and SVM on tags. [Blum & Mitchell, 98] Our three-step learning approach (semi-supervised, multimodal): 1 MKL learned on labeled images with tags, followed by visual-only SVM trained on labeled and unlabeled images: MKL+SVM , 2 MKL, followed by LSR: MKL+LSR . Matthieu Guillaumin, INRIA Grenoble 14/21

Results of semi-supervised learning SVM+SVM Co-training MKL+SVM MKL+LSR SVM PASCAL VOC’07 MIR Flickr 45% 40% Mean AP 35% 30% 25% 20% 40 100 200 40 100 200 Number of labeled training examples SVM+SVM worse than baseline. With little supervision, MKL+LSR is significantly better. With more supervision, differences shrink. Matthieu Guillaumin, INRIA Grenoble 15/21

Weakly supervised scenario For learning: no manual annotation, but Flickr tags, Other tags used as additional features. For evaluation: ground-truth labels. DOG? greyhound running athlete vermont sport horse dog rottweiler canine pets pet → puppy dog locomotive computer dual railroads monitor train Matthieu Guillaumin, INRIA Grenoble 17/21

Weakly supervised setting Tags are noisy annotations: Tag presence is relatively clean (82.0% precision) Tag absence is relatively uninformative (17.8% recall) Our approach, modified: Learn a multimodal MKL with tag annotations, 1 Rank training images and remove the images that yield highest 2 MKL scores but do not have the tag, Fit LSR. 3 Baseline: visual-only SVM learned on images with tag annotations. Matthieu Guillaumin, INRIA Grenoble 18/21

Results on 18 classes of MIR Flickr Baseline MKL+LSR 41% 40% Mean AP 39% 38% 37% 2000 4000 6000 8000 10000 Number of removed training negatives mAP on 18 MIR Flickr classes. On average, MKL+LSR outperforms SVM baseline: SVM baseline better for 4 classes (up to +5.6%), MKL+LSR better for 14 classes (up to +9.8%). Matthieu Guillaumin, INRIA Grenoble 19/21

Conclusion We considered using Flickr tags for 3 scenarios: Supervised classification, 1 Semi-supervised learning of visual classifiers, 2 Weakly supervised learning of visual classifiers. 3 We proposed a three-step learning process: Training of a multimodal classifier on labeled data, 1 Classification of the unlabeled data, 2 Regression of the multimodal classifier. 3 Our multimodal approach using Flickr tags improves over: Visual-only SVM on all three scenarios, Co-training for semi-supervised learning. Matthieu Guillaumin, INRIA Grenoble 20/21

Multimodal semi-supervised learning for image classification Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Grenoble, France

Multimodal semi-supervised learning for image classification - PowerPoint PPT Presentation

Multimodal semi-supervised learning for image classification Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Grenoble, France Motivation and goal Images often come with additional textual info. Videos with scripts and

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Semi-supervised Image Classification in Likelihood Space Rong Duan, Wei Jiang, Hong Man Stevens

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Minister: Rev. Sandra Cox Music Director: John Lindsay-Botten ~ OUR MISSION TO SHARE GODS LOVE

Collecting, err, Correcting Speech Errors Mark Johnson Brown University March, 2005 Joint work

Machine Listening in Complex Environments Some challenges in understanding musical and

AN ABSTRACT OF A THESIS PRIORITY-BASED AND PRIVACY-PRESERVING ELECTRIC VEHICLE DYNAMIC CHARGING

Temperature Screening Solution Foreword Human Skin-Surface temperature is an important indicator

Chaplaincy in the maritime world: an owners perspective Govert van Oord - ICMA World Conference -

Getting Back to Work After the Coronavirus Shutdown: Best Practices and Legal Risks DAVID

School Information General Briefing HOD / Student Management Mr Lukman Hakim Gracious School ,