Exploiting Multimodal Data for Image Understanding Matthieu - PowerPoint PPT Presentation

Exploiting Multimodal Data for Image Understanding Matthieu Guillaumin Supervised by Cordelia Schmid and Jakob Verbeek 27/09/2010

Multimodal data Webpages with images, videos, ... Videos with sound, scripts and subtitles, ... Matthieu Guillaumin, PhD defense 2/55

Images with user tags Leverage user tags available on or other sources: Tags wow San Fransisco Golden Gate Bridge SBP2005 top-f50 fog SF Chronicle 96 hours Matthieu Guillaumin, PhD defense 3/55

News images with captions Exploit to identify persons, retrieve images, ... An Iranian reads the last issue of the Farsi-language Nowruz in Tehran, Iran Wednesday, July 24, 2002. An appeals court on Wednesday confirmed the sen- Chanda Rubin of the United States returns a shot tence banning Iran’s leading reformist daily Nowruz during her match against Elena Dementieva of Russia from publishing for six months and its publisher, at the Hong Kong Ladies Challenge January 1, 2003. Mohsen Mirdamadi, who is President Mohammad Rubin beat Dementieva 6-4 6-1. (REUTERS/Bobby Khatami’s ally, from reporting for four years. Mir- Yip) damadi is head of the National Security and Foreign Policy Committee of the Iranian parliament. (AP Photo/Hasan Sarbakhshian) Matthieu Guillaumin, PhD defense 4/55

Use of multimodal data As additional features for classification, As labels for training (weak supervision), Or to build large collections of images automatically. Matthieu Guillaumin, PhD defense 5/55

Outline Introduction 1 Face verification 2 Logistic discriminant metric learning Experiments News images with captions 3 Graph-based approach for face naming Multiple-instance metric learning Images with user tags 4 Nearest neighbor image auto-annotation Experiments Multimodal classification 5 Conclusion 6 Matthieu Guillaumin, PhD defense 6/55

Visual verification Decide whether two faces images depict the same individual. Matthieu Guillaumin, PhD defense 8/55

Visual verification Decide whether two faces images depict the same individual. Matthieu Guillaumin, PhD defense 9/55

Related work On face recognition: Eigenfaces [Turk and Pentland, 1991] Fisherfaces [Belhummeur et al. , 1997] On visual verification: Patch sampling + Forest + SVM [Nowak and Jurie, 2007] One-shot similarities [Wolf et al. , 2008] Many low-level kernels + MKL [Pinto et al. , 2009] “Is that you? Metric learning approaches for face identification” [Guillaumin, Verbeek and Schmid, ICCV 2009] Matthieu Guillaumin, PhD defense 10/55

Mahalanobis metric learning Make positive pairs closer than negative pairs E x p a n d s s e r p m o C A B C Mahalanobis metrics d M ( x i , x j ) = ( x i − x j ) ⊤ M ( x i − x j ), where M is positive semidefinite (PSD). LMNN [Weinberger et al. , 2005], ITML [Davis et al. , 2007], MCML [Globerson and Roweis, 2005], ... Matthieu Guillaumin, PhD defense 11/55

Logistic discriminant metric learning (LDML) Model the probability of ( x i , x j ) to have the same label as: p ij = p ( y i = y j | x i , x j ; M , b ) = σ ( b − d M ( x i , x j )) where σ ( z ) = 1 / (1 + exp( − z )). 1 p = σ ( b − d ) 0 . 5 0 0 5 10 15 b d Matthieu Guillaumin, PhD defense 12/55

Logistic discriminant metric learning (LDML) Find M and b to maximize the likelihood on training data: p [ y i = y j ] (1 − p ij ) [ y i � = y j ] � L ( M , b ) = ij ( i , j ) Convex and smooth objective and convex PSD constraint: Very effective optimization methods. Kernelizable: Can handle very high dimensional data. Low-rank regularization: Reduces the number of parameters (linear), Defines a PSD matrix, Supervised dimensionality reduction, But: objective becomes non-convex. Desktop machine: ∼ 10 4 instances of 3500d in an hour. Matthieu Guillaumin, PhD defense 13/55

Data set of uncontrolled face images Labeled Faces in the Wild data set, 13233 images, 5749 individuals, standard evaluation protocol. Features: 9 locations × 3 scales × 128d SIFT → 3456d. [Everingham et al. , 2006] Matthieu Guillaumin, PhD defense 15/55

Comparison to other metric learning 0 . 9 L2 Eigenfaces 0 . 85 PCA-LMNN [Weinberger] PCA-ITML [Davis] Accuracy PCA-LDML [ours] 0 . 8 LDML low rank [ours] 0 . 75 0 . 7 0 . 65 35 55 100 200 500 Projection dimensionality Matthieu Guillaumin, PhD defense 16/55

Comparison to the state of the art Method Setting Accuracy Eigenfaces restricted 0.600 ± 0.8 [Nowak, 2007] restricted 0.739 ± 0.5 [Wolf, 2008] restricted 0.785 ± 0.5 [Pinto, 2009] restricted 0.794 ± 0.6 LDML [ours] restricted 0.793 ± 0.6 restricted ∗ [Kumar, 2009] 0.853 ± 1.2 [Wolf, 2008] unrestricted 0.793 ± 0.3 LDML [ours] unrestricted 0.838 ± 0.6 LDML+MkNN [ours] unrestricted 0.875 ± 0.4 Combined multishot [Wolf, 2009] aligned 0.895 ± 0.5 ∗ relies on additional training data. Matthieu Guillaumin, PhD defense 17/55

Face naming from news images The goal is to recover the names of the faces: German Chancellor Angela Merkel Kate Hudson and Naomi Watts , shakes hands with Chinese President Le Divorce, Venice Film Festival - Hu Jintao (. . . ) 8/31/2003. Images as sets of faces (using face detector [Viola and Jones, 2004]) , Captions as sets of labels (using NLP [Deschacht and Moens, 2006]) . Matthieu Guillaumin, PhD defense 19/55

Face naming from news images The goal is to recover the names of the faces: Hu Jintao Angela Merkel Kate Hudson Naomi Watts German Chancellor Angela Merkel Kate Hudson and Naomi Watts , shakes hands with Chinese President Le Divorce, Venice Film Festival - Hu Jintao (. . . ) 8/31/2003. Images as sets of faces (using face detector [Viola and Jones, 2004]) , Captions as sets of labels (using NLP [Deschacht and Moens, 2006]) . Matthieu Guillaumin, PhD defense 20/55

Related work On associating names and faces (videos): Name-It system [Satoh et al. , 1999] Video Google Faces and automatic naming in videos [Everingham, Sivic and Zissermann, 2006–2009] For still images: Gaussian mixture model (GMM) [Berg et al. , 2004–2007] Multimodal clustering [Pham et al. , 2008–2010] Identities and actions [Luo et al. , 2009] Graph-based method for retrieval [Ozkan and Duygulu, 2006–2010] “Automatic face naming using caption-based supervision” [Guillaumin, Mensink, Verbeek and Schmid, CVPR 2008] Matthieu Guillaumin, PhD defense 21/55

Graph-based approach Build a similarity graph: One vertex f i per face image, Edges are weighted with a similarity w ij , One sub-graph Y n for each name n . Find the sub-graphs Y n that maximize the sum of inner similarities: � � � max w ij { Y n } n f i ∈ Y n f j ∈ Y n Matthieu Guillaumin, PhD defense 23/55

Optimization As such, the global problem is intractable: Generally, the following holds: Faces can only be assigned to at most one name. 1 Faces can only be assigned to a name detected in the caption. 2 Names can only be assigned to at most one face. 3 Approximate solution: At document level, match detected faces with detected names, Can be solved exactly and efficiently, Iteration over documents until convergence. Y 1 f f Y 2 f f Y 3 f f f f f Y 4 f f 1 f 2 Y 5 f f f f f Matthieu Guillaumin, PhD defense 24/55

Data set and features Labeled Yahoo! News , with around 28.000 documents. Manually annotated. Same features as previous section. Study influence of LDML on both GMM and Graph-based approach. Matthieu Guillaumin, PhD defense 25/55

Results 1 Graph LDML-Graph 0 . 9 Precision of naming GMM [Berg] LDML-GMM 0 . 8 0 . 7 0 . 6 0 . 5 0 2 4 6 8 10 12 × 10 3 ) Number of named faces ( Matthieu Guillaumin, PhD defense 26/55

Exploiting Multimodal Data for Image Understanding Matthieu - PowerPoint PPT Presentation

Exploiting Multimodal Data for Image Understanding Matthieu Guillaumin Supervised by Cordelia Schmid and Jakob Verbeek 27/09/2010 Multimodal data Webpages with images, videos, ... Videos with sound, scripts and subtitles, ... Matthieu

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand & Ga

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

Visualization of Geant4 Data: Exploiting Component Visualization of Geant4 Data: Exploiting

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Multimodal Corpus for Integrated language and action Rishabh Nigam 10598 Cognitive Sciences

Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks Arjun

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Module 2 Image acquisition & preprocessing Uwe Springmann Centrum fr Informations- und

Matplotlib October 9, 2018 1 Lecture 16: Visualization with matplotlib CBIO (CSCI) 4835/6835:

Asymptotic Behaviour of the Quadratic Knapsack Problem Joachim Schauer Department of Statistics

but not as we know it tsp But first, an example TSP given n cities with x/y coordinates

Automating Disk Forensic Processing with SleuthKit, XML and Python Simson Garfinkel, Ph.D. May

4.1 Eulerian Circuits Recall the K onigsberg bridge problem we discussed in the first class.

Systems State Machines 3: State Minimization Shankar Balachandran* Associate Professor, CSE

Noise Characteriza.on and Filtering in the MicroBooNE LArTPC JINST 12 (2017) no. 08, P08003 Jyo.

Exploiting Multimodal Data for Image Understanding Matthieu - PowerPoint PPT Presentation

Exploiting Multimodal Data for Image Understanding Matthieu Guillaumin Supervised by Cordelia Schmid and Jakob Verbeek 27/09/2010 Multimodal data Webpages with images, videos, ... Videos with sound, scripts and subtitles, ... Matthieu

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Multimodal Corridor Planning &amp; Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand &amp; Ga

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

Visualization of Geant4 Data: Exploiting Component Visualization of Geant4 Data: Exploiting

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Multimodal Corpus for Integrated language and action Rishabh Nigam 10598 Cognitive Sciences

Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks Arjun

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Module 2 Image acquisition &amp; preprocessing Uwe Springmann Centrum fr Informations- und

Matplotlib October 9, 2018 1 Lecture 16: Visualization with matplotlib CBIO (CSCI) 4835/6835:

Asymptotic Behaviour of the Quadratic Knapsack Problem Joachim Schauer Department of Statistics

but not as we know it tsp But first, an example TSP given n cities with x/y coordinates

Automating Disk Forensic Processing with SleuthKit, XML and Python Simson Garfinkel, Ph.D. May

4.1 Eulerian Circuits Recall the K onigsberg bridge problem we discussed in the first class.

Systems State Machines 3: State Minimization Shankar Balachandran* Associate Professor, CSE

Noise Characteriza.on and Filtering in the MicroBooNE LArTPC JINST 12 (2017) no. 08, P08003 Jyo.

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand & Ga

Module 2 Image acquisition & preprocessing Uwe Springmann Centrum fr Informations- und