Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing School of Computer Science, Carnegie Mellon University June 19, 2013 1
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 2
Background Query scuba+diving from Flickr Any meaningful structural summary? Taken in different spatial, temporal, and personal perspective Likely to share common storylines 3
Our Ultimate Goal An example of scuba+diving storyline beach on boat diving underwater dinner boat coral sunset cf) ranking and retrieval by Narrative structural summary vs. independently retrieved images Reconstructing photo storylines from large-scale online images 4
Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation Segment K common regions Match images from • from aligned M images • different photo streams PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) 5
Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation • Online images are too diverse to segment together at once • The alignment discovers the images that share common regions PS2 User 1 at 10/19/2008 (Cayman Islands) PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) 6
Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding ! Alignment Cosegmentation • Improve image matching by a better image similarity measure Closing a loop between the two tasks 7
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 8
Flickr Dataset Flickr dataset of 15 outdoor recreational activities • Experiments with more than 100K images of 1K photo streams • Larger than those of previous work by orders of magnitude # photo streams # of images ( 13,157 ) ( 1,514,976 ) S urfing H orse A ir RA fting YA cht B each R iding B allooning RO wing S cuba F ormula SN ow S afari D iving O ne boarding P ark M ountain R ock L ondon T our de F ly 9 C amping C limbing M arathon F rance F ishing
Image Descriptor and Similarity Measure Image description • HSV color SIFT and HOG features on regular grid • L1 normalized spatial pyramid histogram using 300 visual words Image similarity measure : • (Our assumption) Segmentation enhances the image alignment. 2. Segmentation available 1. No segmentation available • Histogram intersection on SPH Histogram intersection on the • best assignment of segments Not robust against location/pose • changes 10
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 11
Alignment of Photo Streams Input: A set of photo streams (PS): P = { P 1,…, P L} Photo Stream: a set of photos taken in sequence by a single user • in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric P 1 P 2 P 3 P 4 12
Alignment of Photo Streams Input: A set of photo streams (PS): P = { P 1,…, P L} Photo Stream: a set of photos taken in sequence by a single user • in a single day Idea: Align all photo streams at once after building K-NN graph • Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric For simplicity, first consider pairwise alignment of two photo streams P 3 Pairwise alignment P 1 P 2 P 4 13
Pairwise Alignment f : P 1 → P 2 ∪ { ∅ } Goal of alignment: find a matching btw a pair of PS f ( I ) = ∅ • means I in P 1 has no match in P 2 . Optimization: MRF-based energy minimization • Flexibility: Various energy terms • Solved by discrete BP ∑ ∑ ∑ d ( I i , ˆ η min( t ( I i ) − t ( ˆ ρ min( t ( ˆ I i ) − t ( ˆ E ( P 1 , P 2 ) = + I i ), τ ) + I j ), ν ) I i ) Ii ∈ P 1 Ii ∈ P 1 ( Ii , I j ) ∈ δ 14
Pairwise Alignment Objective function ∑ ∑ ∑ d ( I i , ˆ η min( t ( I i ) − t ( ˆ ρ min( t ( ˆ I i ) − t ( ˆ E ( P 1 , P 2 ) = + I i ), τ ) + I j ), ν ) I i ) Ii ∈ P 1 Ii ∈ P 1 ( Ii , I j ) ∈ δ Data term : The Smoothness term : The Time term : The matched image pairs matched images to matched image pairs should be visually neighbors in P 1 should be should be temporally similar. neighbors in P 2. similar. 9AM 6PM 10AM 15
Alignment of Multiple Photo Streams Objective : MRF-based energy minimization ∑ E All = E ( P i , P j ) ( P i , P j ) ∈ Ξ : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations P 3 Pairwise alignment Pairwise P 1 Pairwise alignment alignment Pairwise alignment P 2 P 4 16
Alignment of Multiple Photo Streams Objective : MRF-based energy minimization ∑ E All = E ( P i , P j ) ( P i , P j ) ∈ Ξ : All pairs of NN photo streams Message-passing based optimization • until convergence or for fixed iterations P 3 P 1 P 2 P 4 17
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 18
Build an Image Graph Idea: Connect the images that are similar enough to be cosegmented Image Graph G = ( I , E ) • I : The set of images. E : The set of edges. • E = EB U Ew EB : Edges between different photo streams (results of alignment) EW : Edges within a photo stream t ( I ) − t ( I i ) ≤ δ For each image I , consider the images such that links I with the K-NN of I ( EW ). Ew EB 19
Scalable Cosegmentation Iteratively run the MFC algorithm [Kim and Xing, 2012] on the image graph Review of MFC algorithm Cosegmentation: Jointly segment M images into K+1 regions • ( K foregrounds (FG) + background (BG)) Foreground Region Modeling Assignment Learn appearance Iterate Allocate the regions models of K FGs of image into one of • and BG • K FGs or BG Any region classifiers Very efficiently solve • or their combination using the idea of • combinatorial auction Ex. Gaussian mixture on • RGB, linear SVM on SPH 20
Scalable Cosegmentation on Image Graph Message-passing based optimization • Learn FG Models from neighbors of Ii . • Run region assignment on Ii . Iteratively solve… FG 1 (car) FG 3 FG 2 (BG) (road) 21
Scalable Cosegmentation on Image Graph Initialization Message-passing based optimization • Supervised: start from seed • Learn FG Models from neighbors of Ii . labels Unsupervised: use the algorithm • Run region assignment on Ii . • of CoSand [Kim et al. 2011]. Iteratively solve… FG 1 (car) FG 3 FG 2 (BG) (road) 22
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 23
Evaluation – Two Experiments Evaluation for Alignment Very hard to obtain groundtruth! • Correspondences btw two sets of thousands of images? Task: Temporal localization (inspired by geo-location estimation) Where is it likely to be taken? When are they likely to be taken? P 1 Timeline [Hays and Efros. 2008] Evaluation for cosegmentation Task: Foreground detection • We manually annotate 100 images per class i ∩ R Acc = GT i • Accuracy is measured by intersection-over-union i ∪ R GT 24 i
Evaluation of Alignment Procedures of temporal localization 1. Given a set of photo streams , Training (80% ) randomly split training and test sets Test (20% ) 2. Run alignment 3. Estimate timestamps of all images in test photo streams 4. Temporal localization is correct if t gt − t st ≤ ε e Better temporal localization ≠ Better Alignment Baselines • BPS: Our Alignment + Cosegmentation Justify closing a loop • BP: Our alignment only • KNN: K-nearest neighbors – Image similarity only (the simplest) • HMM: Hidden Markov Models Popular multiple sequence alignment • DTW: Dynamic Time Windows 25
Evaluation of Alignment ε = 60 min. • Temporal localization is correct if t gt − t st ≤ ε e BPS: Our Alignment + Cosegmentation BP: Our alignment only KNN: K-nearest neighbors HMM: Hidden Markov Models DTW: Dynamic Time Windows ε 26
Evaluation of Cosegmentation Task: Foreground detection BP+MFC: (Proposed) Alignment + Cosegmentation MFC: Our cosegmentation without alignment Examples COS : Submodular optimization [Kim et al. ICCV11] LDA: LDA-based localization [Russell et al. CVPR06] 27
Outline • Problem Statement • Algorithm Dataset and preprocessing Alignment of Multiple Photo Streams Large-scale Cosegmentation • Experiments • Conclusion 28
Conclusion Ultimate goal: building photo storylines from large-scale online images horse+riding safari+park 29
Recommend
More recommend