L Leveraging Internet Data i I t t D t IM2GPS: Estimating Geographic Information from a Single Image (by James Hays and Alexei Efros) (by James Hays and Alexei Efros) Adriana Kovashka CS PhD Student
Wh Where is this? is this? Italy
… and this? d thi ? Wales
O Overview of IM2GPS i f IM2GPS � Intuition � “What is it like?” vs. “What is it?” � Data � 6 million geo-tagged images from Flickr g gg g � Method � Represent images in 6 ways, compare p g y , p � Result � Estimated image location age oca o st ated
R Representations in IM2GPS t ti i IM2GPS � Tiny Images � Color histograms � Color histograms � Texton histograms � Line features Li f t � Gist descriptor with color � Geometric context
IM2GPS R IM2GPS Results lt Hays 2008
N t Note on the Task th T k � This is not scene categorization � Specific locations used � Specific locations used � “Urban vs. natural” insufficient � Can think of current task as place recognition* � Can think of current task as place recognition
D Demo Overview O i � Data � 50096 images (incl. 237 test images) � 50096 images (incl. 237 test images) � 100 most populated cities in the world � Representations � Representations � Gist, color, Tiny Images � Comparison C i � K-nn
P Procedure d � Use code by Hays to query/download Flickr images � about 3 days � Download, modify, run Gist code � about 30 hours � Test � about 6 hours for 7000 images � 10 min for 237 test images
R Representations t ti � Gist (512 dim) � Used Torralba’s scene recognition code � Color (32 dim) � Computed histograms in L*a*b* color space p g p � 4 bins for L, 14 for a and b � Tiny Images (768 dim) y g ( ) � Resized images to 16x16x3 � Vectors of color pixels p
C Comparison Methods i M th d � Method One � Sim(x, y) = inner product between concatenation of three representations of x and y � Method Two* � Sim(x, y) = exp(-dist A / σ A )*exp(-dist B / σ B )*exp(-dist C / σ C ) � dist A = Euclidian distance between representations A of x and y of x and y � σ A = mean of distances for representation A
N t Note on the Computation of σ th C t ti f � Current computation C t t ti � X – matrix of n -dim features for all m images � Subtract mean(X) from all rows of X � Subtract mean(X) from all rows of X � Square result � Sum rows � Take square roots of sums � Take mean of resulting column � Better computation � Better computation � Average of Euclidian distance between i and j for each pair of images (i, j) � Computationally very expensive
D t Dataset t � Queried for 104 city tags � Negative tags to remove duplicates, noise g g p , � Downloaded images uploaded over 2 weeks � 50096 images from Flickr (237 test) � 6M in IM2GPS (more tags, time) � 6M in IM2GPS (more tags, time) � Disproportionate image set sizes per city!
'Abidjan' [0] 'Chongqing' [37] 'London' [2891] 'RiodeJaneiro' [1135] 'Ahmedabad' [3] 'Dallas' [459] 'LosAngeles' [1442] 'Riverside' [215] 'Alexandria' [152] 'Delhi' [169] 'Madras' [1] 'Riyadh' [1] 'Ankara' [10] 'Detroit' [263] 'Madrid' [1822] 'Rome' [1328] 'Athens' [213] 'Dhaka' [55] 'Manila' [230] 'Ruhr' [53] 'Atlanta' Atlanta [843] [843] 'Dongguan' Dongguan [0] [0] 'Medellin' Medellin [0] [0] 'Saigon' Saigon [252] [252] 'Baghdad' [3] 'Guadalajara' [71] 'Melbourne' [529] 'SaintPetersburg' [44] 'Bandung' [114] 'Guangzhou' [68] 'MexicoCity' [59] 'Salvador' [867] 'Bangalore' [477] 'Guiyang' [0] 'Miami' [1280] 'SanFrancisco' [2204] 'Bangkok' [428] 'Hanoi' [158] 'Milan' [362] 'Santiago' [365] 'B 'Barcelona' [2221] l ' [2221] 'Harbin' [76] 'H bi ' [76] 'M 'Monterrey' [26] t ' [26] 'SaoPaulo' [229] 'S P l ' [229] 'Beijing' [658] 'HoChiMinhCity' [9] 'Montreal' [0] 'Seoul' [364] 'BeloHorizonte' [3] 'HongKong' [835] 'Moscow' [291] 'Shanghai' [118] 'Berlin' [1655] 'Houston' [461] 'Mumbai' [270] 'Shenyang' [0] 'Bogota' [404] g [ ] 'Hyderabad' [19] y [ ] 'NYC' [2383] [ ] 'Shenzhen' [12] [ ] 'Bombay' [16] 'Istanbul' [681] 'Nagoya' [23] 'Singapore' [1118] 'Boston' [1631] 'Jakarta' [50] 'Nanjing' [17] 'Surat' [0] 'Brasilia' [97] 'Johannesburg' [300] 'NewYorkCity' [483] 'Sydney' [1541] 'BuenosAires' [132] 'Karachi' [9] 'Osaka' [222] 'Taipei' [546] 'Busan' Busan [0] [0] 'Khartoum' Khartoum [6] [6] 'Paris' Paris [3052] [3052] 'Tehran' Tehran [19] [19] 'Cairo' [107] 'Kinshasa' [0] 'Philadelphia' [883] 'Tianjin' [8] 'Calcutta' [4] 'Kolkata' [91] 'Phoenix' [504] 'Tokyo' [1992] 'Chengdu' [225] 'KualaLumpur' [56] 'PortoAlegre' [69] 'Toronto' [2009] 'Chennai' [114] 'Lagos' [25] 'Pune' [5] 'WashingtonDC' [2031] 'Chicago' [2796] 'Lahore' [8] 'Pyongyang' [13] 'Wuhan' [18] 'Chittagong' [0] 'Lima' [97] 'Recife' [221] 'Yangon' [3]
Bangalore Bangalore
Boston Boston
Boston Boston
Cairo Cairo
Istanbul Istanbul
London London
London London
Los Angeles Los Angeles
Madrid Madrid
Milan Milan
Moscow Moscow
Mumbai Mumbai
Paris Paris
Rome Rome
San Francisco San Francisco
San Francisco San Francisco
Sao Paolo Sao Paolo
Tokyo Tokyo
Tokyo Tokyo
Query 1 - Greece Query 1 Greece
Query 2 - Arizona Query 2 Arizona
Query 3 - Switzerland Query 3 Switzerland
O Overview of Results i f R lt � Evaluation � Percentage of correct classifications � Percentage of correct classifications � Percentage of top m neighbors within n km of query image q y g � Average distance of neighbors � Tests � Tests � on 237 test images � on 7000 images from dataset � on 7000 images from dataset
Chance for Test Images (200km) Ch f T t I (200k ) er all k per image ove Chance Images 1 to 237 Chance is pretty low for this data.
Chance for Test Images (cont’d) Ch f T t I ( t’d) er all k nce per run ove Average chan Run number Chance is pretty low for this data.
Test Images, % w/in 200km, M1 T t I % /i 200k M1 0.2 0.18 0.16 0.14 0 14 0.12 k=1 0.1 % within 200km 0.08 k=4 0.06 k=8 k=8 0.04 k=12 0.02 0 k=16 Gist C olor T iny Gist + Gist + C olor + All Images Images C C olor olor T T iny iny T T iny iny Images Images Feature Types Gist seems to perform best with M1.
Test Images, % w/in 200km, M2 T t I % /i 200k M2 0.2 0.18 0.16 0.14 0 14 0.12 k=1 0.1 % within 200km 0.08 k=4 0.06 k=8 k=8 0.04 k=12 0.02 0 k=16 Gist C olor T iny Gist + Gist + C olor + All Images Images C C olor olor T T iny iny T T iny iny Images Images Feature Types M2 works worse than M1.
Test Images, % w/in 1000km, M1 T t I % /i 1000k M1 0.2 0.18 0.16 0.14 0 14 0.12 k=1 0.1 % within 1000km 0.08 k=4 0.06 k=8 k=8 0.04 k=12 0.02 0 k=16 Gist C olor T iny Gist + G ist + C olor + All Images Images C C olor olor T T iny iny T T iny iny Images Images Feature Types Results are naturally much better with larger distance allowed.
IM2GPS R IM2GPS Results lt Hays 2008
D t Dataset, Accuracy, M1 t A M1 0.2 0.18 0.16 0 16 0.14 0.12 0.1 A ccuracy Images 501-4000 0 08 0.08 Images 4001-7500 0.06 0.04 0.02 0 0 k=1 k=4 k=8 k=12 k=16 A ll Feature Types Results are much better with more test images.
D t Dataset, Accuracy, M2 t A M2 0.2 0.18 0.16 0 16 0.14 0.12 0.1 A ccuracy Images 501-4000 Images 501 4000 0 08 0.08 0.06 0.04 0.02 0 0 k=1 k=4 k=8 k=12 k=16 A ll Feature Types M2 performs worse than M1.
D t Dataset, % w/in 200km, M1 t % /i 200k M1 0.2 0.18 0 16 0.16 0.14 0.12 0.1 % within 200km Images 501-4000 0.08 Images 4001-7500 0.06 0.04 0.02 0 k 1 k=1 k 4 k=4 k 8 k=8 k 12 k=12 k 16 k=16 A ll Feature Types Again, with more test images, results are more similar to the authors’.
D t Dataset, % w/in 500km, M1 t % /i 500k M1 0.2 0.18 0 16 0.16 0.14 0.12 0.1 % within 500km Images 501-4000 0.08 Images 4001-7500 0.06 0.04 0.02 0 k 1 k=1 k 4 k=4 k 8 k=8 k 12 k=12 k 16 k=16 A ll Feature Types As expected, results improve when larger distance allowed.
D t Dataset, % w/in 1000km, M1 t % /i 1000k M1 0.2 0.18 0 16 0.16 0.14 0.12 0.1 % within 1000km Images 501-4000 0.08 Images 4001-7500 0.06 0.04 0.02 0 k=1 k=1 k=4 k=4 k=8 k=8 k=12 k=12 k=16 k=16 A ll Feature Types As expected, results improve when larger distance allowed.
Sydney Sydney Cairo Query Image (Argentina/Paraguay/Brazil) Features: Tiny Images
Chicago g Query Image (Barcelona) Toronto Features: Tiny Images
Recife Recife Tokyo Query Image (Barcelona) Features: Tiny Images
Sydney Sydney S d Sydney Query Image (Nassau, near Havana) Features: Tiny Images
Recommend
More recommend