Sparse similarity-preserving hashing Jonathan Masci , Alex M. Bronstein, Michael M. Bronstein, Pablo Sprechmann, Guillermo Sapiro The Swiss AI Lab. IDSIA University of Lugano, Switzerland Tel Aviv University Duke University ICLR 2014 1 / 30
Visual world in numbers in the world Source: Mashed, Tech Radar, YouTube, 2012; Google 2010; Instagram 2013. 2 / 30
Similarity-sensitive hashing s X =1 s X =0 Features space X ⊆ R n 3 / 30
Similarity-sensitive hashing r=1 s X =1 s X =0 1101 1100 h 0000 1111 1001 1011 Features space Hamming space H m = { 0 , 1 } m X ⊆ R n h can be any parametric function, e.g., h ( x ) = sign( Ax + b ) 4 / 30
Similarity-sensitive hashing r=1 s X =1 s X =0 1101 1100 h 0000 1111 1001 1011 Features space Hamming space H m = { 0 , 1 } m X ⊆ R n h can be any parametric function, e.g., h ( x ) = sign( Ax + b ) Hamming ball r : all items with d H ≤ r 5 / 30
Similarity-sensitive hashing r=1 s X =1 s X =0 1101 1100 h 0000 1111 1001 1011 Features space Hamming space H m = { 0 , 1 } m X ⊆ R n h can be any parametric function, e.g., h ( x ) = sign( Ax + b ) Hamming ball r : all items with d H ≤ r r = 0 most efficient search (LUT) 6 / 30
Motivation Typical hash behavior for different length m and Hamming radius r 1 2 10 10 Radius = 1 Radius = 2 Radius = 3 Radius = 3 Radius = 4 0 mean # neighbors per query 10 Brute force 1.5 10 Speed (sec / query) −1 10 Radius = 4 1 10 Radius = 2 −2 10 0.5 10 −3 10 Radius = 1 0 10 −4 10 50 100 150 200 250 300 50 100 150 200 250 300 Code length m Code length m Search complexity for Hamming radius 0 (collisions): O ( m ) Plot: Grauman, Fergus 2013 7 / 30
Motivation Typical hash behavior for different length m and Hamming radius r 1 2 10 10 Radius = 1 Radius = 2 Radius = 3 Radius = 3 Radius = 4 0 mean # neighbors per query 10 Brute force 1.5 10 Speed (sec / query) −1 10 Radius = 4 1 10 Radius = 2 −2 10 0.5 10 −3 10 Radius = 1 0 10 −4 10 50 100 150 200 250 300 50 100 150 200 250 300 Code length m Code length m Search complexity for Hamming radius 0 (collisions): O ( m ) � m � Search complexity for Hamming radius r : O ( ) r Plot: Grauman, Fergus 2013 8 / 30
SparseHash L = s ( x , x ′ ) � h ( x ) − h ( x ′ ) � 1 λ (1 − s ( x , x ′ )) max { 0 , M − � h ( x ) − h ( x ′ ) � 1 } + + α ( � h ( x ) � 1 + � h ( x ′ ) � 1 ) Loss function promoting sparsity of the codes 9 / 30
SparseHash L = s ( x , x ′ ) � h ( x ) − h ( x ′ ) � 1 λ (1 − s ( x , x ′ )) max { 0 , M − � h ( x ) − h ( x ′ ) � 1 } + + α ( � h ( x ) � 1 + � h ( x ′ ) � 1 ) Loss function promoting sparsity of the codes Siamese Architecture s=1 s=0 h L h Features space X ⊆ R n 10 / 30
SparseHash L = s ( x , x ′ ) � h ( x ) − h ( x ′ ) � 1 λ (1 − s ( x , x ′ )) max { 0 , M − � h ( x ) − h ( x ′ ) � 1 } + + α ( � h ( x ) � 1 + � h ( x ′ ) � 1 ) Loss function promoting sparsity of the codes Siamese Architecture s=1 s=0 h L W S τ h x b in b out b in b out ξ ( x ) - τ = σ ( b in ) = σ ( b in ) z out · · · z out τ τ + τ z in b out = b in + z out z in b out = b in + σ τ tanh 0 S ( z out z in ) S ( z out z in ) · · · − − Features space (R n , L 2 ) ISTA Network Binarizer Hashing: Masci, Migliore, Bronstein, Schmidhuber 2011; Siamese: Bromley et al. 1993; Hadsell et al. 2006; ISTA net: Gregor et al. 2010, Sprechmann et al. 2012 11 / 30
Effect of sparsity on the codes Table : Total number of unique codes for the entire CIFAR10 dataset and average number of retrieved results for various Hamming radii search. Hashes of length 48. Avg. # of r -neighbors Method Unique codes r = 0 r = 1 r = 2 KSH 57368 3.95 12.38 27.21 AGH2 55863 1.42 2.33 4.62 SSH 59733 1.01 1.12 1.88 DH 59999 1.00 1.00 1.00 NN 54259 4.83 20.12 56.70 Sparse 9828 798.47 2034.73 3249.86 Methods: Shakhnarovich 2005; Liu et al. 2011; Liu et al. 2012; Masci, Bronstein 2 , Schmidhuber 2012; Data: Torralba et al. 2008, Krizhevsky 2009 12 / 30
Precision-Recall on CIFAR10 CIFAR10 0 10 SparseHash KSH DiffHash AGH2 −1 10 SSH r = m = 48 (full length) NNhash Precision −2 10 −3 10 −4 10 −3 −2 −1 0 10 10 10 10 Recall 13 / 30
Precision-Recall on CIFAR10 CIFAR10 0 10 SparseHash KSH DiffHash AGH2 −1 10 SSH NNhash Precision −2 10 r = 2 −3 10 −4 10 −3 −2 −1 0 10 10 10 10 Recall 14 / 30
Precision-Recall on CIFAR10 CIFAR10 0 10 SparseHash KSH DiffHash AGH2 −1 10 SSH NNhash r = 0 Precision −2 10 −3 10 −4 10 −3 −2 −1 0 10 10 10 10 Recall 15 / 30
Recall vs r on CIFAR10 CIFAR10 0 10 SparseHash KSH DiffHash AGH2 −2 10 SSH NNhash m = 48 Recall −4 10 −6 10 −8 10 0 10 20 30 40 Hamming Radius 16 / 30
Recall vs r on CIFAR10 CIFAR10 0 10 SparseHash KSH DiffHash AGH2 −2 10 SSH NNhash m = 128 Recall −4 10 −6 10 −8 10 0 10 20 30 40 Hamming Radius 17 / 30
Time vs Precision/Recall 1 10 SparseHash KSH 0 10 Time AGH2 SSH −1 10 NNhash 0 r = 0 1 2 10 −1 10 Precision −2 10 0 10 −2 10 −4 −3 10 10 −6 10 Recall 18 / 30
Retrieval examples CIFAR10 Top-10 nearest neighbors 19 / 30
Multimodal data 20 / 30
Multimodal data Create a mutually comparable representation 21 / 30
Multimodal SparseHash s XY =1 s XY =0 notredame paris s Y =1 s X =1 s X =0 1101 notredame s Y =0 paris h g NYC france 1111 rome italy 1001 NYC rome coliseum coliseum Image Modality Hamming space Text Modality H m = { 0 , 1 } m Y ⊆ R n ′ X ⊆ R n 22 / 30
Multimodal SparseHash s XY =1 s XY =0 notredame paris s Y =1 s X =1 s X =0 1101 notredame s Y =0 paris h g NYC france 1111 rome italy 1001 NYC rome coliseum coliseum Image Modality Hamming space Text Modality H m = { 0 , 1 } m Y ⊆ R n ′ X ⊆ R n Intra- and inter- modal binary similarities 23 / 30
Multimodal SparseHash s XY =1 s XY =0 notredame paris s Y =1 s X =1 s X =0 1101 notredame s Y =0 paris h g NYC france 1111 rome italy 1001 NYC rome coliseum coliseum Image Modality Hamming space Text Modality H m = { 0 , 1 } m Y ⊆ R n ′ X ⊆ R n Intra- and inter- modal binary similarities Two coupled siamese ISTA networks 24 / 30
Multimodal SparseHash s XY =1 s XY =0 notredame paris s Y =1 s X =1 s X =0 1101 notredame s Y =0 paris h g NYC france 1111 rome italy 1001 NYC rome coliseum coliseum Image Modality Hamming space Text Modality H m = { 0 , 1 } m Y ⊆ R n ′ X ⊆ R n Intra- and inter- modal binary similarities Two coupled siamese ISTA networks Embeddings h and g jointly learned 25 / 30
Text-based image retrieval on NUS dataset people portrait art CM-SSH SparseHash Data: Chua et al. 2009 26 / 30
Text-based image retrieval on NUS dataset flower art CM-SSH SparseHash Data: Chua et al. 2009 27 / 30
Image annotation on NUS dataset Query CM-SSH: sunset , tree, orange, old, abandoned, car, autumn, road, forest, fall, truck, rust, colourful, woods, antique, vehicle, halloween MM-SparseHash: clouds, sunset , sea , beach , sun , ocean , summer , sand, rocks, evening, holiday, peace, happy, dunes england, italy, island, ship, italia, hawaii, interesting, cow, islands, elephants, maui nature , sky , blue , water , clouds , red, sea , yellow, beach , california, winter, ocean , building, old, sand , sunrise, spain, cloud, wall, coast, sepia, stone, eaves , mist, perspective, fence, school, fly, oregon, jump, monument, perfect, surf , alley nature , sky , water , landscape , sunset, light , white, trees, color, reflection, black, animal, tree, sun, orange, winter, snow, beautiful, river, wildlife, photography, lake, bird, dark, forest, birds, ice, reflections, wood, flying, evening, outdoors, photographer, dusk nature , sky , water , clouds , green, explore, sunset , people, sea , art, beach , ocean, asia, sand, rocks , airplane, aircraft, boats, flying, plane, rural, waves , flight, aviation, breathtaking, bush, thailand, vivid, twilight, glow, cliff, landscapes , airplanes 28 / 30
Conclusions Sparsity improves recall without compromising precision by restricting the number of degrees of freedom for the codes High recall at small radii allows fast retrieval using LUTs SparseHash scales well to large databases Coupling several SparseHash nets multimodal embeddings can be learned 29 / 30
Thank you! 30 / 30
Recommend
More recommend