correlation autoencoder hashing for supervised cross
play

Correlation Autoencoder Hashing for Supervised Cross-Modal Search . - PowerPoint PPT Presentation

. Correlation Autoencoder Hashing for Supervised Cross-Modal Search . . . Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia Retrieval ICMR 2016


  1. . Correlation Autoencoder Hashing for Supervised Cross-Modal Search . . . Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia Retrieval ICMR 2016 . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . .. . .. . .. . .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 1 / 23

  2. Motivation Cross-Modal Retrieval Background In the big data era, the amount of multimedia data has exploded An object or topic can be described by data of multiple modalities . . . . . . . . . . . . . . . . . . . . . .. . .. .. . .. . .. . .. . . .. .. . .. . .. . .. . .. . . .. .. . .. . .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 2 / 23

  3. Motivation Cross-Modal Retrieval Cross-Modal Similarity Search Use a query from one modality to search for semantically relevant items from another modality e.g. search for animal images using textual tags ‘bear, deer …’ Building Blue Sky Downtown Deer Grass Green Bear Building Grass Night Animal Building Green Blue . . . . . . . . . . . . . . . . . . . . .. . . .. .. . .. . . .. . .. . .. . .. .. . . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 3 / 23

  4. Motivation Cross-Modal Retrieval Challenges Features from different Trillions of images and texts modalities are heterogeneous are generated Different dimensions Distinct distributions ... Image Tags Bear Soil Grass Green [0.3, 0.5, -0.2, … , 0.4] [0, 0, 1, … , 0, 1, … , 0] . . . . . . . . . . . . . . . . . . . . .. . . .. .. . .. . . .. . .. . .. . .. . .. . .. .. . .. . . .. .. . . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 4 / 23

  5. Motivation Hashing Methods Cross-Modal Hashing generate image generate image 1 SIFT descriptors hash codes GIST DeCAF Semantic Approximate -1 Correlations Nearest Neighbor Retrieval 1 Tag Deer Occurrence Grass generate tag Word2Vec generate tag Green descriptors hash codes -1 . . Memory Time . . . Computation: x10 - x100 faster 128-d float : 512 bytes → 16 bytes Transmission (disk / web): x30 faster 1 billion items : 512 GB → 16 GB . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. .. . . .. . .. .. . . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 5 / 23

  6. Method Model Homogeneous Architecture . Key Points . . Homogeneous Architecture: image and text can use the same deep architecture . . . ˆ ˆ Y X Reconstruction Reconstruction Decoder Decoder Hash Hash Code Code h y ( y ) h x ( x ) Encoder Encoder Representation Representation X Y . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 6 / 23

  7. Method Model Feature Correlations . Key Points . . Feature correlations can be maximized to reduce heterogeneity across modalities, using pairwise correlations (solid lines) . . . Pairwise ˆ ˆ Y X Reconstruction Reconstruction Decoder Decoder Hash Hash Code Code h y ( y ) h x ( x ) Encoder Encoder Representation Representation Correlation X S Y . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 7 / 23

  8. Method Model Feature Correlation Maximization . Key Points . . Use pairwise correlations for reconstructive embedding . . . . Within-modal Reconstructive Embedding . . n ∥ x i − V x h x ( x i ) ∥ 2 2 + ∥ y i − V y h y ( y i ) ∥ 2 min ( ) (1) ∑ , 2 . . . i =1 V x , V y . Cross-modal Reconstructive Embedding . . n ∥ x i − V x h y ( y i ) ∥ 2 2 + ∥ y i − V y h x ( x i ) ∥ 2 min V x , V y L = (2) ∑ ( ) , 2 . . . i =1 . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. .. . .. . .. . . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 8 / 23

  9. Method Model Semantic Correlations . Key Points . . Due to semantic gap, semantic correlations (dashed lines) need to be maximized . . . ˆ ˆ Y X Semantic Reconstruction Reconstruction Decoder Decoder Hash Hash Code Code h y ( y ) h x ( x ) Encoder Encoder Representation Representation Correlation X S Y . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 9 / 23

  10. Method Model Semantic Correlation Maximization . Key Points . . Construct a Nearest Neighbor Affinity Matrix A . . . . Nearest Neighbor Affinity Matrix . .  { ( ) x i ∈ N k ∨ x j ∈ N k ( x i ) x j d if l i = l j ∧ ( )  x i , y j A ij = (3) , ( ) y i ∈ N k ∨ y j ∈ N k ( y i ) y j otherwise ,  0 , 2 2 d ( x i , y j ) = e − ∥ x i − x j ∥ x + e − ∥ y i − y j ∥ 2 /2 σ 2 2 /2 σ 2 (4) y where N k ( x ) represents the k -nearest neighbors of x . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . .. . .. . .. . .. . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 10 / 23

  11. Method Model Semantic Correlation Maximization . Key Points . . Construct a within-category and a between-category similarity matrix . . . . Similarity Matrices . . { A ij (1/ n − 1/ n c ) , if l i = l j = c S b ij = A ij / n , if l i ̸ = l j , (5) { A ij / n c , if l i = l j = c S w ij = if l i ̸ = l j , 0 , where n c is the number of objects within the c-th category. . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 11 / 23

  12. Method Model Semantic Correlation Maximization . Key Points . . Maximize the inter-category separation margin Circumvent the large intra-class variance . . . . Cross-modal Semantic Correlation . . n n S ij ∥ h x ( x i ) − h y ( y j ) ∥ 2 min W x , W y R = (6) ∑ ∑ 2 , i =1 j =1 { A ij (2/ n c − 1/ n ) , if l i = l j = c S ij = (7) − A ij / n , if l i ̸ = l j . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. .. . . .. . .. . .. . .. . .. . .. .. . .. . .. . . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 12 / 23

  13. Method Model Correlation Autoencoder Hashing . Key Points . . enhances feature correlation by cross-modal reconstruct embedding maximizes the inter-category separation margin for learning more discriminative hash codes minimizes the intra-category variance by further exploring the cross-modal locality information . . . ˆ Pairwise ˆ Y X Semantic Reconstruction Reconstruction Decoder Decoder Hash Hash Code Code h y ( y ) h x ( x ) Encoder Encoder Representation Representation Correlation X S Y . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . .. . .. . .. . .. . .. . .. .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 13 / 23

  14. Method Model Correlation Autoencoder Hashing . Unified Optimization Problem . . V x , V y , W x , W y O = L + λ R min (8) W T W T h x ( x ) = sgn , h y ( y ) = sgn ( ) ( ) , x x y y where λ is a penalty parameter for trading off the relative importance of feature correlation and semantic correlation. . . . . Learning Algorithm . . By back-propagation (BP) using mini-batch SGD ∂ O ( x i , y i ) = ∂ L ( y i ) + λ∂ R ( x i ) (9) , ∂ W x ∂ W x ∂ W x . . . pq pq pq . . . . . . . . . . . . . . . . . . . . . .. . .. .. . . .. . .. .. . . .. . .. . .. . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. . Y. Cao et al. (Tsinghua University) Correlation Autoencoder Hashing ICMR 2016 14 / 23

Recommend


More recommend