image representation
play

IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April - PowerPoint PPT Presentation

IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 APPROACHES Bag of Words Spatial Pyramid Matching Descriptor Encoding Monday, April 7,


  1. Encoding SIFT: From VQ to SC M M X k =1 ,. . . ,K k x m � v k k 2 X k x m � u m V k 2 min min min V V,U m =1 m =1 s.t. Card ( u m ) = 1 , | u m | = 1 , u m ⌫ 0 , 8 m M k x m � u m V k 2 + λ | u m | X min V,U m =1 k v k k  1 , 8 k s.t. Monday, April 7, 14

  2. Why L1 encourages sparsity min k x � uV k 2 min k x � uV k 2 2 + λ k u k 1 2 + λ k u k 2 Monday, April 7, 14

  3. Encoding SIFT: From VQ to SC M M X k =1 ,. . . ,K k x m � v k k 2 X k x m � u m V k 2 min min min V V,U m =1 m =1 s.t. Card ( u m ) = 1 , | u m | = 1 , u m ⌫ 0 , 8 m M k x m � u m V k 2 + λ | u m | X min V,U m =1 k v k k  1 , 8 k s.t. Monday, April 7, 14

  4. Encoding SIFT: From VQ to SC M M X k =1 ,. . . ,K k x m � v k k 2 X k x m � u m V k 2 min min min V V,U m =1 m =1 s.t. Card ( u m ) = 1 , | u m | = 1 , u m ⌫ 0 , 8 m M k x m � u m V k 2 + λ | u m | X min V,U m =1 k v k k  1 , 8 k s.t. Implementation: feature-sign search algorithm [Lee et al 2006] http://ai.stanford.edu/~hllee/softwares/nips06-sparsecoding.htm Monday, April 7, 14

  5. Algorithm Architecture Image credit: Yang et al 2009 Monday, April 7, 14

  6. Linear SPM U = [ u 1 , . . . , u M ] > z = F ( U ) define F as: z j = max {| u 1 j | , | u 2 j , . . . , | u Mj ||} 2 l 2 l 2 X X X κ ( z i , z j ) = z > h z l i ( s, t ) , z l i z j = j ( s, t ) i s =1 t =1 l =0 Image credit: Yang et al 2009 Monday, April 7, 14

  7. APPROACHES • Bag of Words • Spatial Pyramid Matching • Descriptor Encoding - Linear SPM using Sparse Coding - Locality-constrained Linear Coding - Fisher Vector Monday, April 7, 14

  8. Locality-constrained Linear Coding Image credit: Wang et al 2010 Monday, April 7, 14

  9. Locality-constrained Linear Coding SC M k x m � u m V k 2 + λ | u m | X min V,U m =1 k v k k  1 , 8 k s.t. Monday, April 7, 14

  10. Locality-constrained Linear Coding SC LLC M M k x m � u m V k 2 + λ | u m | k x m � u m V k 2 + λ k d m � u m k 2 X X min min V,U V,U m =1 m =1 1 > u m = 1 , k v k k  1 , 8 k s.t. 8 m s.t. Monday, April 7, 14

  11. Locality-constrained Linear Coding SC LLC M M k x m � u m V k 2 + λ | u m | k x m � u m V k 2 + λ k d m � u m k 2 X X min min V,U V,U m =1 m =1 1 > u m = 1 , k v k k  1 , 8 k s.t. 8 m s.t. ✓ dist( x m , V ) ◆ locality adaptor: d m = exp σ where dist( x m , V ) = [dist( x m , v 1 ) , . . . , dist( x m , v K )] > dist( x m , v k ) is the Euclidean distance between x m and v k σ is for adjusting the decay speed Monday, April 7, 14

  12. Properties of LLC • Better reconstruction • Local smooth sparsity • Analytical solution V = { v k } V = { v k } V = { v k } Image credit: Wang et al 2010 Monday, April 7, 14

  13. Properties of LLC • Better reconstruction • Local smooth sparsity m ) > + λ diag( d ) ( V − 1x > m )( V − 1x > � � ˜ u m = \ 1 • Analytical solution u m / 1 > ˜ u m = ˜ u m V = { v k } V = { v k } V = { v k } Image credit: Wang et al 2010 Monday, April 7, 14

  14. Approximated LLC for Fast Encoding M k x m � u m V k 2 + λ k d m � u m k 2 X min V,U m =1 1 > u m = 1 , 8 m s.t. Monday, April 7, 14

  15. Approximated LLC for Fast Encoding M k x m � u m V k 2 + λ k d m � u m k 2 X min V,U m =1 1 > u m = 1 , 8 m s.t. Select local bases of each descriptor to form a local coordinate system M X u m V m k 2 k x m � ˜ min ˜ U m =1 1 > ˜ 8 m s.t. u m = 1 , Monday, April 7, 14

  16. Approximated LLC for Fast Encoding M k x m � u m V k 2 + λ k d m � u m k 2 X min V,U m =1 1 > u m = 1 , 8 m s.t. Select local bases of each descriptor the K nearest neighbors of x m to form a local coordinate system forms the local basis V m M X u m V m k 2 k x m � ˜ min ˜ U m =1 1 > ˜ 8 m s.t. u m = 1 , Monday, April 7, 14

  17. APPROACHES • Bag of Words • Spatial Pyramid Matching • Descriptor Encoding - Linear SPM using Sparse Coding - Locality-constrained Linear Coding - Fisher Vector Monday, April 7, 14

  18. Fisher Kernel X = { x 1 , . . . , x T } : a sample of T observations x t ∈ X u λ : the pd f λ = [ λ 1 , . . . , λ M ] > ∈ R M score function : G X λ = r λ log u λ ( X ) similarity measurement: [Jaakkola and Haussler, 1998] F � 1 K F K ( X, Y ) = G X > λ G Y λ λ Monday, April 7, 14

  19. Fisher Kernel similarity measurement: F � 1 K F K ( X, Y ) = G X > λ G Y λ λ Fisher Information Matrix: F � 1 F λ = E x ⇠ u λ ( G X λ G X > = L > ) λ L λ λ λ Monday, April 7, 14

  20. Fisher Kernel similarity measurement: F � 1 K F K ( X, Y ) = G X > λ G Y λ λ Fisher Information Matrix: F � 1 F λ = E x ⇠ u λ ( G X λ G X > = L > ) λ L λ λ λ Fisher Kernel re-written as: K F K ( X, Y ) = G X > G Y λ λ Fisher Vector where G X λ = L λ r λ log u λ ( X ) Monday, April 7, 14

  21. Fisher Vector on Images Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 GMM: K X u λ ( x ) = w k u k ( x ) k =1 1 ⇢ − 1 � 2( x − µ k ) > Σ � 1 where u k ( x ) = (2 π ) D/ 2 | Σ k | 1 / 2 exp k ( x − µ k ) Monday, April 7, 14

  22. Fisher Vector on Images Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 GMM: K X u λ ( x ) = w k u k ( x ) k =1 1 ⇢ − 1 � 2( x − µ k ) > Σ � 1 where u k ( x ) = (2 π ) D/ 2 | Σ k | 1 / 2 exp k ( x − µ k ) EM algorithm to estimate the parameters: λ = { w k , µ k , Σ k , k = 1 , . . . , K } Monday, April 7, 14

  23. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Gradients: r α k log u λ ( x t ) = γ t ( k ) � w k ✓ x t � µ t ◆ r µ k log u λ ( x t ) = γ t ( k ) σ 2 k  ( x t � µ k ) 2 � 1 � r σ k log u λ ( x t ) = γ t ( k ) σ 4 σ k k Monday, April 7, 14

  24. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Gradients: Posterior Probability: w k u k ( x k ) r α k log u λ ( x t ) = γ t ( k ) � w k γ t ( k ) = P K ✓ x t � µ t ◆ j =1 w j u j ( x t ) r µ k log u λ ( x t ) = γ t ( k ) σ 2 k  ( x t � µ k ) 2 � 1 � r σ k log u λ ( x t ) = γ t ( k ) σ 4 σ k k Monday, April 7, 14

  25. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Gradients: exp ( α k ) w k = P K r α k log u λ ( x t ) = γ t ( k ) � w k j =1 exp( α j ) ✓ x t � µ t ◆ r µ k log u λ ( x t ) = γ t ( k ) σ 2 k  ( x t � µ k ) 2 � 1 � r σ k log u λ ( x t ) = γ t ( k ) σ 4 σ k k Monday, April 7, 14

  26. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Gradients: BoW r α k log u λ ( x t ) = γ t ( k ) � w k ✓ x t � µ t ◆ r µ k log u λ ( x t ) = γ t ( k ) σ 2 k  ( x t � µ k ) 2 � 1 � r σ k log u λ ( x t ) = γ t ( k ) σ 4 σ k k Monday, April 7, 14

  27. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Fisher Information Matrix: F � 1 F λ = E x ⇠ u λ ( G X λ G X > = L > ) λ L λ λ λ Monday, April 7, 14

  28. Soft Assignment Fisher Vector: T X G X λ = L λ r λ log u λ ( x t ) t =1 Fisher Information Matrix: F � 1 F λ = E x ⇠ u λ ( G X λ G X > = L > ) λ L λ λ λ coordinate-wise Assume almost hard normalization on gradient FIM diagonal assignment vectors Monday, April 7, 14

  29. Fisher Vector coordinate-wise Assume almost hard normalization on gradient FIM diagonal assignment vectors Normalized Gradients: T 1 X G X α k = ( γ t ( k ) − w k ) √ w k Concatenate the gradient t =1 vectors: T ✓ x t − µ k ◆ Dimension = (2D+1)K 1 X G X µ k = γ t ( k ) √ w k σ k t =1 T  ( x t − µ k ) 2 � 1 γ t ( k ) 1 X G X σ k = − 1 √ √ w k σ 2 2 k t =1 Monday, April 7, 14

  30. Fisher Vector coordinate-wise Assume almost hard normalization on gradient FIM diagonal assignment vectors Normalized Gradients: T 1 X G X α k = ( γ t ( k ) − w k ) √ w k Concatenate the gradient t =1 vectors: T ✓ x t − µ k ◆ Dimension = (2D+1)K 1 X G X µ k = γ t ( k ) √ w k σ k t =1 λ ← 1 T  ( x t − µ k ) 2 � G X T G X 1 γ t ( k ) 1 X G X λ σ k = − 1 √ √ w k σ 2 2 k t =1 T : patch size Monday, April 7, 14

  31. w k u k ( x k ) γ t ( k ) = P K j =1 w j u j ( x t ) To make it work with linear classifier Image credit: Sanchez et al 2013 Monday, April 7, 14

  32. Extension on FV • Spatial Pyramid [Sanchez et al 2013] • Deep Fisher Networks [Simonyan et al 2013] • Other methods account scene geometry in FV framework [Krapac et al, 2011, Sanchez et al, 2012] Monday, April 7, 14

  33. Extension on FV • Spatial Pyramid [Sanchez et al 2013] • Deep Fisher Networks [Simonyan et al 2013] • Other methods account scene geometry in FV framework [Krapac et al, 2011, Sanchez et al, 2012] Monday, April 7, 14

  34. Extension on FV • Spatial Pyramid [Sanchez et al 2013] • Deep Fisher Networks [Simonyan et al 2013] • Other methods account scene geometry in FV framework [Krapac et al, 2011, Sanchez et al, 2012] Monday, April 7, 14

  35. Deep Fisher Networks Image credit: Simonyan et al 2013 Monday, April 7, 14

  36. Single Fisher Layer Image credit: Simonyan et al 2013 Monday, April 7, 14

  37. Single Fisher Layer more details: Deep Fisher Networks for Large-Scale Image Classification http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13b/simonyan13b.pdf Image credit: Simonyan et al 2013 Monday, April 7, 14

  38. Evaluations - on Pascal VOC 2007 Results from: Chatfield et al 2011 Monday, April 7, 14

  39. Evaluations - on Pascal VOC 2007 Results from: Chatfield et al 2011 Monday, April 7, 14

  40. Evaluations - on SUN 397 Image credit: Sanchez et al 2013 Monday, April 7, 14

  41. BoW Summary Issues • Sampling strategy dense uniform, interest points, random... • Codebook learning supervised/unsupervised, size... • Similarity measurement SVM, Pyramid Matching • Spatial information • Scalability Monday, April 7, 14

  42. From Vectors to Codes Given global image representation, want to learn compact binary codes for image retrieval task on large dataset Monday, April 7, 14

  43. From Vectors to Codes Motivation • Tractable memory usage • Constant lookup time • Similarity preserved by hamming distance Monday, April 7, 14

Recommend


More recommend