dictionaries manifolds and domain adaptation for image
play

Dictionaries, Manifolds and Domain Adaptation for Image and Video- - PowerPoint PPT Presentation

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama Chellappa University of Maryland Student and the teachers Major points Training and testing data come from different distributions.


  1. Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama Chellappa University of Maryland

  2. Student and the teachers

  3. Major points • Training and testing data come from different distributions. – Distributions are complex due to variations in patterns – Domain adaptation • Robust representations and distance measures – Vector space vs manifolds – Euclidean vs geodesics • Will develop these points for two representations of images and videos. – Dictionaries – Manifolds

  4. Outline of the talk • Dictionaries – Learning and applications to image and video-based recognition. • Manifolds – Representation, inference and applications to image and video- based recognition. – Analytical and empirical • Domain adaptation – How to adapt representations to new domains – Domain shifts could be due to pose, illumination, rate, time lapse, views,.. – Semi-supervised, unsupervised • Relies on works of Prof. Amari and Chikuse.

  5. Motivation - 1

  6. Motivation – 2 • Task: Given a probe video of one or more subjects, retrieve their IDs from a gallery of still face images or face videos. • Challenges: Getting a face image is more than half the problem Low Pose Uncontrolled Blur resolution Variation Illumination Camera motion

  7. Dictionaries for signal and image analysis • Matching Pursuit algorithms Mallat (early 90’s) • Orthogonal matching pursuits (Pati, et al,1993, Tropp 2004) • Saito and Coifman, 1997 • Etemad, Chellappa, 1997 • Represent signals using wavelets, wavelet packets,.. • Learning dictionary from data instead of using off-the- shelf bases. (Olshausen and Field, 1997), …

  8. Modern day dictionaries • Represent Signals and images using signals and images. • Sparse coding has neural backings. • Allow compositional representations • Dictionary updates – Batch (Method of Optimal directions) – K-SVD • Dictionaries for images are more complicated – Need to account for pose, illumination, resolution variations.

  9. Basic formulation • Assume L classes and n images per class in gallery. • The training images of the kth class is represented as • Dictionary D is obtained by concatenating all the training images • The unknown test vector can be represented as a linear combination of the training images as Wright et al, 2009 •The coefficient vector α is sparse . Wagner et al, 2011

  10. Dictionary-based face recognition α can be recovered by Basis Pursuit as Find the reconstruction error Select the class giving the minimum while representing the test image reconstruction error. with coefficients of each class separately .

  11. Learning dictionaries – K-SVD Training faces K-SVD Learned dictionary M. Aharon, M. Elad, and A. M. Bruckstein, 2006

  12. Outlier rejection

  13. The illumination problem • Robust albedo estimation (Biswas et al. PAMI 2009) – Estimate albedo – Relight images with different light source direction – Use relighted images for training

  14. Robust estimation of albedo Inverse problem Surface Normals + + Light Source Albedo Intensity Image Albedo + Shape Single Intensity Image Biswas, et al ICCV 2007 PAMI 2009

  15. Albedo estimation  Lambertian assumption Albedo Surface Normal Light Source Intensity  Light Source Estimated :  Initial Surface Normal : Initial Albedo Estimate Error in initial albedo estimate

  16. Albedo estimation Initial albedo estimate Signal Dependent Additive Noise  Non-stationary Mean Non-stationary Variance (NMNV) model for the true unknown albedo  Unbiased source assumption  Uncorrelated Noise

  17. Estimated albedo – PIE dataset

  18. Relighting using the estimated albedo

  19. Experimental results • DFR – 99.17 % • SRC – 98.1 % • CDPCA – 98.83 % Yale B data set V. M. Patel, T. Wu, S. Biswas, P. J. Phillips, and R. Chellappa, “ Dictionary-based face recognition under variable lighting and pose”, IEEE Trans, Information Forensics and Security, 2011.

  20. Outdoor face dataset  An outdoor dataset with 18 subjects with 5 gallery images each and 90 low resolution images. Gallery – 120 x 120 Probe – 20 x 20 Method Recognition SLRFR 67% Reg. 60% .LDA+SVM BTAS 2011 CLPM 16.1%

  21. Video dictionaries for face recognition Preprocessing Dictionary learning Using summarization (extract frames and for each partition and algorithm to partition detect/crop face finding sequence- cropped face images regions) specific dictionaries 1] N. Shroff, P. Turaga, and R. Chellappa, “Video precis: High lighting diverse aspects of videos,” IEEE Transactions on Multimedia , 2010., NIPS 2011 Constructing Recognition / distance/similarity verification matrices ECCV 2012 21 ECCV 2012

  22. Dictionary learning (build sequence-specific dictionaries) • Let be the gallery matrix of the k -th partition of the j -th video sequence of subject i . • Given , use K-SVD [2] algorithm to build a (partition level) sub-dictionary such that • Concatenate the (partition-level) sub-dictionaries to form a sequence-specific dictionary [2] M. Aharon, M. Elad and A. M. Bruckstein, “The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing , vol. 54, no. 11, pp. 4311-4322, 2006

  23. Recognition/Identification • Given the m -th query video sequence • We generate the partition as • The distance , between and (i.e. dictionary of the p -th video sequence) is calculated as where • We select the best matched with such that

  24. MBGC recognition results MBGC dataset: 397 walking (frontal-face) videos: 198 SDs + 199 HDs 371 activity (profile-face) videos: 185 SDs + 186 HDs

  25. • Facial expression analysis using AUs and high-level knowledge available in FACS regarding AUs composition and expression decomposition • AUs have ambiguous semantic descriptions so it is difficult to accurately model them  AU-Dictionary – We use local features to model each AU 25

  26. 26

  27.  We learn separate dictionaries for each AU • AU-Dictionary is then formed using all the individual AU dictionaries . AU-1 AU-2 AU-5 AU-10 AU-12 AU-23 D = 27

  28. 28

  29. • Objective function to be minimized: B 29

  30. • Goal : – To simultaneously learn structures on the expressive face and corresponding subspace representations – We want the final subspaces to be as separate as possible • Objective: structures  disjoint subsets of local patch descriptors • :learned dictionaries for the structures • Learned structures for the universal expressions from the CK+ dataset 30

  31. • Min residual error 31

  32. 32

  33. Some additional results • Competitive results for iris recognition. Enables cancelability. (PAMI 2011) • Non-linear dictionaries through kernelization produces improvements of 5- 10% depending on the problem. (ICASSP 2012) – Illustrated using the USPS dataset, Caltech 101 and 256 datasets. • Building dictionaries in the Radon transform domain yields robustness to in- plane rotation and scale in CBIR applications. (IEEE TIP) • Characteristic views (Chakravarthy and Freeman) can be built using sparse representation theory. (ICIP 2012) • Joint sparsity driven dictionary learning produces improvements in multi- modal biometrics applications. (Under review) • Reconstruction from sparse gradients (IEEE TIP 2012) in collaboration with Anna Gilbert.

  34. Domain adaptation: Motivation Source domain Target domain Data: X, Labels: Data: X’, Labels: Y Y’ Transfer Learning 1  P(Y|X) ≠ P(Y’|X’), P(X) ≈ P(X’) Image credit: Saenko et al., ECCV 2010, Bergamo et al., NIPS 2010 Domain adaptation 1 S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 22:1345 –1359,  P(X) ≠ P(X’), P(Y|X) ≈ P(Y’|X’) October 2010.

  35. Domain adaptation - Related work Semi-supervised Unsupervised • Learns domain change • No correspondence, no through correspondence knowledge of domain change – Daume and Marcu, JAIR ’06 – Ben-David et al., AISTATS ’10 – Duan et al., ICML ’09 – Blitzer et al., NIPS ’08 – Xing et al., KDD ’07 – Wang and Mahadevan, IJCAI – Saenko et al., ECCV 2010, ’09 Kulis et al., CVPR 2011 – Gopalan, Li and Chellappa, – Bergamo and Torresani, NIPS ICCV 2011 2010 – Gong et al, .. CVPR 2012 – Lai and Fox, IJRR 2010 – Zheng and Chellappa, ICPR 2012 D. Xu’s group, 2012

  36. Unsupervised domain adaptation* Intermediate domains Domain 1 Domain 2 (Incremental learning) (labeled) (unlabeled) Labeled G N,d source domain (X) Unlabeled target domain (X~) S 1 Generative S 2 S 1.3 subspace from X Generative subspace from X~ (no labels) S 1.6 * R. Gopalan, R. Li, R. Chellappa, “Domain adaptation for object recognition: An unsupervised approach”, International Conference on Computer Vision, ICCV 2011 (Oral)

Recommend


More recommend