Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang <yang@eecs.berkeley.edu> Feb. 25, 2008. UC Berkeley Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion What is Sparsity Sparsity A signal is sparse if most of its coefficients are (approximately) zero. (a) Harmonic functions (b) Magnitude spectrum Figure: 2-D DCT transform. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity in spatial domain gene microarray data [Drmanac et al. 1993] Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006] Feed-forward : No iterative feedback loop. 1 Redundancy : Average 80-200 neurons for each feature representation. 2 Recognition : Information exchange between stages is not about individual neurons, but 3 rather how many neurons as a group fire together. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity and ℓ 1 -Minimization “Black gold” age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979] Figure: Deconvolution of spike train. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007] Basis pursuit [Chen & Donoho 1999]: Given y = A x and x unknown, x ∗ = arg min � x � 1 , subject to y = A x The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x ∗ = arg min � y − A x � 2 , subject to � x � 1 ≤ k Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Taking Advantage of Sparsity What generates sparsity? ( d’apr` es Emmanuel Cand` es) Measure first, analyze later. Curse of dimensionality. Numerical analysis : sparsity reduces cost for storage and computation. 1 Regularization in classification : 2 (a) decision boundary (b) maximal margin Figure: Linear support vector machine (SVM) Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Our Contributions Classification via compressed sensing 1 Performance in face recognition 2 Extensions 3 Outlier rejection Occlusion compensation Distributed pattern recognition in sensor networks. 4 Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Problem Formulation in Face Recognition Notations 1 Training: For K classes, collect training samples { v 1 , 1 , · · · , v 1 , n 1 } , · · · , { v K , 1 , · · · , v K , nK } ∈ R D . Test: Present a new y ∈ R D , solve for label( y ) ∈ [1 , 2 , · · · , K ]. Construct R D sample space via stacking 2 Figure: For images, assume 3-channel 640 × 480 image, D = 3 · 640 · 480 ≈ 1 e 6. Assume y belongs to Class i [Belhumeur et al. 1997, Basri & Jacobs 2003] 3 y = α i , 1 v i , 1 + α i , 2 v i , 2 + · · · + α i , n 1 v i , n i , = A i α i , where A i = [ v i , 1 , v i , 2 , · · · , v i , n i ]. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Nevertheless, i is the variable we need to solve. 1 Global representation: α 1 α 2 . , = [ A 1 , A 2 , · · · , A K ] y . . α K = A x 0 . Over-determined system: A ∈ R D × n , where D ≫ n = n 1 + · · · + n K . 2 x 0 encodes membership of y : If y belongs to Subject i , x 0 = [ 0 ··· 0 α i 0 ··· 0 ] T ∈ R n . Problems to face Solving for x 0 in R D is intractable . 1 True solution x 0 is sparse : Average K terms non-zero. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Dimensionality Redunction Construct linear projection R ∈ R d × D , d is the feature dimension , d ≪ D . 1 y . = R y = RA x 0 = ˜ A x 0 ∈ R d . ˜ ˜ A ∈ R d × n , but x 0 is unchanged. Holistic features 2 Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005] Partial features 3 Unconventional features 4 Downsampled faces Random projections Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 0 -Minimization Solving for sparsest solution via ℓ 0 -Minimization 1 y = ˜ x 0 = arg min � x � 0 s.t. ˜ A x . x � · � 0 simply counts the number of nonzero terms. ℓ 0 -Ball 2 ℓ 0 -ball is not convex. ℓ 0 -minimization is NP-hard. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 1 / ℓ 0 Equivalence Compressed sensing : If x 0 is sparse enough , ℓ 0 -minimization is equivalent to 1 y = ˜ ( P 1 ) min � x � 1 s.t. ˜ A x . � x � 1 = | x 1 | + | x 2 | + · · · + | x n | . ℓ 1 -Ball 2 ℓ 1 -Minimization is convex. Solution equal to ℓ 0 -minimization. ℓ 1 / ℓ 0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] 3 y = ˜ A x 0 , there exists equivalence breakdown point (EBP) ρ (˜ Given ˜ A ), if � x 0 � 0 < ρ : ℓ 1 -solution is unique x 1 = x 0 Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 1 -Minimization Routines Matching pursuit [Mallat 1993] Find most correlated vector v i in ˜ A with y : i = arg max � y , v j � . 1 ˆ A ← ˜ ˜ i , x i ← � y , v i � , y ← y − x i v i . A 2 Repeat until � y � < ǫ . 3 Basis pursuit [Chen 1998] 1 Assume x 0 is m -sparse. Select m linearly independent vectors B m in ˜ 2 A as a basis x m = B † m y . Repeat swapping one basis vector in B m with another vector in ˜ A if improve � y − B m x m � . 3 If � y − B m x m � 2 < ǫ , stop. 4 y = ˜ A x 0 + z ∈ R d , where � z � 2 < ǫ Quadratic solvers : ˜ arg min {� x � 1 + λ � y − ˜ x ∗ = A x � 2 } [Lasso, Second-order cone programming]: More expensive. Matlab Toolboxes ℓ 1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Classification Project x 1 onto face subspaces: 1 0 0 α 1 0 0 α 2 , δ 2 ( x 1 ) = , · · · , δ K ( x 1 ) = . δ 1 ( x 1 ) = . . (1) . . . . . . . 0 0 α K y − ˜ Define residual r i = � ˜ A δ i ( x 1 ) � 2 for Subject i : 2 id( y ) = arg min i =1 , ··· , K { r i } Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion AR Database 100 Subjects (Illumination and Expression Variance) Table: I. Nearest Neighbor Table: II. Nearest Subspace Dimension 30 54 130 540 30 54 130 540 Eigen [%] 68.1 74.8 79.3 80.5 64.1 77.1 82 85.1 Laplacian [%] 73.1 77.1 83.8 89.7 66 77.5 84.3 90.3 Random [%] 56.7 63.7 71.4 75 59.2 68.2 80 83.3 Down [%] 51.7 60.9 69.2 73.7 56.2 67.7 77 82.1 Fisher [%] 83.4 86.8 N/A N/A 80.3 85.8 N/A N/A Table: IV. ℓ 1 -Minimization Table: III. Linear SVM Dimension 30 54 130 540 30 54 130 540 Eigen [%] 73 84.3 89 92 71.1 80 85.7 92 Laplacian [%] 73.4 85.8 90.8 95.7 73.7 84.7 91 94.3 Random [%] 54.1 70.8 81.6 88.8 57.8 75.5 87.6 94.7 Down [%] 51.4 73 83.4 90.3 46.8 67 84.6 93.9 Fisher [%] 86.3 93.3 N/A N/A 87 92.3 N/A N/A Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>
Recommend
More recommend