Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of Computer Science, UNM and Mitsubishi Electric Research Labs Cambridge, MA December, 2013 Joint work with Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey and Matthew E. Brand
Contents 1 / 7
Nonnegative Matrix Factorization Factor a nonnegative matrix as follows: X ≈ W H ( m × n ) ( m × r ) ( r × n ) Applications: Collaborative filtering, hyperspectral image analysis, music transcription among others. Prior information Problem is under-determined. Additional requirements imposed by the problem domain: Sparsity Orthogonality 2 / 7
Sparsity measures L 0 norm corresponds to our intuitive notion of sparsity. Axioms (Hurley and Rickard 2009) Robin Hood —Stealing from rich decreases sparsity. Scaling —Sparsity is scale-invariant. Rising tide —Adding constant decreases sparsity. Cloning — Invariant under cloning. Bill Gates —A very wealthy individual increases sparsity. Babies —Newborns increase sparsity. Hoyer’s sparsity measure: √ 1 d − � x � 1 sp ( x ) = √ ( ) � x � 2 d − 1 Observe that sp () lies between 0 and 1. Higher values correspond to sparser vectors. 3 / 7
Sparse NMF Sparse NMF formulation (Hoyer 2004, Heiler and Schnorr 2006): W , H f ( W , H ) = 1 2 � X − WH � 2 min s.t. W ≥ 0 , H ≥ 0 F (1) � W i � 2 = 1 , � W i � 1 = α ∀ i ∈ { 1 , . . . , r } , Figure : 25 features each. Sparsity of 0.5 (left), 0.6 (middle) and 0.75 (right). 4 / 7
Group sparse NMF Our Sparse NMF formulation (includes Mørup et al., 2008): W , H f ( W , H ) = 1 2 � X − WH � 2 min s.t. W ≥ 0 , H ≥ 0 F � W i � 2 = 1 ∀ i ∈ { 1 , . . . , r } , � � W i � 1 = α g ∀ g ∈ { 1 , . . . , G } i ∈ I g User-friendly sparsity formulation (implicit version Kim et al., 2012). Optimizing a column at a time: y ≥ 0 b ⊤ y ⊤ y = k , max s.t. 1 � y � 2 = 1 where dim ( b ) = m . Sparsity does not mix! 5 / 7
Update Schemes for W Sparsity does not mix This paper ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ s 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ s 2 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ s 1 6 / 7
Results on ORL faces dataset Optimizing two columns at a time: y ≥ 0 b ⊤ y s.t. 1 ⊤ y = k , � y 1 � 2 = 1 , � y 2 � 2 = 1 max ⊤ and b = [ b ⊤ ⊤ dim ( b 1 ) = m 1 , dim ( b 2 ) = m 2 . where y = [ y ⊤ 1 , y ⊤ 1 , b ⊤ 2 ] 2 ] Figure : 25 features each. Sparsity of 0.4 (left), 0.6 (middle) and { 0 . 2 , 0 . 5 , 0 . 8 } (right). 7 / 7
Recommend
More recommend