multiplicative updates for nonnegative least squares
play

Multiplicative Updates for Nonnegative Least Squares Donghui Chen - PowerPoint PPT Presentation

Multiplicative Updates for Nonnegative Least Squares Donghui Chen School of Securities and Futures Southwestern University of Finance and Economics November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab D. Chen


  1. Multiplicative Updates for Nonnegative Least Squares Donghui Chen School of Securities and Futures Southwestern University of Finance and Economics November 18, 2013 Joint work with Matt Brand, Mitsbushi Electronic Research Lab D. Chen (SWUFE) NNLS November 18, 2013 1 / 23

  2. what really matters is the wisdom he teaches you, ... – Sofia Pauca D. Chen (SWUFE) NNLS November 18, 2013 2 / 23

  3. Outline 1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 3 / 23

  4. Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

  5. Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x Because ( Ax − b ) T ( Ax − b ) || Ax − b || 2 = 2 x T ( A T A ) x − b T ( Ax ) − ( Ax ) T b + b T b = � �� � � �� � � �� � constant scalar scalar x T ( A T A ) x − x T ( A T b ) − x T ( A T b ) + b T b = x T ( A T A ) x − 2 x T ( A T b ) + b T b = D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

  6. Objective function Nonnegative Least Squares || Ax − b || 2 x ≥ 0 , argmin F ( x ) = argmin s.t. (1) 2 x x Because ( Ax − b ) T ( Ax − b ) || Ax − b || 2 = 2 x T ( A T A ) x − b T ( Ax ) − ( Ax ) T b + b T b = � �� � � �� � � �� � constant scalar scalar x T ( A T A ) x − x T ( A T b ) − x T ( A T b ) + b T b = x T ( A T A ) x − 2 x T ( A T b ) + b T b = Hence, solving Equation (1) is equivalent to solving 1 2 x T Qx − x T h x ≥ 0 , argmin F ( x ) = argmin s.t. (2) x x with Q = A T A and h = A T b . D. Chen (SWUFE) NNLS November 18, 2013 4 / 23

  7. 1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 5 / 23

  8. Multiplicative NNLS Iteration Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) in Equation (2) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , (3) i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging , 1986. D. Lee, S. Seung, in Nature , 1999 D. Chen (SWUFE) NNLS November 18, 2013 6 / 23

  9. Multiplicative NNLS Iteration Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) in Equation (2) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , (3) i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . Remark: If Q and h have only nonnegative components and δ = 0 , above iteration reduces to � � h i x k +1 = x k , i i ( Qx k ) i which is called image space reconstruction algorithm (ISRA). Lee ad Seung generalize the ISRA idea to NMF. 1 M. E. Daube-Witherspoon, G. Muehllehner, in IEEE Trans. on Medical Imaging , 1986. D. Lee, S. Seung, in Nature , 1999 D. Chen (SWUFE) NNLS November 18, 2013 6 / 23

  10. Gradient Descent Property The multiplicative update (3) is an element-wise iterative gradient descent method. � 2( Q − x k ) i + h + � i + δ x k +1 − x k x k i − x k = i i i ( | Q | x k ) i + h − i + δ � 2( Q − x k ) i + h + � i − ( | Q | x k ) i − h − x k i = i ( | Q | x k ) i + h − i + δ � � ( Qx k ) i − h i x k = − i ( | Q | x k ) i − h − i + δ � � x k i (( Qx k ) i − h i ) = − ( | Q | x k ) i − h − i + δ − γ k ∇ ( F ( x k )) , = � � x k , and ∇ ( F ( x )) = Qx k − h . where the step-size γ k = i ( | Q | x k ) i − h − i + δ D. Chen (SWUFE) NNLS November 18, 2013 7 / 23

  11. What if δ = 0 ? Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , (4 3 , 2 x 1 = 3) , (2 3 , 4 x 2 = 3) , · · · However, the optimal solution is x ∗ = ( r, r ) , r ∈ R . iterations by (3) with δ = 0 D. Chen (SWUFE) NNLS November 18, 2013 8 / 23

  12. Positive δ Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (1 , 1) , iterations by (3) with δ = 1 D. Chen (SWUFE) NNLS November 18, 2013 9 / 23

  13. Convergence Analysis Definition (Auxiliary Function) For positive vectors, x , y , an auxiliary function, G ( x, y ) , of F ( x ) , has the following two properties • F ( x ) < G ( x, y ) if x � = y ; • F ( x ) = G ( x, x ) D. Chen (SWUFE) NNLS November 18, 2013 10 / 23

  14. Convergence Analysis Definition (Auxiliary Function) For positive vectors, x , y , an auxiliary function, G ( x, y ) , of F ( x ) , has the following two properties • F ( x ) < G ( x, y ) if x � = y ; • F ( x ) = G ( x, x ) D. Chen (SWUFE) NNLS November 18, 2013 10 / 23

  15. Convergence Analysis contd. Lemma Assume G ( x, y ) is an auxiliary function of F ( x ) , then F ( x ) is strictly decreasing under the update x k +1 = argmin G ( x, x k ) , x if and only if x k +1 � = x k . D. Chen (SWUFE) NNLS November 18, 2013 11 / 23

  16. Convergence Analysis contd. Lemma Assume G ( x, y ) is an auxiliary function of F ( x ) , then F ( x ) is strictly decreasing under the update x k +1 = argmin G ( x, x k ) , x if and only if x k +1 � = x k . Proof: By the definition of an auxiliary function G ( x, y ) , if x k +1 � = x k , we have F ( x k +1 ) < G ( x k +1 , x k ) ≤ G ( x k , x k ) = F ( x k ) . The equality attains if and only if x k +1 = x k . D. Chen (SWUFE) NNLS November 18, 2013 11 / 23

  17. Convergence Analysis contd. Lemma For any positive vectors, x , y , define the diagonal matrix, D ( y ) , with diagonal element D ii = ( | Q | y ) i + h − i + δ i = 1 , 2 , · · · , n , y i where δ > 0 . The function G ( x, y ) = F ( y ) + ( x − y ) T ∇ F ( y ) + 1 2( x − y ) T D ( y )( x − y ) is an auxiliary function for F ( x ) = 1 2 x T Qx − x T h. D. Chen (SWUFE) NNLS November 18, 2013 12 / 23

  18. Review Theorem (Multiplicative NNLS Iteration) Nonnegative least squares objective function F ( x ) 1 2 x T Qx − x T h x ≥ 0 , argmin F ( x ) = argmin s.t. x x is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � i + δ x k +1 = x k , i i ( | Q | x k ) i + h − i + δ with δ > 0 , Q − = − min( Q, 0) , | Q | = abs ( Q ) , h + = max( h, 0) , h − = − min( h, 0) . D. Chen (SWUFE) NNLS November 18, 2013 13 / 23

  19. Review contd. Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (1 , 1) , iterations by (3) with δ = 1 D. Chen (SWUFE) NNLS November 18, 2013 14 / 23

  20. Sparse Solution? If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, ˆ || Ax − b || 2 2 + λ || x || 1 , x ≥ 0 , λ > 0 argmin F ( x ) = argmin (4) x x with nonnegative λ as the regularization parameter. D. Chen (SWUFE) NNLS November 18, 2013 15 / 23

  21. Sparse Solution? If a sparse solution is expected, it is recommended to add a regularization term to the original least squares problem, ˆ || Ax − b || 2 2 + λ || x || 1 , x ≥ 0 , λ > 0 argmin F ( x ) = argmin (4) x x with nonnegative λ as the regularization parameter. Theorem The objective function ˆ F ( x ) in (4) is monotonically decreasing under the multiplicative update � 2( Q − x k ) i + h + � x k +1 = x k i , (5) i i ( | Q | x k ) i + h − i + λ with λ > 0 . D. Chen (SWUFE) NNLS November 18, 2013 15 / 23

  22. Sparse Solution cont. Suppose � � 1 − 1 Q = , h = 0 , − 1 1 with initial guess, (2 3 , 4 x 0 = 3) , . . . x ∞ = (0 , 0) , iterations by (5) with λ = 2 D. Chen (SWUFE) NNLS November 18, 2013 16 / 23

  23. 1 Introduction 2 Multiplicative NNLS Iteration The Algorithm Properties Convergence Analysis Sparse Solution Accerleration 3 Numerical Experiments: Image Labelling 4 Conclusion Remarks D. Chen (SWUFE) NNLS November 18, 2013 17 / 23

  24. Image Labelling 2   K � �  η � ω ij ( x ia − x ja ) 2 + d ia x ia f ( x ) :=  2 a =1 i j ∈N ( i ) with constraints K � ∀ i, x ia = 1 , x ia ≥ 0 , a =1 • x ia is the probability of pixel i belongs to labelling set a • K is the number of labelling sets • ω ij is the weight between adjacent pixel i and j , I T i I j ω ij := | I i | · | I j | = cos( θ ) , where I · is the image value • N ( i ) represents the neighbours of pixel i • η is a parameter controlling the spatial smoothness • d ia is the cost of label a at each pixel M. Rivera, O. Dalmau, and J. Tago, in ICPR , pp.1-5, 2008. D. Chen (SWUFE) NNLS November 18, 2013 18 / 23

Recommend


More recommend