robust principal component pursuit via alternating
play

Robust Principal Component Pursuit via Alternating Minimization - PowerPoint PPT Presentation

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hinterm uller


  1. Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hinterm¨ uller tao.wu@uni-graz.at Riem-RPCP (1/19)

  2. Low-rank paradigm. Low-rank matrices arise in one way or another: ◮ low-degree statistical processes � e.g. collaborative filtering, latent semantic indexing. ◮ regularization on complex objects � e.g. manifold learning, metric learning. ◮ approximation of compact operators � e.g. proper orthogonal decomposition. Fig.: Collaborative filtering (courtesy of wikipedia.org). tao.wu@uni-graz.at Riem-RPCP (2/19)

  3. Robust principal component pursuit. ◮ Sparse component corresponds to pattern-irrelevant outliers. ◮ Robustifies classical principal component analysis. ◮ Carries important information in certain applications; e.g. moving objects in surveillance video. ◮ Robust principal component pursuit: data low-rank sparse noise Z = A + B + N ◮ Introduced in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11]. tao.wu@uni-graz.at Riem-RPCP (3/19)

  4. Convex-relaxation approach. ◮ A popular (convex) variational model: min � A � nuclear + λ � B � ℓ 1 s . t . � A + B − Z � ≤ ε. ◮ Considered in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11], ... ◮ rank( A ) relaxed by nuclear-norm; � B � 0 relaxed by ℓ 1 -norm. ◮ Numerical solvers: proximal gradient method, augmented Lagrangian method, ... � Efficiency is constrained by SVD in full dimension at each iteration. tao.wu@uni-graz.at Riem-RPCP (4/19)

  5. Manifold constrained least-squares model. ◮ Our variational model: min 1 2 � A + B − Z � 2 s . t . A ∈ M ( r ) := { A ∈ R m × n : rank( A ) ≤ r } , B ∈ N ( s ) := { B ∈ R m × n : � B � 0 ≤ s } . ◮ Our goal is to develop an algorithm such that: ◮ globally converges to a stationary point (often a local minimizer). ◮ provides exact decomposition with high probability for noiseless data. ◮ outperforms solvers based on convex-relaxation approach, especially in large scales. tao.wu@uni-graz.at Riem-RPCP (5/19)

  6. Existence of solution and optimality condition. ◮ A little quadratic regularization ( 0 < µ ≪ 1 ) is included for the (theoretical) sake of existence of a solution; i.e. min f ( A, B ) := 1 2 � A + B − Z � 2 + µ 2 � A � 2 , s . t . ( A, B ) ∈ M ( r ) × N ( s ) . In numerics, choosing µ = 0 seems fine. ◮ Stationarity condition as variational inequalities: � � ∆ , (1 + µ ) A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T M ( r ) ( A ∗ ) , � ∆ , A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T N ( s ) ( B ∗ ) . T M ( r ) ( A ∗ ) and T N ( s ) ( B ∗ ) refer to tangent cones. tao.wu@uni-graz.at Riem-RPCP (6/19)

  7. Constraints of Riemannian manifolds. ◮ M ( r ) is Riemannian manifold around A ∗ if rank( A ∗ ) = r ; N ( s ) is Riemannian manifold around B ∗ if � B ∗ � 0 = s . ◮ Optimality condition reduces to: � P T M ( r ) ( A ∗ ) ((1 + µ ) A ∗ + B ∗ − Z ) = 0 , P T N ( s ) ( B ∗ ) ( A ∗ + B ∗ − Z ) = 0 . P T M ( r ) ( A ∗ ) and P T N ( s ) ( B ∗ ) are orthogonal projections onto subspaces. ◮ Tangent space formulae: T M ( r ) ( A ∗ ) = { UMV ⊤ + U p V ⊤ + UV ⊤ p : A ∗ = U Σ V ⊤ as compact SVD , M ∈ R r × r , U p ∈ R m × r , U ⊤ p U = 0 , V p ∈ R n × r , V ⊤ p V = 0 } , T N ( s ) ( B ∗ ) = { ∆ ∈ R m × n : supp(∆) ⊂ supp( B ∗ ) } . tao.wu@uni-graz.at Riem-RPCP (7/19)

  8. A conceptual alternating minimization scheme. Initialize A 0 ∈ M ( r ) , B 0 ∈ N ( s ) . Set k := 0 and iterate: 1. A k +1 ≈ arg min A ∈M ( r ) 2 � A + B k − Z � 2 + µ 1 2 � A � 2 . 2. B k +1 ≈ arg min B ∈N ( s ) 2 � A k +1 + B − Z � 2 . 1 Theorem (sufficient descrease + stationarity ⇒ convergence) Let { ( A k , B k ) } be generated as above. Suppose that there exists δ > 0 , ε k a ↓ 0 , and ε k b ↓ 0 such that for all k : f ( A k +1 , B k ) ≤ f ( A k , B k ) − δ � A k +1 − A k � 2 , f ( A k +1 , B k +1 ) ≤ f ( A k +1 , B k ) − δ � B k +1 − B k � 2 , � ∆ , (1 + µ ) A k +1 + B k − Z � ≥ − ε k for any ∆ ∈ T M ( r ) ( A k +1 ) , a � ∆ � , � ∆ , A k +1 + B k +1 − Z � ≥ − ε k for any ∆ ∈ T N ( s ) ( B k +1 ) . b � ∆ � , Then any non-degenerate limit point ( A ∗ , B ∗ ) , i.e. rank( A ∗ ) = r and � B ∗ � 0 = s , satisfies the first-order optimality condition. tao.wu@uni-graz.at Riem-RPCP (8/19)

  9. Sparse matrix subproblem. ◮ The global solution P N ( s ) ( Z − A k +1 ) (as metric projection) can be efficiently calculated from “sorting”. ◮ The global solution may not necessarily fulfill the sufficient descrease condition. ◮ Whenever necessary, safeguard by a local solution: � ( Z − A k +1 ) ij , if B k ij � = 0 , B k +1 = ij 0 , otherwise . ◮ Given non-degeneracy of B k +1 , i.e. � B k +1 � 0 = s , the exact stationarity holds. tao.wu@uni-graz.at Riem-RPCP (9/19)

  10. Low-rank matrix subproblem: a Riemannian perspective. 1 ◮ Global solution P M ( r ) ( 1+ µ ( Z − B k )) as metric projection: ◮ available due to Eckart-Young theorem; i.e. n r 1 1 1 + µ ( Z − B k ) = � σ j u j v ⊤ 1 + µ ( Z − B k )) = � σ j u j v ⊤ ⇒ P M ( r ) ( j . j j =1 j =1 ◮ but requires SVD in full dimension � expensive for large-scale problems (e.g. m, n ≥ 2000 ). ◮ Alternatively resolved by a single Riemannian optimization step on matrix manifold. ◮ Riemannian optimization applied to low-rank matrix/tensor problems; see [Simonsson and Eld´ en, ’10], [Savas and Lim, ’10], [Vandereycken, ’13], ... ◮ Our goal: The subproblem solver should activate the convergence criteria, i.e. sufficient descrease + stationarity. tao.wu@uni-graz.at Riem-RPCP (10/19)

  11. Riemannian optimization: an overview. R f M ◮ References: [Smith, ’93], [Edelman, Arias, and Smith, ’98], [Absil, Mahony, and Sepulchre, ’08], ... ◮ Why Riemannian optimization? ◮ Local homeomorphism is computationally infeasible/expensive. ◮ Intrinsically low dimensionality of the underlying manifold. ◮ Further dimension reduction via quotient manifold. ◮ Typical Riemannian manifolds in applications: ◮ finite-dimensional (matrix manifold): Stiefel manifold, Grassmann manifold, fixed-rank matrix manifold, ... ◮ infinite-dimensional: shape/curve spaces, ... tao.wu@uni-graz.at Riem-RPCP (11/19)

  12. 00˙AMS September 23, 2007 Riemannian optimization: a conceptual algorithm. T ¯ M ( r ) ( A k ) T x M x A k ∆ k ξ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ¯ M ( r ) M At the current iterate: 1. Build a quadratic model in the tangent space using Riemannian gradient and Riemannian Hessian. 2. Based on the quadratic model, build a tangential search path. 3. Perform backtracking path search via retraction to determine the step size. 4. Generate the next iterate. tao.wu@uni-graz.at Riem-RPCP (12/19)

  13. Riemannian gradient and Hessian. ¯ A : A ∈ ¯ M ( r ) := { A : rank( A ) = r } ; f k M ( r ) �→ f ( A, B k ) . ◮ ◮ Riemannian gradient, grad f k A ( A ) ∈ T ¯ M ( r ) ( A ) , is defined s.t. � grad f k A ( A ) , ∆ � = Df k A ( A )[∆] , ∀ ∆ ∈ T ¯ M ( r ) ( A ) . grad f k M ( r ) ( A ) ( ∇ f k A ( A ) = P T ¯ A ( A )) . ◮ Riemannian Hessian, Hess f k A ( A ) : T ¯ M ( r ) ( A ) → T ¯ M ( r ) ( A ) , is defined s.t. Hess f k A ( A )[∆] = ∇ ∆ grad f k A ( A ) , ∀ ∆ ∈ T ¯ M ( A ) . Hess f k A ( A )[∆] = ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ )∆ ⊤ U Σ − 1 V ⊤ + U Σ − 1 V ⊤ ∆ ⊤ ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ ) + (1 + µ )∆ . See, e.g., [Vandereycken, ’12]. tao.wu@uni-graz.at Riem-RPCP (13/19)

  14. 00˙AMS September 23, 2007 Dogleg search path and projective retraction. M ( r ) ( A k ) T ¯ T x M Trust region x A k p ( ) Optimal trajectory ∆( σ ) ∆ k ξ ∆ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ∆ N p B ( full step ) M ( r ) ¯ ∆ C p U ( unconstrained min along — g ) — g M dogleg path ◮ “Dogleg” path ∆ k ( τ k ) as approximation of optimal trajectory of tangential trust-region subproblem (left figure): A ( A k ) + � g k , ∆ � + 1 min f k 2 � ∆ , H k [∆] � M ( r ) ( A k ) , � ∆ � ≤ σ. s . t . ∆ ∈ T ¯ ◮ Metric projection as retraction (right figure): M ( r ) ( A k + ∆ k ( τ k )) . M ( r ) ( A k , ∆ k ( τ k )) = P ¯ retract ¯ Computationally efficient: “reduced” SVD on 2 r -by- 2 r matrix! tao.wu@uni-graz.at Riem-RPCP (14/19)

  15. Low-rank matrix subproblem: projected dogleg step. Given A k ∈ ¯ M ( r ) , B k ∈ N ( s ) : 1. Compute g k , H k , and build the dogleg search path ∆ k ( τ k ) in M ( r ) ( A k ) . T ¯ 2. Whenever non-positive definiteness of H k is detected, replace the dogleg search path by the line search path along steepest descent direction, i.e. ∆( τ k ) = − τ k g k . 3. Perform backtracking path/line search; i.e. find the largest step size τ k ∈ { 2 , 3 / 2 , 1 , 1 / 2 , 1 / 4 , 1 / 8 , ... } s.t. the sufficient descrease condition is satisfied: f k A ( A k ) − f k M ( r ) ( A k +∆ k ( τ k ))) ≥ δ � A k − P ¯ M ( r ) ( A k +∆ k ( τ k )) � 2 . A ( P ¯ 4. Return A k +1 = f k M ( r ) ( A k + ∆ k ( τ k ))) . A ( P ¯ tao.wu@uni-graz.at Riem-RPCP (15/19)

Recommend


More recommend