Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof. Michael Hinterm¨ uller tao.wu@uni-graz.at Riem-RPCP (1/19)
Low-rank paradigm. Low-rank matrices arise in one way or another: ◮ low-degree statistical processes � e.g. collaborative filtering, latent semantic indexing. ◮ regularization on complex objects � e.g. manifold learning, metric learning. ◮ approximation of compact operators � e.g. proper orthogonal decomposition. Fig.: Collaborative filtering (courtesy of wikipedia.org). tao.wu@uni-graz.at Riem-RPCP (2/19)
Robust principal component pursuit. ◮ Sparse component corresponds to pattern-irrelevant outliers. ◮ Robustifies classical principal component analysis. ◮ Carries important information in certain applications; e.g. moving objects in surveillance video. ◮ Robust principal component pursuit: data low-rank sparse noise Z = A + B + N ◮ Introduced in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11]. tao.wu@uni-graz.at Riem-RPCP (3/19)
Convex-relaxation approach. ◮ A popular (convex) variational model: min � A � nuclear + λ � B � ℓ 1 s . t . � A + B − Z � ≤ ε. ◮ Considered in [Cand´ es, Li, Ma, and Wright, ’11], [Chandrasekaran, Sanghavi, Parrilo, and Willsky, ’11], ... ◮ rank( A ) relaxed by nuclear-norm; � B � 0 relaxed by ℓ 1 -norm. ◮ Numerical solvers: proximal gradient method, augmented Lagrangian method, ... � Efficiency is constrained by SVD in full dimension at each iteration. tao.wu@uni-graz.at Riem-RPCP (4/19)
Manifold constrained least-squares model. ◮ Our variational model: min 1 2 � A + B − Z � 2 s . t . A ∈ M ( r ) := { A ∈ R m × n : rank( A ) ≤ r } , B ∈ N ( s ) := { B ∈ R m × n : � B � 0 ≤ s } . ◮ Our goal is to develop an algorithm such that: ◮ globally converges to a stationary point (often a local minimizer). ◮ provides exact decomposition with high probability for noiseless data. ◮ outperforms solvers based on convex-relaxation approach, especially in large scales. tao.wu@uni-graz.at Riem-RPCP (5/19)
Existence of solution and optimality condition. ◮ A little quadratic regularization ( 0 < µ ≪ 1 ) is included for the (theoretical) sake of existence of a solution; i.e. min f ( A, B ) := 1 2 � A + B − Z � 2 + µ 2 � A � 2 , s . t . ( A, B ) ∈ M ( r ) × N ( s ) . In numerics, choosing µ = 0 seems fine. ◮ Stationarity condition as variational inequalities: � � ∆ , (1 + µ ) A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T M ( r ) ( A ∗ ) , � ∆ , A ∗ + B ∗ − Z � ≥ 0 , for any ∆ ∈ T N ( s ) ( B ∗ ) . T M ( r ) ( A ∗ ) and T N ( s ) ( B ∗ ) refer to tangent cones. tao.wu@uni-graz.at Riem-RPCP (6/19)
Constraints of Riemannian manifolds. ◮ M ( r ) is Riemannian manifold around A ∗ if rank( A ∗ ) = r ; N ( s ) is Riemannian manifold around B ∗ if � B ∗ � 0 = s . ◮ Optimality condition reduces to: � P T M ( r ) ( A ∗ ) ((1 + µ ) A ∗ + B ∗ − Z ) = 0 , P T N ( s ) ( B ∗ ) ( A ∗ + B ∗ − Z ) = 0 . P T M ( r ) ( A ∗ ) and P T N ( s ) ( B ∗ ) are orthogonal projections onto subspaces. ◮ Tangent space formulae: T M ( r ) ( A ∗ ) = { UMV ⊤ + U p V ⊤ + UV ⊤ p : A ∗ = U Σ V ⊤ as compact SVD , M ∈ R r × r , U p ∈ R m × r , U ⊤ p U = 0 , V p ∈ R n × r , V ⊤ p V = 0 } , T N ( s ) ( B ∗ ) = { ∆ ∈ R m × n : supp(∆) ⊂ supp( B ∗ ) } . tao.wu@uni-graz.at Riem-RPCP (7/19)
A conceptual alternating minimization scheme. Initialize A 0 ∈ M ( r ) , B 0 ∈ N ( s ) . Set k := 0 and iterate: 1. A k +1 ≈ arg min A ∈M ( r ) 2 � A + B k − Z � 2 + µ 1 2 � A � 2 . 2. B k +1 ≈ arg min B ∈N ( s ) 2 � A k +1 + B − Z � 2 . 1 Theorem (sufficient descrease + stationarity ⇒ convergence) Let { ( A k , B k ) } be generated as above. Suppose that there exists δ > 0 , ε k a ↓ 0 , and ε k b ↓ 0 such that for all k : f ( A k +1 , B k ) ≤ f ( A k , B k ) − δ � A k +1 − A k � 2 , f ( A k +1 , B k +1 ) ≤ f ( A k +1 , B k ) − δ � B k +1 − B k � 2 , � ∆ , (1 + µ ) A k +1 + B k − Z � ≥ − ε k for any ∆ ∈ T M ( r ) ( A k +1 ) , a � ∆ � , � ∆ , A k +1 + B k +1 − Z � ≥ − ε k for any ∆ ∈ T N ( s ) ( B k +1 ) . b � ∆ � , Then any non-degenerate limit point ( A ∗ , B ∗ ) , i.e. rank( A ∗ ) = r and � B ∗ � 0 = s , satisfies the first-order optimality condition. tao.wu@uni-graz.at Riem-RPCP (8/19)
Sparse matrix subproblem. ◮ The global solution P N ( s ) ( Z − A k +1 ) (as metric projection) can be efficiently calculated from “sorting”. ◮ The global solution may not necessarily fulfill the sufficient descrease condition. ◮ Whenever necessary, safeguard by a local solution: � ( Z − A k +1 ) ij , if B k ij � = 0 , B k +1 = ij 0 , otherwise . ◮ Given non-degeneracy of B k +1 , i.e. � B k +1 � 0 = s , the exact stationarity holds. tao.wu@uni-graz.at Riem-RPCP (9/19)
Low-rank matrix subproblem: a Riemannian perspective. 1 ◮ Global solution P M ( r ) ( 1+ µ ( Z − B k )) as metric projection: ◮ available due to Eckart-Young theorem; i.e. n r 1 1 1 + µ ( Z − B k ) = � σ j u j v ⊤ 1 + µ ( Z − B k )) = � σ j u j v ⊤ ⇒ P M ( r ) ( j . j j =1 j =1 ◮ but requires SVD in full dimension � expensive for large-scale problems (e.g. m, n ≥ 2000 ). ◮ Alternatively resolved by a single Riemannian optimization step on matrix manifold. ◮ Riemannian optimization applied to low-rank matrix/tensor problems; see [Simonsson and Eld´ en, ’10], [Savas and Lim, ’10], [Vandereycken, ’13], ... ◮ Our goal: The subproblem solver should activate the convergence criteria, i.e. sufficient descrease + stationarity. tao.wu@uni-graz.at Riem-RPCP (10/19)
Riemannian optimization: an overview. R f M ◮ References: [Smith, ’93], [Edelman, Arias, and Smith, ’98], [Absil, Mahony, and Sepulchre, ’08], ... ◮ Why Riemannian optimization? ◮ Local homeomorphism is computationally infeasible/expensive. ◮ Intrinsically low dimensionality of the underlying manifold. ◮ Further dimension reduction via quotient manifold. ◮ Typical Riemannian manifolds in applications: ◮ finite-dimensional (matrix manifold): Stiefel manifold, Grassmann manifold, fixed-rank matrix manifold, ... ◮ infinite-dimensional: shape/curve spaces, ... tao.wu@uni-graz.at Riem-RPCP (11/19)
00˙AMS September 23, 2007 Riemannian optimization: a conceptual algorithm. T ¯ M ( r ) ( A k ) T x M x A k ∆ k ξ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ¯ M ( r ) M At the current iterate: 1. Build a quadratic model in the tangent space using Riemannian gradient and Riemannian Hessian. 2. Based on the quadratic model, build a tangential search path. 3. Perform backtracking path search via retraction to determine the step size. 4. Generate the next iterate. tao.wu@uni-graz.at Riem-RPCP (12/19)
Riemannian gradient and Hessian. ¯ A : A ∈ ¯ M ( r ) := { A : rank( A ) = r } ; f k M ( r ) �→ f ( A, B k ) . ◮ ◮ Riemannian gradient, grad f k A ( A ) ∈ T ¯ M ( r ) ( A ) , is defined s.t. � grad f k A ( A ) , ∆ � = Df k A ( A )[∆] , ∀ ∆ ∈ T ¯ M ( r ) ( A ) . grad f k M ( r ) ( A ) ( ∇ f k A ( A ) = P T ¯ A ( A )) . ◮ Riemannian Hessian, Hess f k A ( A ) : T ¯ M ( r ) ( A ) → T ¯ M ( r ) ( A ) , is defined s.t. Hess f k A ( A )[∆] = ∇ ∆ grad f k A ( A ) , ∀ ∆ ∈ T ¯ M ( A ) . Hess f k A ( A )[∆] = ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ )∆ ⊤ U Σ − 1 V ⊤ + U Σ − 1 V ⊤ ∆ ⊤ ( I − UU ⊤ ) ∇ f k A ( A )( I − V V ⊤ ) + (1 + µ )∆ . See, e.g., [Vandereycken, ’12]. tao.wu@uni-graz.at Riem-RPCP (13/19)
00˙AMS September 23, 2007 Dogleg search path and projective retraction. M ( r ) ( A k ) T ¯ T x M Trust region x A k p ( ) Optimal trajectory ∆( σ ) ∆ k ξ ∆ R x ( ξ ) M ( r ) ( A k , ∆ k ) retract ¯ ∆ N p B ( full step ) M ( r ) ¯ ∆ C p U ( unconstrained min along — g ) — g M dogleg path ◮ “Dogleg” path ∆ k ( τ k ) as approximation of optimal trajectory of tangential trust-region subproblem (left figure): A ( A k ) + � g k , ∆ � + 1 min f k 2 � ∆ , H k [∆] � M ( r ) ( A k ) , � ∆ � ≤ σ. s . t . ∆ ∈ T ¯ ◮ Metric projection as retraction (right figure): M ( r ) ( A k + ∆ k ( τ k )) . M ( r ) ( A k , ∆ k ( τ k )) = P ¯ retract ¯ Computationally efficient: “reduced” SVD on 2 r -by- 2 r matrix! tao.wu@uni-graz.at Riem-RPCP (14/19)
Low-rank matrix subproblem: projected dogleg step. Given A k ∈ ¯ M ( r ) , B k ∈ N ( s ) : 1. Compute g k , H k , and build the dogleg search path ∆ k ( τ k ) in M ( r ) ( A k ) . T ¯ 2. Whenever non-positive definiteness of H k is detected, replace the dogleg search path by the line search path along steepest descent direction, i.e. ∆( τ k ) = − τ k g k . 3. Perform backtracking path/line search; i.e. find the largest step size τ k ∈ { 2 , 3 / 2 , 1 , 1 / 2 , 1 / 4 , 1 / 8 , ... } s.t. the sufficient descrease condition is satisfied: f k A ( A k ) − f k M ( r ) ( A k +∆ k ( τ k ))) ≥ δ � A k − P ¯ M ( r ) ( A k +∆ k ( τ k )) � 2 . A ( P ¯ 4. Return A k +1 = f k M ( r ) ( A k + ∆ k ( τ k ))) . A ( P ¯ tao.wu@uni-graz.at Riem-RPCP (15/19)
Recommend
More recommend