a 2 phase augmented lagrangian approach for large scale
play

A 2-phase augmented Lagrangian approach for large scale matrix - PowerPoint PPT Presentation

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun Department of Mathematics, National University of Singapore September 5, 2014 (Presentation at 2014 Workshop on Optimization for Modern Computation, Beijing


  1. A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun Department of Mathematics, National University of Singapore September 5, 2014 (Presentation at 2014 Workshop on Optimization for Modern Computation, Beijing University) Joint work with: Kim-Chuan Toh, National University of Singapore Students/postdocs: Caihua Chen (Nanjing), Junfeng Yang (Nanjing), Chao Ding (CAS), Kaifeng Jiang (DBS), Yongjin Liu (Shengyang Aerospace U), Chengjing Wang (Southwest Jiaotong U), Liuqin Yang (NUS), Xudong Li (NUS), Xinyuan Zhao (Beijing U Tech.) 1

  2. Outline Matrix optimization problem (MOP) Examples: linear semidefinite programming (SDP), etc General framework of proximal-point algorithm (PPA) 2-phase PPA applied to SDP and SDP+ (matrix variable is pos- itive semidefinite and nonnegative) A majorized semismooth Newton-CG (SNCG) method for solving PPA subproblems SDPNAL+: practical implementation of PPA for SDP+ Numerical experiments 2

  3. Convex conic matrix optimization X = R p × n or S n ( n × n symmetric matrices) endowed trace inner product �· , ·� and Frobenius norm � · � � � (MOP) min f ( X ) | A ( X ) − b ∈ Q , X ∈ X f : X → ( −∞ , ∞ ] is a proper closed convex function Q is a closed convex cone in R m b ∈ R m A : X → R m is a given (onto) linear map, e.g., A ( X ) = diag( X ) Define A ∗ = the adjoint of A Define the dual cone Q ∗ = { X ∈ X | � Y, X � ≥ 0 ∀ Y ∈ Q} . Define � 0 if X ∈ Q (indicator function) δ Q ( X ) = ∞ otherwise 3

  4. Dual of MOP Define f ∗ ( Z ) {� Z, X � − f ( X ) } (conjugate function) = sup X ∈X (subdifferential) ∂f ( X ) = conv { subgradients of f at X } The dual problem of (MOP) is given by y ∈Q ∗ � b, y � − f ∗ ( A ∗ y ) max The KKT conditions for (MOP) are: y ∈ Q ∗ , A ∗ y ∈ ∂f ( X ) A X − b ∈ Q , 4

  5. MOP covers many important classes of problems S n + = cone of positive semidefinite matrices. Write X � 0 if X ∈ S n + . MOP includes linear semidefinite programming (SDP): � � � C, X � | A ( X ) = b, X ∈ S n (SDP) min + � + ( X ) | A ( X ) − b ∈ Q := { 0 } m � f ( X ) := � C, X � + δ S n = min � if X ∈ S n 0 + + ( X ) = indicator function of S n δ S n + = ∞ otherwise SDP is solvable by powerful interior-point methods if n and m are not too large, say, n ≤ 2 , 000 , m ≤ 10 , 000 . Current research interests focus on n ≤ 10 , 000 but m ≫ 10 , 000 . 5

  6. SDP and MOP have lots of Applications SDP (and more generally MOP) is a powerful modelling tool! Appli- cations are growing rapidly, and driving developments in algorithms and software. LMI in control Combinatorial optimization Robust optimization: project management, revenue manage- ment Polynomial optimization: option pricing, queueing systems Moment problems, applied probability Engineering: Signal processing, communication, structural opti- mization, computer vision Statistics/Finance: correlation/covariance matrix estimation Machine learning: kernel estimation, dimensionality reduction/manifold unfolding, Euclidean metric embedding: sensor network localization, molec- ular conformation Quantum chemistry, quantum information Many others ... 6

  7. Maximum stable set problem a graph G = ( V, E ) A stable set S is subset of V such that no vertices in S are adjacent. Maximum stable set problem: find S with maximum cardinality. Let � 1 � n if i ∈ S x i = ⇒ | S | = x i . 0 otherwise i =1 A common formulation of the max-stable-set problem: � � 1 ij x i x j | x i x j = 0 ∀ ( i, j ) ∈ E , x ∈ { 0 , 1 } n | S | = α ( G ) := max | S | X := xx T / | S | ⇓ � � max � E, X � | X ij = 0 ∀ ( i, j ) ∈ E , � I, X � = 1 SDP relaxation: X = xx T / | S | ⇒ X � 0 , get � � θ ( G ) := max � E, X � : X ij = 0 ∀ ( i, j ) ∈ E , � I, X � = 1 , X � 0 θ + ( G ) := n ( n + 1) / 2 additional constraints X ≥ 0 7

  8. Quadratic assignment problem (QAP) Assign n facilities to n locations [Koopmans and Beckmann (1957)] A = ( a ij ) where a ij = flow from facility i to facility j B = ( b kl ) where b kl = distance from location k to location l cost of assignment π = � n � n j =1 a ij b π ( i ) π ( j ) i =1 � � � B ⊗ A, vec( P )vec( P ) T � | P is n × n permutation matrix min P SDP+ relaxation [Povh and Rendl, 09]: relax vec( P )vec( P ) T to the n 2 × n 2 variable X ∈ S n 2 + and X ≥ 0 � � � B ⊗ A, X � | A ( X ) − b = 0 , X ∈ S n 2 (QAP) min + , X ≥ 0 where the linear constraints (with m = 3 n ( n + 1) / 2 ) encode the condition P T P = I n , P ≥ 0 . 8

  9. Relaxations of rank-1 tensor approximations Consider symmetric 4-tensor [Nie, Lasserre, Lim, De Lathauwer et al]: � F ijkl x i x j x k x l → F ≈ λ ( u ⊗ u ⊗ u ⊗ u ) f ( x ) = 1 ≤ i,j,k,l ≤ n for some scalar λ and u ∈ R n with � u � = 1 . Need to solve: max x ∈ R n {± f ( x ) | g ( x ) := x 2 1 + · · · + x 2 n = 1 } . Let [ x ] d = monomial vector of degree at most d � � A α x α ⇒ M d ( y ) := [ x ] d [ x ] T = A α y α d | α |≤ 2 d α � f α x α ⇒ � f, y � f ( x ) = � g α x α ⇒ � g, y � g ( x ) = SDP relaxation is given by: max {� f, y � | � g, y � = 1 , M d ( y ) � 0 } Relaxation is tight if rank( M d ( y ∗ ) )=1. 9

  10. Molecular conformation and sensor localization Given sparse and noisy distance data { d ij | ( i, j ) ∈ E} for n atoms, find coordinates v 1 , . . . , v n in R 3 such that � v i − v j � ≈ d ij . Typically E consists of 20–50% of all pairs of atoms which are ≤ 6 ˚ A apart. Consider the model: �� ij | | v 1 , . . . , v n ∈ R 3 � ( ij ) ∈E |� v i − v j � 2 − d 2 min Let V = [ v 1 , . . . , v n ] and X = V T V. Relaxing X = V T V to X � 0 lead to an SDP: �� � ( i,j ) ∈E |� A ij , X � − d 2 min ij | : � E, X � = 0 , X � 0 X where A ij = e i e T i + e j e T j − e i e T j − e j e T i 10

  11. Protein molecule 1PTQ from Protein Data Bank: number of atoms n = 402 number of pairwise distances given |E| ≈ 3700 (50% of distances ≤ 6 ˚ A ≈ 4.5% of all pairwise distances) Actual Reconstructed 11

  12. Nuclear norm minimization problem Given a partially observed matrix of M ∈ R n × n , find a min-rank matrix Y ∈ R n × n to complete M : � � min rank ( Y ) | Y ij = M ij ∀ ( i, j ) ∈ E (NP-hard) Y ∈ R n × n [Candes, Parrilo, Recht, Tao,...] For a given rank- r matrix M ∈ R n × n that satisfies certain properties, if enough entries ( ∝ r n polylog( n ) ) are sampled randomly, then with very high probability, M can be recovered from the following nuclear norm minimization problem: � � easier problem, but still � Y � ∗ | Y ij = M ij ∀ ( i, j ) ∈ E min nontrivial to solve ! Y ∈ R n × n where � Y � ∗ = sum of singular values of Y . 12

  13. Based on partially observed matrix, predict unobserved entries: will customer i like movie j ? movies 2 1 4 5 5 4 ? 1 3 3 5 2 4 ? 5 3 ? 4 1 3 5 2 1 ? 4 1 5 5 4 users 2 ? 5 ? 4 3 3 1 5 2 1 3 1 2 3 4 5 1 3 3 3 ? 5 2 ? 1 1 5 2 ? 4 4 1 3 1 5 4 5 1 2 4 5 ?

  14. Sparse covariance selection problems Given i.i.d. observations drawn from an n -dimensional Gaussian dis- tribution N ( x, µ, Σ) , let � Σ be the sample covariance matrix. Want to estimate Σ , whose inverse X := Σ − 1 is sparse. Dempster (1972) proved that x i and x j are conditionally inde- pendent (given all other x k ) if and only if X ij = 0 . Typically, we estimate X via the log-likelihood function: � � log det X − � � max Σ , X � − � W, | X |� | X ≻ 0 where the weighted L 1 -term is added to encourage sparsity in X . Many papers: d’Aspremont, M. Yuan, Lu, Meinshausen, B¨ uhlmann, Wang-Sun-Toh, Yang-Sun-Toh 14

  15. Convex quadratic SDP (MOP) also contains the important case of convex quadratic SDP: � 1 � 2 � X, Q ( X ) � + � C, X � | A ( X ) − b = 0 , X ∈ S n (QSDP) min + X ∈S n Q : S n → S n is a self-adjoint positive semidefinite linear operator. A well-studied example is the nearest correlation matrix problem, where given data matrix U ∈ S n and weight matrix W ≻ 0 , we want to solve the W -weighted NCM problem: � 1 � 2 � W ( X − U ) W � 2 | Diag( X ) = 1 , X � 0 (W-NCM) min . X 1 The alternating projection method [Higham 02] 2 The quasi-Newton method [Malick 04] 3 An inexact semismooth Newton-CG method [Qi and Sun 06] 4 An inexact interior-point method [Toh, T¨ ut¨ unc¨ u and Todd 07] 15

  16. H -weighted NCM problem � 1 � 2 � H ◦ ( X − U ) � 2 | Diag( X ) = 1 , X � 0 (H-NCM) min X where H ∈ S n has nonnegative entries and “ ◦ ” denotes the Hardamard product. 1 An inexact IPM for convex QSDP [Toh 08] 2 An ALM [Qi and Sun 10] 3 A semismooth Newton-CG ALM for convex quadratic program- ming over symmetric cones [Zhao 09] 4 A modified alternating direction method for convex quadratically constrained QSDPs [J. Sun and Zhang 10] 16

Recommend


More recommend