Reformulations The (MSSC) formulation has the same optima as: � � min P ij x ij x,y,P i ≤ n j ≤ d � p i − y j � 2 ∀ i ≤ n, j ≤ d ≤ P ij 2 � � ∀ j ≤ d p i x ij = y j x ij i ≤ n i ≤ n � ∀ i ≤ n x ij = 1 j ≤ d ∀ j ≤ d y j ∈ ([min i ≤ n p ia ] , max i ≤ n p ia | a ≤ d ) { 0 , 1 } nd x ∈ [0 , P U ] nd ∈ P ◮ Only nonconvexities: products of bounded by binary variables ◮ Caveat : cannot have empty clusters 26 / 160
Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [0 , 1] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: z ∈ [0 , 1] y − (1 − x ) ≤ z ≤ y + (1 − x ) − x ≤ z ≤ x ◮ ⇒ Everything’s linear now! [Fortet 1959] 27 / 160
Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [ y L , y U ] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: [min( y L , 0) , max( y U , 0)] z ∈ y − (1 − x ) max( | y L | , | y U | ) ≤ ≤ y + (1 − x ) max( | y L | , | y U | ) z − x max( | y L | , | y U | ) ≤ ≤ x max( | y L | , | y U | ) z ◮ ⇒ Everything’s linear now! [L. et al. 2009] 28 / 160
MSSC is a convex MINLP � � min χ ij x,y,P,χ,ξ i ≤ n j ≤ d ∀ i ≤ n, j ≤ d 0 ≤ χ ij ≤ P ij ∀ i ≤ n, j ≤ qquadP ij − (1 − x ij ) P U ≤ ≤ x ij P U χ ij � p i − y j � 2 ∀ i ≤ n, j ≤ d ≤ P ij ⇐ convex 2 � � ∀ j ≤ d p i x ij = ξ ij i ≤ n i ≤ n y j − (1 − x ij ) max( | y L | , | y U | ) ≤ ≤ y j + (1 − x ij ) max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij − x ij max( | y L | , | y U | ) ≤ ≤ x ij max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij � ∀ i ≤ n x ij = 1 j ≤ d [ y L , y U ] ∀ j ≤ d y j ∈ { 0 , 1 } nd x ∈ [0 , P U ] nd P ∈ [0 , P U ] nd χ ∈ [min( y L , 0) , max( y U , 0)] ∀ i ≤ n, j ≤ d ξ ij ∈ y j , ξ ij , y L , y U are vectors in R m 29 / 160
Solving the MSSC ◮ k-means ◮ heuristic (optimum not guaranteed) ◮ fast, well-known, lots of analyses ◮ scales reasonably well ◮ implemented in practically all languages ◮ convex MINLP ◮ exact (guaranteed global optima) ◮ reasonably fast only for small sizes ◮ scales exponentially ◮ Solvers: KNITRO (commercial), Bonmin (free) need an MP language interpreter (AMPL) 30 / 160
Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 31 / 160
Metric embeddings Mapping metric spaces into each other 32 / 160
Graphs with shortest path metric E.g. mathematical genealogy skeleton 33 / 160
The distance matrix (only for a subgraph) Euler Thibaut Pfaff Lagrange Laplace Möbius Gudermann Dirksen Gauss Kästner 10 1 1 9 8 2 2 2 2 Euler 11 9 1 3 10 12 12 8 Thibaut 2 10 10 3 1 1 3 Pfaff 8 8 1 3 3 1 Lagrange 2 9 11 11 7 Laplace 9 11 11 7 Möbius 4 4 2 Gudermann 2 4 Dirksen 4 0 10 1 1 9 8 2 2 2 2 10 0 11 9 1 3 10 12 12 8 1 11 0 2 10 10 3 1 1 3 1 9 2 0 8 8 1 3 3 1 9 1 10 8 0 2 9 11 11 7 D = 8 3 10 8 2 0 9 11 11 7 2 10 3 1 9 9 0 4 4 2 2 12 1 3 11 11 4 0 2 4 2 12 1 3 11 11 4 2 0 4 2 8 3 1 7 7 2 4 4 0 34 / 160
Subsection 1 Fréchet embeddings in ℓ ∞ 35 / 160
ℓ ∞ embeddings ◮ Given a metric space ( X, d ) with distance matrix D = ( d ij ) ◮ Consider i -th row δ i = ( d i 1 , . . . , d in ) of D ◮ Embed i ∈ X by vector δ i ∈ R n ◮ Define f ( X ) = { δ 1 , . . . , δ n } , f ( d ( i, j )) = � f ( i ) − f ( j ) � ∞ ◮ Thm.: ( f ( X ) , ℓ ∞ ) is a metric space with distance matrix D ◮ If ( X, d ) not a metric space, ∃ i, j ∈ X ( d ( i, j ) � = � δ i − δ j � ∞ ) ◮ Issue: embedding is high-dimensional ( R n ) [Kuratowski 1935] 36 / 160
Proof ◮ Consider i, j ∈ X with distance d ( i, j ) = d ij ◮ Then f ( d ( i, j )) = � δ i − δ j � ∞ = max k ≤ n | d ik − d jk | ≤ max k ≤ n | d ij | = d ij ◮ max | d ik − d jk | over k ≤ n is achieved when k ∈ { i, j } ⇒ f ( d ( i, j )) = d ij 37 / 160
Genealogical similarity example Fréchet embedding in the 3 most significant (rotated) dimensions 38 / 160
Subsection 2 Approximate embeddings in ℓ 2 39 / 160
Lower dimensional (approximate) embeddings ◮ Given distance matrix, find approximate Euclidean embedding ◮ Application: visualize a metric space e.g. embed genealogy tree in R 3 (some errors allowed) ◮ For visualization purposes, K ∈ { 1 , 2 , 3 } for other purposes, K < n Classical methods ◮ Multi-Dimensional Scaling (MDS) ◮ Principal Component Analysis (PCA) 40 / 160
Classic Multidimensional Scaling 41 / 160
Definition and history ◮ [I. Schoenberg, Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert” , Ann. Math., 1935] ◮ Question: Given n × n symmetric matrix D , what are necessary and sufficient conditions s.t. D is a EDM 1 corresponding to n points x 1 , . . . , x n ∈ R K with minimum K ? ◮ Main theorem: Thm. D = ( d ij ) is an EDM iff 1 2 ( d 2 1 i + d 2 1 j − d 2 ij | 2 ≤ i, j ≤ n ) is PSD of rank K ◮ PSD: positive semidefinite , all eigenvalues ≥ 0 1 Euclidean Distance Matrix 42 / 160
Gram in function of EDM ◮ x = ( x 1 , . . . , x n ) ⊆ R K , written as n × K matrix ◮ matrix G = xx ⊤ = ( x i · x j ) is the Gram matrix of x ◮ Schoenberg’s theorem: relation between EDMs and Gram matrices G = − 1 2 JD 2 J ( § ) ◮ D 2 = ( d 2 ij ) , J = I n − 1 n 11 ⊤ 43 / 160
Multidimensional scaling (MDS) ◮ Often get approximate EDMs ˜ D from raw data (dissimilarities, discrepancies, differences) ◮ ˜ 2 J ˜ G = − 1 D 2 J is an approximate Gram matrix ◮ Approximate Gram ⇒ spectral decomposition P ˜ Λ P ⊤ has ˜ Λ �≥ 0 ◮ Let Λ be closest PSD diagonal matrix to ˜ Λ : zero the negative components of ˜ Λ √ Λ is an “approximate realization” of ˜ ◮ x = P D 44 / 160
Classic MDS: Main result 1. Prove Schoenberg’s theorem: G = − 1 2 JD 2 J 2. Prove matrix is Gram iff it is PSD 45 / 160
Classic MDS: Proof 1/3 ◮ Assume zero centroid WLOG (can translate x as needed) ◮ Expand: d 2 ij = � x i − x j � 2 2 = ( x i − x j )( x i − x j ) = x i x i + x j x j − 2 x i x j ( ∗ ) ◮ Aim at “inverting” ( ∗ ) to express x i x j in function of d 2 ij ✿ 0 by zero centroid ◮ Sum ( ∗ ) over i : � i d 2 ij = � � i x i x i + nx j x j − 2 x j ✘✘ i x i ✘ ◮ Similarly for j and divide by n , get: 1 1 � d 2 � = x i x i + x j x j ( † ) ij n n i ≤ n i ≤ n 1 x i x i + 1 � d 2 � = x j x j ( ‡ ) ij n n j ≤ n j ≤ n ◮ Sum ( † ) over j , get: 1 ij = n 1 � d 2 � � � x i x i + x j x j = 2 x i x i n n i,j i j i ◮ Divide by n , get: 1 ij = 2 � d 2 � x i x i ( ∗∗ ) n 2 [Borg 2010] n i,j i 46 / 160
Classic MDS: Proof 2/3 ◮ Rearrange ( ∗ ), ( † ), ( ‡ ) as follows: x i x i + x j x j − d 2 2 x i x j = (1) ij 1 ij − 1 � d 2 � x i x i = x j x j (2) n n j j 1 ij − 1 � d 2 � x j x j = x i x i (3) n n i i ◮ Replace LHS of Eq. (2)-(3) in Eq. (1), get 2 x i x j = 1 ik + 1 ij − 2 � d 2 n d 2 kj − d 2 � x k x k n n k k ◮ By ( ∗∗ ) replace 2 1 d 2 � � x i x i with ij , get n n 2 i i,j 2 x i x j = 1 ij − 1 � ( d 2 ik + d 2 kj ) − d 2 � d 2 ( § ) hk n 2 n k h,k which expresses x i x j in function of D 47 / 160
Classic MDS: Proof 3/3 ◮ Gram ⊆ PSD ◮ x is an n × K real matrix ◮ G = xx ⊤ its Gram matrix ◮ For each y ∈ R n we have yGy ⊤ = y ( xx ⊤ ) y ⊤ = ( yx )( x ⊤ y ⊤ ) = ( yx )( yx ) ⊤ = � yx � 2 2 ≥ 0 ◮ ⇒ G � 0 ◮ PSD ⊆ Gram ◮ Let G � 0 be n × n ◮ Spectral decomposition : G = P Λ P ⊤ ( P orthogonal, Λ ≥ 0 diagonal) √ ◮ Λ ≥ 0 ⇒ Λ exists √ √ √ √ ⊤ P ⊤ ) = ( P ⊤ ◮ G = P Λ P ⊤ = ( P Λ)( Λ Λ)( P Λ) √ ◮ Let x = P Λ , then G is the Gram matrix of x 48 / 160
Principal Component Analysis 49 / 160
Principal Component Analysis (PCA) ◮ MDS with fixed K √ ◮ Motivation: “draw” x = P Λ in 2D or 3D but rank (Λ) = K > 3 ◮ Only keep 2 or 3 largest components of Λ zero the rest ◮ Get realization in desired space [Pearson 1901] 50 / 160
Mathematicians genealogy in 2D/3D In 2D In 3D 51 / 160
Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 52 / 160
Distance Geometry Embedding weighted graphs in Euclidean spaces 53 / 160
Distance Geometry Problem (DGP) Given K ∈ N and G = ( V, E, d ) with d : E → R + , find x : V → R K s.t. � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij Given a weighted graph , draw it so edges are drawn as segments with lengths = weights [Cayley 1841, Menger 1928, Schoenberg 1935, Yemini 1978] 54 / 160
Subsection 1 DGP applications 55 / 160
Some applications ◮ clock synchronization ( K = 1 ) ◮ sensor network localization ( K = 2 ) ◮ molecular structure from distance data ( K = 3 ) ◮ autonomous underwater vehicles ( K = 3 ) ◮ distance matrix completion (whatever K ) 56 / 160
Clock synchronization From [Singer, Appl. Comput. Harmon. Anal. 2011] Determine a set of unknown timestamps from a partial measurements of their time differences ◮ K = 1 ◮ V : timestamps ◮ { u, v } ∈ E if known time difference between u, v ◮ d : values of the time differences Used in time synchronization of distributed networks 57 / 160
Clock synchronization 58 / 160
Sensor network localization From [Yemini, Proc. CDSN , 1978] The positioning problem arises when it is necessary to locate a set of geographically distributed objects using measurements of the distances between some object pairs ◮ K = 2 ◮ V : (mobile) sensors ◮ { u, v } ∈ E iff distance between u, v is measured ◮ d : distance values Used whenever GPS not viable (e.g. underwater) d uv ∝ ∼ battery consumption in P2P communication betw. u, v 59 / 160
Sensor network localization 60 / 160
Molecular structure from distance data From [L. et al., SIAM Rev. , 2014] ◮ K = 3 ◮ V : atoms ◮ { u, v } ∈ E iff distance between u, v is known ◮ d : distance values Used whenever X-ray crystallography does not apply (e.g. liquid) Covalent bond lengths and angles known precisely Distances � 5 . 5 measured approximately by NMR 61 / 160
Subsection 2 Complexity of the DGP 62 / 160
Complexity class ◮ DGP 1 with d : E → Q + is in NP ◮ if instance YES ∃ realization x ∈ R n × 1 ◮ if some component x i �∈ Q translate x so x i ∈ Q ◮ consider some other x j � ◮ let ℓ = ( length sh. path p : i → j ) = d uv ∈ Q { u,v }∈ p ◮ then x j = x i ± ℓ → x j ∈ Q ◮ ⇒ verification of ∀{ i, j } ∈ E | x i − x j | = d ij in polytime ◮ DGP K may not be in NP for K > 1 don’t know how to verify � x i − x j � 2 = d ij for x �∈ Q nK 63 / 160
Hardness ◮ Want to show DGP 1 is NP -hard by reduction from Partition Given a = ( a 1 , . . . , a n ) ∈ N n , ∃ I ⊆ { 1 , . . . , n } s.t. � a i = � a i ? i ∈ I i �∈ I ◮ a − → cycle C V ( C ) = { 1 , . . . , n } , E ( C ) = {{ 1 , 2 } , . . . , { n, 1 }} ◮ For i < n let d i,i +1 = a i d n,n +1 = d n 1 = a n ◮ E.g. for a = (1 , 4 , 1 , 3 , 3) , get cycle graph: [Saxe, 1979] 64 / 160
Partition is YES ⇒ DGP 1 is YES (1/2) ◮ Given: I ⊂ { 1 , . . . , n } s.t. � a i = � a i i ∈ I i �∈ I ◮ Construct: realization x of C in R 1. x 1 = 0 // start 2. induction step: suppose x i known if i ∈ I let x i +1 = x i + d i,i +1 // go right else let x i +1 = x i − d i,i +1 // go left ◮ Correctness proof: by the same induction but careful when i = n : have to show x n +1 = x 1 65 / 160
Partition is YES ⇒ DGP 1 is YES (2/2) � � (1) = ( x i +1 − x i ) = d i,i +1 = i ∈ I i ∈ I � � = a i = a i = i ∈ I i �∈ I � � = d i,i +1 = ( x i − x i +1 ) = (2) i �∈ I i �∈ I � � � (1) = (2) ⇒ ( x i +1 − x i ) = ( x i − x i +1 ) ⇒ ( x i +1 − x i ) = 0 i ∈ I i �∈ I i ≤ n ⇒ ( x n +1 − x n ) + ( x n − x n − 1 ) + · · · + ( x 3 − x 2 ) + ( x 2 − x 1 ) = 0 ⇒ x n +1 = x 1 66 / 160
Partition is NO ⇒ DGP 1 is NO ◮ By contradiction: suppose DGP 1 is YES, x realization of C ◮ F = {{ u, v } ∈ E ( C ) | x u ≤ x v } , E ( C ) � F = {{ u, v } ∈ E ( C ) | x u > x v } ◮ Trace x 1 , . . . , x n : follow edges in F ( → ) and in E ( C ) � F ( ← ) � � ( x v − x u ) = ( x u − x v ) { u,v }∈ F { u,v }�∈ F � � | x u − x v | = | x u − x v | { u,v }∈ F { u,v }�∈ F � � d uv = d uv { u,v }∈ F { u,v }�∈ F ◮ Let J = { i < n | { i, i + 1 } ∈ F } ∪ { n | { n, 1 } ∈ F } � � ⇒ a i = a i i ∈ J i �∈ J ◮ So J solves Partition instance, contradiction ◮ ⇒ DGP is NP-hard; since ∈ NP , DGP 1 is also NP-complete 67 / 160
Subsection 3 Number of solutions 68 / 160
Number of solutions ◮ ( G, K ) : DGP instance ˜ X ⊆ R Kn : set of solutions ◮ ◮ Congruence : composition of translations, rotations, reflections ◮ C = set of congruences in R K ◮ x ∼ y means ∃ ρ ∈ C ( y = ρx ) : distances in x are preserved in y through ρ ◮ ⇒ if | ˜ X | > 0 , | ˜ X | = 2 ℵ 0 69 / 160
Number of solutions modulo congruences ◮ Congruence is an equivalence relation ∼ on ˜ X (reflexive, symmetric, transitive) ◮ Partitions ˜ X into equivalence classes ◮ X = ˜ X/ ∼ : sets of representatives of equivalence classes ◮ Focus on | X | rather than | ˜ X | 70 / 160
Examples 71 / 160
Rigidity, flexibility and | X | ◮ infeasible ⇔ | X | = 0 ◮ rigid graph ⇔ | X | < ℵ 0 ◮ globally rigid graph ⇔ | X | = 1 ◮ flexible graph ⇔ | X | = 2 ℵ 0 ◮ DMDGP graphs ⇔ | X | a power of 2 ◮ | X | = ℵ 0 : impossible by Milnor’s theorem [Milnor 1964, L. et al. 2013] 72 / 160
Milnor’s theorem implies | X | � = ℵ 0 ◮ System S of polynomial equations of degree 2 ∀ i ≤ m p i ( x 1 , . . . , x nK ) = 0 ◮ Let X be the set of x ∈ R nK satisfying S ◮ Number of connected components of X is O (3 nK ) [Milnor 1964] ◮ If | X | is countably ∞ then G cannot be flexible ⇒ incongruent elts of X are separate connected components ⇒ by Milnor’s theorem, there’s finitely many of them 73 / 160
Subsection 4 MP based solution methods 74 / 160
Direct methods 75 / 160
System of quadratic equations � x u − x v � 2 = d 2 ∀{ u, v } ∈ E (4) uv Computationally: useless (less than 10 vertices with K = 3 using Octave) 76 / 160
Unconstrained Global Optimization � ( � x u − x v � 2 − d 2 uv ) 2 min (5) x { u,v }∈ E Globally optimal obj. fun. value of (5) is 0 iff x solves (4) Computational experiments in [L. et al., 2006] : ◮ GO solvers from 10 years ago ◮ randomly generated protein data: ≤ 50 atoms ◮ cubic crystallographic grids: ≤ 64 atoms 77 / 160
Constrained global optimization � |� x u − x v � 2 − d 2 ◮ min x uv | exactly reformulates (4) { u,v }∈ E ◮ Relax objective f to concave part, remove constant term, rewrite min − f as max f ◮ Reformulate convex part of obj. fun. to convex constraints ◮ Exact reformulation � � � x u − x v � 2 max x (6) { u,v }∈ E � x u − x v � 2 ≤ d 2 ∀{ u, v } ∈ E uv Theorem (Activity) At a glob. opt. x ∗ of a YES instance, all constraints of (6) are active [Mencarelli et al. 2017] 78 / 160
Semidefinite Programming 79 / 160
Linearization � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij � x i � 2 2 + � x j � 2 2 − 2 x i · x j = d 2 ⇒ ∀{ i, j } ∈ E ij � ∀{ i, j } ∈ E d 2 X ii + X jj − 2 X ij = ij ⇒ x x ⊤ X = X = x x ⊤ ⇔ ∀ i, j X ij = x i x j 80 / 160
Relaxation x x ⊤ X = X − x x ⊤ ⇒ = 0 X − x x ⊤ (relax) ⇒ � 0 � I K � x ⊤ � Schur ( X, x ) = 0 x X If x does not appear elsewhere ⇒ get rid of it (e.g. choose x = 0 ): replace Schur ( X, x ) � 0 by X � 0 Reason for this weird relaxation: there are efficient solvers for Semidefinite Programming (SDP) 81 / 160
SDP relaxation min F • X d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X � 0 How do we choose F ? 82 / 160
Some possible objective functions ◮ For protein conformation: � max ( X ii + X jj − 2 X ij ) { i,j }∈ E with = changed to ≤ in constraints (or min and ≥ ) [Dias & L. 2016] “push-and-pull” the realization ◮ [Ye, 2003], application to wireless sensors localization min tr ( X ) improve covariance estimator accuracy ◮ How about “just random”? [Barvinok 1995] 83 / 160
Computational evaluation ◮ Download protein files from Protein Data Bank (PDB) they contain atom realizations ◮ Mimick a Nuclear Magnetic Resonance experiment Keep only pairwise distances < 5 . 5 ◮ Try and reconstruct the protein shape from those weighted graphs ◮ Quality evaluation of results: ◮ LDE ( x ) = max { i,j }∈ E | � x i − x j � − d ij | � 1 ◮ MDE ( x ) = | � x i − x j � − d ij | | E | { i,j }∈ E 84 / 160
Objective function tests SDP solved with Mosek SDP + PCA Instance LDE MDE CPU Name | V | | E | PP Ye Rnd PP Ye Rnd PP Ye Rnd C0700odd.1 15 39 3.31 4.57 4.44 1.92 2.52 2.50 0.13 0.07 0.08 C0700odd.C 36 242 10.61 4.85 4.85 3.02 3.02 3.02 0.69 0.43 0.44 C0700.odd.G 36 308 4.57 4.77 4.77 2.41 2.84 2.84 0.86 0.54 0.54 C0150alter.1 37 335 4.66 4.88 4.86 2.52 3.00 3.00 0.97 0.59 0.58 C0080create.1 60 681 7.17 4.86 4.86 3.08 3.19 3.19 2.48 1.46 1.46 tiny 37 335 4.66 4.88 4.88 2.52 3.00 3.00 0.97 0.60 0.60 1guu-1 150 959 10.20 4.93 4.93 3.43 3.43 3.43 9.23 5.68 5.70 SDP + PCA + NLP Instance LDE MDE CPU | V | | E | Name PP Ye Rnd PP Ye Rnd PP Ye Rnd 1b03 89 456 0.00 0.00 0.00 0.00 0.00 0.00 8.69 6.28 9.91 1crn 138 846 0.81 0.81 0.81 0.07 0.07 0.07 33.33 31.32 44.48 1guu-1 150 959 0.97 4.93 0.92 0.10 3.43 0.08 56.45 7.89 65.33 [Dias & L., 2016] 85 / 160
Empirical considerations ◮ Ye very fast but often imprecise ◮ Random good but nondeterministic ◮ Push-and-Pull relaxes X ii + X jj − 2 X ij = d 2 ij to X ii + X jj − 2 X ij ≥ d 2 ij , easier to satisfy feasibility ...will be useful in DDP later on Focus on Push-and-Pull objective 86 / 160
Diagonally Dominant Programming 87 / 160
When SDP solvers hit their size limit ◮ SDP solver: technological bottleneck ◮ How can we best use an LP solver? ◮ Diagonally Dominant (DD) matrices are PSD ◮ Not vice versa : inner approximate PSD cone Y � 0 ◮ Idea by A.A. Ahmadi and co-authors [Ahmadi & Majumdar 2014, Ahmadi & Hall 2015] 88 / 160
Diagonally dominant matrices n × n matrix X is DD if � ∀ i ≤ n X ii ≥ | X ij | . j � = i 1 0 . 1 − 0 . 2 0 0 . 04 0 0 . 1 1 − 0 . 05 0 . 1 0 0 E.g. − 0 . 2 − 0 . 05 1 0 . 1 0 . 01 0 0 0 . 1 0 . 1 1 0 . 2 0 . 3 0 . 04 0 0 . 01 0 . 2 1 − 0 . 3 0 0 0 0 . 3 − 0 . 3 1 89 / 160
DD Linearization � ∀ i ≤ n X ii ≥ | X ij | ( ∗ ) j � = i ◮ introduce “sandwiching” variable T ◮ write | X | as T ◮ add constraints − T ≤ X ≤ T ◮ by ≥ constraint sense, write ( ∗ ) as � X ii ≥ T ij j � = i 90 / 160
DD Programming (DDP) � d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X is DD d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij � ∀ i ≤ n T ij ≤ X ii ⇒ j ≤ n j � = i − T ≤ X ≤ T 91 / 160
DDP formulation for the DGP � min ( X ii + X jj − 2 X ij ) { i,j }∈ E d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij ≥ ij � ∀ i ≤ n ≤ T ij X ii j ≤ n j � = i − T ≤ X ≤ T ≥ T 0 [Dias & L., 2016] 92 / 160
SDP vs. DDP: tests Using “push-and-pull” objective in SDP SDP solved with Mosek, DDP with CPLEX SDP/DDP + PCA SDP DDP Instance LDE MDE CPU modl/soln LDE MDE CPU modl/soln C0700odd.1 0.79 0.34 0.06/0.12 0.38 0.30 0.15/0.15 C0700.odd.G 2.38 0.89 0.57/1.16 1.86 0.58 1.11/0.95 C0150alter.1 1.48 0.45 0.73/1.33 1.54 0.55 1.23/1.04 C0080create.1 2.49 0.82 1.63/7.86 0.98 0.67 3.39/4.07 0.50 0.15 6.67/684.89 1.00 0.85 37.74/153.17 1guu-1 93 / 160
Subsection 5 Barvinok’s naive algorithm 94 / 160
Concentration of measure From [Barvinok, 1997] The value of a “well behaved” function at a random point of a “big” probability space X is “very close” to the mean value of the function. and In a sense, measure concentration can be considered as an extension of the law of large numbers. 95 / 160
Concentration of measure Given Lipschitz function f : X → R s.t. ∀ x, y ∈ X | f ( x ) − f ( y ) | ≤ L � x − y � 2 for some L ≥ 0 , there is concentration of measure if ∃ constants c, C s.t. P x ( | f ( x ) − E ( f ) | > ε ) ≤ c e − Cε 2 /L 2 ∀ ε > 0 ≡ “discrepancy from mean is unlikely” 96 / 160
Barvinok’s theorem Consider: ◮ for each k ≤ m , manifolds X k = { x ∈ R n | x ⊤ Q k x = a k } ◮ a feasibility problem x ∈ � X k k ≤ m ◮ its SDP relaxation ∀ x ≤ m ( Q k • X = a k ) with soln. ¯ X X ) , y ∼ N n (0 , 1) and x ′ = Ty Let T = factor ( ¯ Then ∃ c and n 0 ∈ N s.t. if n ≥ n 0 , � � � � ¯ ∀ k ≤ m dist ( x ′ , X k ) ≤ c Prob X � 2 ln n ≥ 0 . 9 . IDEA: since x ′ is “close” to each X k try local Nonlinear Programming (NLP) 97 / 160
Application to the DGP X ij = { x ∈ R nK | � x i − x j � 2 ◮ ∀{ i, j } ∈ E 2 = d 2 ij } � ◮ DGP can be written as X ij { i,j }∈ E ◮ SDP relaxation X ii + X jj − 2 X ij = d 2 ij ∧ X � 0 with soln. ¯ X ◮ Difference with Barvinok: x ∈ R Kn , rk ( ¯ X ) ≤ K ◮ IDEA: sample y ∼ N nK (0 , 1 K ) √ ◮ Thm. Barvinok’s theorem works in rank K [L. & Vu, unpublished] 98 / 160
The heuristic 1. Solve SDP relaxation of DGP, get soln. ¯ X use DDP+LP if SDP+IPM too slow 2. a. T = factor ( ¯ X ) b. y ∼ N nK (0 , 1 K ) √ c. x ′ = Ty 3. Use x ′ as starting point for a local NLP solver on formulation � � x i − x j � 2 − d 2 � � 2 min ij x { i,j }∈ E and return improved solution x [Dias & L., 2016] 99 / 160
SDP+Barvinok vs. DDP+Barvinok SDP DDP Instance LDE MDE CPU LDE MDE CPU C0700odd.1 0.00 0.00 0.63 0.00 0.00 1.49 C0700.odd.G 0.00 0.00 21.67 0.42 0.01 30.51 C0150alter.1 0.00 0.00 29.30 0.00 0.00 34.13 C0080create.1 0.00 0.00 139.52 0.00 0.00 141.49 1b03 0.18 0.01 132.16 0.38 0.05 101.04 0.78 0.02 800.67 0.76 0.04 522.60 1crn 1guu-1 0.79 0.01 1900.48 0.90 0.04 667.03 Most of the CPU time taken by local NLP solver 100 / 160
Recommend
More recommend