distance geometry in data science
play

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole - PowerPoint PPT Presentation

Distance Geometry in Data Science Leo Liberti, CNRS LIX Ecole Polytechnique liberti@lix.polytechnique.fr CNMAC 2017 1 / 160 Line of reasoning for this talk 1. Graphs and weighted graphs necessary to model data 2. Computers can reason by


  1. Reformulations The (MSSC) formulation has the same optima as: � �  min P ij x ij  x,y,P  i ≤ n j ≤ d   � p i − y j � 2  ∀ i ≤ n, j ≤ d ≤ P ij  2   � �  ∀ j ≤ d p i x ij = y j x ij    i ≤ n i ≤ n  � ∀ i ≤ n x ij = 1  j ≤ d    ∀ j ≤ d y j ∈ ([min i ≤ n p ia ] , max i ≤ n p ia | a ≤ d )      { 0 , 1 } nd x ∈     [0 , P U ] nd ∈ P ◮ Only nonconvexities: products of bounded by binary variables ◮ Caveat : cannot have empty clusters 26 / 160

  2. Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [0 , 1] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: z ∈ [0 , 1] y − (1 − x ) ≤ z ≤ y + (1 − x ) − x ≤ z ≤ x ◮ ⇒ Everything’s linear now! [Fortet 1959] 27 / 160

  3. Products of binary and continuous vars. ◮ Suppose term xy appears in a formulation ◮ Assume x ∈ { 0 , 1 } and y ∈ [ y L , y U ] is bounded ◮ means “either z = 0 or z = y ” ◮ Replace xy by a new variable z ◮ Adjoin the following constraints: [min( y L , 0) , max( y U , 0)] z ∈ y − (1 − x ) max( | y L | , | y U | ) ≤ ≤ y + (1 − x ) max( | y L | , | y U | ) z − x max( | y L | , | y U | ) ≤ ≤ x max( | y L | , | y U | ) z ◮ ⇒ Everything’s linear now! [L. et al. 2009] 28 / 160

  4. MSSC is a convex MINLP � � min χ ij x,y,P,χ,ξ i ≤ n j ≤ d ∀ i ≤ n, j ≤ d 0 ≤ χ ij ≤ P ij ∀ i ≤ n, j ≤ qquadP ij − (1 − x ij ) P U ≤ ≤ x ij P U χ ij � p i − y j � 2 ∀ i ≤ n, j ≤ d ≤ P ij ⇐ convex 2 � � ∀ j ≤ d p i x ij = ξ ij i ≤ n i ≤ n y j − (1 − x ij ) max( | y L | , | y U | ) ≤ ≤ y j + (1 − x ij ) max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij − x ij max( | y L | , | y U | ) ≤ ≤ x ij max( | y L | , | y U | ) ∀ i ≤ n, j ≤ d ξ ij � ∀ i ≤ n x ij = 1 j ≤ d [ y L , y U ] ∀ j ≤ d y j ∈ { 0 , 1 } nd x ∈ [0 , P U ] nd P ∈ [0 , P U ] nd χ ∈ [min( y L , 0) , max( y U , 0)] ∀ i ≤ n, j ≤ d ξ ij ∈ y j , ξ ij , y L , y U are vectors in R m 29 / 160

  5. Solving the MSSC ◮ k-means ◮ heuristic (optimum not guaranteed) ◮ fast, well-known, lots of analyses ◮ scales reasonably well ◮ implemented in practically all languages ◮ convex MINLP ◮ exact (guaranteed global optima) ◮ reasonably fast only for small sizes ◮ scales exponentially ◮ Solvers: KNITRO (commercial), Bonmin (free) need an MP language interpreter (AMPL) 30 / 160

  6. Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 31 / 160

  7. Metric embeddings Mapping metric spaces into each other 32 / 160

  8. Graphs with shortest path metric E.g. mathematical genealogy skeleton 33 / 160

  9. The distance matrix (only for a subgraph) Euler Thibaut Pfaff Lagrange Laplace Möbius Gudermann Dirksen Gauss Kästner 10 1 1 9 8 2 2 2 2 Euler 11 9 1 3 10 12 12 8 Thibaut 2 10 10 3 1 1 3 Pfaff 8 8 1 3 3 1 Lagrange 2 9 11 11 7 Laplace 9 11 11 7 Möbius 4 4 2 Gudermann 2 4 Dirksen 4 0 10 1 1 9 8 2 2 2 2   10 0 11 9 1 3 10 12 12 8   1 11 0 2 10 10 3 1 1 3     1 9 2 0 8 8 1 3 3 1     9 1 10 8 0 2 9 11 11 7   D =   8 3 10 8 2 0 9 11 11 7     2 10 3 1 9 9 0 4 4 2     2 12 1 3 11 11 4 0 2 4     2 12 1 3 11 11 4 2 0 4   2 8 3 1 7 7 2 4 4 0 34 / 160

  10. Subsection 1 Fréchet embeddings in ℓ ∞ 35 / 160

  11. ℓ ∞ embeddings ◮ Given a metric space ( X, d ) with distance matrix D = ( d ij ) ◮ Consider i -th row δ i = ( d i 1 , . . . , d in ) of D ◮ Embed i ∈ X by vector δ i ∈ R n ◮ Define f ( X ) = { δ 1 , . . . , δ n } , f ( d ( i, j )) = � f ( i ) − f ( j ) � ∞ ◮ Thm.: ( f ( X ) , ℓ ∞ ) is a metric space with distance matrix D ◮ If ( X, d ) not a metric space, ∃ i, j ∈ X ( d ( i, j ) � = � δ i − δ j � ∞ ) ◮ Issue: embedding is high-dimensional ( R n ) [Kuratowski 1935] 36 / 160

  12. Proof ◮ Consider i, j ∈ X with distance d ( i, j ) = d ij ◮ Then f ( d ( i, j )) = � δ i − δ j � ∞ = max k ≤ n | d ik − d jk | ≤ max k ≤ n | d ij | = d ij ◮ max | d ik − d jk | over k ≤ n is achieved when k ∈ { i, j } ⇒ f ( d ( i, j )) = d ij 37 / 160

  13. Genealogical similarity example Fréchet embedding in the 3 most significant (rotated) dimensions 38 / 160

  14. Subsection 2 Approximate embeddings in ℓ 2 39 / 160

  15. Lower dimensional (approximate) embeddings ◮ Given distance matrix, find approximate Euclidean embedding ◮ Application: visualize a metric space e.g. embed genealogy tree in R 3 (some errors allowed) ◮ For visualization purposes, K ∈ { 1 , 2 , 3 } for other purposes, K < n Classical methods ◮ Multi-Dimensional Scaling (MDS) ◮ Principal Component Analysis (PCA) 40 / 160

  16. Classic Multidimensional Scaling 41 / 160

  17. Definition and history ◮ [I. Schoenberg, Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert” , Ann. Math., 1935] ◮ Question: Given n × n symmetric matrix D , what are necessary and sufficient conditions s.t. D is a EDM 1 corresponding to n points x 1 , . . . , x n ∈ R K with minimum K ? ◮ Main theorem: Thm. D = ( d ij ) is an EDM iff 1 2 ( d 2 1 i + d 2 1 j − d 2 ij | 2 ≤ i, j ≤ n ) is PSD of rank K ◮ PSD: positive semidefinite , all eigenvalues ≥ 0 1 Euclidean Distance Matrix 42 / 160

  18. Gram in function of EDM ◮ x = ( x 1 , . . . , x n ) ⊆ R K , written as n × K matrix ◮ matrix G = xx ⊤ = ( x i · x j ) is the Gram matrix of x ◮ Schoenberg’s theorem: relation between EDMs and Gram matrices G = − 1 2 JD 2 J ( § ) ◮ D 2 = ( d 2 ij ) , J = I n − 1 n 11 ⊤ 43 / 160

  19. Multidimensional scaling (MDS) ◮ Often get approximate EDMs ˜ D from raw data (dissimilarities, discrepancies, differences) ◮ ˜ 2 J ˜ G = − 1 D 2 J is an approximate Gram matrix ◮ Approximate Gram ⇒ spectral decomposition P ˜ Λ P ⊤ has ˜ Λ �≥ 0 ◮ Let Λ be closest PSD diagonal matrix to ˜ Λ : zero the negative components of ˜ Λ √ Λ is an “approximate realization” of ˜ ◮ x = P D 44 / 160

  20. Classic MDS: Main result 1. Prove Schoenberg’s theorem: G = − 1 2 JD 2 J 2. Prove matrix is Gram iff it is PSD 45 / 160

  21. Classic MDS: Proof 1/3 ◮ Assume zero centroid WLOG (can translate x as needed) ◮ Expand: d 2 ij = � x i − x j � 2 2 = ( x i − x j )( x i − x j ) = x i x i + x j x j − 2 x i x j ( ∗ ) ◮ Aim at “inverting” ( ∗ ) to express x i x j in function of d 2 ij ✿ 0 by zero centroid ◮ Sum ( ∗ ) over i : � i d 2 ij = � � i x i x i + nx j x j − 2 x j ✘✘ i x i ✘ ◮ Similarly for j and divide by n , get: 1 1 � d 2 � = x i x i + x j x j ( † ) ij n n i ≤ n i ≤ n 1 x i x i + 1 � d 2 � = x j x j ( ‡ ) ij n n j ≤ n j ≤ n ◮ Sum ( † ) over j , get: 1 ij = n 1 � d 2 � � � x i x i + x j x j = 2 x i x i n n i,j i j i ◮ Divide by n , get: 1 ij = 2 � d 2 � x i x i ( ∗∗ ) n 2 [Borg 2010] n i,j i 46 / 160

  22. Classic MDS: Proof 2/3 ◮ Rearrange ( ∗ ), ( † ), ( ‡ ) as follows: x i x i + x j x j − d 2 2 x i x j = (1) ij 1 ij − 1 � d 2 � x i x i = x j x j (2) n n j j 1 ij − 1 � d 2 � x j x j = x i x i (3) n n i i ◮ Replace LHS of Eq. (2)-(3) in Eq. (1), get 2 x i x j = 1 ik + 1 ij − 2 � d 2 n d 2 kj − d 2 � x k x k n n k k ◮ By ( ∗∗ ) replace 2 1 d 2 � � x i x i with ij , get n n 2 i i,j 2 x i x j = 1 ij − 1 � ( d 2 ik + d 2 kj ) − d 2 � d 2 ( § ) hk n 2 n k h,k which expresses x i x j in function of D 47 / 160

  23. Classic MDS: Proof 3/3 ◮ Gram ⊆ PSD ◮ x is an n × K real matrix ◮ G = xx ⊤ its Gram matrix ◮ For each y ∈ R n we have yGy ⊤ = y ( xx ⊤ ) y ⊤ = ( yx )( x ⊤ y ⊤ ) = ( yx )( yx ) ⊤ = � yx � 2 2 ≥ 0 ◮ ⇒ G � 0 ◮ PSD ⊆ Gram ◮ Let G � 0 be n × n ◮ Spectral decomposition : G = P Λ P ⊤ ( P orthogonal, Λ ≥ 0 diagonal) √ ◮ Λ ≥ 0 ⇒ Λ exists √ √ √ √ ⊤ P ⊤ ) = ( P ⊤ ◮ G = P Λ P ⊤ = ( P Λ)( Λ Λ)( P Λ) √ ◮ Let x = P Λ , then G is the Gram matrix of x 48 / 160

  24. Principal Component Analysis 49 / 160

  25. Principal Component Analysis (PCA) ◮ MDS with fixed K √ ◮ Motivation: “draw” x = P Λ in 2D or 3D but rank (Λ) = K > 3 ◮ Only keep 2 or 3 largest components of Λ zero the rest ◮ Get realization in desired space [Pearson 1901] 50 / 160

  26. Mathematicians genealogy in 2D/3D In 2D In 3D 51 / 160

  27. Outline Direct methods Reasoning Semidefinite Programming Relations, graphs, distances Diagonal Dominance Clustering Barvinok’s naive algorithm Clustering in graphs Isomap for the DGP Clustering in Euclidean spaces Distance resolution limit Metric embeddings When to start worrying Fréchet embeddings in ℓ ∞ Embeddings in ℓ 2 Random projections Classic MDS More efficient clustering PCA Random projections in LP Distance Geometry Projecting feasibility Projecting optimality DGP applications Solution retrieval Complexity of the DGP Quantile regression Number of solutions Solution methods The end 52 / 160

  28. Distance Geometry Embedding weighted graphs in Euclidean spaces 53 / 160

  29. Distance Geometry Problem (DGP) Given K ∈ N and G = ( V, E, d ) with d : E → R + , find x : V → R K s.t. � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij Given a weighted graph , draw it so edges are drawn as segments with lengths = weights [Cayley 1841, Menger 1928, Schoenberg 1935, Yemini 1978] 54 / 160

  30. Subsection 1 DGP applications 55 / 160

  31. Some applications ◮ clock synchronization ( K = 1 ) ◮ sensor network localization ( K = 2 ) ◮ molecular structure from distance data ( K = 3 ) ◮ autonomous underwater vehicles ( K = 3 ) ◮ distance matrix completion (whatever K ) 56 / 160

  32. Clock synchronization From [Singer, Appl. Comput. Harmon. Anal. 2011] Determine a set of unknown timestamps from a partial measurements of their time differences ◮ K = 1 ◮ V : timestamps ◮ { u, v } ∈ E if known time difference between u, v ◮ d : values of the time differences Used in time synchronization of distributed networks 57 / 160

  33. Clock synchronization 58 / 160

  34. Sensor network localization From [Yemini, Proc. CDSN , 1978] The positioning problem arises when it is necessary to locate a set of geographically distributed objects using measurements of the distances between some object pairs ◮ K = 2 ◮ V : (mobile) sensors ◮ { u, v } ∈ E iff distance between u, v is measured ◮ d : distance values Used whenever GPS not viable (e.g. underwater) d uv ∝ ∼ battery consumption in P2P communication betw. u, v 59 / 160

  35. Sensor network localization 60 / 160

  36. Molecular structure from distance data From [L. et al., SIAM Rev. , 2014] ◮ K = 3 ◮ V : atoms ◮ { u, v } ∈ E iff distance between u, v is known ◮ d : distance values Used whenever X-ray crystallography does not apply (e.g. liquid) Covalent bond lengths and angles known precisely Distances � 5 . 5 measured approximately by NMR 61 / 160

  37. Subsection 2 Complexity of the DGP 62 / 160

  38. Complexity class ◮ DGP 1 with d : E → Q + is in NP ◮ if instance YES ∃ realization x ∈ R n × 1 ◮ if some component x i �∈ Q translate x so x i ∈ Q ◮ consider some other x j � ◮ let ℓ = ( length sh. path p : i → j ) = d uv ∈ Q { u,v }∈ p ◮ then x j = x i ± ℓ → x j ∈ Q ◮ ⇒ verification of ∀{ i, j } ∈ E | x i − x j | = d ij in polytime ◮ DGP K may not be in NP for K > 1 don’t know how to verify � x i − x j � 2 = d ij for x �∈ Q nK 63 / 160

  39. Hardness ◮ Want to show DGP 1 is NP -hard by reduction from Partition Given a = ( a 1 , . . . , a n ) ∈ N n , ∃ I ⊆ { 1 , . . . , n } s.t. � a i = � a i ? i ∈ I i �∈ I ◮ a − → cycle C V ( C ) = { 1 , . . . , n } , E ( C ) = {{ 1 , 2 } , . . . , { n, 1 }} ◮ For i < n let d i,i +1 = a i d n,n +1 = d n 1 = a n ◮ E.g. for a = (1 , 4 , 1 , 3 , 3) , get cycle graph: [Saxe, 1979] 64 / 160

  40. Partition is YES ⇒ DGP 1 is YES (1/2) ◮ Given: I ⊂ { 1 , . . . , n } s.t. � a i = � a i i ∈ I i �∈ I ◮ Construct: realization x of C in R 1. x 1 = 0 // start 2. induction step: suppose x i known if i ∈ I let x i +1 = x i + d i,i +1 // go right else let x i +1 = x i − d i,i +1 // go left ◮ Correctness proof: by the same induction but careful when i = n : have to show x n +1 = x 1 65 / 160

  41. Partition is YES ⇒ DGP 1 is YES (2/2) � � (1) = ( x i +1 − x i ) = d i,i +1 = i ∈ I i ∈ I � � = a i = a i = i ∈ I i �∈ I � � = d i,i +1 = ( x i − x i +1 ) = (2) i �∈ I i �∈ I � � � (1) = (2) ⇒ ( x i +1 − x i ) = ( x i − x i +1 ) ⇒ ( x i +1 − x i ) = 0 i ∈ I i �∈ I i ≤ n ⇒ ( x n +1 − x n ) + ( x n − x n − 1 ) + · · · + ( x 3 − x 2 ) + ( x 2 − x 1 ) = 0 ⇒ x n +1 = x 1 66 / 160

  42. Partition is NO ⇒ DGP 1 is NO ◮ By contradiction: suppose DGP 1 is YES, x realization of C ◮ F = {{ u, v } ∈ E ( C ) | x u ≤ x v } , E ( C ) � F = {{ u, v } ∈ E ( C ) | x u > x v } ◮ Trace x 1 , . . . , x n : follow edges in F ( → ) and in E ( C ) � F ( ← ) � � ( x v − x u ) = ( x u − x v ) { u,v }∈ F { u,v }�∈ F � � | x u − x v | = | x u − x v | { u,v }∈ F { u,v }�∈ F � � d uv = d uv { u,v }∈ F { u,v }�∈ F ◮ Let J = { i < n | { i, i + 1 } ∈ F } ∪ { n | { n, 1 } ∈ F } � � ⇒ a i = a i i ∈ J i �∈ J ◮ So J solves Partition instance, contradiction ◮ ⇒ DGP is NP-hard; since ∈ NP , DGP 1 is also NP-complete 67 / 160

  43. Subsection 3 Number of solutions 68 / 160

  44. Number of solutions ◮ ( G, K ) : DGP instance ˜ X ⊆ R Kn : set of solutions ◮ ◮ Congruence : composition of translations, rotations, reflections ◮ C = set of congruences in R K ◮ x ∼ y means ∃ ρ ∈ C ( y = ρx ) : distances in x are preserved in y through ρ ◮ ⇒ if | ˜ X | > 0 , | ˜ X | = 2 ℵ 0 69 / 160

  45. Number of solutions modulo congruences ◮ Congruence is an equivalence relation ∼ on ˜ X (reflexive, symmetric, transitive) ◮ Partitions ˜ X into equivalence classes ◮ X = ˜ X/ ∼ : sets of representatives of equivalence classes ◮ Focus on | X | rather than | ˜ X | 70 / 160

  46. Examples 71 / 160

  47. Rigidity, flexibility and | X | ◮ infeasible ⇔ | X | = 0 ◮ rigid graph ⇔ | X | < ℵ 0 ◮ globally rigid graph ⇔ | X | = 1 ◮ flexible graph ⇔ | X | = 2 ℵ 0 ◮ DMDGP graphs ⇔ | X | a power of 2 ◮ | X | = ℵ 0 : impossible by Milnor’s theorem [Milnor 1964, L. et al. 2013] 72 / 160

  48. Milnor’s theorem implies | X | � = ℵ 0 ◮ System S of polynomial equations of degree 2 ∀ i ≤ m p i ( x 1 , . . . , x nK ) = 0 ◮ Let X be the set of x ∈ R nK satisfying S ◮ Number of connected components of X is O (3 nK ) [Milnor 1964] ◮ If | X | is countably ∞ then G cannot be flexible ⇒ incongruent elts of X are separate connected components ⇒ by Milnor’s theorem, there’s finitely many of them 73 / 160

  49. Subsection 4 MP based solution methods 74 / 160

  50. Direct methods 75 / 160

  51. System of quadratic equations � x u − x v � 2 = d 2 ∀{ u, v } ∈ E (4) uv Computationally: useless (less than 10 vertices with K = 3 using Octave) 76 / 160

  52. Unconstrained Global Optimization � ( � x u − x v � 2 − d 2 uv ) 2 min (5) x { u,v }∈ E Globally optimal obj. fun. value of (5) is 0 iff x solves (4) Computational experiments in [L. et al., 2006] : ◮ GO solvers from 10 years ago ◮ randomly generated protein data: ≤ 50 atoms ◮ cubic crystallographic grids: ≤ 64 atoms 77 / 160

  53. Constrained global optimization � |� x u − x v � 2 − d 2 ◮ min x uv | exactly reformulates (4) { u,v }∈ E ◮ Relax objective f to concave part, remove constant term, rewrite min − f as max f ◮ Reformulate convex part of obj. fun. to convex constraints ◮ Exact reformulation � � � x u − x v � 2 max x (6) { u,v }∈ E � x u − x v � 2 ≤ d 2 ∀{ u, v } ∈ E uv Theorem (Activity) At a glob. opt. x ∗ of a YES instance, all constraints of (6) are active [Mencarelli et al. 2017] 78 / 160

  54. Semidefinite Programming 79 / 160

  55. Linearization � x i − x j � 2 2 = d 2 ∀{ i, j } ∈ E ij � x i � 2 2 + � x j � 2 2 − 2 x i · x j = d 2 ⇒ ∀{ i, j } ∈ E ij � ∀{ i, j } ∈ E d 2 X ii + X jj − 2 X ij = ij ⇒ x x ⊤ X = X = x x ⊤ ⇔ ∀ i, j X ij = x i x j 80 / 160

  56. Relaxation x x ⊤ X = X − x x ⊤ ⇒ = 0 X − x x ⊤ (relax) ⇒ � 0 � I K � x ⊤ � Schur ( X, x ) = 0 x X If x does not appear elsewhere ⇒ get rid of it (e.g. choose x = 0 ): replace Schur ( X, x ) � 0 by X � 0 Reason for this weird relaxation: there are efficient solvers for Semidefinite Programming (SDP) 81 / 160

  57. SDP relaxation min F • X d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X � 0 How do we choose F ? 82 / 160

  58. Some possible objective functions ◮ For protein conformation: � max ( X ii + X jj − 2 X ij ) { i,j }∈ E with = changed to ≤ in constraints (or min and ≥ ) [Dias & L. 2016] “push-and-pull” the realization ◮ [Ye, 2003], application to wireless sensors localization min tr ( X ) improve covariance estimator accuracy ◮ How about “just random”? [Barvinok 1995] 83 / 160

  59. Computational evaluation ◮ Download protein files from Protein Data Bank (PDB) they contain atom realizations ◮ Mimick a Nuclear Magnetic Resonance experiment Keep only pairwise distances < 5 . 5 ◮ Try and reconstruct the protein shape from those weighted graphs ◮ Quality evaluation of results: ◮ LDE ( x ) = max { i,j }∈ E | � x i − x j � − d ij | � 1 ◮ MDE ( x ) = | � x i − x j � − d ij | | E | { i,j }∈ E 84 / 160

  60. Objective function tests SDP solved with Mosek SDP + PCA Instance LDE MDE CPU Name | V | | E | PP Ye Rnd PP Ye Rnd PP Ye Rnd C0700odd.1 15 39 3.31 4.57 4.44 1.92 2.52 2.50 0.13 0.07 0.08 C0700odd.C 36 242 10.61 4.85 4.85 3.02 3.02 3.02 0.69 0.43 0.44 C0700.odd.G 36 308 4.57 4.77 4.77 2.41 2.84 2.84 0.86 0.54 0.54 C0150alter.1 37 335 4.66 4.88 4.86 2.52 3.00 3.00 0.97 0.59 0.58 C0080create.1 60 681 7.17 4.86 4.86 3.08 3.19 3.19 2.48 1.46 1.46 tiny 37 335 4.66 4.88 4.88 2.52 3.00 3.00 0.97 0.60 0.60 1guu-1 150 959 10.20 4.93 4.93 3.43 3.43 3.43 9.23 5.68 5.70 SDP + PCA + NLP Instance LDE MDE CPU | V | | E | Name PP Ye Rnd PP Ye Rnd PP Ye Rnd 1b03 89 456 0.00 0.00 0.00 0.00 0.00 0.00 8.69 6.28 9.91 1crn 138 846 0.81 0.81 0.81 0.07 0.07 0.07 33.33 31.32 44.48 1guu-1 150 959 0.97 4.93 0.92 0.10 3.43 0.08 56.45 7.89 65.33 [Dias & L., 2016] 85 / 160

  61. Empirical considerations ◮ Ye very fast but often imprecise ◮ Random good but nondeterministic ◮ Push-and-Pull relaxes X ii + X jj − 2 X ij = d 2 ij to X ii + X jj − 2 X ij ≥ d 2 ij , easier to satisfy feasibility ...will be useful in DDP later on Focus on Push-and-Pull objective 86 / 160

  62. Diagonally Dominant Programming 87 / 160

  63. When SDP solvers hit their size limit ◮ SDP solver: technological bottleneck ◮ How can we best use an LP solver? ◮ Diagonally Dominant (DD) matrices are PSD ◮ Not vice versa : inner approximate PSD cone Y � 0 ◮ Idea by A.A. Ahmadi and co-authors [Ahmadi & Majumdar 2014, Ahmadi & Hall 2015] 88 / 160

  64. Diagonally dominant matrices n × n matrix X is DD if � ∀ i ≤ n X ii ≥ | X ij | . j � = i 1 0 . 1 − 0 . 2 0 0 . 04 0   0 . 1 1 − 0 . 05 0 . 1 0 0   E.g. − 0 . 2 − 0 . 05 1 0 . 1 0 . 01 0     0 0 . 1 0 . 1 1 0 . 2 0 . 3     0 . 04 0 0 . 01 0 . 2 1 − 0 . 3   0 0 0 0 . 3 − 0 . 3 1 89 / 160

  65. DD Linearization � ∀ i ≤ n X ii ≥ | X ij | ( ∗ ) j � = i ◮ introduce “sandwiching” variable T ◮ write | X | as T ◮ add constraints − T ≤ X ≤ T ◮ by ≥ constraint sense, write ( ∗ ) as � X ii ≥ T ij j � = i 90 / 160

  66. DD Programming (DDP) � d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij X is DD  d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij = ij  �   ∀ i ≤ n T ij ≤ X ii ⇒ j ≤ n j � = i    − T ≤ X ≤ T 91 / 160

  67. DDP formulation for the DGP �  min ( X ii + X jj − 2 X ij )   { i,j }∈ E    d 2 ∀{ i, j } ∈ E X ii + X jj − 2 X ij ≥   ij  � ∀ i ≤ n ≤ T ij X ii j ≤ n   j � = i   − T ≤ X ≤ T     ≥ T 0 [Dias & L., 2016] 92 / 160

  68. SDP vs. DDP: tests Using “push-and-pull” objective in SDP SDP solved with Mosek, DDP with CPLEX SDP/DDP + PCA SDP DDP Instance LDE MDE CPU modl/soln LDE MDE CPU modl/soln C0700odd.1 0.79 0.34 0.06/0.12 0.38 0.30 0.15/0.15 C0700.odd.G 2.38 0.89 0.57/1.16 1.86 0.58 1.11/0.95 C0150alter.1 1.48 0.45 0.73/1.33 1.54 0.55 1.23/1.04 C0080create.1 2.49 0.82 1.63/7.86 0.98 0.67 3.39/4.07 0.50 0.15 6.67/684.89 1.00 0.85 37.74/153.17 1guu-1 93 / 160

  69. Subsection 5 Barvinok’s naive algorithm 94 / 160

  70. Concentration of measure From [Barvinok, 1997] The value of a “well behaved” function at a random point of a “big” probability space X is “very close” to the mean value of the function. and In a sense, measure concentration can be considered as an extension of the law of large numbers. 95 / 160

  71. Concentration of measure Given Lipschitz function f : X → R s.t. ∀ x, y ∈ X | f ( x ) − f ( y ) | ≤ L � x − y � 2 for some L ≥ 0 , there is concentration of measure if ∃ constants c, C s.t. P x ( | f ( x ) − E ( f ) | > ε ) ≤ c e − Cε 2 /L 2 ∀ ε > 0 ≡ “discrepancy from mean is unlikely” 96 / 160

  72. Barvinok’s theorem Consider: ◮ for each k ≤ m , manifolds X k = { x ∈ R n | x ⊤ Q k x = a k } ◮ a feasibility problem x ∈ � X k k ≤ m ◮ its SDP relaxation ∀ x ≤ m ( Q k • X = a k ) with soln. ¯ X X ) , y ∼ N n (0 , 1) and x ′ = Ty Let T = factor ( ¯ Then ∃ c and n 0 ∈ N s.t. if n ≥ n 0 , � � � � ¯ ∀ k ≤ m dist ( x ′ , X k ) ≤ c Prob X � 2 ln n ≥ 0 . 9 . IDEA: since x ′ is “close” to each X k try local Nonlinear Programming (NLP) 97 / 160

  73. Application to the DGP X ij = { x ∈ R nK | � x i − x j � 2 ◮ ∀{ i, j } ∈ E 2 = d 2 ij } � ◮ DGP can be written as X ij { i,j }∈ E ◮ SDP relaxation X ii + X jj − 2 X ij = d 2 ij ∧ X � 0 with soln. ¯ X ◮ Difference with Barvinok: x ∈ R Kn , rk ( ¯ X ) ≤ K ◮ IDEA: sample y ∼ N nK (0 , 1 K ) √ ◮ Thm. Barvinok’s theorem works in rank K [L. & Vu, unpublished] 98 / 160

  74. The heuristic 1. Solve SDP relaxation of DGP, get soln. ¯ X use DDP+LP if SDP+IPM too slow 2. a. T = factor ( ¯ X ) b. y ∼ N nK (0 , 1 K ) √ c. x ′ = Ty 3. Use x ′ as starting point for a local NLP solver on formulation � � x i − x j � 2 − d 2 � � 2 min ij x { i,j }∈ E and return improved solution x [Dias & L., 2016] 99 / 160

  75. SDP+Barvinok vs. DDP+Barvinok SDP DDP Instance LDE MDE CPU LDE MDE CPU C0700odd.1 0.00 0.00 0.63 0.00 0.00 1.49 C0700.odd.G 0.00 0.00 21.67 0.42 0.01 30.51 C0150alter.1 0.00 0.00 29.30 0.00 0.00 34.13 C0080create.1 0.00 0.00 139.52 0.00 0.00 141.49 1b03 0.18 0.01 132.16 0.38 0.05 101.04 0.78 0.02 800.67 0.76 0.04 522.60 1crn 1guu-1 0.79 0.01 1900.48 0.90 0.04 667.03 Most of the CPU time taken by local NLP solver 100 / 160

Recommend


More recommend