Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong Wu Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue → Duke) Workshop on Local Algorithms, MIT, June 13, 2018
Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / Yihong Wu (Yale) Recovery Threshold for TSP LP 2
Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / • Goal: observe W , recover C ∗ with high probability Yihong Wu (Yale) Recovery Threshold for TSP LP 2
Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / • Goal: observe W , recover C ∗ with high probability Remarks: • P, Q depends on the graph size n • For this talk, Q = N (0 , 1) and P = N ( µ, 1) , so that W = µ · adj matrix of C ∗ + noise � �� � “signal” • Hidden Hamiltonian cycle planted in Erd¨ os-R´ enyi graph [Broder-Frieze-Shamir ’94] Yihong Wu (Yale) Recovery Threshold for TSP LP 2
Link information in Chicago datasets 1 Reconstitute chromatin in vitro upon naked DNA 2 Produce cross-links by fixing chromatin with formaldehyde Chicago datasets generate cross-links among contigs [Putnam et al. ’16 ] On average more cross-links exist between adjacent contigs Yihong Wu (Yale) Recovery Threshold for TSP LP 3
Ordering DNA contigs with Chicago cross-links DNA Scaffolding Yihong Wu (Yale) Recovery Threshold for TSP LP 4
Ordering DNA contigs with Chicago cross-links DNA Scaffolding Reduces to traveling salesman problem (TSP) Find a path (tour) that visits every contig exactly once with the maximum number of cross-links Yihong Wu (Yale) Recovery Threshold for TSP LP 4
Key challenges for DNA scaffolding with Chicago data • Computational: TSP is NP-hard in the worst-case • Statistical: spurious cross-links between contigs that are far apart Yihong Wu (Yale) Recovery Threshold for TSP LP 5
Key challenges for DNA scaffolding with Chicago data • Computational: TSP is NP-hard in the worst-case • Statistical: spurious cross-links between contigs that are far apart Key questions: • How to efficiently order hundreds of thousands of contigs? • How much noise can be tolerated for accurate DNA scaffolding? Yihong Wu (Yale) Recovery Threshold for TSP LP 5
Mathematical model for DNA scaffolding 60 20 40 50 60 40 80 100 30 120 140 20 160 10 180 200 0 50 100 150 200 Chicago dataset [Putnam et al. ’16] Yihong Wu (Yale) Recovery Threshold for TSP LP 6
Mathematical model for DNA scaffolding 60 20 40 50 60 40 80 100 30 120 140 20 160 10 180 200 0 50 100 150 200 Chicago dataset [Putnam et al. ’16] Yihong Wu (Yale) Recovery Threshold for TSP LP 6
Mathematical model for DNA scaffolding 40 60 20 20 35 40 40 50 30 60 60 40 80 80 25 100 100 20 30 120 120 15 140 20 140 10 160 160 10 5 180 180 200 0 200 0 50 100 150 200 50 100 150 200 Chicago dataset [Putnam et al. ’16] Simulated Poisson data Yihong Wu (Yale) Recovery Threshold for TSP LP 6
Mathematical model for DNA scaffolding 40 60 20 20 35 40 40 50 30 60 60 40 80 80 25 100 100 20 30 120 120 15 140 20 140 10 160 160 10 5 180 180 200 0 200 0 50 100 150 200 50 100 150 200 Chicago dataset [Putnam et al. ’16] Simulated Poisson data Yihong Wu (Yale) Recovery Threshold for TSP LP 6
What is known information-theoretically Maximum likelihood estimator reduces to TSP � X TSP = arg max � L, X � X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix L ij = log dP dQ ( W ij ) . For Gaussian or Poisson, simply take L = W . Yihong Wu (Yale) Recovery Threshold for TSP LP 7
What is known information-theoretically Maximum likelihood estimator reduces to TSP � X TSP = arg max � L, X � X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix L ij = log dP dQ ( W ij ) . For Gaussian or Poisson, simply take L = W . Theorem (Sharp threshold) If µ 2 < 4 log n , exact recovery is information-theoretically impossible If µ 2 > 4 log n , MLE succeeds in exact recovery Yihong Wu (Yale) Recovery Threshold for TSP LP 7
What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) Yihong Wu (Yale) Recovery Threshold for TSP LP 8
What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8
What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n • Greedy merging [Motahari-Bresler-Tse ’13] : ◮ µ > √ 6 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8
What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n • Greedy merging [Motahari-Bresler-Tse ’13] : ◮ µ > √ 6 log n • This talk: linear programming achieves sharp threshold µ 2 log n > 4 : LP succeeds µ 2 log n < 4 : Everything fails Yihong Wu (Yale) Recovery Threshold for TSP LP 8
In general Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q : � 1 ( dP ) ρ ( dQ ) 1 − ρ . D ρ ( P � Q ) � ρ − 1 log • LP works when D 1 / 2 ( P � Q ) − log n → ∞ optimal under mild assumptions Yihong Wu (Yale) Recovery Threshold for TSP LP 9
In general Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q : � 1 ( dP ) ρ ( dQ ) 1 − ρ . D ρ ( P � Q ) � ρ − 1 log • LP works when D 1 / 2 ( P � Q ) − log n → ∞ optimal under mild assumptions • Thresholding works when D 1 / 2 ( P � Q ) − 2 log n → ∞ • Greedy works when D 1 / 3 ( Q � P ) − log n → ∞ Yihong Wu (Yale) Recovery Threshold for TSP LP 9
Convex relaxations of TSP
Integer Linear Programming reformulation of TSP � X TSP = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ { 0 , 1 } � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I Yihong Wu (Yale) Recovery Threshold for TSP LP 11
Integer Linear Programming reformulation of TSP � X TSP = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ { 0 , 1 } � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I • The last constraint: subtour elimination Yihong Wu (Yale) Recovery Threshold for TSP LP 11
Subtour LP � X SUB = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I Yihong Wu (Yale) Recovery Threshold for TSP LP 12
Subtour LP � X SUB = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I • Replacing the integrality constraint with box constraint: SUBTOUR LP relaxation [Dantzig-Fulkerson-Johnson ’54, Held-Karp ’70] • Exponentially many linear constraints, nevertheless solvable using interior point method Yihong Wu (Yale) Recovery Threshold for TSP LP 12
F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP Yihong Wu (Yale) Recovery Threshold for TSP LP 13
F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van Zuylen ’14] F2F ≤ 4 2F ◮ The integrality gap 3 for metric TSP (min formulation) Yihong Wu (Yale) Recovery Threshold for TSP LP 13
F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van Zuylen ’14] F2F ≤ 4 2F ◮ The integrality gap 3 for metric TSP (min formulation) • What is the integrality gap whp in our random instance? Yihong Wu (Yale) Recovery Threshold for TSP LP 13
Optimality of Fractional 2 -Factor LP Theorem If µ 2 − 4 log n → ∞ , then � X F2F = X ∗ with high probability. Yihong Wu (Yale) Recovery Threshold for TSP LP 14
Optimality of Fractional 2 -Factor LP Theorem If µ 2 − 4 log n → ∞ , then � X F2F = X ∗ with high probability. Remarks • The integrality gap is 1 whp! • Achieving the IT-limit µ 2 = 4 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 14
Recommend
More recommend