Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong - PowerPoint PPT Presentation

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong Wu Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue → Duke) Workshop on Local Algorithms, MIT, June 13, 2018

Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / Yihong Wu (Yale) Recovery Threshold for TSP LP 2

Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / • Goal: observe W , recover C ∗ with high probability Yihong Wu (Yale) Recovery Threshold for TSP LP 2

Mathematical problem: Hidden Hamiltonian cycle model • Observe: a weighted undirected complete graph on n vertices with weighted adjacency matrix W • Latent: a Hamiltonian cycle C ∗ • Edge weight � e ∈ C ∗ P ind. ∼ W e ∈ C ∗ Q e / • Goal: observe W , recover C ∗ with high probability Remarks: • P, Q depends on the graph size n • For this talk, Q = N (0 , 1) and P = N ( µ, 1) , so that W = µ · adj matrix of C ∗ + noise � �� “signal” • Hidden Hamiltonian cycle planted in Erd¨ os-R´ enyi graph [Broder-Frieze-Shamir ’94] Yihong Wu (Yale) Recovery Threshold for TSP LP 2

Link information in Chicago datasets 1 Reconstitute chromatin in vitro upon naked DNA 2 Produce cross-links by fixing chromatin with formaldehyde Chicago datasets generate cross-links among contigs [Putnam et al. ’16 ] On average more cross-links exist between adjacent contigs Yihong Wu (Yale) Recovery Threshold for TSP LP 3

Ordering DNA contigs with Chicago cross-links DNA Scaffolding Yihong Wu (Yale) Recovery Threshold for TSP LP 4

Ordering DNA contigs with Chicago cross-links DNA Scaffolding Reduces to traveling salesman problem (TSP) Find a path (tour) that visits every contig exactly once with the maximum number of cross-links Yihong Wu (Yale) Recovery Threshold for TSP LP 4

Key challenges for DNA scaffolding with Chicago data • Computational: TSP is NP-hard in the worst-case • Statistical: spurious cross-links between contigs that are far apart Yihong Wu (Yale) Recovery Threshold for TSP LP 5

Key challenges for DNA scaffolding with Chicago data • Computational: TSP is NP-hard in the worst-case • Statistical: spurious cross-links between contigs that are far apart Key questions: • How to efficiently order hundreds of thousands of contigs? • How much noise can be tolerated for accurate DNA scaffolding? Yihong Wu (Yale) Recovery Threshold for TSP LP 5

Mathematical model for DNA scaffolding 60 20 40 50 60 40 80 100 30 120 140 20 160 10 180 200 0 50 100 150 200 Chicago dataset [Putnam et al. ’16] Yihong Wu (Yale) Recovery Threshold for TSP LP 6

Mathematical model for DNA scaffolding 40 60 20 20 35 40 40 50 30 60 60 40 80 80 25 100 100 20 30 120 120 15 140 20 140 10 160 160 10 5 180 180 200 0 200 0 50 100 150 200 50 100 150 200 Chicago dataset [Putnam et al. ’16] Simulated Poisson data Yihong Wu (Yale) Recovery Threshold for TSP LP 6

What is known information-theoretically Maximum likelihood estimator reduces to TSP � X TSP = arg max � L, X � X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix L ij = log dP dQ ( W ij ) . For Gaussian or Poisson, simply take L = W . Yihong Wu (Yale) Recovery Threshold for TSP LP 7

What is known information-theoretically Maximum likelihood estimator reduces to TSP � X TSP = arg max � L, X � X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix L ij = log dP dQ ( W ij ) . For Gaussian or Poisson, simply take L = W . Theorem (Sharp threshold) If µ 2 < 4 log n , exact recovery is information-theoretically impossible If µ 2 > 4 log n , MLE succeeds in exact recovery Yihong Wu (Yale) Recovery Threshold for TSP LP 7

What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) Yihong Wu (Yale) Recovery Threshold for TSP LP 8

What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n • Greedy merging [Motahari-Bresler-Tse ’13] : ◮ µ > √ 6 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

What is known algorithmically • Spectral methods fails miserably: ◮ µ ≫ n 2 . 5 (spectral gap of cycle is too small) • Thresholding: ◮ µ > √ 8 log n • Greedy merging [Motahari-Bresler-Tse ’13] : ◮ µ > √ 6 log n • This talk: linear programming achieves sharp threshold µ 2 log n > 4 : LP succeeds µ 2 log n < 4 : Everything fails Yihong Wu (Yale) Recovery Threshold for TSP LP 8

In general Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q : � 1 ( dP ) ρ ( dQ ) 1 − ρ . D ρ ( P � Q ) � ρ − 1 log • LP works when D 1 / 2 ( P � Q ) − log n → ∞ optimal under mild assumptions Yihong Wu (Yale) Recovery Threshold for TSP LP 9

In general Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q : � 1 ( dP ) ρ ( dQ ) 1 − ρ . D ρ ( P � Q ) � ρ − 1 log • LP works when D 1 / 2 ( P � Q ) − log n → ∞ optimal under mild assumptions • Thresholding works when D 1 / 2 ( P � Q ) − 2 log n → ∞ • Greedy works when D 1 / 3 ( Q � P ) − log n → ∞ Yihong Wu (Yale) Recovery Threshold for TSP LP 9

Convex relaxations of TSP

Integer Linear Programming reformulation of TSP � X TSP = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ { 0 , 1 } � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I Yihong Wu (Yale) Recovery Threshold for TSP LP 11

Integer Linear Programming reformulation of TSP � X TSP = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ { 0 , 1 } � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I • The last constraint: subtour elimination Yihong Wu (Yale) Recovery Threshold for TSP LP 11

Subtour LP � X SUB = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I Yihong Wu (Yale) Recovery Threshold for TSP LP 12

Subtour LP � X SUB = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] � X ij ≥ 2 , ∀∅ � = I ⊂ [ n ] i ∈ I,j / ∈ I • Replacing the integrality constraint with box constraint: SUBTOUR LP relaxation [Dantzig-Fulkerson-Johnson ’54, Held-Karp ’70] • Exponentially many linear constraints, nevertheless solvable using interior point method Yihong Wu (Yale) Recovery Threshold for TSP LP 12

F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP Yihong Wu (Yale) Recovery Threshold for TSP LP 13

F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van Zuylen ’14] F2F ≤ 4 2F ◮ The integrality gap 3 for metric TSP (min formulation) Yihong Wu (Yale) Recovery Threshold for TSP LP 13

F2F LP � X F2F = arg max � W, X � X � s.t. X ij = 2 , ∀ i j X ij ∈ [0 , 1] • Further dropping subtour elimination constraints = ⇒ Fractional 2 -factor (F2F) LP • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van Zuylen ’14] F2F ≤ 4 2F ◮ The integrality gap 3 for metric TSP (min formulation) • What is the integrality gap whp in our random instance? Yihong Wu (Yale) Recovery Threshold for TSP LP 13

Optimality of Fractional 2 -Factor LP Theorem If µ 2 − 4 log n → ∞ , then � X F2F = X ∗ with high probability. Yihong Wu (Yale) Recovery Threshold for TSP LP 14

Optimality of Fractional 2 -Factor LP Theorem If µ 2 − 4 log n → ∞ , then � X F2F = X ∗ with high probability. Remarks • The integrality gap is 1 whp! • Achieving the IT-limit µ 2 = 4 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 14

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong - PowerPoint PPT Presentation

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong Wu Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue Duke)

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Generalized Hamiltonian Cycles Jakub Teska School of ITMS University of Ballarat, VIC 3353,

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Hamiltonian engineering for many-body quantum systems by Shortcuts To Adiabaticity Kazutaka

Hamiltonian systems Marc R. Roussel October 31, 2019 Marc R. Roussel Hamiltonian systems

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Uniquely Hamiltonian Graphs Benedikt Klocker Algorithms and Complexity Group Institute of

Is the ozone layer Is the ozone layer recovering ? recovering ? Johannes Staehelin Institute

Recovering Minerals and Bitumen Recovering Minerals and Bitumen from Oil Sands Tailings from Oil

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Ruijsenaars-Schneider system from reduction Quasi- quasi-Hamiltonian reduction Hamiltonian

On cubic 4-ordered graphs and cubic 4-ordered Hamiltonian graphs Hamiltonian graphs Lih-Hsing

Community breakout session A community riot 1 / 17 Goals Review the current community channels

Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao Yale University @Leiden,

Medicaid and CHIP: Pathways to Coverage and Covered Services June 21, 2012 12 noon 1 p.m.

Openness of W3C Working Groups Paul Cotton Microsoft, WS-Policy WG co-chair W3C Process (in a

AV1: Nits, Nitpicks and Shortcomings [Things we should fix for AV2] Nathan Egge

A Multiagent System Approach to Schedule Devices in Smart Homes William Yeoh Enrico Pontelli

Reimagining Institutional Models for Online Program Development and Support Jason Rhode, Ph.D.

Biometrics Engineering & Public Policy Rebecca Balebako October 9, 2014 y & c S

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong - PowerPoint PPT Presentation

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong Wu Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue Duke)

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Generalized Hamiltonian Cycles Jakub Teska School of ITMS University of Ballarat, VIC 3353,

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Hamiltonian engineering for many-body quantum systems by Shortcuts To Adiabaticity Kazutaka

Hamiltonian systems Marc R. Roussel October 31, 2019 Marc R. Roussel Hamiltonian systems

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Uniquely Hamiltonian Graphs Benedikt Klocker Algorithms and Complexity Group Institute of

Is the ozone layer Is the ozone layer recovering ? recovering ? Johannes Staehelin Institute

Recovering Minerals and Bitumen Recovering Minerals and Bitumen from Oil Sands Tailings from Oil

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Ruijsenaars-Schneider system from reduction Quasi- quasi-Hamiltonian reduction Hamiltonian

On cubic 4-ordered graphs and cubic 4-ordered Hamiltonian graphs Hamiltonian graphs Lih-Hsing

Community breakout session A community riot 1 / 17 Goals Review the current community channels

Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao Yale University @Leiden,

Medicaid and CHIP: Pathways to Coverage and Covered Services June 21, 2012 12 noon 1 p.m.

Openness of W3C Working Groups Paul Cotton Microsoft, WS-Policy WG co-chair W3C Process (in a

AV1: Nits, Nitpicks and Shortcomings [Things we should fix for AV2] Nathan Egge

A Multiagent System Approach to Schedule Devices in Smart Homes William Yeoh Enrico Pontelli

Reimagining Institutional Models for Online Program Development and Support Jason Rhode, Ph.D.

Biometrics Engineering &amp; Public Policy Rebecca Balebako October 9, 2014 y &amp; c S

Biometrics Engineering & Public Policy Rebecca Balebako October 9, 2014 y & c S