Multiplicative Weights Method: Basic Version • Ay ≤ b Ay ≤ b y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0
Multiplicative Weights Method: Basic Version • • Ay ≤ b Ay ≤ b y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0
Multiplicative Weights Method: Basic Version • • Ay ≤ b Ay ≤ b y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0
Multiplicative Weights Method: Basic Version • • Ay ≤ b Ay ≤ b y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 •
Multiplicative Weights Method: Basic Version • • Ay ≤ b Ay ≤ b • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 •
Multiplicative Weights Method: Basic Version • • Ay ≤ b Ay ≤ b • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 • •
Multiplicative Weights Method: Basic Version • • • Ay ≤ b Ay ≤ b • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 • •
Multiplicative Weights Method: Basic Version • • • Ay ≤ b Ay ≤ b • • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 • •
Multiplicative Weights Method: Basic Version • Ay ≤ (1 + 3 ǫ ) b y ≥ 0 • • Ay ≤ b Ay ≤ b • • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 • •
Multiplicative Weights Method: Basic Version Number of rounds depends on ρ, ǫ and other specifics of updating u . ρ = width . • Ay ≤ (1 + 3 ǫ ) b y ≥ 0 • • Ay ≤ b Ay ≤ b • • y ≥ 0 y ≥ 0 Ay ≤ ρ b y ≥ 0 • •
How does the proof work? Scale RHS to get Ay ≤ 1. Let solution for iteration t be y ( t ), assume − ρ ≤ − ℓ ≤ A i y ( t ) ≤ ρ . “Violation” of constraint i as V i ( y ( t )) = A i y ( t ) − 1; recall u i ( t + 1) ≈ u i ( t ) e ǫ V i ( y ( t )) /ρ . u i “Average Violation” as av ( t ) = � j u j V i ( y ( t )). i � On the same side: ≤ 0 (easier case). For approximation ≤ δ . “Potential” at iteration t = � i u i ( t ). i u i ( t )) e ǫ av ( t ) /ρ . Telescopes. Now � i u i ( t + 1) ≤ ( � Upper Bound Final Fractional wt of i + ǫ ln u i ( t ) ≤ ln � t av ( t ) ρ Upper Bound t V i ( t ) /ρ − 2 ǫ 2 ℓ T /ρ ≤ ln Final Fractional wt of i + ǫ ǫ � � t aV ( t ) ρ � t V i ( t ) ≤ · · · ≤ δ
Dantzig Decompositions A (weighted) running average view (primal space). Hard decision problem Easy decision problem
Dantzig Decompositions A (weighted) running average view (primal space). Hard decision problem Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). Hard decision problem Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). Hard decision problem Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • Hard decision problem Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • Hard decision problem • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • Hard decision problem • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • Hard decision problem • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • Hard decision problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • • Hard • decision problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • • Hard • decision • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • • Hard • • decision • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • • Hard • • decision • • • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • • Hard • • decision • • • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • Apx decision problem • Hard • • • decision • • • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • Apx decision problem • Hard • • • decision • • • problem • • • Easy decision • problem
Dantzig Decompositions A (weighted) running average view (primal space). • • Apx decision problem • Hard • • • decision • • • problem • • • Easy decision • problem
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Dual of a point? Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? • Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. • Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. • • Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. • • Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. • • Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. Suppose we prove [*]: ∃ u s.t. A T u ≥ c and ρ b T u < β . Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. Suppose we prove [*]: ∃ u s.t. A T u ≥ c and ρ b T u < β . Providing a y corresponds to: we have not yet proved [*]. Ay ≤ b c T y ≥ β y ≥ 0 Ay ≤ ρ b c T y ≥ β
Multiplicative Weights: Optimization and Duals Instead of tracking violations and averaging solutions at the end, Consider the process from the perspective of u Dual of a hyperplane/constraint? Point in dual space. Dual of a point? Hyperplane/constraint in dual space. Suppose we prove [*]: ∃ u s.t. A T u ≥ c and ρ b T u < β . Providing a y corresponds to: we have not yet proved [*]. Think trajectories. Decompositions on dual. What does y mean then? A T u ≥ c ρ b T u < β u ≥ 0 • • • • • • • •
So the Dual or the Primal? How do we choose which to start from?
Which set of constraints would you rather solve? The one with more variables! Lot more degrees of freedom. Easier to approximate. Maybe sparse solutions exist.
Which set of constraints would you rather solve? The one with more variables! Lot more degrees of freedom. Easier to approximate. Maybe sparse solutions exist. Rewrite relaxations to introduce freedom!
(b) Application to Min. Correlation Clustering Exponentially many constraints. How to design an Oracle. Drag and Drop application of Graph Sparsification/Sketching!
Correlation Clustering: Motivation Tutorial in KDD 2014. Bonchi, Garcia-Soriano, Liberty. Clustering of objects known only through relationships. (Can have wide ranges of edge weights, +ve/-ve.)
Correlation Clustering: Motivation Tutorial in KDD 2014. Bonchi, Garcia-Soriano, Liberty. Clustering of objects known only through relationships. (Can have wide ranges of edge weights, +ve/-ve.) Consider an Entity Resolution example. News arcticle 1: Mr Smith is devoted to mountain climbing. . . . Mrs Smith is a diver and said that she finds diving to be a sublime experience. . . . The goal is to reach new heights, said Smith . Now consider a stream of such articles, with new as well as old entities.
Correlation Clustering: Motivation Tutorial in KDD 2014. Bonchi, Garcia-Soriano, Liberty. Clustering of objects known only through relationships. (Can have wide ranges of edge weights, +ve/-ve.) Consider an Entity Resolution example. News arcticle 1: Mr Smith is devoted to mountain climbing. . . . Mrs Smith is a diver and said that she finds diving to be a sublime experience. . . . The goal is to reach new heights, said Smith . Now consider a stream of such articles, with new as well as old entities. Likely Mr Smith � = Mrs Smith . Large -ve weight. The other references can be either. Small weights depending on context. Weights are not a metric. Have a large range.
Correlation Clustering: A Formulation 1.1 -2 -3 -1 10 1 1 12 2 1 0.3 Find a grouping that disagrees least with the graph. ◮ Count +ve edges out of clusters. Count -ve edges in clusters. ◮ Use as many clusters as you like. Alternatively we can find a grouping that agrees least. NP Hard. Bansal Blum, Chawla, 04. Many approximation algorithms are known. For many variants. Approximations factors were known defore, will not focus on the factor.
Correlation Clustering: A Formulation 1.1 -2 -3 C 1 C 2 -1 10 1 1 12 2 1 0.3 Find a grouping that disagrees least with the graph. ◮ Count +ve edges out of clusters. Count -ve edges in clusters. ◮ Use as many clusters as you like. Alternatively we can find a grouping that agrees least. NP Hard. Bansal Blum, Chawla, 04. Many approximation algorithms are known. For many variants. Approximations factors were known defore, will not focus on the factor.
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Sparsification preserves all cuts within (1 ± ǫ ).
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? Sparsification preserves all cuts within (1 ± ǫ ).
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? NP Hard. ≥ 0 . 5 apx uses SDPs. Sparsification preserves all cuts within (1 ± ǫ ).
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? NP Hard. ≥ 0 . 5 apx uses SDPs. Sparsification preserves all cuts within (1 ± ǫ ). (a) Does not imply anything about finding specific cuts.
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? NP Hard. ≥ 0 . 5 apx uses SDPs. Sparsification preserves all cuts within (1 ± ǫ ). (a) Does not imply anything about finding specific cuts. Yet.
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? NP Hard. ≥ 0 . 5 apx uses SDPs. Sparsification preserves all cuts within (1 ± ǫ ). (a) Does not imply anything about finding specific cuts. Yet. (b) Does not obviously save space either!
Global Sparsification: There and back again Think of a problem on graph cuts. 0.5 1 3 1 10 1 1 12 2 1 0.3 s t Min s - t Cut? Max s - t Cut? Max Cut? NP Hard. ≥ 0 . 5 apx uses SDPs. Sparsification preserves all cuts within (1 ± ǫ ). (a) Does not imply anything about finding specific cuts. Yet. (b) Does not obviously save space either! We will see examples both (a)–(b) and how to overcome them. Lets return to correlation clustering.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. � � min w ij (1 − x ij ) + | w ij | x ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) x ij ≤ 1 ∀ i , j x ij ≥ 0 ∀ i , j (1 − x ij ) + (1 − x jk ) ≥ (1 − x ik ) ∀ i , j , k A linear program.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. • � � min w ij (1 − x ij ) + | w ij | x ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) • • x ij ≤ 1 ∀ i , j x ij ≥ 0 ∀ i , j (1 − x ij ) + (1 − x jk ) ≥ (1 − x ik ) ∀ i , j , k Triangle constraints A linear program. Θ( n 3 ) Constraints, Θ( n 2 ) variables.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. • � � min w ij (1 − x ij ) + | w ij | x ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) • • x ij ≤ 1 ∀ i , j x ij ≥ 0 ∀ i , j (1 − x ij ) + (1 − x jk ) ≥ (1 − x ik ) ∀ i , j , k Triangle constraints A linear program. Θ( n 3 ) Constraints, Θ( n 2 ) variables. 1 pass lower bound of | E ( − ) | for any apx via Communication Complexity.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. • � � min w ij (1 − x ij ) + | w ij | x ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) • • x ij ≤ 1 ∀ i , j x ij ≥ 0 ∀ i , j (1 − x ij ) + (1 − x jk ) ≥ (1 − x ik ) ∀ i , j , k Triangle constraints A linear program. Θ( n 3 ) Constraints, Θ( n 2 ) variables. 1 pass lower bound of | E ( − ) | for any apx via Communication Complexity. Will have ˜ Sparsify E (+), store E ( − )? O ( n ) + | E ( − ) | variables.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. • � � min w ij (1 − x ij ) + | w ij | x ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) • • x ij ≤ 1 ∀ i , j x ij ≥ 0 ∀ i , j (1 − x ij ) + (1 − x jk ) ≥ (1 − x ik ) ∀ i , j , k Triangle constraints A linear program. Θ( n 3 ) Constraints, Θ( n 2 ) variables. 1 pass lower bound of | E ( − ) | for any apx via Communication Complexity. Will have ˜ Sparsify E (+), store E ( − )? O ( n ) + | E ( − ) | variables. � n � Does not work. The triangle constraints need all variables. 2
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. Set y ij = 1 − x ij for +ve edges. z ij = x ij for -ve edges. � � min w ij y ij + | w ij | z ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) y ij , z ij ≥ 0 ∀ ( i , j ) ∈ E y ij , z ij ? Sparsify E (+). Store E ( − ). Θ( n 2 ) → ˜ O ( n ) + | E ( − ) | variables? Θ( n 3 ) Constraints
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. Set y ij = 1 − x ij for +ve edges. z ij = x ij for -ve edges. j � � | w ij | min w ij y ij + | w ij | z ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) i y ij , z ij ≥ 0 ∀ ( i , j ) ∈ E � y uv + z ij ≥ 1 ∀ i , j , and i - j path P ( ij ) ( u , v ) ∈ P ( ij ) Sparsify E (+). Store E ( − ). Θ( n 2 ) → ˜ O ( n ) + | E ( − ) | variables. Θ( n 3 ) Constraints → Exponentially many constraints!
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. Set y ij = 1 − x ij for +ve edges. z ij = x ij for -ve edges. j � � | w ij | min w ij y ij + | w ij | z ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) i y ij , z ij ≥ 0 ∀ ( i , j ) ∈ E � y uv + z ij ≥ 1 ∀ i , j , and i - j path P ( ij ) ( u , v ) ∈ P ( ij ) Sparsify E (+). Store E ( − ). Θ( n 2 ) → ˜ O ( n ) + | E ( − ) | variables. Θ( n 3 ) Constraints → Exponentially many constraints! Solve LP (ellipsoid) & Ball Growing : Garg, Vazirani, Yannakakis 93.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. Set y ij = 1 − x ij for +ve edges. z ij = x ij for -ve edges. j � � | w ij | min w ij y ij + | w ij | z ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) i y ij , z ij ≥ 0 ∀ ( i , j ) ∈ E � y uv + z ij ≥ 1 ∀ i , j , and i - j path P ( ij ) ( u , v ) ∈ P ( ij ) Sparsify E (+). Store E ( − ). Θ( n 2 ) → ˜ O ( n ) + | E ( − ) | variables. Θ( n 3 ) Constraints → Exponentially many constraints! Solve LP (ellipsoid) & Ball Growing : Garg, Vazirani, Yannakakis 93. MWM on the dual. ˜ O ( n + | E ( − ) | ) space and ˜ O ( n 2 ) time.
Min Correlation Clustering Equivalent to Max-Agreement at optimality. Not in approximation. x ij = 1 if in same group, and 0 otherwise. E (+ / − ) = + / − ve edge sets. Set y ij = 1 − x ij for +ve edges. z ij = x ij for -ve edges. j � � | w ij | min w ij y ij + | w ij | z ij ( i , j ) ∈ E (+) ( i , j ) ∈ E ( − ) i y ij , z ij ≥ 0 ∀ ( i , j ) ∈ E � y uv + z ij ≥ 1 ∀ i , j , and i - j path P ( ij ) ( u , v ) ∈ P ( ij ) Sparsify E (+). Store E ( − ). Θ( n 2 ) → ˜ O ( n ) + | E ( − ) | variables. Θ( n 3 ) Constraints → Exponentially many constraints! Solve LP (ellipsoid) & Ball Growing : Garg, Vazirani, Yannakakis 93. MWM on the dual. ˜ O ( n + | E ( − ) | ) space and ˜ O ( n 2 ) time. Round infeasible primal (the running average). Success → done. Failure → violated constraint(s) → point needed for MWM on Dual.
Algorithm in a Picture? Reformulation Duality Graph Sparsification Duality
Recommend
More recommend