Improved Convergence for ℓ ∞ and ℓ 1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu
IRLS Method Basic primitive: min ∑r i x i 2 Ax = b
IRLS Method Basic primitive: min ∑r i x i 2 Ax = b solution given by one linear system solve x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r)
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one linear system solve * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }
Benchmark: Optimization on Graphs t min |x| ∞ s Ax = b
Benchmark: Optimization on Graphs minimize congestion of flow x t min |x| ∞ s Ax = b
Benchmark: Optimization on Graphs minimize congestion of flow x t min |x| ∞ s Ax = b boundary condition: x routes demand from s to t
Benchmark: Optimization on Graphs minimize congestion of .5 .5 flow x .5 t 0 min |x| ∞ s Ax = b .5 .5 .5 boundary condition: x routes demand Maximum flow from s to t
Benchmark: Optimization on Graphs +1 -1 min |x| 1 +1 Ax = b -1
Benchmark: Optimization on Graphs minimize +1 cost of flow x -1 min |x| 1 +1 Ax = b -1
Benchmark: Optimization on Graphs minimize +1 cost of flow x -1 min |x| 1 +1 Ax = b -1 boundary condition: x routes demand from +1 to -1
Benchmark: Optimization on Graphs minimize +1 cost of 1 1 flow x 0 -1 0 min |x| 1 +1 Ax = b 0 1 1 -1 boundary condition: x routes demand Minimum cost flow from +1 to -1
Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Ax = b Ax = b max flow min cost flow
Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow
Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [LS ’ 14] “Hybrid” method ➜ [CKMST, STOC ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method
Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [Lee-Sidford ’ 14] “Hybrid” method ➜ [CKMST, STOC ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method
Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [Lee-Sidford ’ 14] “Hybrid” method ➜ [Christiano-Kelner-Madry-Spielman-Teng ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations * no matter what the structure of the underlying matrix is
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ ≤ OPT Ax = b t s
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ Guess ≤ OPT Ax = b OPT value (.5) t s
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ Guess ≤ OPT Ax = b OPT value (.5) t s
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1 OPT value (.5) t r = 1 s Initialize 1 1 1
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .4 Ax = b 1 OPT value (.5) .4 t .6 .2 r = 1 s Initialize .6 .4 1 .4 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .4 Ax = b 1.44 OPT value (.5) .4 t .6 .2 r = 1 s Initialize .6 .4 .4 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .4 Ax = b 1.44 OPT value (.5) .4 t .6 .2 r = 1 s Initialize .6 .4 .4 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1.44 OPT value (.5) t r = 1 s Initialize 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .44 Ax = b 1.44 OPT value (.5) .44 t .55 .11 r = 1 s Initialize .55 .44 .44 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .44 Ax = b 1.75 OPT value (.5) .44 t .55 .11 r = 1 s Initialize .55 .44 .44 1.75 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1.75 OPT value (.5) t r = 1 s Initialize 1.75 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ Guess ≤ OPT Ax = b OPT value (.5) t r = 1 s Initialize min ∑ r i x i 2 Solve least Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 2 Ax = b OPT value (.5) t r = 1 s Initialize 2 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .5 2 Ax = b OPT value .5 (.5) t .5 0 r = 1 s Initialize .5 .5 2 .5 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}
Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i Similar analysis to packing/covering LP [Young ’ 01] ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]
Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i ➜ Similar analysis to packing/covering LP [Young ’ 01] ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]
Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i ➜ Similar analysis to packing/covering LP [Young ’ 01] ➜ ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]
Recommend
More recommend