Regression via Iteratively Reweighted Least Squares Alina Ene, - PowerPoint PPT Presentation

Improved Convergence for ℓ ∞ and ℓ 1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu

IRLS Method Basic primitive: min ∑r i x i 2 Ax = b

IRLS Method Basic primitive: min ∑r i x i 2 Ax = b solution given by one linear system solve x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r)

IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one linear system solve * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }

IRLS Method “Hard” problem: Basic primitive: ** min ∑r i x i 2 min |x| p Ax = b Ax = b solution given by one equivalent to linear linear system solve programming * x = R -1 A T (A T R -1 A) -1 Ab * R = diag(r) ** p = {1, ∞ }

Benchmark: Optimization on Graphs t min |x| ∞ s Ax = b

Benchmark: Optimization on Graphs minimize congestion of flow x t min |x| ∞ s Ax = b

Benchmark: Optimization on Graphs minimize congestion of flow x t min |x| ∞ s Ax = b boundary condition: x routes demand from s to t

Benchmark: Optimization on Graphs minimize congestion of .5 .5 flow x .5 t 0 min |x| ∞ s Ax = b .5 .5 .5 boundary condition: x routes demand Maximum flow from s to t

Benchmark: Optimization on Graphs +1 -1 min |x| 1 +1 Ax = b -1

Benchmark: Optimization on Graphs minimize +1 cost of flow x -1 min |x| 1 +1 Ax = b -1

Benchmark: Optimization on Graphs minimize +1 cost of flow x -1 min |x| 1 +1 Ax = b -1 boundary condition: x routes demand from +1 to -1

Benchmark: Optimization on Graphs minimize +1 cost of 1 1 flow x 0 -1 0 min |x| 1 +1 Ax = b 0 1 1 -1 boundary condition: x routes demand Minimum cost flow from +1 to -1

Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Ax = b Ax = b max flow min cost flow

Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow

Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [LS ’ 14] “Hybrid” method ➜ [CKMST, STOC ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [Lee-Sidford ’ 14] “Hybrid” method ➜ [CKMST, STOC ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method

Benchmark: Optimization on Graphs min |x| 1 min |x| ∞ Q: Are these problems really that hard? Ax = b Ax = b max flow min cost flow First order methods (gradient descent) ➜ running time strongly depends on matrix structure ➜ in general, takes time at least Ω(m 1.5 /poly(ε)) Second order methods (Newton method, IRLS) ➜ interior point method: Õ(m 1/2 ) linear system solves ➜ can be made Õ(n 1/2 ) with a lot of work [Lee-Sidford ’ 14] “Hybrid” method ➜ [Christiano-Kelner-Madry-Spielman-Teng ’ 11] Õ(m 1/3 /ε 11/3 ) linear system solves ➜ ~30 pages of description and proofs for complicated method

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations * no matter what the structure of the underlying matrix is

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ ≤ OPT Ax = b t s

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ Guess ≤ OPT Ax = b OPT value (.5) t s

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1 OPT value (.5) t r = 1 s Initialize 1 1 1

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .4 Ax = b 1 OPT value (.5) .4 t .6 .2 r = 1 s Initialize .6 .4 1 .4 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .4 Ax = b 1.44 OPT value (.5) .4 t .6 .2 r = 1 s Initialize .6 .4 .4 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1.44 OPT value (.5) t r = 1 s Initialize 1.44 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 Ax = b 1.75 OPT value (.5) t r = 1 s Initialize 1.75 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations min |x| ∞ Guess ≤ OPT Ax = b OPT value (.5) t r = 1 s Initialize min ∑ r i x i 2 Solve least Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 2 Ax = b OPT value (.5) t r = 1 s Initialize 2 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

This work Natural IRLS method runs in Õ(m 1/3 /ε 2/3 +1/ε 2 ) iterations 1 min |x| ∞ Guess ≤ OPT 1 .5 2 Ax = b OPT value .5 (.5) t .5 0 r = 1 s Initialize .5 .5 2 .5 min ∑ r i x i 2 1 Solve least 1 Ax = b squares problem r i ← r i * Update r max{(x i /OPT) 2 , 1}

Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i Similar analysis to packing/covering LP [Young ’ 01] ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]

Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i ➜ Similar analysis to packing/covering LP [Young ’ 01] ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]

Nonstandard Optimization Primitive ➜ Objective function is max r≥0 min Ax=b ∑r i x i 2 /∑r i ➜ Similar analysis to packing/covering LP [Young ’ 01] ➜ ℓ 1 version is a type of “slime mold dynamics” [Straszak- Vishnoi ’ 16, ‘17]

Regression via Iteratively Reweighted Least Squares Alina Ene, - PowerPoint PPT Presentation

Improved Convergence for and 1 Regression via Iteratively Reweighted Least Squares Alina Ene, Adrian Vladu IRLS Method Basic primitive: min r i x i 2 Ax = b IRLS Method Basic primitive: min r i x i 2 Ax = b solution

Iteratively reweighted penalty alternating minimization methods with continuation for image

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Iteratively Reweighted 1 Approaches to Sparse Composite Regularization Phil Schniter Joint

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

1 Least Squares Regression Suppose someone hands you a stack of N vectors, { x N } , each of

Deep Learning - Theory and Practice Linear Regression, Least Squares 20-02-2020 Classification

Least Squares (outline) Standard regression: Fit data with weighted sum of regressors.

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

On Approximating the Covering Radius and Finding Dense Lattice Subspaces Daniel Dadush Centrum

Hypergeometric L -functions in average polynomial time Edgar Costa, Kiran S. Kedlaya, and David

UNDERSTANDING PROGRAM EFFICIENCY: 1 (download slides and .py files and follow along!) 6.0001

From Ramsey to Ehrenfeucht: a reduction between games Oleg Verbitsky Humboldt Universit at

UMBC A N R Y L D A B M A L F T U M B C I O M 1 (March 29, 1998 4:05 pm) Y O

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks Yuan Cao

Dependency Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Improving Your TABLEGEN Description Javed Absar WHAT IS TABLEGEN ? DSL invented for LLVM