Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - PowerPoint PPT Presentation

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schöpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 2 of 30

Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Amount for one pass through all columns same as applying A Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Amount for one pass through all columns same as applying A Stefan Kaczmarz [1937]: Convergent to some solution for all consistent systems Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

Learning with Kaczmarz Unknown distibution ρ on X × Y = R d × R , regression function f ρ ( a ) = � y d ρ ( y | a ) Hypothesis space H = { f x ∈ L 2 ρ X , x ∈ R d } , f x ( a ) = � a , x � Learning: Obtain samples a ∈ X ′ , b ∈ Y sequentially and try to learn x Kaczmarz: Update x k by x k + 1 = x k − � x k , a � − b a � a � 2 Goal: Show that x k converges to some x ∗ such that � X × Y ( b − f ( a )) 2 d ρ f x ∗ = argmin E ( f ) = argmin f ∈H f ∈H [Lin, Zhou 2015] Here focus on Kaczmarz as an algorithm for solving systems Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 4 of 30

Convergence speed? m = 6 rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 − 1 − 0 . 5 0 0 1 0 . 5 1 linear order random order Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

Convergence speed? m = 12 rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 0 1 − 1 − 0 . 5 0 0 . 5 1 linear order random order Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

Convergence speed? rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 0 1 − 1 − 0 . 5 0 0 . 5 1 linear order random order Btw: Randomized Kaczmarz is stochastic gradient descent for ∑ i ( � a i , x � − b i ) 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

Randomization leads to linear convergence In each iteration, choose index i with probability p i . If ˆ x solves (i.e. � ˆ x , a i � = b i ), then x � 2 − ( � x k − ˆ x , a i � ) 2 � x k + 1 − ˆ x � 2 = � x k − ˆ � a i � 2 . Taking the expectation over the choice of i gives ( � x k − ˆ x , a i � ) 2 E ( � x k + 1 − ˆ x � 2 ) = � x k − ˆ x � 2 − ∑ p i � a i � 2 i = � x k − ˆ x � 2 − � A ( x k − ˆ x ) , DA ( x k − ˆ x ) � with D = diag ( p i / � a i � 2 ) . Gives uniform improvement E ( � x k + 1 − ˆ x � 2 ) ≤ ( 1 − λ ) � x k − ˆ x � 2 , λ = λ min ( A T DA ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 7 of 30

Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Choice p i = � a i � 2 F gives D = � A � − 2 F I , i.e. � A � 2 λ = λ min ( A T A ) = σ min ( A ) = : κ ( A ) � A � 2 � A � 2 F F Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Choice p i = � a i � 2 F gives D = � A � − 2 F I , i.e. � A � 2 λ = λ min ( A T A ) = σ min ( A ) = : κ ( A ) � A � 2 � A � 2 F F Experimentally: above p not optimal, other p give larger λ Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . As above: E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 λ = λ min ( Z T A T DAZ ) , D = diag ( p i / � a i � 2 ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . As above: E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 λ = λ min ( Z T A T DAZ ) , D = diag ( p i / � a i � 2 ) Convergence to minimum-norm solution ˆ x Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) This is the solution with min � x � 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) This is the solution with min � x � 2 Convergence to other solutions? (e.g. min � x � 1 ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - PowerPoint PPT Presentation

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018 The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018

Sparse and TV Kaczmarz solvers and the linearized Bregman method Dirk Lorenz, Frank Schpfer,

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

The Kaczmarz Method for Ultrasound Tomography Frank Natterer University of Mnster Department

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Mina Kwon - 2019.10.15 - Randomized Clinical Trial ? Randomized Clinical Trial Purpose:

Probability and Delaunay triangulations 1 Randomized algorithms for Delaunay triangulations

Randomized algorithms Quick-sort Closest pair of points Inge Li Grtz 1 2

Randomized Algorithms Lecture 3: Occupancy, Moments and deviations, Randomized selection

DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Emery Berger Ben Zorn

Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New

Directed Automated Randomized Testing (DART) Motivation Verifica(on is really

Controversies and Unresolved Issues in the Design of Randomized Controlled Trials Testing

On the Resiliency of Randomized Routing Against Multiple Edge

Introduction to Randomized Algorithms: QuickSort Lecture 2 January 17, 2019 Chandra (UIUC)

Introducing Random Search H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data Scientist

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - PowerPoint PPT Presentation

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018 The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018

Sparse and TV Kaczmarz solvers and the linearized Bregman method Dirk Lorenz, Frank Schpfer,

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

The Kaczmarz Method for Ultrasound Tomography Frank Natterer University of Mnster Department

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah &amp; Karan Singh 1 Randomized

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Mina Kwon - 2019.10.15 - Randomized Clinical Trial ? Randomized Clinical Trial Purpose:

Probability and Delaunay triangulations 1 Randomized algorithms for Delaunay triangulations

Randomized algorithms Quick-sort Closest pair of points Inge Li Grtz 1 2

Randomized Algorithms Lecture 3: Occupancy, Moments and deviations, Randomized selection

DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Emery Berger Ben Zorn

Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New

Directed Automated Randomized Testing (DART) Motivation Verifica(on is really

Controversies and Unresolved Issues in the Design of Randomized Controlled Trials Testing

On the Resiliency of Randomized Routing Against Multiple Edge

Introduction to Randomized Algorithms: QuickSort Lecture 2 January 17, 2019 Chandra (UIUC)

Introducing Random Search H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data Scientist

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized