randomized sparse kaczmarz methods
play

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - PowerPoint PPT Presentation

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018 The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018


  1. Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schöpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018

  2. The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 2 of 30

  3. Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

  4. Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

  5. Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

  6. Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Amount for one pass through all columns same as applying A Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

  7. Just solving systems of linear equations Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row � a i , x � = b by projecting onto the hyperplane of solutions: x k + 1 = x k − � x k , a i � − b i a i � a i � 2 Each projection just needs O ( n ) operations Amount for one pass through all columns same as applying A Stefan Kaczmarz [1937]: Convergent to some solution for all consistent systems Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

  8. Learning with Kaczmarz Unknown distibution ρ on X × Y = R d × R , regression function f ρ ( a ) = � y d ρ ( y | a ) Hypothesis space H = { f x ∈ L 2 ρ X , x ∈ R d } , f x ( a ) = � a , x � Learning: Obtain samples a ∈ X ′ , b ∈ Y sequentially and try to learn x Kaczmarz: Update x k by x k + 1 = x k − � x k , a � − b a � a � 2 Goal: Show that x k converges to some x ∗ such that � X × Y ( b − f ( a )) 2 d ρ f x ∗ = argmin E ( f ) = argmin f ∈H f ∈H [Lin, Zhou 2015] Here focus on Kaczmarz as an algorithm for solving systems Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 4 of 30

  9. Convergence speed? m = 6 rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 − 1 − 0 . 5 0 0 1 0 . 5 1 linear order random order Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

  10. Convergence speed? m = 12 rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 0 1 − 1 − 0 . 5 0 0 . 5 1 linear order random order Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

  11. Convergence speed? rows, n = 2 columns: 1 1 0 . 5 0 . 5 0 0 − 0 . 5 − 0 . 5 − 1 − 1 − 1 0 1 − 1 − 0 . 5 0 0 . 5 1 linear order random order Btw: Randomized Kaczmarz is stochastic gradient descent for ∑ i ( � a i , x � − b i ) 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

  12. The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 6 of 30

  13. Randomization leads to linear convergence In each iteration, choose index i with probability p i . If ˆ x solves (i.e. � ˆ x , a i � = b i ), then x � 2 − ( � x k − ˆ x , a i � ) 2 � x k + 1 − ˆ x � 2 = � x k − ˆ � a i � 2 . Taking the expectation over the choice of i gives ( � x k − ˆ x , a i � ) 2 E ( � x k + 1 − ˆ x � 2 ) = � x k − ˆ x � 2 − ∑ p i � a i � 2 i = � x k − ˆ x � 2 − � A ( x k − ˆ x ) , DA ( x k − ˆ x ) � with D = diag ( p i / � a i � 2 ) . Gives uniform improvement E ( � x k + 1 − ˆ x � 2 ) ≤ ( 1 − λ ) � x k − ˆ x � 2 , λ = λ min ( A T DA ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 7 of 30

  14. Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

  15. Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

  16. Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Choice p i = � a i � 2 F gives D = � A � − 2 F I , i.e. � A � 2 λ = λ min ( A T A ) = σ min ( A ) = : κ ( A ) � A � 2 � A � 2 F F Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

  17. Theorem A ∈ R m × n , m ≥ n with full rank, A ˆ x = b, then iterates of randomized Kaczmarz fulfill E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 with λ = λ min ( A T DA ) , D = diag ( p i / � a i � 2 ) . Result due to [Stohmer, Vershynin 2009] Choice p i = � a i � 2 F gives D = � A � − 2 F I , i.e. � A � 2 λ = λ min ( A T A ) = σ min ( A ) = : κ ( A ) � A � 2 � A � 2 F F Experimentally: above p not optimal, other p give larger λ Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

  18. Underdetermined systems Consider Ax = b , underdetermined but consistent Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  19. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  20. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  21. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  22. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  23. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . As above: E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 λ = λ min ( Z T A T DAZ ) , D = diag ( p i / � a i � 2 ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  24. Underdetermined systems Consider Ax = b , underdetermined but consistent Which solution does Kaczmarz pick? Initialization x 0 = 0 (or x 0 ∈ rg A T ), then all iterates x k ∈ rg A T x solution in rg A T Assume ˆ Z ∈ R n × m , columns form ONB of rg A T , then x k = ZZ T x k , ZZ T ˆ x = ˆ x . As above: E ( � x k − ˆ x � 2 ) ≤ ( 1 − λ ) k � x 0 − ˆ x � 2 λ = λ min ( Z T A T DAZ ) , D = diag ( p i / � a i � 2 ) Convergence to minimum-norm solution ˆ x Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

  25. The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 10 of 30

  26. Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

  27. Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) This is the solution with min � x � 2 Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

  28. Kaczmarz converging to sparse solutions? Kaczmarz converges to (unique) solution in x 0 + rg A T (if consistent) This is the solution with min � x � 2 Convergence to other solutions? (e.g. min � x � 1 ) Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

Recommend


More recommend