Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019 Computational and Applied Mathematics UCLA joint with Deanna Needell, Nazanin Rahnavard, and Alireza Zaeezadeh 1
Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise 2
Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise Approach : min x ∈ R N � x � 1 s.t. � Ax − y �≤ ǫ or m � Ax − y � 2 s.t. � x � 0 ≤ s min x ∈ R N 1 2
Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise Applications : Approach : ⊲ image reconstruction min x ∈ R N � x � 1 s.t. � Ax − y �≤ ǫ or ⊲ hyper spectral imaging m � Ax − y � 2 s.t. � x � 0 ≤ s min x ∈ R N 1 ⊲ wireless communications ⊲ analog to digital conversion 2
Algorithmic Approaches Convex optimization : ⊲ linear programming ⊲ (proximal) gradient descent ⊲ coordinate descent ⊲ stochastic iterative methods (SGD) 3
Algorithmic Approaches Greedy pursuits : Convex optimization : ⊲ orthogonal matching pursuit ⊲ linear programming (OMP) ⊲ (proximal) gradient descent ⊲ regularized OMP (ROMP) ⊲ coordinate descent ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ stochastic iterative methods (SGD) ⊲ iterative hard thresholding (IHT) 3
Algorithmic Approaches Greedy pursuits : Convex optimization : ⊲ orthogonal matching pursuit ⊲ linear programming (OMP) ⊲ (proximal) gradient descent ⊲ regularized OMP (ROMP) ⊲ coordinate descent ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ stochastic iterative methods (SGD) ⊲ iterative hard thresholding (IHT) IHT : x ( n +1) = H k ( x ( n ) + A T ( y − A x ( n ) )) 3
StoIHT 1 1 Nguyen, Needell, Woolf, IEEE Transactions on Information Theory ’17 4
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x 5
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary 5
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches 5
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 5
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 ⊲ likely that same non-zero entries are updated from one iteration to the next 5
Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 ⊲ likely that same non-zero entries are updated from one iteration to the next ⊲ a slow core could easily “undo” the progress of previous updates by faster cores 5
Asynchronous StoIHT 2 2 Needell, Woolf, Proc. Information Theory and Applications ’17 6
Bayesian Asynchronous StoIHT Require: Number of subproblems, M , and probability of selection p ( B ). The reliability score distribution parameters, ˆ i and ˆ β 1 β 0 i , and the tally a 1 a 0 scores parameters, ˆ n and ˆ n , are available to each processor. Each processor performs the following at each iteration: 1: randomize: select B t ∈ [ M ] with probability p ( B t ) b ( t ) = x ( t ) + Mp ( B t ) A ∗ B t ( y B t − A B t x ( t ) ) 2: proxy: γ S ( t ) = supp s ( b ( t ) ) and ˜ T ( t ) = supp s ( φ ) ˆ 3: identify: x ( t +1) = b ( t ) 4: estimate: S ( t ) ∪ ˜ ˆ T ( t ) 5: repeat update E Q { u ni } { u ni } = Q { u ni = 1 } 6: update ˆ i and ˆ β 1 β 0 a 1 a 0 7: i , ˆ n and ˆ n 8: until convergence 9: update φ 10: t = t + 1 2 Zaeemzadeh, H., Rahnavard, Needell, Proc. 49th Asilomar Conf. on Signals, 7 Systems and Computers ’18
Experimental Convergence 8
Tools for Analysis First step : analyze IHT variant running on each node of parallel system 9
Tools for Analysis First step : analyze IHT variant running on each node of parallel system k : x ( n +1) = H k , ˜ k ( x ( n ) + A T ( y − A x ( n ) )) IHT k , ˜ 9
Tools for Analysis First step : analyze IHT variant running on each node of parallel system k : x ( n +1) = H k , ˜ k ( x ( n ) + A T ( y − A x ( n ) )) IHT k , ˜ Non-Symmetric Isometry Property: (1 − β k ) � z � 2 2 ≤ � A z � 2 2 ≤ � z � 2 2 for all k -sparse z 9
Convergence of IHT k , ˜ k Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) k < 1 If A has the non-symmetric restricted isometry property with β 3 k +2˜ 8 , then in iteration n, the IHT k , ˜ k algorithms with input observations y = Ax + e recover the approximation x ( n ) with � x − x ( n ) �≤ 2 − n � x k � +5 � x − x k � + 4 � x − x k � 1 +4 � e � . √ k 10
An Improved Scenario Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) Suppose the signal x has constant values on its support, and the ˜ k indices selected (non-greedily) by the IHT k , ˜ k algorithm each lie uniformly in the support of x with probability p. If A has the non-symmetric k < 1 restricted isometry property with β 3 k +2˜ 8 , then in iteration n, the IHT k , ˜ k algorithms with input observations y = Ax + e recover the approximation x ( n ) with k � x − x ( n ) � ≤ 2 − n � x � +5 E ˜ x ( n ) � k � x − ˜ E ˜ + 4 x ( n ) � 1 +4 � e � √ E ˜ k � x − ˜ k 5 α + 4 α � � ≤ 2 − n � x � + √ � x � 1 +4 � e � k | supp ( x ) |− k | supp ( x ) |− p ˜ � �� � k where α = . | supp ( x ) | | supp ( x ) | 11
Experimental Convergence of IHT k , ˜ k Figure 1: Plot of error � x − x ( n ) � vs. iteration for 100 iterations of IHT k , ˜ k with various probabilities p that the ˜ k indices lie in supp( x ). 12
Rate of Support Intersection 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 (a) 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 (b) Figure 2: The rate at which the shared indices between nodes lie in the true support of signal x for iterations of (a) AStoIHT and (b) BAStoIHT. 13
Conclusions and Future Work 14
Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant 14
Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence 14
Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version 14
Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHT k , ˜ k 14
Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHT k , ˜ k ⊲ extend to non-heuristic analysis of Asynchronous StoIHT 14
Thanks for listening! Questions? [1] J. Haddock, D. Needell, N. Rahnavard, and A. Zaeemzadeh. Convergence of iterative hard thresholding variants with application to asynchronous parallel methods for sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2019. [2] Deanna Needell and Tina Woolf. An asynchronous parallel approach to sparse recovery. In 2017 Information Theory and Applications Workshop (ITA), pages 1–5. IEEE, 2 2017. [3] Nam Nguyen, Deanna Needell, and Tina Woolf. Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints. IEEE Transactions on Information Theory, 63(11):6869–6895, 11 2017. [4] A. Zaeemzadeh, J. Haddock, N. Rahnavard, and D. Needell. A Bayesian approach for asynchronous parallel sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2018. 15
Recommend
More recommend