from supervised to unsupervised computational sensing
play

From Supervised to Unsupervised Computational Sensing Ali Mousavi - PowerPoint PPT Presentation

From Supervised to Unsupervised Computational Sensing Ali Mousavi Aug 12 th 2019 brain Brain vision summit 1 Collaborators Rich Baraniuk Arian Maleki Rice University Columbia University Chris Metzler Reinhard Heckel Gautam Dasarathy 2


  1. From Supervised to Unsupervised Computational Sensing Ali Mousavi Aug 12 th 2019 brain Brain vision summit 1

  2. Collaborators Rich Baraniuk Arian Maleki Rice University Columbia University Chris Metzler Reinhard Heckel Gautam Dasarathy 2 Stanford University Rice University Arizona StateUniversity

  3. Computational Sensing • Conventional Sensing Ψ Expensive Subject Hardware Computational Sensing: Reduce costs in acquisition systems • by replacing expensive hardware w/ cheap hardware + computation Φ Computation Simpler Subject Software Measurements Hardware 3

  4. Large Scale Datasets 4

  5. Data-Driven Computational Sensing Recovered Subject Subject Measurements Simpler Computational Hardware Software 5

  6. Model Φ Computation Simpler Subject Software Measurements Hardware Φ − 1 ( . ) Φ ( . ) y = Φ ( x ) ∈ R M x ∈ R N x ∈ R N ˆ − − − → − − − − → Overdetermined Determined Underdetermined Φ Φ Φ y y y x x x × × × = = = 6 M > N M = N M < N

  7. Model Φ Computation Simpler Subject Software Measurements Hardware Φ − 1 ( . ) Φ ( . ) y = Φ ( x ) ∈ R M x ∈ R N x ∈ R N ˆ − − − → − − − − → N M = × Φ y 7 x

  8. Applications 8

  9. Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) 9

  10. Iterative Algorithms Φ y x o M ⌧ N x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) Initial Estimate Calculate the Residual Until Convergence Update the Estimate 10

  11. Iterative Algorithms Φ x o y x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) M ⌧ N x o y = Φ x C 11

  12. Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) 12

  13. Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) 13

  14. Sparse Regression x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x k y � Φ x k 2 2 + λ k x k 1 min Φ y x o M ⌧ N 14

  15. Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ x o y = Φ x C Φ | y ) Φ | y ( η 15

  16. Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 16

  17. Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 17

  18. Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step Projection Operator x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 18

  19. Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step Projection Operator x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Soft Thresholding 0.5 Residual x o η ( x, τ ) 0 y = Φ x C Φ | y ) Φ | y ( η 19 − 0.5 − 1 − 0.5 0 0.5 1 τ − τ

  20. Sparse Regression Φ y x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N • Approximate Message Passing (AMP) [Donoho, Maleki, Montanari 2009] x o x t +1 = η ( x t + Φ | z t ; τ t ) y = Φ x C x t + Φ | z t = x o + v t Φ | y ) Φ | y ( η Effective Noise 20

  21. Structured Regression Φ y x o x k y � Φ x k 2 min 2 + λ f ( x ) M ⌧ N • Denoising Approximate Message Passing (D-AMP) [Metzler, Maleki, Baraniuk 2015] x o x t +1 = D t ( x t + Φ | z t ) y = Φ x C Φ | y ) | y Φ ( D 21

  22. Unrolling Iterative Algorithms Iterative Algorithm Unrolled Algorithm Initial Estimate Initial Estimate Updated Residual Calculate the Residual Updated Estimate Until Convergence Update the Estimate Updated Residual Updated Estimate 22 [Gregor and LeCun, 2010]

  23. Learned-Denoising-AMP Learned-Denoising-AMP (LDAMP) [Metzler, Mousavi, Baraniuk, NIPS 2017 ] x l +1 = D l ( x l + Φ | z l ) z l = y − Φ x l + 1 div D l ( x l − 1 + Φ | z l − 1 ) δ z l − 1 ⌦ ↵ We use a 20-layer convolutional network as a denoiser [Zhang et al. 2017] • Two layers of the LDAMP network • Φ | Φ Φ | Φ 23

  24. Training LDAMP and LDIT End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 24

  25. Training LDAMP End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 • Lemma 1 [Metzler, Mousav i , Baraniuk, NIPS 2017 ] Layer-by-layer training of LDAMP is MMSE optimal. Lemma 2 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Denoiser-by-denoiser training of LDAMP is MMSE optimal. 25

  26. Training LDAMP End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 Lemma 1 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Layer-by-layer training of LDAMP is MMSE optimal. Lemma 2 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Denoiser-by-denoiser training of LDAMP is MMSE optimal. Average PSNR (dB) of one hundred 40x40 images Recovered from i.i.d Gaussian Measurements • Noise discretization degrades the performance. • Denoiser-by-denoiser is more generalizable. 26

  27. Compressive Image Recovery 512x512 images, 20x undersampling, noiseless measurements BM3D-AMP (27.2 dB, 75.04 sec) Original Image 27 LDAMP (28.1 dB, 1.22 sec) TVAL3 (26.4 dB, 6.85 sec)

  28. summary so far N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L 28

  29. Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) 29

  30. Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) Mousavi, Maleki, Baraniuk, ‘Consistent Parameter Estimation’, Annals of Statistics 2017 • Mousavi, Dasarathy, Baraniuk, ‘Data-Driven Sparse Representation’, ICLR 2019 • 30

  31. Summary so far N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L Supervised 31

  32. Next Step N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L Unsupervised 32

  33. Stein’s Unbiased Risk Estimator (SURE) [Stein ‘81] • A statistical model selection technique Unknown f θ ( . ) Weakly differentiable 33

  34. Monte-Carlo SURE [Ramani, Blu, Unser, 2008] • Challenge: Computing the divergence For bounded functions: • Approximation: • 34

  35. Denoising with Noisy Data • DnCNN Denoiser: • Training Data: • Loss Function: MSE SURE 35

  36. Denoising with Noisy Data Results Original Noisy Image BM3D (26.0 dB, 4.01 sec.) 36 DnCNN SURE (26.5 dB, 0.04 sec.)DnCNN MSE (26.7 dB, 0.04 sec.)

  37. Compressive Image Recovery w/ Noisy Data • Problem Formulation: x ∈ R N Image: y ∈ R M y = Φ x + w , Measurements: Φ ∈ R M × N Measurement Operator: w ∈ R M Noise: Setting: M ⌧ N N M + = × Φ y 37 x o w

  38. Recovery Algorithm • Learning Denoising-based AMP (LDAMP) Neural Network (for k=1,…,K): z k = y − Φ x k + 1 θ k − 1 ( x k − 1 + Φ ∗ z k − 1 ) m z k − 1 div D k − 1 σ k = k z k k 2 p m Layer by Layer Training x k +1 = D k θ k ( x k + Φ ∗ z k ) L 1 L 1 L 2 L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 • Decouples image recovery into a series of denoising problems: x k + Φ ∗ z k = x o + σ v [Donoho et al. 2009, 2011] [Bayati and Montanari, 2011] • Layerwise Training of the LDAMP Network: MSE SURE 38

  39. Compressive Image Recovery 5x undersampling Original Image BM3D-AMP (31.3 dB, 13.2 sec.) 39 LDAMP MSE (34.6 dB, 0.4 sec.) LDAMP SURE (31.9 dB, 0.4 sec.)

  40. Take away Messages! There are three major paradigms • for signal acquisition. Sampling Modeling Reconstruction Each paradigm puts resources • on one of the sampling, modeling, Nyquist Rate (~1900) or reconstruction tasks. Compressive Sensing (~2007) Our Work There seems to be a preservation • of computation between different paradigms. 40

Recommend


More recommend