Dictionary learning - fast and dirty Karin Schnass Department of Mathematics University of Innsbruck karin.schnass@uibk.ac.at Der Wissenschaftsfonds Dagstuhl, August 31 Karin Schnass ITKM 1 / 16
why do we care about sparsity again? A sparse representation of the data is the basis for Karin Schnass ITKM 2 / 16
why do we care about sparsity again? A sparse representation of the data is the basis for efficient data processing, e.g denoising, compressed sensing, inpainting Example: inpainting a a J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. Karin Schnass ITKM 2 / 16
why do we care about sparsity again? A sparse representation of the data is the basis for efficient data processing, e.g denoising, compressed sensing, inpainting efficient data analysis, e.g source separation, anomaly detection, sparse components Example: sparse components a a D.J. Field, B.A. Olshausen, Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Karin Schnass ITKM 2 / 16
why do we care about sparsity again? A sparse representation of the data is the basis for efficient data processing, e.g denoising, compressed sensing, inpainting efficient data analysis, e.g source separation, anomaly detection, sparse components In all examples: the sparser - the more efficient Karin Schnass ITKM 2 / 16
why do we care about dictionary learning? data: Y = ( y 1 , . . . , y N ) N vectors y n ∈ R d d , N large Karin Schnass ITKM 3 / 16
why do we care about dictionary learning? data: Y = ( y 1 , . . . , y N ) N vectors y n ∈ R d d , N large Karin Schnass ITKM 3 / 16
why do we care about dictionary learning? data: Y = ( y 1 , . . . , y N ) N vectors y n ∈ R d d , N large No need for intuition time (days vs. years) Karin Schnass ITKM 3 / 16
why do we care about dictionary learning? data: Y = ( y 1 , . . . , y N ) N vectors y n ∈ R d d , N large No need for intuition time (days vs. years) Karin Schnass ITKM 3 / 16
Let’s do it. We have: data Y a model ( Y is S-sparse in a d × K dictionary Φ) Karin Schnass ITKM 4 / 16
Let’s do it. We have: data Y a model ( Y is S-sparse in a d × K dictionary Φ) We want: an algorithm (fast, cheap) guarantees that the algorithm will find Φ. Karin Schnass ITKM 4 / 16
Let’s do it. We have: data Y a model ( Y is S-sparse in a d × K dictionary Φ) We want: an algorithm (fast, cheap) guarantees that the algorithm will find Φ. Promising directions: Graph clustering algorithms (not so cheap) Tensor methods (not so cheap) - later today! (Alternating) Optimisation (not so many guarantees) Karin Schnass ITKM 4 / 16
Let’s do it. We have: data Y a model ( Y is S-sparse in a d × K dictionary Φ) We want: an algorithm (fast, cheap) guarantees that the algorithm will find Φ. Promising directions: Graph clustering algorithms (not so cheap) Tensor methods (not so cheap) - later today! (Alternating) Optimisation (not so many guarantees) Karin Schnass ITKM 4 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n but this leads to K-SVD which is slow. Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n but this leads to K-SVD which is slow. So let’s modify the optimisation programme | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (2) Ψ ∈D n Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n but this leads to K-SVD which is slow. So let’s modify the optimisation programme � |� ψ i , y n �| 2 , max max (2) Ψ ∈D i n Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n but this leads to K-SVD which is slow. So let’s modify the optimisation programme � max max |� ψ i , y n �| , (2) Ψ ∈D i n Karin Schnass ITKM 5 / 16
warm up - a bit of K-SVD. Since our signals are S-sparse, let’s minimise � Y − Ψ X � 2 min F Ψ ∈D , X ∈X S (Ψ ∈ D has normalised columns, X ∈ X S has S-sparse columns) or equivalently maximise | I |≤ S � Ψ I Ψ † � I y n � 2 max max 2 , (1) Ψ ∈D n but this leads to K-SVD which is slow. So let’s modify the optimisation programme � | I |≤ S � Ψ ⋆ max max I y n � 1 , (2) Ψ ∈D n Karin Schnass ITKM 5 / 16
Iterative Thresholding and K signal means (ITKsM) To optimise: � | I | = S � Ψ ⋆ max max I y n � 1 (3) Ψ ∈D n Karin Schnass ITKM 6 / 16
Iterative Thresholding and K signal means (ITKsM) To optimise: � | I | = S � Ψ ⋆ max max I y n � 1 (3) Ψ ∈D n Algorithm (ITKsM one iteration) Given an input dictionary Ψ and N training signals y n do: For all n find I t Ψ , n = arg max I : | I | = S � Ψ ⋆ I y n � 1 . For all k calculate ψ k = 1 ¯ � y n · sign( � ψ k , y n � ) · χ ( I t Ψ , n , k ) . (4) N n Output ¯ Ψ = ( ¯ ψ 1 / � ¯ ψ 1 � 2 , . . . , ¯ ψ K / � ¯ ψ K � 2 ) . Karin Schnass ITKM 6 / 16
ITKsM is ridiculously cheap O ( dKN ) (parallelisable, online version) robust to noise, not exact or low sparsity ( S = O ( µ − 2 )) locally convergent (radius 1 / √ log K ) for sparsity S = O ( µ − 2 ), needs only O ( K log K ε − 2 ) samples, Karin Schnass ITKM 7 / 16
ITKsM is ridiculously cheap O ( dKN ) (parallelisable, online version) robust to noise, not exact or low sparsity ( S = O ( µ − 2 )) locally convergent (radius 1 / √ log K ) for sparsity S = O ( µ − 2 ), needs only O ( K log K ε − 2 ) samples, but is not globally convergent Karin Schnass ITKM 7 / 16
ITKsM is ridiculously cheap O ( dKN ) (parallelisable, online version) robust to noise, not exact or low sparsity ( S = O ( µ − 2 )) locally convergent (radius 1 / √ log K ) for sparsity S = O ( µ − 2 ), needs only O ( K log K ε − 2 ) samples, but is not globally convergent Algorithm (ITKrM one iteration) Given an input dictionary Ψ and N training signals y n do: For all n find I t Ψ , n = arg max I : | I | = S � Ψ ⋆ I y n � 1 . For all k calculate ¯ � ψ k = sign( � ψ k , y n � ) · y n . n : k ∈ I t Ψ , n Output ¯ Ψ = ( ¯ ψ 1 / � ¯ ψ 1 � 2 , . . . , ¯ ψ K / � ¯ ψ K � 2 ) . Karin Schnass ITKM 7 / 16
ITKsM is ridiculously cheap O ( dKN ) (parallelisable, online version) robust to noise, not exact or low sparsity ( S = O ( µ − 2 )) locally convergent (radius 1 / √ log K ) for sparsity S = O ( µ − 2 ), needs only O ( K log K ε − 2 ) samples, but is not globally convergent Algorithm (ITKrM one iteration) Given an input dictionary Ψ and N training signals y n do: For all n find I t Ψ , n = arg max I : | I | = S � Ψ ⋆ I y n � 1 . For all k calculate ¯ � � � ψ k = sign( � ψ k , y n � ) · I − P (Ψ I t n ) + P ( ψ k ) y n . n : k ∈ I t Ψ , n Output ¯ Ψ = ( ¯ ψ 1 / � ¯ ψ 1 � 2 , . . . , ¯ ψ K / � ¯ ψ K � 2 ) . Karin Schnass ITKM 7 / 16
intermediate quiz Are we going to recover the dictionary? Karin Schnass ITKM 8 / 16
intermediate quiz Are we going to recover the dictionary? Karin Schnass ITKM 8 / 16
intermediate quiz Are we going to recover the dictionary? Karin Schnass ITKM 8 / 16
intermediate quiz Are we going to recover the dictionary? No, no, no!! Karin Schnass ITKM 8 / 16
intermediate quiz Are we going to recover the dictionary? No, no, no!! We need a sparse model Karin Schnass ITKM 8 / 16
Recommend
More recommend