Toward Fast Transform Learning Toward Fast Transform Learning Olivier Chabiron 1 , Franc ¸ois Malgouyres 2 , Jean-Yves Tourneret 1 , Nicolas Dobigeon 1 1 Institut de Recherche en Informatique de Toulouse (IRIT) 2 Institut de Math´ ematiques de Toulouse (IMT) This work is supported by the CIMI Excellence Laboratory C URVES AND S URFACES 2014 1/45
Toward Fast Transform Learning Introduction Introduction 1 Problem studied 2 ALS Algorithm 3 Approximation experiments 4 Convergence experiments 5 2/45
Toward Fast Transform Learning Introduction Introduction to sparse representation Notations Objects u live in R P where P is a set of pixels (such as { 1 ,..., N } 2 ). In image processing, many problems are underdetermined. For example, in sparse representation, we want to solve min � α � ∗ subject to � D α − u � 2 ≤ τ Principle of sparse representation/approximation For many applications, � . � ∗ should be � . � 0 . � α � 0 = # { j ; α j � = 0 } Issue The sparse representation problem is (in general) NP-hard. However, successful algorithms exist when the columns of D are almost orthogonal. 3/45
Toward Fast Transform Learning Introduction Dictionary learning Choosing a dictionary (cosines, Learning the dictionary from the data wavelets, curvelets, ...) + fast transform - no fast transform - limited sparsity + better sparsity The DL problem Learn an efficient representation frame for an image class, solving � � argmin D , α ∑ µ � D α − u � 2 2 + � α � ∗ u DL problems are often resolved in two steps argmin α − → Sparse coding stage, argmin D − → Dictionary update stage. 4/45
Toward Fast Transform Learning Introduction Motivations (1) . . . . . . . . . . . . . . . . . . . # P image size . . . . α # D = # P u . . D . . . . . . . . . . . . . . . . . � �� � # D number of atoms Usually, # D ≫ # P . Computing D α costs O (# D # P ) > O (# P 2 ) operations. Computing sparse codes is very expensive. Storing D is very expensive. 5/45
Toward Fast Transform Learning Introduction Motivations (2) Our objectives: Define a fast transform to compute D α . Ensure a fast update so that larger atoms can be learned. 6/45
Toward Fast Transform Learning Introduction Model Model for a dictionary update with a single atom H ∈ R P . How to include every possible translation of H ? ∑ α p ′ H p − p ′ = ( α ∗ H ) p p ′ ∈ P Model Image is a sum of weighted translations of one atom u = α ∗ H + b , (1) where u ∈ R P is the image data, α ∈ R P is the code, H ∈ R P the target and b is noise. 7/45
Toward Fast Transform Learning Introduction Fast Transform How ? Atoms computed with a composition of K convolutions H ≈ h 1 ∗ h 2 ∗···∗ h K Kernels ( h k ) 1 ≤ k ≤ K have constrained supports defined by a mapping S k : � h k � � S k � ∀ k ∈ { 1 ,..., K } , supp ⊂ rg � S k � = { S k ( 1 ) ,..., S k ( S ) } where rg contains all the possible locations of the non-zero elements of h k . Figure: Tree structure for a dictionary. Notation : h = ( h k ) 1 ≤ k ≤ K ∈ ( R P ) K . 8/45
Toward Fast Transform Learning Introduction Example of support mapping Figure: Supports ( S k ) 1 ≤ k ≤ 4 of size S = 3 × 3 upsampled by a factor k . 9/45
Toward Fast Transform Learning Problem studied Introduction 1 Problem studied 2 ALS Algorithm 3 Approximation experiments 4 Convergence experiments 5 10/45
Toward Fast Transform Learning Problem studied ( P 0 ) First formulation ( h k ) 1 ≤ k ≤ K ∈ ( R P ) K � α ∗ h 1 ∗···∗ h K − u � 2 � h k � � S k � ( P 0 ) : ⊂ rg argmin s.t. supp 2 Energy gradient ∂ E 0 ( h ) H k ∗ ( α ∗ h 1 ∗···∗ h K − u ) , = 2 ˜ (2) ∂ h k where H k = α ∗ h 1 ∗···∗ h k − 1 ∗ h k + 1 ∗···∗ h K , (3) . operator is defined for any h ∈ R P as and where the ˜ ˜ h p = h − p , ∀ p ∈ P . (4) 11/45
Toward Fast Transform Learning Problem studied ( P 0 ) Shortcoming If h 1 = h 2 = 0, ∇ E 0 ( h ) = 0 but not a global minimum. Another view ∀ ( µ k ) 1 ≤ k ≤ K ∈ R K such that ∏ K k = 1 µ k = 1, we have � � ( µ k h k ) 1 ≤ k ≤ K = E 0 ( h ) , E 0 for any k ∈ { 1 ,..., K } , ∂ E 0 ∂ E 0 = 1 � � ( µ k h k ) 1 ≤ k ≤ K ∂ h k ( h ) . ∂ h k µ k The gradient depends on quantities which are irrelevant regarding the value of the objective function. 12/45
Toward Fast Transform Learning Problem studied New formulation: Problem ( P 1 ) Second formulation argmin λ ≥ 0 , h ∈ D � λα ∗ h 1 ∗···∗ h K − u � 2 ( P 1 ) : 2 , with � S k �� � h k � � h ∈ ( R P ) K | ∀ k ∈ { 1 ,..., K } , � h k � 2 = 1 and supp D = ⊂ rg Reminder : h = ( h k ) 1 ≤ k ≤ K ∈ ( R P ) K . See On the best rank-1 and rank-(R 1, R 2,..., Rn) approximation of higher-order tensors , L. De Lathauwer, B. De Moor, J. Vandewalle, SIAM Journal on Matrix Analysis and Applications 21 (4), 1324-1342, 2000. 13/45
Toward Fast Transform Learning Problem studied Existence of a solution of ( P 1 ) Proposition. [Existence of a solution] � R P × R P × ( P S ) K � For any ( u , α , ( S k ) 1 ≤ k ≤ K ) ∈ , if α ∗ h 1 ∗ ... ∗ h K � = 0 , ∀ h ∈ D , (5) then the problem ( P 1 ) has a minimizer. Proof. Idea : use compacity of D and λ -coercivity of the objective function. 14/45
Toward Fast Transform Learning Problem studied Link between ( P 0 ) and ( P 1 ) Proposition. [ ( P 1 ) is equivalent to ( P 0 ) ] � R P × R P × ( P S ) K � Let ( u , α , ( S k ) 1 ≤ k ≤ K ) ∈ be such that (5) holds. For any ( λ , h ) ∈ R × ( R P ) K , we consider the kernels g = ( g k ) 1 ≤ k ≤ K ∈ ( R P ) K defined by g 1 = λ h 1 and g k = h k , ∀ k ∈ { 2 ,..., K } . (6) The following statements hold: if ( λ , h ) ∈ R × ( R P ) K is a stationary point of ( P 1 ) and λ > 0 then g is a 1 stationary point of ( P 0 ) . if ( λ , h ) ∈ R × ( R P ) K is a global minimizer of ( P 1 ) then g is a global 2 minimizer of ( P 0 ) . 15/45
Toward Fast Transform Learning ALS Algorithm Introduction 1 Problem studied 2 ALS Algorithm 3 Principle of the algorithm Computations Initialization and restart Approximation experiments 4 5 Convergence experiments 16/45
Toward Fast Transform Learning ALS Algorithm Principle of the algorithm Block formulation of ( P 1 ) Problem ( P k ) � argmin λ ≥ 0 , h ∈ R P � λα ∗ h 1 ∗···∗ h k − 1 ∗ h ∗ h k + 1 ∗ ... ∗ h K − u � 2 2 , ( P k ) : � S k � s.t. supp ( h ) ⊂ rg and � h � 2 = 1 where the kernels ( h k ′ p ) p ∈ P are fixed ∀ k ′ � = k . 17/45
Toward Fast Transform Learning ALS Algorithm Principle of the algorithm Algorithm overview Algorithm 1: Overview of the ALS algorithm Input : u : target measurements; α : known coefficients; ( S k ) 1 ≤ k ≤ K : supports of the kernels ( h k ) 1 ≤ k ≤ K . Output : λ and kernels ( h k ) 1 ≤ k ≤ K such that λ h 1 ∗ ... ∗ h K ≈ H . begin Initialize the kernels ( h k ) 1 ≤ k ≤ K ; while not converged do for k = 1 ,..., K do Update λ and h k with a minimizer of ( P k ) . 18/45
Toward Fast Transform Learning ALS Algorithm Computations Matrix formulation of ( P k ) ( P k ) argmin λ ≥ 0 , h ∈ R S � λ C k h − u � 2 ( P k ) : s.t. � h � 2 = 1 2 Alternative: ( P ′ k ) ( P ′ argmin h ∈ R S � C k h − u � 2 k ) : 2 . k ) has a minimizer h ∗ ∈ R S . ( P ′ Computation of a stationary point yields h ∗ = ( C T k C k ) − 1 C T k u (7) 19/45
Toward Fast Transform Learning ALS Algorithm Computations Update rule Find h ∗ solution of ( P ′ k ) Update � h ∗ , if � h ∗ � 2 � = 0 , h k = � h ∗ � 2 λ = � h ∗ � 2 and (8) 1 √ S ✶ { 1 ,..., S } , otherwise, 20/45
Toward Fast Transform Learning ALS Algorithm Computations Matrix C k H k C k h = # P h s S p − S k ( s ) � �� � S . . . H k C T = S # P = S � ...,... � R P complexity O ( S # P ) k u u p p − S k ( s ) ���� . . S × 1 . � �� � # P = S 2 � ...,... � R P complexity O ( S 2 # P ) C T H k H k k C k = p − S k ( s ) p − S k ( s ) � �� � S × S 21/45
Recommend
More recommend