Prototypes and Matrix Relevance Learning in Complex Fourier Space M. Straat, M. Kaden, M. Gay, T. Villmann, A. Lampe, U. Seiffert, M. Biehl, and F. Melchert June 26, 2017
Overview A study of classification of time series. In Fourier space: Vectors in C n . Generalized Matrix Learning Vector Quantization (GMLVQ) on complex-valued data. Evaluation and interpretation of the Fourier-space classifiers. Plane examples 3 2.5 2 1.5 Feature value 1 0.5 0 -0.5 -1 -1.5 -2 0 50 100 150 Feature index
Learning Vector Quantization (LVQ) Dataset of vectors ① m ∈ R N , each carrying class label σ m ∈ { 1 , 2 , ..., C } Training: For each class σ , identify prototype(s) ✇ i ∈ R N in feature space that are typical representatives for that class. Aim: Classify novel vectors ① µ , assigning them to the class of the nearest prototype.
Figure: LVQ with 5 prototypes per class. Initialized with K-means on each class. Black line : Piece-wise linear decision boundary.
d ( ① , ✇ ) = ( ① − ✇ ) T ( ① − ✇ ), sq. Euclidean distance. 1: procedure LVQ for each training epoch do 2: for each labeled vector { ① , σ } do 3: { ✇ ∗ , S ∗ } ← argmin i { d ( ① , ✇ i ) } 4: ✇ ∗ ← ✇ ∗ + η Ψ( S ∗ , σ )( ① − ✇ ∗ ) 5: � � +1 , if S = σ Ψ( S , σ ) = − 1 , otherwise Classification of novel data point ① µ : Closest prototype { ✇ ∗ , S ∗ } ← argmin { d ( ① µ , ✇ i ) } i Classify ① µ in class S ∗ : { ① µ , σ µ = S ∗ }
GMLVQ Learn feature relevance and adapt d accordingly. Adaptive quadratic distance measure: d Ω ( ① , ✇ ) = ( ① − ✇ ) T Ω T Ω ( ① − ✇ ). Update two prototypes upon presentation of { ① , σ } . ✇ + : Closest prototype of the same class as ① . ✇ − : Closest prototype of a different class than ① . Cost one example ① m e m = d Ω [ ✇ + ] − d Ω [ ✇ − ] d Ω [ ✇ + ] + d Ω [ ✇ − ] ∈ [ − 1 , 1] . Learning is minimization of the cost with gradient descent: ✇ ± ← ✇ ± − η w ∇ ✇ ± e m Ω ← Ω − η Ω ∇ Ω e m
Time series Example 2 1.5 1 f ( t ) → f ( i ∆ T ) , i = 0 , 1 , ..., N − 1 Magnitude 0.5 Vectors ① ∈ R N . 0 Temporal order of -0.5 dimensions. -1 -1.5 0 200 400 600 800 1000 Feature index (sample index)
Training in coefficient space Approximate f ( t ) = � n i =1 c i g i ( t ): Using Chebyshev basis. Using Fourier basis: ① ∈ R N → ① f ∈ C n . Prototypes ✇ i ∈ C n and relevances Λ Hermitian. Figure: 5 Chebyshev basis functions Figure: Fourier complex sinusoid F. Melchert, U. Seiffert, and M. Biehl, Polynomial Approximation of Spectral Data in LVQ and Relevance Learning, in Workshop on New Challenges in Neural Computation 2015
Fourier: Time ⇆ Frequency Matrix ❋ ∈ C n × N with rows e − j 2 π kn / N , k = 0 , 1 , 2 , ..., N − 1. Forward (DFT): ① f = ❋① ∈ C n Backward (iDFT): ① = 1 N ❋ H ① f ∈ R N Example Frequency magnitudes 150 2 1.5 1 100 Magnitude Magnitude 0.5 0 50 -0.5 -1 -1.5 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Feature index (sample index) Frequency
GMLVQ complex-valued data Quadratic distance measure d Λ [ ① f , ✇ f ] = ( ① f − ✇ f ) H Ω H Ω ( ① f − ✇ f ) ∈ R ≥ 0 . Cost one example ① m f e m = d Λ [ ✇ + f ] − d Λ [ ✇ − f ] f ] ∈ [ − 1 , 1] . d Λ [ ✇ + f ] + d Λ [ ✇ − Compute gradients w.r.t. ✇ + f , ✇ − and Ω for learning: f f e µ = ∂ e µ ∂ d Λ ∇ ✇ + ∂ d + ∂ ✇ + Λ f
Wirtinger derivatives f ( z ) : C → R . � � � � ∂ z = 1 ∂ ∂ x − i ∂ ∂ ∂ z ∗ = 1 ∂ ∂ x + i ∂ ∂ Operators and 2 ∂ y 2 ∂ y ∂ z = z ∗ and f ( z ) = z · z ∗ , then ∂ f ∂ f ∂ z ∗ = z . Wirtinger gradients � T � T � � ∂ ∂ ∂ ∂ ∂ ∂ ∂ z = , ..., and ∂ z ∗ = , ..., ∂ z ∗ ∂ z ∗ ∂ z 1 ∂ z N 1 N Using the Wirtinger gradient: ∂ ∂ z ∗ ( z H ❆ z ) = ❆ z M. Gay, M. Kaden, M. Biehl, A. Lampe, and T. Villmann, ”Complex variants of GLVQ based on Wirtinger’s calculus”
Learning rules Complex-valued GMLVQ (Wirtinger) f d Λ [ ① f , ✇ f ] = − Ω H Ω ( ① f − ✇ f ) , ∇ ✇ ∗ ∇ Ω ∗ d Λ [ ① f , ✇ f ] = Ω ( ① f − ✇ f )( ① f − ✇ f ) H . Relevance matrix Λ = Ω H Ω is Hermitian. Real-valued GMLVQ ∇ ✇ d Λ [ ① , ✇ ] = − 2 Ω T Ω ( ① − ✇ ) , ∇ Ω d Λ [ ① , ✇ ] = Ω ( ① − ✇ )( ① − ✇ ) T . Relevance matrix Λ = Ω T Ω is symmetric (also Hermitian). After each epoch, normalize Λ such that tr( Λ ) = 1.
The testing scenarios 1 GMLVQ in original time domain on vectors ① ∈ R N . 2 GMLVQ (Wirtinger) in complex Fourier space on vectors ① f ∈ C n with n = [6 , 11 , ..., 51]. 3 GMLVQ in Fourier space on vectors ① f ∈ R 2 n , real and imaginary concatenated. 4 GMLVQ on smoothed time domain vectors ˆ ① ∈ R N . Before training... All dimensions z-score transformed. One prototype per class. Initialization prototype class i : ✇ i ≈ mean( { ( ① , y ) | y == i } ). Λ = cI .
Plane dataset 210 labeled vectors ( x , y ) ∈ R 144 × { 1 , 2 , ..., 7 } 105/105 train/val vectors. Plane examples 3 2.5 2 1.5 Feature value 1 0.5 0 -0.5 -1 -1.5 -2 0 50 100 150 Feature index
Plane - Classification performance Accuracies of the 4 testing scenarios on validation set
Interpreting the classifier Prototypes ✇ i f ∈ C n Matrix Λ f is Hermitian: Λ f = Λ H f 2 prototypes Plane 21-coeff Fourier 80 60 Magnitude 40 20 0 1 2 Prototype Map prototypes to time domain with iDFT: ✇ i = 1 N ❋ H ✇ i f . Relevance matrix to time domain: d [ ① f , ✇ f ] = ( ① − ✇ ) H ❋ H Λ f ❋ ( ① − ✇ ).
Plane - Prototypes and feature relevance Time domain training vs. 21 coefficient Fourier space Prototypes Plane Backtransformed prototypes Plane 3 3 2 2 1 1 Value Value 0 0 -1 -1 -2 -2 0 50 100 150 0 50 100 150 Feature Feature Relevances Plane Relevances Plane (backtransformed) 0.03 0.04 Relevance Relevance 0.02 0.02 0.01 0 0 0 50 100 150 0 50 100 150 Feature Feature
Symbols dataset 1020 feature vectors ( x , y ) ∈ R 398 × { 1 , 2 , ..., 6 } 25/995 train/validation vectors. Symbols examples 2.5 2 1.5 1 Feature value 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 50 100 150 200 250 300 350 400 Feature index
Symbols - Classification performance Accuracies of the 4 testing scenarios on validation set
Mallat dataset 2400 feature vectors ( x , y ) ∈ R 1024 × { 1 , 2 , ..., 8 } 55/2345 train/validation vectors. Mallat examples 2.5 2 1.5 1 Feature value 0.5 0 -0.5 -1 -1.5 0 100 200 300 400 500 600 700 800 900 1000 Feature index
Mallat - Classification performance Accuracies of the 4 testing scenarios on validation set
Mallat - Classification error curves Error development on the training and validation set 0.25 0.25 GMLVQ original space GMLVQ original space GMLVQ Complex Fourier GMLVQ Complex Fourier 0.2 0.2 GMLVQ concatenated Fourier GMLVQ concatenated Fourier train error test error 0.15 0.15 0.1 0.1 0.05 0.05 0 0 50 100 150 200 250 50 100 150 200 250 epoch epoch
Discussion Learning in complex Fourier-coefficient space... can be an effective method for classification of periodic functional data. can provide an efficient low-dimensional representation. has the potential to improve classification accuracy. For future research: How to obtain close to optimal accuracy with the least number of adaptive parameters.
Recommend
More recommend