Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F. Campbell, University of Bath Carl Henrik Ek, University of Bristol Ivan Ustyuzhaninov, University of T¨ ubingen Tom Waterson, Electronic Arts September, 2019
Motivation Data: • Motion capture sequences, e.g. a jump or a golf swing. • Each motion corresponds to a different style or mood. Goal: Generate new motions by interpolating between the captured clips. Pre-processing: The clips need to be temporally aligned.
Motivation Assume we are given some time-series data with inputs x ∈ R N and J output sequences { y j ∈ R N } . We know that there are multiple underlying function that generated this data, say K such functions, f k ( · ), and the observed data was generated by warping the inputs to the true functions using some warping function g j ( x ) such that: y j = f k ( g j ( x )) + noise . (1) Two groups (to be found automatically): Unknown Unknown warps latent functions
Motivation Unknowns: • Number of underlying functions K • Underlying functions f k ( · ) • Warps g j ( · ) for each sequence Data 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100
Motivation Let’s try to find K using K-means clustering: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100
Motivation K-means clustering vs. correct labels: K-means initialisation with 2 clusters K-means initialisation with 3 clusters 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.25 0.25 0.50 0.50 0 20 40 60 80 100 0 20 40 60 80 100 Correct clustering of inputs 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0 20 40 60 80 100
Motivation A PCA scatter plot of the data: PCA initialisation with correct labels 3 2 1 0 1 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5
Alignment model Three constituent parts: • Model of transformations (warps), g j • Model of sequences, f k • Alignment objective
Model of transformations (warps) 1.00 0.75 0.50 0.25 g(x) 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x Observed sequences Example warp • Parametric warps. � 1.00 i ∈ I w i = 1 , w i ≥ 0 ∀ i ∈ I 0.75 0.50 • Nonparametric warps. 0.25 0.00 For example, monotonic GPs 0.25 0.50 0.75 In general, we prefer warps that 1.00 are close to an identity 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Riihim¨ aki & Vehtari. Gaussian processes with monotonicity information (2010) K. et al. Monotonic Gaussian Process Flow (2019)
Model of sequences Option 1: interpolate sequences using linear interpolation or splines. Option 2: fit GPs to the sequences. • principled way to handle observational noise Observed sequences • can impose priors of f k 1.0 1.0 0.5 0.5 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 GP regression
Notation Assume that the observed data was generated as: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) (2) where x are fixed linearly spaced input locations (or evenly sampled time). Then the corresponding aligned sequences are: s j := f k ( x ) (3) The joint conditional likelihood is: �� s j �� � � � k θ j ( X , X ) �� k θ j ( X , G j ) � ∼ N p � G j , X j , θ j 0 , � k θ j ( G j , G j ) + β − 1 y j k θ j ( G j , X ) j (4)
Model of sequences Pseudo-observations S Evenly spaced inputs X Observations Y Warped inputs g(X) Then the goal is to: • Fit GPs to observations and pseudo-observations { [ g ( X ) , X ] , [ Y , S ] } for each sequence • Impose alignment constraint on pseudo-observations { X , S }
Alignment objective We want an alignment objective that: • infers the number of clusters (underlying functions) K • aligns sequences within these clusters We aim to design a clustering or dim. reduction objective that is invariant to the transformation (warps) of the inputs
Pairwise distance alignment objective Minimise the pairwise distance between all sequences (irrespective of the underlying clusters of functions): J J � � || s n ( x ) − s m ( x ) || 2 L = (5) n =1 m = n +1 Warps Aligned functions Complexity: 1.845 Alignment error: 1.735 1.00 0.8 0.75 0.6 0.50 0.4 0.25 0.2 0.00 0.25 0.0 0.50 0.2 0.75 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100
Traditional GP-LVM • Observe high-dimensional data S . • Find low-dim representation Z that captures the structure of S . • Find a mapping f from Z to S . Latent space Z Mapping h Inputs S s j = h ( z j , θ ) + noise , where θ are parameters of h .
Traditional GP-LVM In a GP-LVM, GPs are taken to be independent across the features and the likelihood function is: D D � � N ( s d | 0 , K + γ − 1 I ) p ( S | x ) = p ( s d | x ) = (6) d =1 d =1 Observed data Y in matrix form Aligned data S in matrix form
GP-LVM as alignment objective We impose the alignment objective by learning a low-dimensional representation Z of the pseudo-observations S . L GP-LVM = log p ( S | Z , θ h , θ z , β ) − 1 = N 2 Tr ( K − 1 zz SS T ) 2 log | K zz | (7) � �� � � �� � complexity terms data fitting terms log( p ( Z | θ z )) + + log( p ( θ h )) + const � �� � � �� � prior over latent variables prior over GP mappings As an alignment objective, it is controlled by: 1. prior over the latent variables Z , p ( Z ) ∼ N ( 0 , θ z I ) 2. lengthscale in the GP-LVM mapping (part of θ h ))
Aside: Pairwise distance alignment objective Observations = 0.000 = 0.100 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 = 0.464 = 2.154 = 10.000 2 1 0 1 2 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,
Aside: GP-LVM as alignment objective Observations 0.000 0.100 2 1 0 1 2 0.02 0.02 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 0.25 0.42 0.51 2 0.34 0.81 0.96 0.02 0.01 0.05 0.04 0.10 0.07 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,
Aside: Bayesian Mixture Model as alignment objective Observations 0.000 0.100 2 1 0 1 Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 0.464 2.154 10.000 2 1 0 1 Cluster assignments Cluster assignments Cluster assignments 2 0 1 2 3 0 1 2 3 0 1 2 3 0 2 4 6 0 2 4 6 0 2 4 6 y i , w i ∈ R 8 with γ || w || 2 , i = 1 , 2 , 3 , 4 y i transformed = y i input + w i ,
Full objective for sequence alignment 1. For each of the J sequences we perform standard GP regression on the observed data y j and the pseudo-observations s j by learning the hyperparameters of the GPs and the parameters of the warpings. 2. Impose the alignment objective on the pseudo-observations S The sum of the log-likelihoods is: J J � � L = L GP i + L GP-LVM + log p ( g j ) j =1 j =1 J J � � log p ([ s j , y j ] T | x , g j , θ j , β j ) + L GP-LVM ( Z , ψ h , ψ z , γ ) + = log p ( g j ) j =1 j =1 (8)
Results on ECG data Input data: Alignment with GP-LVM objective: Warps Aligned functions Complexity: 6.447 Alignment error: 0.411 Manifold locations 1.00 0.8 0.6 0.75 0.50 0.6 0.4 0.25 0.4 0.2 0.00 0.2 0.0 0.25 0.0 0.50 0.2 0.2 0.75 0.4 0.4 1.00 0 20 40 60 80 100 0 20 40 60 80 100 0.6 0.4 0.2 0.0 0.2 0.4
Competing objectives and joint model X Z g f h β γ S Y J
Competing objectives and joint model X Z g f h β γ S Y J Likelihood p ( S | H , F X ) as an equal mixture (where S j and S n refer to rows and columns of S ): p ( S | H , F X ) = 1 � � N ( S n | H n , γ − 1 I J ) + N ( S j | F X j , β − 1 I N ) j 2 n j
Multi-task learning and Matrix distributions Given data Y ∈ R J × N : 1. each sequence (row) has a GP prior and there’s a free-form matrix C that models the covariances between the sequences 1 . 2. learn sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples using GP-LVM 2 : p ( Y | R , C − 1 ) = N (vec( Y ) | 0 N × D , C ⊗ R + σ 2 I N × D ) (9) 1 Bonilla et al. Multi-task Gaussian Process Prediction (2008) 2 Stegle et al. Efficient inference in matrix-variate Gaussian models with iid observation noise (2011)
More generally... These types of constructions are useful when: 1. The data has a hierarchical structure with additional constraints: ǫ j ∼ N (0 , β − y j = f k ( g j ( x )) + ǫ j , j 1) 2. We want to perform dim. reduction or clustering that is invariant to a specific transformation
Uncertainty in alignment model
Uncertainty in alignment model While the alignment model is probabilistic, so far we only considered point estimates and ignored the uncertainties associated with warpings and group assignments. Uncertainty in the alignment model contains: 1. Observed sequences are often noisy 2. Warping uncertainty 3. Assignment of sequences to groups is ambiguous
Recommend
More recommend