probabilistic dimensionality reduction
play

Probabilistic Dimensionality Reduction Neil D. Lawrence University - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London 14th April 2016 Outline Probabilistic Linear Dimensionality Reduction Non Linear Probabilistic Dimensionality Reduction Examples Conclusions


  1. Linear Dimensionality Reduction Linear Latent Variable Model ◮ Represent data, Y , with a lower dimensional set of latent variables X . ◮ Assume a linear relationship of the form y i , : = Wx i , : + ǫ i , : , where � � 0 , σ 2 I ǫ i , : ∼ N .

  2. Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 Y n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

  3. Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 ◮ Standard Latent Y variable approach: n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

  4. Linear Latent Variable Model Probabilistic PCA X W ◮ Define linear-Gaussian relationship between latent variables and σ 2 Y data. ◮ Standard Latent variable approach: n ◮ Define Gaussian prior � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N over latent space , X . i = 1 n � � � N x i , : | 0 , I p ( X ) = i = 1

  5. Linear Latent Variable Model X W Probabilistic PCA ◮ Define linear-Gaussian relationship between σ 2 Y latent variables and data. ◮ Standard Latent n variable approach: � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N ◮ Define Gaussian prior i = 1 over latent space , X . n � ◮ Integrate out latent � � p ( X ) = N x i , : | 0 , I variables . i = 1 n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

  6. Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : ,

  7. Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � ,

  8. Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � , 0 , WW ⊤ + σ 2 I � � Wx i , : + ǫ i , : ∼ N

  9. Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) W σ 2 Y n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

  10. Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1

  11. Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const.

  12. Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q ,

  13. Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q , � 1 � W = U q LR ⊤ , Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

  14. Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 Y n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

  15. Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 ◮ Novel Latent variable Y approach: n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

  16. Linear Latent Variable Model III Dual Probabilistic PCA W X ◮ Define linear-Gaussian relationship between latent variables and σ 2 Y data. ◮ Novel Latent variable approach: n ◮ Define Gaussian prior � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N over parameters , W . i = 1 p � � � p ( W ) = N w i , : | 0 , I i = 1

  17. Linear Latent Variable Model III W X Dual Probabilistic PCA ◮ Define linear-Gaussian relationship between σ 2 Y latent variables and data. ◮ Novel Latent variable n approach: � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N ◮ Define Gaussian prior i = 1 over parameters , W . p � ◮ Integrate out � � N w i , : | 0 , I p ( W ) = parameters . i = 1 p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

  18. Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j ,

  19. Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j , Xw : , j ∼ N � 0 , XX ⊤ � ,

  20. Computation of the Marginal Likelihood � � 0 , σ 2 I w : , j ∼ N ( 0 , I ) , ǫ i , : ∼ N y : , j = Xw : , j + ǫ : , j , Xw : , j ∼ N � 0 , XX ⊤ � , 0 , XX ⊤ + σ 2 I � � Xw : , j + ǫ : , j ∼ N

  21. Linear Latent Variable Model IV Dual Probabilistic PCA Max. Likelihood Soln (Lawrence, 2004, 2005) X σ 2 Y p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

  22. Linear Latent Variable Model IV Dual PPCA Max. Likelihood Soln (Lawrence, 2004, 2005) p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1

  23. Linear Latent Variable Model IV PPCA Max. Likelihood Soln (Tipping and Bishop, 1999) p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const.

  24. Linear Latent Variable Model IV PPCA Max. Likelihood Soln p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q ,

  25. Linear Latent Variable Model IV PPCA Max. Likelihood Soln p � � � K = XX ⊤ + σ 2 I p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q , � 1 � X = U ′ q LR ⊤ , Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

  26. Linear Latent Variable Model IV Dual PPCA Max. Likelihood Soln (Lawrence, 2004, 2005) p � K = XX ⊤ + σ 2 I � � p ( Y | X ) = N y : , j | 0 , K , j = 1 log p ( Y | X ) = − p 2 log | K | − 1 � K − 1 YY ⊤ � 2tr + const. q are first q principal eigenvectors of p − 1 YY ⊤ and the If U ′ corresponding eigenvalues are Λ q , � 1 X = U ′ q LR ⊤ , � Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

  27. Linear Latent Variable Model IV PPCA Max. Likelihood Soln (Tipping and Bishop, 1999) n N � y i , : | 0 , C � , � C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q , � 1 W = U q LR ⊤ , � Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

  28. Equivalence of Formulations The Eigenvalue Problems are equivalent ◮ Solution for Probabilistic PCA (solves for the mapping) Y ⊤ YU q = U q Λ q W = U q LR ⊤ ◮ Solution for Dual Probabilistic PCA (solves for the latent positions) YY ⊤ U ′ q = U ′ X = U ′ q LR ⊤ q Λ q ◮ Equivalence is from − 1 U q = Y ⊤ U ′ q Λ 2 q

  29. Gaussian Processes: Extremely Short Overview 6 4 2 0 -2 -4 -6 0 2 4 6 8 10

  30. Gaussian Processes: Extremely Short Overview 6 4 2 0 -2 -4 -6 0 2 4 6 8 10

  31. Gaussian Processes: Extremely Short Overview 6 4 2 0 -2 -4 -6 0 2 4 6 8 10

  32. Gaussian Processes: Extremely Short Overview 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 0 2 4 6 8 10 0 2 4 6 8 10

  33. Non-Linear Latent Variable Model W Dual Probabilistic PCA X ◮ Define linear-Gaussian relationship between σ 2 latent variables and Y data. ◮ Novel Latent variable n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N approach: i = 1 ◮ Define Gaussian prior p over parameteters , W . � � � N w i , : | 0 , I p ( W ) = ◮ Integrate out i = 1 parameters . p � � y : , j | 0 , XX ⊤ + σ 2 I � p ( Y | X ) = N j = 1

  34. Non-Linear Latent Variable Model Dual Probabilistic PCA ◮ Inspection of the marginal likelihood W shows ... X σ 2 Y p � y : , j | 0 , XX ⊤ + σ 2 I � � p ( Y | X ) = N j = 1

  35. Non-Linear Latent Variable Model Dual Probabilistic PCA ◮ Inspection of the W marginal likelihood X shows ... ◮ The covariance matrix is a covariance σ 2 Y function. p � � � p ( Y | X ) = N y : , j | 0 , K j = 1 K = XX ⊤ + σ 2 I

  36. Non-Linear Latent Variable Model Dual Probabilistic PCA W ◮ Inspection of the X marginal likelihood shows ... σ 2 ◮ The covariance matrix Y is a covariance function. p ◮ We recognise it as the � � � p ( Y | X ) = N y : , j | 0 , K ‘linear kernel’. j = 1 K = XX ⊤ + σ 2 I This is a product of Gaussian processes with linear kernels.

  37. Non-Linear Latent Variable Model Dual Probabilistic PCA W ◮ Inspection of the X marginal likelihood shows ... ◮ The covariance matrix σ 2 Y is a covariance function. p ◮ We recognise it as the � � � p ( Y | X ) = N y : , j | 0 , K ‘linear kernel’. j = 1 ◮ We call this the K = ? Gaussian Process Latent Variable model Replace linear kernel with non-linear (GP-LVM). kernel for non-linear model.

  38. Non-linear Latent Variable Models Exponentiated Quadratic (EQ) Covariance � � ◮ The EQ covariance has the form k i , j = k x i , : , x j , : , where 2  � �  � x i , : − x j , : � �   � � � 2   k x i , : , x j , : = α exp  −  .     2 ℓ 2     ◮ No longer possible to optimise wrt X via an eigenvalue problem. ◮ Instead find gradients with respect to X , α, ℓ and σ 2 and optimise using conjugate gradients.

  39. Outline Probabilistic Linear Dimensionality Reduction Non Linear Probabilistic Dimensionality Reduction Examples Conclusions

  40. Applications Style Based Inverse Kinematics ◮ Facilitating animation through modeling human motion (Grochow et al., 2004) Tracking ◮ Tracking using human motion models (Urtasun et al., 2005, 2006) Assisted Animation ◮ Generalizing drawings for animation (Baxter and Anjyo, 2006) Shape Models ◮ Inferring shape (e.g. pose from silhouette). (Ek et al., 2008b,a; Priacuriu and Reid, 2011a,b)

  41. Example: Latent Doodle Space (Baxter and Anjyo, 2006)

  42. Example: Latent Doodle Space (Baxter and Anjyo, 2006) Generalization with much less Data than Dimensions ◮ Powerful uncertainly handling of GPs leads to surprising properties. ◮ Non-linear models can be used where there are fewer data points than dimensions without overfitting .

  43. Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix

  44. Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix L n i � n ( M i − M 0 )( M i − M 0 ) ⊤ S w = i = 1 where X ( i ) = [ x ( i ) 1 , · · · , x ( i ) n i ] are the n i training points of class i , M i is the mean of the elements of class i , and M 0 is the mean of all the training points of all classes.

  45. Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix L n i � n ( M i − M 0 )( M i − M 0 ) ⊤ S w = i = 1 L  n i  n i 1 � � ( x ( i ) k − M i )( x ( i )  k − M i ) ⊤  S b =       n n i     i = 1 k = 1 where X ( i ) = [ x ( i ) 1 , · · · , x ( i ) n i ] are the n i training points of class i , M i is the mean of the elements of class i , and M 0 is the

  46. Prior for Supervised Learning (Urtasun and Darrell, 2007) ◮ We introduce a prior that is based on the Fisher criteria    − 1   � � S − 1   p ( X ) ∝ exp tr w S b  ,   σ 2     d with S b the between class matrix and S w the within class matrix 1 0.5 0.6 0 0.4 − 1 0 0.2 − 2 0 − 3 0.2 0.5 − 4 0.4 − 5 0.6 − 4 − 2 0 2 − 1 5 − 1 − 0 5 0 0 5 1 − 0.8 − 0.6 − 0.4 − 0.2 0 0.2 0.4

  47. GaussianFace (Lu and Tang, 2014) ◮ First system to surpass human performance on cropped Learning Faces in Wild Data. http://tinyurl.com/nkt9a38 ◮ Lots of feature engineering, followed by a Discriminative GP-LVM. 1 0.98 0.96 0.94 true positive rate 0.92 0.9 0.88 High dimensional LBP (95.17%) [Chen et al. 2013] Fisher Vector Faces (93.03%) [Simonyan et al. 2013] 0.86 TL Joint Bayesian (96.33%) [Cao et al. 2013] Human, cropped (97.53%) [Kumar et al. 2009] 0.84 DeepFace-ensemble (97.35%) [Taigman et al. 2014] 0.82 ConvNet-RBM (92.52%) [Sun et al. 2013] Figure 5: The two rows present examples of matched and GaussianFace-FE + GaussianFace-BC (98.52%) 0.8 mismatched pairs respectively from LFW that were incorrectly 0 0.05 0.1 0.15 0.2 classified by the GaussianFace model. false positive rate Conclusion and Future Work Figure 4: The ROC curve on LFW. Our method achieves the best performance, beating human-level performance. This paper presents a principled Multi-Task Learning ap-

  48. Continuous Character Control (Levine et al., 2012) ◮ Graph diffusion prior for enforcing connectivity between motions. � log K d log p ( X ) = w c ij i , j with the graph diffusion kernel K d obtain from K d H = − T − 1 / 2 LT − 1 / 2 ij = exp( β H ) with the graph Laplacian, and T is a diagonal matrix with T ii = � j w ( x i , x j ),  � k w ( x i , x k ) if i = j   L ij =  − w ( x i , x j )  otherwise.   and w ( x i , x j ) = || x i − x j || − p measures similarity.

  49. Character Control: Results

  50. GPLVM for Character Animation ◮ Learn a GPLVM from a small mocap sequence ◮ Pose synthesis by solving an optimization problem arg min x , y − log p ( y | x ) such that C ( y ) = 0 ◮ These handle constraints may come from a user in an interactive session, or from a mocap system. ◮ Smooth the latent space by adding noise in order to reduce the number of local minima. ◮ Optimization in an annealed fashion over different anneal version of the latent space.

  51. Application: Replay same motion (Grochow et al., 2004)

  52. Application: Keyframing joint trajectories (Grochow et al., 2004)

  53. Application: Deal with missing data in mocap (Grochow et al., 2004)

  54. Application: Style Interpolation (Grochow et al., 2004)

  55. Shape Priors in Level Set Segmentation ◮ Represent contours with elliptic Fourier descriptors ◮ Learn a GPLVM on the parameters of those descriptors ◮ We can now generate close contours from the latent space ◮ Segmentation is done by non-linear minimization of an image-driven energy which is a function of the latent space

  56. GPLVM on Contours [ V. Prisacariu and I. Reid, ICCV 2011]

  57. Segmentation Results [ V. Prisacariu and I. Reid, ICCV 2011]

Recommend


More recommend