non local manifold parzen windows
play

Non-Local Manifold Parzen Windows Yoshua Bengio, Hugo Larochelle and - PowerPoint PPT Presentation

Non-Local Manifold Parzen Windows Non-Local Manifold Parzen Windows Yoshua Bengio, Hugo Larochelle and Pascal Vincent D epartement dinformatique et de recherche op erationnelle Universit e de Montr eal July 15 th , 2005 Non-Local


  1. Non-Local Manifold Parzen Windows Non-Local Manifold Parzen Windows Yoshua Bengio, Hugo Larochelle and Pascal Vincent D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal July 15 th , 2005

  2. Non-Local Manifold Parzen Windows Plan Introduction 1 Local vs Non-Local learning 2 Experiments and Results 3 Conclusion 4

  3. Non-Local Manifold Parzen Windows Introduction Plan Introduction 1 Local vs Non-Local learning 2 Experiments and Results 3 Conclusion 4

  4. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold

  5. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold How : using the Manifold Parzen Windows model learning the model’s parameters with a neural network

  6. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold How : using the Manifold Parzen Windows model learning the model’s parameters with a neural network Why :

  7. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold How : using the Manifold Parzen Windows model learning the model’s parameters with a neural network Why : ... because my supervisor wants me to work on that

  8. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold How : using the Manifold Parzen Windows model learning the model’s parameters with a neural network Why : ... because my supervisor wants me to work on that ... to publish papers

  9. Non-Local Manifold Parzen Windows Introduction About this talk... What : density estimation of high dimensional continuous data, lying on a lower dimensional manifold How : using the Manifold Parzen Windows model learning the model’s parameters with a neural network Why : ... because my supervisor wants me to work on that ... to publish papers but mostly to use and make a point about non-local learning

  10. Non-Local Manifold Parzen Windows Introduction Manifold Parzen Windows (Vincent and Bengio, 2003) Extension of the Parzen Windows model (mixture of spherical Gaussians, centered on the training points) The Gaussians are parametrized so that most of the density is situated on the underlying manifold F IG .: Parzen Windows vs Manifold Parzen Windows

  11. Non-Local Manifold Parzen Windows Introduction Manifold Parzen Windows (Vincent and Bengio, 2003) Density estimator : n p ( x ) = 1 � N ( x ; µ ( x t ) , Σ( x t )) n t = 1

  12. Non-Local Manifold Parzen Windows Introduction Manifold Parzen Windows (Vincent and Bengio, 2003) Density estimator : n p ( x ) = 1 � N ( x ; µ ( x t ) , Σ( x t )) n t = 1 Parametrization : d Σ( x t ) = σ 2 � noise ( x t ) I + s j ( x t ) v j ( x t ) v j ( x t ) ′ j = 1

  13. Non-Local Manifold Parzen Windows Introduction Manifold Parzen Windows (Vincent and Bengio, 2003) Density estimator : n p ( x ) = 1 � N ( x ; µ ( x t ) , Σ( x t )) n t = 1 Parametrization : d Σ( x t ) = σ 2 � noise ( x t ) I + s j ( x t ) v j ( x t ) v j ( x t ) ′ j = 1 Training : µ ( x t ) = x t is fixed for each x t , use principal eigenvalues ( s j ( x t ) ) and eigenvectors ( v j ( x t ) ) of k nearest neighbors covariance matrix σ noise ( x t ) is an hyper-parameter

  14. Non-Local Manifold Parzen Windows Introduction Non-Local Manifold Parzen Windows In Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are stored in memory for every training point x t

  15. Non-Local Manifold Parzen Windows Introduction Non-Local Manifold Parzen Windows In Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are stored in memory for every training point x t In Non-Local Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are functions of x t , modeled by a neural network

  16. Non-Local Manifold Parzen Windows Introduction Non-Local Manifold Parzen Windows In Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are stored in memory for every training point x t In Non-Local Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are functions of x t , modeled by a neural network The neural network can capture global information about the underlying manifold, and share it among all training points

  17. Non-Local Manifold Parzen Windows Introduction Non-Local Manifold Parzen Windows In Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are stored in memory for every training point x t In Non-Local Manifold Parzen Windows, µ ( x t ) , σ noise ( x t ) , s j ( x t ) and v j ( x t ) are functions of x t , modeled by a neural network The neural network can capture global information about the underlying manifold, and share it among all training points The neural network is trained using stochastic gradient descent on the average negative log-likelihood of the training set

  18. Non-Local Manifold Parzen Windows Local vs Non-Local learning Plan Introduction 1 Local vs Non-Local learning 2 Experiments and Results 3 Conclusion 4

  19. Non-Local Manifold Parzen Windows Local vs Non-Local learning Informal definitions What is local learning : A learning algorithm is said to be local if it uses mostly nearby points of x to make a prediction at x Examples : k nearest neighbors, SVM, most popular dimensionality reduction algorithms, Manifold Parzen Windows

  20. Non-Local Manifold Parzen Windows Local vs Non-Local learning Informal definitions What is local learning : A learning algorithm is said to be local if it uses mostly nearby points of x to make a prediction at x Examples : k nearest neighbors, SVM, most popular dimensionality reduction algorithms, Manifold Parzen Windows What is non-local learning : A learning algorithm is said to be non-local if it is able to use information from training points far from x to generalize at x

  21. Non-Local Manifold Parzen Windows Local vs Non-Local learning Toy example We are trying to learn a density using this training set : 0.6 0.4 0.2 0 −0.2 −0.4 −0.4 −0.2 0 0.2 0.4 0.6 F IG .: Samples from a spiral distribution

  22. Non-Local Manifold Parzen Windows Local vs Non-Local learning Toy example We are trying to learn a density using this training set : 0.6 0.4 0.2 0 −0.2 −0.4 −0.4 −0.2 0 0.2 0.4 0.6 F IG .: Samples from a spiral distribution Let’s train a Manifold Parzen Windows model, and look at the first principal direction of variance of the training point gaussians

  23. Non-Local Manifold Parzen Windows Local vs Non-Local learning Toy example Because the training of Manifold Parzen Windows uses only local information, some of the principal directions of variance of badly estimated. F IG .: On the left : training points. On the right : first principal direction of variance.

  24. Non-Local Manifold Parzen Windows Local vs Non-Local learning Real life examples A lot of large scale, real life problems are likely to befenit from non-local learning : Vision : the pixels at a certain position from very different images share the same properties with respect to certain transformations (e.g. translation, rotation) ; Natural Language Processing : words that are very different in some aspect still usually share a lot of properties (e.g. two nouns, even if they have very different meanings, will still obey to the same grammatical rules)

  25. Non-Local Manifold Parzen Windows Experiments and Results Plan Introduction 1 Local vs Non-Local learning 2 Experiments and Results 3 Conclusion 4

  26. Non-Local Manifold Parzen Windows Experiments and Results Toy 2D data experiments Sinusoidal distribution Spiral distribution 1.5 0.6 1 0.4 0.5 0.2 0 0 −0.5 −0.2 −0.4 −1 −1.5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.4 −0.2 0 0.2 0.4 0.6 1.5 0.6 1 0.4 0.5 0.2 0 0 −0.5 −0.2 −1 −0.4 −1.5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.4 −0.2 0 0.2 0.4 0.6

  27. Non-Local Manifold Parzen Windows Experiments and Results Toy 2D data experiments Results : Algorithm sinus spiral Non-Local MP 1.144 -1.346 Manifold Parzen 1.345 -0.914 Gauss Mix Full 1.567 -0.857 Parzen Windows 1.841 -0.487 T AB .: Average out-of-sample negative log-likelihood on two toy problems, for Non-Local Manifold Parzen, a Gaussian mixture with full covariance, Manifold Parzen and Parzen Windows. The non-local algorithm dominates all the others.

  28. Non-Local Manifold Parzen Windows Experiments and Results Toy 2D data experiments F IG .: From left to right, top to bottom, densities learned by Non-Local Manifold Parzen, a Gaussian mixture with full covariance, Manifold Parzen and Parzen Windows.

Recommend


More recommend