manifold adaptive dimension estimation
play

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , - PowerPoint PPT Presentation

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) , Jean-Yves Audibert (2) (1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France High-Dimensional


  1. Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvári (1) , Jean-Yves Audibert (2) (1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France

  2. High-Dimensional Data Everywhere • Vision • Sensor Fusion • Feature Expansion • Kernel • ...

  3. Curse of Dimensionality 0 10 ! 1 10 ! 2 10 Mean Squared Error ! 3 10 ! 4 10 D = 1 D = 5 D = 100 ! 5 10 ! 6 10 ! 7 10 0 2 4 6 8 10 10 10 10 10 10 10 Number of Samples

  4. Practical Implications • Thou shall reduce the dimension of the t ! i a data before working with it! W • Thou shall not add features unnecessarily! • Thou shall not accept projects with high- dimensional data! • ... !

  5. Regularities of Data • Smoothness 0.5 • Sparsity 0 1.5 • Low noise at boundary 1 ! 0.5 1.5 0.5 1 0 0.5 0 ! 0.5 ! 0.5 ✓ Lower dimensional submanifold ! 1 ! 1 ! 1.5 ! 1.5 • LLE, IsoMap, Laplacian Eigenmap, Hessian Eigenmap, ... • Semi-supervised Learning, Reinforcement Learning, ...

  6. Goal • Manifold-adaptive machine learning methods • Convergence rate independent of the dimension of the input space

  7. Many open questions! Here: dimension estimation (:

  8. Why? • Needed in various learning methods • Not known a priori

  9. New? • Many existing methods [Pettis et al. (1979) , Kegl (2002), Costa & Hero (2004), Levina & Bickel (2005), Hein & Audibert (2005)] • No rigorous analysis • Asymptotic result [Levina & Bickel (2005)]

  10. Our Contribution • New algorithm • K-NN • Manifold-adaptive convergence rate

  11. General Idea P ( X i ∈ B ( x, r )) = η ( x, r ) r d

  12. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  13. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  14. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  15. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r )

  16. P ( X i ∈ B ( x, r )) = η ( x, r ) r d ln ( P ( X i ∈ B ( x, r ))) = ln( η ( x, r )) + d ln( r ) r k ( x )) ln( k/n ) ≈ ln( η 0 ) + d ln(ˆ r ⌈ k/ 2 ⌉ ( x )) ln( k/ (2 n )) ≈ ln( η 0 ) + d ln(ˆ ln 2 ˆ d ( x ) = r k ( x ) / ˆ r ⌈ k/ 2 ⌉ ( x )) ln(ˆ

  17. Finite Sample Convergence Rate ln 2 ˆ d ( X i ) = r ( k ) ( X i ) / ˆ r ( ⌈ k/ 2 ⌉ ) ( X i )) ln(ˆ Theorem : Under some regularity assumptions on η , provided that n k > Ω (2 d ), with probability at least 1 − δ , � 1 � �� k �� � ln(4 / δ ) d | ˆ d ( X i ) − d | ≤ O d + . n k

  18. Issues ln 2 ˆ d ( X i ) = r ( k ) ( X i ) / ˆ r ( ⌈ k/ 2 ⌉ ) ( X i )) ln(ˆ High variance of ˆ d ( X i ) Ine ffi cient use of data r ≪ 1 = ⇒ k ≪ n

  19. Aggregation � � n 1 • Averaging ˆ ˆ � d avg = d ( X i ) n i =1 • Voting n ˆ I { [ ˆ � d ( X i )] = d ′ } d vote = arg max d ′ i =1 Theorem: c ′ n � � − ( cd k )2 , ˆ d vote � = d e ≤ P c ′′ n � � − ( Dcd k )2 . ˆ d avg � = d e ≤ P

  20. Experiments

  21. Varying the Manifold Dimension S4 S8 1 0.$ Mean Absolute Dimension Estimation Error 0 10 0 ! 0.$ ! 1 1 0.$ 1 0.$ 0 0 ! 0.$ ! 0.$ ! 1 ! 1 ! 1 10 1 2 3 4 10 10 10 10 Number of Samples

  22. Varying Embedding Space Dimension X (D = 3) 0 10 X’ (D = 6) X’’ (D = 12) Mean Absolute Dimension Estimation Errors 0"# 0 1"# 1 ! 0"# 1"# 0"# 1 ! 1 0 0"# 10 0 ! 0"# ! 0"# ! 1 ! 1 ! 1"# ! 1"# 10 100 1000 10000 20000 Number of Samples

  23. Other Datasets Data set n=50 n=100 n=500 n=1000 n=5000 S 1 98 (99) 100 (100) 100 (100) 100 (100) 100 (100) S 3 75 (19) 95 (20) 100 (15) 100 (19) 100 (62) S 5 33 (5) 50 (10) 100 (9) 98 (2) 100 (0) S 7 18 (2) 17 (3) 57 (1) 54 (1) 100 (0) Sinusoid 92 (98) 100 (100) 100 (100) 100 (100) 100 (100) 10-M¨ obius 69 (47) 13 (74) 100 (98) 100 (99) 100 (100) Swiss roll 62 (71) 49 (91) 88 (96) 100 (100 100 (100)

  24. Conclusions and Future Work • New algorithm • Competitive results • Manifold-adaptive convergence rate • Other ML methods? • K-NN regression can! • Penalized least squares in the works • Dimension Reduction?

  25. Questions?

  26. Curse of Dimensionality High-Dimensional Data Increase the complexity of the function space Higher variance with the same number of samples More samples for the same precision

  27. Lower Bound Assume that m n is a regression function that estimate random variable Y based on X and D n = { ( X 1 , Y 1 ) , ..., ( X n , Y n ) } , and m ( X ) = E [ Y | X ]. What is the best possible performance of m n in L 2 sense, i.e. E { � m n ( X ) − m ( X ) � 2 } ? For the class of D ( p,C ) of ( X, Y ) distributions, when X ∈ R D , we have the the following behavior: � � 2 p E { � m n ( X ) − m ( X ) � 2 } > O n − 2 p + D

  28. Two sources of error: • Approximation Error : assuming fixed η ( x, r ) • Estimation Error : estimating P ( X ∈ B ( x, r )) with the empirical estimate k/n . Both of them can be controlled by changing the size of neighborhood r (which is related to k/n ).

  29. Effect of k and n S4 − Averaging S4 − Voting 4 4 10 10 Number of Samples (n) Number of Samples (n) 3 3 10 10 2 2 10 10 1 10 1 10 1 2 3 10 10 10 1 2 3 10 10 10 K K S8 − Averaging S8 − Voting 4 4 10 10 Number of Samples (n) Number of Samples (n) 3 3 10 10 2 2 10 10 1 1 10 10 1 2 3 1 2 3 10 10 10 10 10 10 K K

  30. Experiments Noise Effect 4 10 ! Mobius embedded in R3 10 ! Mobius embedded in R12 3.5 S4 embedded in R5 S4 embedded in R20 3 Mean Absolute Estimation Error 2.5 2 1.5 1 0.5 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Noise Level (standard deviation)

  31. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  32. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  33. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  34. Effect of Noise 2 1.5 1 0.5 0 ! 0.5 ! 1 ! 1.5 ! 2 ! 2 ! 1.5 ! 1 ! 0.5 0 0.5 1 1.5 2

  35. Exponential Rate Averaging Aggregation 0 10 S4 Probability of Error ! 1 10 ! 2 10 1 2 3 10 10 10 Number of Samples

Recommend


More recommend