the nearest neighbor information estimator is adaptively
play

The Nearest Neighbor Information Estimator is Adaptively Near - PowerPoint PPT Presentation

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr eal, Canada The Nearest Neighbor Information Estimator is


  1. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr´ eal, Canada

  2. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx 2 / 6

  3. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection 2 / 6

  4. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference 2 / 6

  5. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology 2 / 6

  6. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology 2 / 6

  7. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · 2 / 6

  8. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · Our Task Given empirical samples X 1 , · · · , X n ∼ f , estimate h ( f ). 2 / 6

  9. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor 3 / 6

  10. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r 3 / 6

  11. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r Idea n h ( f ) = E [ − log f ( X )] ≈ − 1 � log f ( X i ) n i =1 f ( X i ) · vol d ( R i , k ) ≈ k n 3 / 6

  12. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r Idea n h ( f ) = E [ − log f ( X )] ≈ − 1 � log f ( X i ) n i =1 f ( X i ) · vol d ( R i , k ) ≈ k n 3 / 6

  13. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term 4 / 6

  14. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration 4 / 6

  15. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k 4 / 6

  16. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k ◮ Good empirical performance without theoretical guarantee, especially when the density may be close to zero. 4 / 6

  17. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. 5 / 6

  18. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d 5 / 6

  19. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d First theoretical guarantee of Kozachenko–Leonenko estimator without assuming density bounded away from zero. 5 / 6

  20. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d 6 / 6

  21. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax 6 / 6

  22. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s 6 / 6

  23. The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s ◮ Maximal inequality plays a central role in dealing with small densities. 6 / 6

Recommend


More recommend