The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao Berkeley EECS Weihao Gao UIUC ECE Yanjun Han Stanford EE NIPS 2018, Montr´ eal, Canada
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Differential Entropy Estimation Differential entropy of a continuous density on R d : � 1 h ( f ) = R d f ( x ) log f ( x ) dx ◮ machine learning tasks, e.g., classification, clustering, feature selection ◮ causal inference ◮ sociology ◮ computational biology ◮ · · · Our Task Given empirical samples X 1 , · · · , X n ∼ f , estimate h ( f ). 2 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor 3 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r 3 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r Idea n h ( f ) = E [ − log f ( X )] ≈ − 1 � log f ( X i ) n i =1 f ( X i ) · vol d ( R i , k ) ≈ k n 3 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Ideas of Nearest Neighbor Notations: ◮ n : number of samples ◮ d : dimensionality ◮ k : number of nearest neighbors ◮ R i , k : ℓ 2 distance of i -th sample to its k -th nearest neighbor ◮ vol d ( r ): volumn of the d -dimensional ball with radius r Idea n h ( f ) = E [ − log f ( X )] ≈ − 1 � log f ( X i ) n i =1 f ( X i ) · vol d ( R i , k ) ≈ k n 3 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term 4 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration 4 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k 4 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Kozachenko–Leonenko Estimator Definition (Kozachenko–Leonenko Estimator) n � n � n , k = 1 � ˆ h KL log k vol d ( R i , k ) + log( k ) − ψ ( k ) n � �� � i =1 bias correction term ◮ Easy to implement: no numerical integration ◮ Only tuning parameter: k ◮ Good empirical performance without theoretical guarantee, especially when the density may be close to zero. 4 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. 5 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d 5 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Main Result Let H s d be the class of probability densities supported on [0 , 1] d which are H¨ older smooth with parameter s ≥ 0. Theorem (Main Result) For fixed k and s ∈ (0 , 2] , � 1 � 2 � � 2 s s + d log n + n − 1 ˆ � n − h KL 2 . sup E f n , k − h ( f ) f ∈H s d First theoretical guarantee of Kozachenko–Leonenko estimator without assuming density bounded away from zero. 5 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d 6 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax 6 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s 6 / 6
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Matching Lower Bound Theorem (Han–Jiao–Weissman–Wu’17) For any s ≥ 0 , � 1 � 2 � � 2 s + d (log n ) − s +2 d s s + d + n − 1 ˆ � n − 2 . inf sup E f h − h ( f ) ˆ f ∈H s h d Take-home Message ◮ Nearest neighbor estimator is nearly minimax ◮ Nearest neighbor estimator adapts to the unknown smoothness s ◮ Maximal inequality plays a central role in dealing with small densities. 6 / 6
Recommend
More recommend