Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning Workshop 2019. 2. 22 (Fri.) Yung-Kyun Noh Seoul National University → Hanyang University Seoul National University
Nearest Neighbors • Similar data share similar properties (= Labels?) (= Behavior) : class 1 : class 2 2 Seoul National University
, uniformly with increasing N . In the limit, & p 2 ( x ) p 1 ( x ) x 0 x N N Data space For classification as an example: [T. Cover and P. Hart, IEEE TIT , 1967] 3 Seoul National University
Applications of Using Nearest Neighbors • Prediction using k -Nearest Neighbor Information – k -Nearest Neighbor Classification – k -Nearest Neighbor Regression : class 1 : class 2 • Estimation using k -Nearest Neighbor Information [Leonenko, N., Pronzato, L., & Savani, V., 2008] is a distance to the nearest neighbor in class c from . 4 Seoul National University
Similar Formulations • Nadaraya-Watson estimator for kernel classification/regression Kernel weight with respect to the distance bandwidth 5 Seoul National University
Bias Analysis • k -Nearest Neighbor Classification [R. R. Snapp et al. The Annals of Statistics , 1998] [Y.-K. Noh et al. IEEE TPAMI , 2018] …① …② ① : Asymptotic NN Error ② : Residual due to Fin init ite Sampli ling . 6 Seoul National University
Change of Metric [Y.-K. Noh et al. IEEE TPAMI , 2018] p ~ p ~ > z = L x Euclidean metric Optimal metric A = A A = I O p t p d ( ) p d ( ) N N p ( ) p ( ) p p x x y y 1 1 p ( ) p ( ) x x y y 2 2 d d N N 7 Seoul National University
Nearest Neighbor Classification with Metric Obtain from generative models r 2 p 1 ; r 2 p 2 ; p 1 ; p 2 [Y.-K. Noh et al. IEEE TPAMI , 2018] 20% increase 8 Seoul National University
Bandwidth and Nadaraya-Watson Regression 9 Seoul National University
Bias Analysis • k -Nearest Neighbor Classification → Minimizes mean square error (MSE) → Metric independent asymptotic property • Bias µ r ¶ > p ( x ) r y ( x ) + r 2 y ( x ) y ( x ) ¡ y ( x )] = h 2 + o ( h 4 ) E [ b p ( x ) 2 10 Seoul National University
For x & y Jointly Gaussian • Learned metric is not sensitive to the bandwidth [Y.-K. Noh, et al., NeurIPS , 2017] 11 Seoul National University
[Y.-K. Noh, et al., NeurIPS , 2017] 12 Seoul National University
Variance Reduction is Not Critical in High-Dimensions [Y.-K. Noh, et al., NeurIPS , 2017] Proposition Reducing the variance is not important in a high dimensional space once the bias is minimized and the bandwidth selection is followed. 13 Seoul National University
Information-theoretic Measure Estimation is a distance to the nearest neighbor in class c from . Metric invariant Metric dependent ② ① ① = ② 14 Seoul National University
Increase the KL-Divergence of Two Gaussians and its Estimation [Y.-K. Noh, et al., NeCo , 2018] 15 Seoul National University
MAKING GENERAL ESTIMATORS FOR F-DIVERGENCES 16 Seoul National University
Estimation of the General f -Divergences • Shannon Entropy Estimation [D. Lombardi and S. Pant, Phys. Rev. E , 2016] [A. Kraskov, H. Stögbauer, and P. Grassberger, Phys. Rev. E , 2004] , Note that d In this case, 17 Seoul National University
Density Estimator and Entropy Estimator • Loftsgaarden and Quesenberry (1965) • Shannon Entropy Estimator 18 Seoul National University
Historical Remarks of Making Plug-in Estimators [N. Leonenko, L. Pronzato, &V. Savani, Annals of Statistics , 2008] [B. Poczos and J. Schneider, AISTATS , 2011] Shannon entropy Plug-in and correction Rényi and Tsallis entropies Plug-in and correction [Moon, K. & Hero, A., 2014] considers the general f -divergence plug-in estimator 19 Seoul National University
Plug-in Nearest Neighbor f -divergence Estimator • Kullback-Leibler Divergence • Tsallis-alpha Divergence 20 Seoul National University
Plug-in methods do not work for general f -divergences [Cover, T., 1968] Nearest neighbor classification Cover True f -divergence [Cover, T., 1968] Plug-in estimator [Noh, Y.-K. Ph.D. thesis, 2011] 21 Seoul National University
Obtaining the General f -Divergence Estimator Inverse Laplace Transform 22 Seoul National University
arXiv:1805.08342 23 Seoul National University
Summary • Asymptotically, nearest neighbor methods are very nice. ( In terms of Theory!! ) • With finite samples, bias treatment using geometry change can improve the conventional nonparametric methods significantly (in high-dimensional space). • General and systematic way of obtaining f - divergence using nearest neighbor information. 24 Seoul National University
THAN HANK YO YOU Yung-Kyun Noh nohyung@snu.ac.kr 25 Seoul National University
Recommend
More recommend