Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and Svetlana Lazebnik Beckman Institute and University of Illinois
Dimensionality Estimation from Samples Problem: Estimate the intrinsic dimensionality d of a manifold M embedded in D -dimensional space ( d < D ) given n i.i.d. samples from M . D = 3, d = 2
Previous Work Key idea: for data uniformly distributed on a d -dimensional smooth compact submanifold of , the probability of a small ball R D I of radius around any point on the manifold is . Θ ( ε d ) ε When only finitely many samples are available, these probabilities are to be estimated, e.g., from nearest-neighbor distances. Bennett (1969), Grassberger and Procaccia (1983), Camastra and Vinciarelli (2002), Brand (2003), Kegl (2003), Costa and Hero (2004), Levina and Bickel (2005) Shortcomings: negative bias (esp. in high extrinsic dimensions), behavior in the presence of noise poorly understood.
Our Approach: High-Resolution VQ Key idea: when the data lying on a d -dimensional submanifold of are optimally vector-quantized with a large number k of R D I C · k − 1 /d codevectors, the quantizer error scales approximately as Advantages: • Can use simple and efficient techniques for empirical VQ design • Can ensure statistical consistency by using an independent test sequence • Effects of additive noise can be simply analyzed and understood
The Basics of Vector Quantization A D -dimensional k -point quantizer maps a vector to R D Q k x ∈ I one of k codevectors ; log k is the quantizer R D , 1 ≤ i ≤ k y i ∈ I rate (bits/vector) Average distortion of a VQ : Q k δ r ( Q k | µ ) = E µ [ � X − Q k ( X ) � r ] ≡ E µ [ ρ r ( X, Q k ( X ))] r ∈ [1 , ∞ ) Average error: e r ( Q k | µ ) = [ δ r ( Q k | µ )] 1 /r Optimality: e ∗ r ( k | µ ) = Q k ∈ Q k e r ( Q k | µ ) inf Q k - the set of all D -dimensional k -point VQ’s
Quantization Dimension M is a smooth compact d -dim. manifold embedded in R D I is a regular probability distribution on M : Pr(ball of rad. ε ) = Θ ( ε d ) µ High-rate approximation (HRA): VQ cells can be approximated r ( k | µ ) = Θ ( k − 1 /d ) by balls, and the optimal VQ error satisfies e ∗ log k Quantization dimension of of order r : d r ( µ ) = − lim µ log e ∗ r ( k | µ ) k →∞ (Zador, 1982; Graf & Luschgy, 2000) exists for all and in the limit as and is d r ( µ ) r ∈ [1 , ∞ ) r → ∞ equal to the intrinsic dimension d of the manifold M . VQ literature: assume d known and study asymptotics of e ∗ r ( k | µ ) Our work: observe empirical VQ error in the high-rate regime and estimate d
Estimating the Quantization Dimension X n = ( X 1 , X 2 , . . . , X n ) Given a training sequence of i.i.d. samples Z m = ( Z 1 , . . . , Z m ) from M and an independent test sequence : 1. For each k in a range of codebook sizes for which the HRA is valid: - Training: use to learn a k -point VQ to minimize ˆ X n Q k n 1 ρ r ( X i , ˆ � Q k ( X i )) n i =1 - Testing: run on and approximate by ˆ Q k e ∗ r ( k | µ ) Z m � 1 /r m � 1 ρ r ( Z i , ˆ � e r ( k ) = ˆ Q k ( Z i )) m i =1 2. Plot vs. and estimate d from the slope of the − log ˆ e r ( k ) log k (linear) plot over the chosen range of k
Statistical Consistency n 1 Minimizing the training error is necessary to ρ r ( X i , ˆ � Q k ( X i )) n i =1 approximate the optimal quantizer for µ However, the training error is an optimistically biased estimate of (the empirical VQ overfits the training sequence) e ∗ r ( k | µ ) Therefore, in order to obtain a statistically consistent estimate of , we need to measure VQ error on an independent test e ∗ r ( k | µ ) sequence: � 1 /r Q k | µ ) by the law of m � 1 ρ r ( Z i , ˆ ≈ e r ( ˆ � e r ( k ) = ˆ Q k ( Z i )) large numbers m i =1 n →∞ e r ( ˆ Statistical consistency follows since a.s. Q k | µ ) → e ∗ r ( k | µ ) −
Results on Synthetic Data, r = 2
The Limit and Packing Numbers r = ∞ M is compact, so exists, and e ∞ ( Q k | µ ) = lim r →∞ e r ( Q k | µ ) e ∞ ( Q k | µ ) = max x ∈ M � x − Q k ( x ) � (the worst-case quantization error of X by , independent of ) Q k µ The optimum is the smallest radius of the most e ∗ ∞ ( k | µ ) economical covering of M by k or fewer balls covering numbers worst-case VQ error log k log N M ( ε ) d ∞ = − lim d cap = − lim log e ∗ ∞ ( k | µ ) log ε k →∞ ε → 0 The limit of our scheme is equivalent to Kegl’s method r → ∞ based on covering/packing numbers (NIPS 2003)
Choice of Distortion Exponent For finite , empirical VQ design is hard: optimal codevectors r � = 2 for a particular VQ partition are not given by the centroids of the partition regions. This makes r=2 a preferred choice. The limiting case is attractive because of its robustness r = ∞ against variations in the sampling density, but this is offset by increased sensitivity to noise.
Effect of Noise Additive isotropic Gaussian noise. : a point on the manifold; X ∼ µ : independent from X ; noisy samples Y = X + W. W ∼ N (0 , σ 2 I ) For large n, the estimation error for is bounded by e ∗ r ( k | µ ) � 1 /r � Γ (( r + D ) / 2) √ + o (1) 2 σ Γ ( D/ 2) √ For r=2 , the bound becomes . D σ Bound Actual error
Results on Real Data Handwritten digits: MNIST data set, http://yann.lecun.com/exdb/mnist Faces: http://www.cs.toronto.edu/~roweis/data.html, courtesy of B. Frey and S. Roweis
Summary • The use of an independent test set helps to avoid negative bias. • Limiting case ( r = ∞ ) is equivalent to previous method based on packing numbers. • Our method can be seamlessly integrated with a VQ-based technique for dimensionality reduction (Raginsky, ISIT 2005). • Application: find a clustering of the data that follows the local neighborhood structure of the manifold (i.e., clusters are locally d -dimensional).
Recommend
More recommend