Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1 , Sheng Chen 2 , Chris J. Harris 2 1 School of Systems Engineering University of Reading, Reading RG6 6AY, UK E-mail: x.hong@reading.ac.uk 2 School of Electronics and Computer Science University of Southampton, Southampton SO17 1BJ, UK E-mails: {sqc,cjh}@ecs.soton.ac.uk International Joint Conference on Neural Networks 2010
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Regularisation Methods Two-norm of weight vector Naturally combined with quadratic main cost function, and computationally efficient implementation Only drive many weights to small near-zero values One-norm of weight vector Can drive many weights to zero, and hence should achieve sparser results than two-norm based method Harder to minimise and higher complexity implementation Zero-norm of weight vector Ultimate model sparsity and generalisation performance Intractable in implementation, and even with approximation, very difficult to minimise and impose very high complexity Two-norm and one-norm based regularisations have been combined with OLS algorithm, with the former approach providing highly efficient sparse kernel modelling
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Our Contributions We incorporate an effective approximate zero-norm regularisation into sparse kernel density estimation Approximate zero-norm naturally merges into underlying constrained nonnegative quadratic programming Various SVM algorithms can readily be applied to obtain SKD estimate efficiently Proposed sparse kernel density estimator: use D -optimality OLS subset selection to select a small number of significant kernels, in terms of kernel eigenvalues then solve final SKD estimate from associate subset constrained nonnegative quadratic programming
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Kernel Density Estimation Give finite data set D N = { x k } N k = 1 , drawn from unknown density p ( x ) , where x k ∈ R m Infer p ( x ) based on D N using kernel density estimate N � ˆ p ( x ; β N , ρ ) = β k K ρ ( x , x k ) k = 1 β k ≥ 0 , 1 ≤ k ≤ N , β T s.t. N 1 N = 1 Here β N = [ β 1 β 2 · · · β N ] T : kernel weight vector, 1 N : the vector of ones with dimension N , and K ρ ( • , • ) : chosen kernel function with kernel width ρ Unsupervised density estimation ⇒ “ supervised ” regression using Parzen window estimate as “desired response”
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Regression Formulation For x k ∈ D N , denote ˆ y k = ˆ p ( x k ; β N , ρ ) , y k as Parzen window estimate at x k , and ε k = y k − ˆ y k ⇒ regression formulation y k + ε k = φ T y k = ˆ N ( k ) β N + ε k or over D N y = Φ N β N + ε Associated constrained nonnegative quadratic programming � � 2 β T 1 N B N β N − v T min N β N β N s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N where B N = Φ T N Φ N is the design matrix and v N = Φ T N y This is not using kernel density estimate to fit Parzen window estimate !
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Zero-Norm Constraint Given α > 0, an approximation to zero norm � β N � 0 is N � 1 − e − α | β i | � � � β N � 0 ≈ i = 1 Combining this zero-norm constraint with constrained NNQP � N 1 − e − α | β i | �� 1 2 β T N B N β N − v T � � min N β N + λ β N i = 1 s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N with λ > 0 a small “regularisation” parameter With 2nd order Taylor series expansion for e − α | β i | e − α | β i | ≈ 1 − α | β i | + α 2 β 2 i ⇒ 2 N N N | β i | − α 2 � 1 − e − α | β i | � � � � β 2 ≈ α i 2 i = 1 i = 1 i = 1
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Constrained NNQP Hence, “new” constrained NNQP � � 2 β T 1 N A N β N − v T min N β N β N s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N A N = B N − δ I N and δ = λα 2 predetermined small parameter Remark : Under convexity constraint on β N , minimisation of approximate zero norm ⇔ maximisation of two norm β T N I N β N Design matrix B N should positive definite , and δ bounded by smallest eigenvalue of B N so that A N also positive definite Common for B N of large data set to be ill-conditioned Approach most effective when it is applied following some model subset selection preprocessing
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions D -Optimality Design Least squares estimate ˆ β N = B − 1 N Φ T N y is unbiased and � ˆ ∝ B − 1 � covariance matrix of estimate Cov β N N Estimation accurate depends on condition number C = max { σ i , 1 ≤ i ≤ N } min { σ i , 1 ≤ i ≤ N } where σ i , 1 ≤ i ≤ N , are eigenvalues of B N D -optimality design maximises determinant of design matrix Selected subset model Φ N s maximises � � Φ T � � det N s Φ N s = det B N s Prevent oversized ill-posed model and high estimate variances “ Unsupervised ” D -optimality design particularly suitable for determining structure of kernel density estimate
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions OFR Aided Algorithm Orthogonal forward regression selects Φ N s of N s significant kernels based on D -optimality criterion Complexity of this preprocessing no more than O ( N 2 ) This preprocessing results in subset constrained NNQP � � 2 β T 1 N s A N s β N s − v T min N s β N s β Ns s.t. β T N s 1 N s = 1 and β i ≥ 0 , 1 ≤ i ≤ N s with v N s = Φ T N s y , A N s = B N s − δ I N s , B N s = Φ T N s Φ N s , δ < w T N s w N s Various SVM algorithms can be used to solve this problem As N s is very small and A N s is well-conditioned, we use simple multiplicative nonnegative quadratic programming algorithm Complexity of which is negligible, in comparison with O ( N 2 ) of D -optimality based OFR preprocessing
Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4
Recommend
More recommend