Introduction PDF Estimation Based Over-sampling Experiments Conclusions Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming Gao a , Xia Hong a , Sheng Chen b , c , Chris J. Harris b a School of Systems Engineering, University of Reading, Reading RG6 6AY, UK ming.gao@pgr.reading.ac.uk x.hong@reading.ac.uk b Electronics and Computer Science, Faculty of Physical and Applied Sciences, University of Southampton, Southampton SO17 1BJ, UK sqc@ecs.soton.ac.uk cjh@ecs.soton.ac.uk c Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia IEEE World Congress on Computational Intelligence Brisbane Australia, June 10-15, 2012
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Background Highly imbalanced two-class classification problems widely occur in life-threatening or safety critical applications Techniques for imbalanced problems can be divided into: Imbalanced learning algorithms: 1 Internally modify existing algorithms, without artificially altering original imbalanced data Resampling methods: 2 Externally operate on original imbalanced data set to re-balance data for conventional classifier Resampling methods can be categorised into: Under-sampling : which tends to be ideal when imbalance 1 degree is not very severe Over-sampling : which becomes necessary if imbalance 2 degree is high
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Our Approach What would be ideal over-sampling: Draw synthetic data according to same probability distribution which produces observed positive-class data samples Our probability density function estimation based over-sampling Construct Parzen window or kernel density estimation from 1 observed positive-class data samples Generate synthetic data samples according to estimated 2 positive-class probability density function Apply our tunable radial basis function classifier based on 3 leave-one-out misclassification rate to rebalanced data Ready-made PW estimator is low complexity in this application, as minority-class by nature is small size Particle swarm optimisation aided OFR for constructing RBF classifier based on LOO error rate is a state-of-the-art
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Problem Statement Imbalanced two-class data set D N = { x k , y k } N k = 1 D N − = { x i , y i = + 1 } N + N − [ [ D N = D N + { x l , y l = − 1 } i = 1 l = 1 y k ∈ {± 1 } : class label for feature vector x k ∈ R m 1 x k are i.i.d. drawn from unknown underlying PDF 2 N = N + + N − , and N + ≪ N − 3 Kernel density estimator ˆ p ( x ) for p ( x ) is constructed based on positive-class samples D N + = { x i , y i = + 1 } N + i = 1 N + p ( x ) = ( det S ) − 1 / 2 “ ” X S − 1 / 2 ( x − x i ) ˆ Φ σ N + i = 1 Kernel : 1 σ − m ( 2 π ) m / 2 e − 1 2 σ − 2 ( x − x i ) T S − 1 ( x − x i ) “ ” S − 1 / 2 ( x − x i ) Φ σ = S : covariance matrix of positive class 2 σ : smoothing parameter 3
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Kernel Parameter Estimate Unbiased estimate of positive-class covariance matrix is N + 1 X x ) T ( x i − ¯ x )( x i − ¯ S = N + − 1 i = 1 � N + 1 with mean vector of positive class ¯ x = i = 1 x i N + Smoothing parameter by grid search to minimise score function “ ” M ( σ ) = N − 2 X X Φ ∗ S − 1 / 2 ( x j − x i ) + 2 N − 1 + Φ σ ( 0 ) σ + i j with “ ” ≈ Φ ( 2 ) “ ” “ ” S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) Φ ∗ − 2 Φ σ σ σ √ 2 σ ) − m √ = ( “ ” ( 2 π ) m / 2 e − 1 2 σ ) − 2 ( x j − x i ) T S − 1 ( x j − x i ) Φ ( 2 ) 2 ( S − 1 / 2 ( x j − x i ) σ M ( σ ) is based on mean integrated square error measure
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Draw Synthetic Samples Over-sampling positive class by drawing synthetic data samples according to PDF estimate ˆ p ( x ) Procedure for generating a synthetic sample 1) Based on discrete uniform distribution, randomly draw a data sample, x o , from positive-class data set D N + 2) Generate a synthetic data sample, x n , using Gaussian distribution with mean x o and covariance matrix σ 2 S x n = x o + σ R · randn () R : upper triangular matrix that is Cholesky decomposition of S randn () : pseudorandom vector drawn from zero-mean normal distribution with covariance matrix I m Repeat Procedure r · N + times, given oversampling rate r
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (PDF estimate) (a) Imbalanced data set: x denoting positive-class instance and ◦ negative-class instance N + = 10 positive-class samples: mean [ 2 2 ] T and covariance I 2 N − = 100 negative-class samples: mean [ 0 0 ] T and covariance I 2 (b) Constructed PDF kernel of each positive-class instance Optimal smoothing parameter σ = 1 . 25 and covariance matrix S ≈ I 2 (c) Estimated density distribution of positive class (a) (b) (c)
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (over-sampling) Over-sampling rate : r = 100 % , ideal decision boundary : x + y − 2 = 0 (a) Proposed PDF estimate based over-sampling: over-sampled positive-class data set expands along direction of ideal decision boundary (b) Synthetic minority over-sampling technique (SMOTE): over-sampled data set is confined in region defined by original positive-class instances (a) (b)
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Tunable RBF Classifier Construct radial basis function classifier from oversampled training data, still denoted as D N = { x k , y k } N k = 1 M y ( M ) y ( M ) y ( M ) � ˆ � � = g T ˜ � ˆ � = w i g i x k M ( k ) w M and = sgn k k k i = 1 y ( M ) M : number of tunable kernels, ˜ : estimated class label 1 k Gaussian kernel adopted: g i ( x ) = e − ( x − µ i ) T Σ − 1 ( x − µ i ) 2 i µ i ∈ R m : i th RBF kernel center vector 3 Σ i = diag { σ 2 i , 1 , σ 2 i , 2 , · · · , σ 2 i , m } : i th covariance matrix 4 Regression model on training data D N y = G M w M + ε ( M ) � T with error ε ( M ) ε ( M ) = ε ( M ) · · · ε ( M ) y ( M ) � = y k − ˆ 1 1 N k k � � G M = g 1 g 2 · · · g M : N × M regression matrix 2 � T : classifier’s weight vector � w M = w 1 · · · w M 3
Introduction PDF Estimation Based Over-sampling Experiments Conclusions Orthogonal Decomposition Orthogonal decomposition of regression matrix G M = P M A M 1 a 1 , 2 · · · a 1 , M . ... . 0 1 . A M = . ... ... . . a M − 1 , M 0 · · · 0 1 � � with orthogonal columns : p T P M = p 1 · · · p M i p j = 0 for i � = j Equivalent regression model y = G M w M + ε ( M ) ⇔ y = P M θ M + ε ( M ) � T satisfies θ M = A M w M � θ M = θ 1 · · · θ M � � After n th stage of orthogonal forward selection, G n = g 1 · · · g n � � is built with corresponding P n = p 1 · · · p n and A n k th row of P n is denoted as p T ( k ) = � � p 1 ( k ) · · · p n ( k )
Recommend
More recommend