a fuzzy clustering method using genetic algorithm and
play

A fuzzy clustering method using Genetic Algorithm and Fuzzy - PowerPoint PPT Presentation

A fuzzy clustering method using Genetic Algorithm and Fuzzy Subtractive Clustering Thanh Le, Tom Altman and Katheleen Gardiner University of Colorado Denver July 18, 2012 Overview Introduction Fuzzy clustering using Fuzzy C-Means


  1. A fuzzy clustering method using Genetic Algorithm and Fuzzy Subtractive Clustering Thanh Le, Tom Altman and Katheleen Gardiner University of Colorado Denver July 18, 2012

  2. Overview  Introduction  Fuzzy clustering using Fuzzy C-Means algorithm  Current genetic algorithms for fuzzy clustering  Proposed method: fzGASCE  Genetic algorithm  Fuzzy subtractive clustering  Probability based fitness function  Datasets:  Artificial datasets: Finite mixture model  Real datasets: UCI repository  Experimental results  Discussion

  3. Fuzzy C-Means algorithm- FCM  Objective function n c ∑∑ = − 2 → ≥ m J ( X | U , V ) u x v min, m 1 ki i k = = i 1 k 1 c ∑ = = u 1 , i 1 .. n ki = k 1  Model parameters estimation: 1 1     m - 1 m - 1 c 1 1   ∑   = u     ki 2 2 − − x v x v     = l 1 i k i l n n ∑ ∑ = m m v u x u k ki i ki = = i 1 i 1

  4. FCM algorithm (contd.)  Advantages  Model free  Rapid convergence  Multiple cluster assignment  Shortcomings  Definition of the number of clusters  Fuzzy partition evaluation  Convergence to local optima  Defuzzification

  5. Recent fuzzy clustering Genetic Algorithms (GA)  Chromosome describes a clustering solution  Fitness functions are based on cluster indices  Random mutations  Genes to be replaced  Genes to replace

  6. Recent fuzzy clustering GAs (contd.)  Advantages  Search for the ‘best’ solution in the solution space  Can escape local optima  Cross-over operator  Mutation operator  Can determine the number of clusters using the ‘best’ solution

  7. Recent fuzzy clustering GAs (contd.)  Shortcomings  Problem with cluster indices  Scale between compactness and separartion  Random selection of genes to be replaced  Improper defuzzification

  8. Fuzzy clustering using GA and Subtractive Clustering - fzGASCE  Chromosome describes a clustering solution  Data clustering using FCM  Probability based fitness function  Mutation gene selection using fuzzy Subtractive Clustering  Defuzzification of fuzzy partition using probabilistic model

  9. fzGASCE: the probabilistic model  Bayesian validation method for fuzzy clustering - fzBLE (Le et al. , 2011)  Central limit theorem  Bayesian theory  Possibility to probability transformation { u ki } i= 1..n - possibility distribution of X at v k { p ki } i= 1..n - probability distribution of X at v k ,  Create the probabilistic model at v k using { p ki } i= 1..n

  10. Use of fzGASCE probabilistic model  fzGASCE fitness function fit({ U,V} ) = Prob(X|{ U,V} )  Address the problems with using cluster indices  Outperform cluster indices on artificial and real datasets (Le et al., 2011)  Defuzzification of fuzzy partition Prob(v * |x i ) = max{ Prob(v k | x i )}  Address the problems of maximum membership and spatial information methods (Le et al. 2012)

  11. Application of fuzzy Subtractive Clustering (fzSC) in fzGASCE  fzSC method (Le et al., 2011)  Fuzzy mathematics application  Histogram based density estimation  Data density using fuzzy partition  Application of fzSC in fzGASCE  Order data points based on data densities  The most dense data points are used to replace mutated genes

  12. fzSC – an example on how it works Red-circle : Cluster centers of fuzzy partition Back-circle : The most dense data points found by fzSC fzSC demonstration available online: http://demo.tinyray.com/fzsc0

  13. Datasets  Artificial datasets  Datasets generated using finite mixture model  Non-uniform dataset  Clusters differ in size and density  Real datasets  Iris  Wine  Glass These datasets are from UC Irvine Machine Learning

  14. Performance measures  Correctness ratio N 1 ∑ = − ˆ COR I ( c c ) N = i 1 where, N is the number of trials  Error variance 1 = − 2 ˆ EVAR ( c c ) N  Misclassification N n 1 1 ∑ ∑ = − c l EMIS I ( x x ) i i N n = = t 1 i 1 Compare the cluster label of each data object with its actual class label

  15. Uniform dataset – ASET1 Algorithm COR EVAR EMI S fzGASCE 1.000 0.000 0.000 fzGAE 0.640 0.500 0.000 PBMF 0.510 0.590 0.000 MPC 0.290 0.970 0.000 HPK 0.100 5.010 0.021 AGFCM 0.600 2.800 0.000 0.490 1.450 0.000 XB 0.120 1.100 0.070 FS PC 0.230 1.040 0.000 ACVI 0.200 2.490 0.011 fzGAE is an immature version of fzGASCE, where the fzSC method is not used in the mutation operator

  16. Uniform dataset – ASET2 Algorithm COR EVAR EMI S fzGASCE 1.000 0.000 0.000 fzGAE 0.710 0.380 0.000 PBMF 0.600 0.450 0.000 MPC 0.610 0.860 0.000 HPK 0.120 5.240 0.000 AGFCM 0.650 1.490 0.000 XB 0.640 0.430 0.000 FS 0.520 0.840 0.011 PC 0.620 0.890 0.000 ACVI 0.100 2.100 0.000

  17. Non-uniform dataset – ASET4 Algorithm COR EVAR EMI S fzGASCE 1.000 0.000 0.000 0.900 0.100 0.107 fzGAE PBMF 0.700 0.300 0.107 MPC 0.050 0.960 0.107 HPK 0.000 5.770 - AGFCM 0.000 8.470 - XB 0.040 0.960 0.107 FS 0.020 3.480 0.107 0.050 0.960 0.107 PC ACVI 0.080 0.920 0.107

  18. Iris dataset Algorithm COR EVAR EMI S fzGASCE 1.000 0.000 0.033 fzGAE 0.880 0.120 0.040 PBMF 0.860 0.140 0.040 MPC 0.040 0.970 0.160 HPK 0.000 5.720 - AGFCM 0.000 8.120 - XB 0.050 1.010 0.040 0.390 0.780 0.154 FS PC 0.080 0.920 0.115 ACVI 0.150 0.850 0.040

  19. Wine dataset Algorithm COR EVAR EMI S fzGASCE 1.000 0.000 0.213 fzGAE 0.860 0.140 0.303 PBMF 0.000 2.050 - MPC 0.000 2.810 - HPK 0.000 6.760 - AGFCM 0.000 9.210 - XB 0.270 1.010 0.303 FS 0.000 5.720 - PC 0.110 0.920 0.303 ACVI 0.090 0.910 0.303

  20. Advantages of fzGASCE  Describe the data distribution using probabilistic model  Apply the probabilistic model into fitness function and defuzzification  Use of fzSC method with mutation operator to effectively escape local optima  No parameters to be specified a priori

  21. Future work  Eliminate the oscillation during the convergence process when using fzSC to speed up fzGASCE  Integrate with external distance measures to meet specific requirements of real-world applications.

  22. Thank you! Questions? We acknowledge the supports from  Vietnamese Ministry of Education and Training, the 322 scholarship  program. University of Colorado Denver, USA 

Recommend


More recommend