clustering
play

Clustering DWML, 2007 1/27 Densitiy Based Clustering DBSCAN Idea: - PowerPoint PPT Presentation

Clustering DWML, 2007 1/27 Densitiy Based Clustering DBSCAN Idea: identify contiguous regions of high density. DWML, 2007 2/27 Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters , k DWML, 2007 3/27 Densitiy


  1. Clustering DWML, 2007 1/27

  2. Densitiy Based Clustering DBSCAN Idea: identify contiguous regions of high density. DWML, 2007 2/27

  3. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k DWML, 2007 3/27

  4. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ DWML, 2007 3/27

  5. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ DWML, 2007 3/27

  6. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ 3.: Label as border points : points within distance ǫ of a core point DWML, 2007 3/27

  7. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ 3.: Label as border points : points within distance ǫ of a core point DWML, 2007 3/27

  8. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ 3.: Label as border points : points within distance ǫ of a core point 4.: Label as isolated points : all remaining points DWML, 2007 3/27

  9. Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters ǫ , k 2.: Label as core points : points with at least k other points within distance ǫ 3.: Label as border points : points within distance ǫ of a core point 4.: Label as isolated points : all remaining points DWML, 2007 3/27

  10. Densitiy Based Clustering Step 2: Define Connectivity DWML, 2007 4/27

  11. Densitiy Based Clustering Step 2: Define Connectivity 1. Two core points are directly connected if they are within ǫ distance of each other. 2. Each border point is directly connected to one randomly chosen core point within distance ǫ . DWML, 2007 4/27

  12. Densitiy Based Clustering Step 2: Define Connectivity 1. Two core points are directly connected if they are within ǫ distance of each other. 2. Each border point is directly connected to one randomly chosen core point within distance ǫ . 3. Each connected component of the directly connected relation (with at least one core point) is a cluster. DWML, 2007 4/27

  13. Densitiy Based Clustering Setting k and ǫ For fixed k there exist heuristic methods for choosing ǫ by considering the distribution in the data of the distance to the k th nearest neighbor. Pros and Cons + Can detect clusters of highly irregular shape + Robust with respect to outliers - Difficulties with clusters of varying density - Parameters k, ǫ must be suitably chosen DWML, 2007 5/27

  14. EM Clustering Probabilistic Model for Clustering Assumption: • Data a 1 , . . . , a N is generated by a mixture of k probability distributions P 1 , . . . , P k , i.e. k k X X P ( a ) = λ i P i ( a ) ( λ i = 1) i =1 i =1 • Cluster label of instance = (index of) distribution from which instance was drawn • The P i are not (exactly) known DWML, 2007 6/27

  15. EM Clustering Clustering principle Try to find the most likely explanation of the data, i.e. • determine (parameters of) P 1 , . . . , P k and λ 1 , . . . , λ k , such that • likelihood function N Y P ( a 1 , . . . , a N | P 1 , . . . , P k , λ 1 , . . . , λ k ) = P ( a j ) j =1 is maximized. • instance a is assigned to cluster j = max k i =1 P i ( a ) λ i . DWML, 2007 7/27

  16. EM Clustering Mixture of Gaussians Mixture of three Gaussian distributions with weights λ = 0 . 2 , 0 . 3 , 0 . 5 . „ « 2( x − u ) T Σ − 1 ( x − µ ) 1 − 1 P i ( x | µ , Σ) = (2 π ) N/ 2 | Σ | 1 / 2 exp DWML, 2007 8/27

  17. EM Clustering Mixture Model → Data Equi-potential lines and centers of mixture components DWML, 2007 9/27

  18. EM Clustering Mixture Model → Data Equi-potential lines and centers of mixture components Sample from mixture DWML, 2007 9/27

  19. EM Clustering Mixture Model → Data Equi-potential lines and centers of mixture components Sample from mixture Data we see DWML, 2007 9/27

  20. EM Clustering Data → Clustering DWML, 2007 10/27

  21. EM Clustering Data → Clustering Fit a mixture of three Gaussians to the data DWML, 2007 10/27

  22. EM Clustering Data → Clustering Fit a mixture of three Gaussians to the data DWML, 2007 10/27

  23. EM Clustering Data → Clustering Fit a mixture of three Gaussians to the data DWML, 2007 10/27

  24. EM Clustering Data → Clustering Fit a mixture of three Gaussians to the data Assign instances to their most probable mixture components DWML, 2007 10/27

  25. EM Clustering Gaussian Mixture Models Each mixture component is a Gaussian distribution. A Gaussian distribution is determined by • a mean vector (“center”) • a covarianc matrix Usually: all components are assumed to have the same covariance matrix. Then to fit mixture to data: need to find weights and mean vectors of mixture components. If covariance matrix is a diagonal matrix with constant entries on the diagonal, then fitting the Gaussian mixture model is equivalent to minimizing the within cluster point scatter, i.e. the k -means algorithm effectively fits such a Gaussian mixture model. DWML, 2007 11/27

  26. EM Clustering Naive Bayes Mixture Model (for discrete attributes): each mixture component is a distribution in which the attributes are independent: C A 3 A 4 A 5 A 6 A 7 A 1 A 2 Model determined by parameters: • λ 1 , . . . , λ k (prior probabilities of the class variable) • P ( A j = a | C = c j ) ( a ∈ States ( A j ) , c j ∈ States ( C ) ) Fitting the model: finding parameters that maximize probability of observed instances. DWML, 2007 12/27

  27. EM Clustering Clustering as fitting Incomplete Data Clustering data as incomplete labeled data: SL SW PL PW Cluster 5.1 3.5 1.4 0.2 ? 4.9 3.0 1.4 0.2 ? 6.3 2.9 6.0 2.1 ? 6.3 2.5 4.9 1.5 ? . . . . . . . . . . . . . . . SubAllCap TrustSend InvRet . . . B’zambia’ Cluster y n n . . . n ? n n n . . . n ? n y n . . . n ? n n n . . . n ? . . . . . . . . . . . . . . . . . . DWML, 2007 13/27

  28. EM Clustering Given: • incomplete data with unobserved Cluster variable • a mixture model for the joint distribution of mixture component (=cluster variable) and attributes. Model specifies number of states of the cluster variable. Want: • the (parameters of) the mixture distribution that best fits the data • (as a side effect): the index of the most likely mixture component for each instance Can use the EM algorithm for parameter estimation from incomplete data! When applied to Gaussian mixture model, EM proceeds in a similar way as k -means. DWML, 2007 14/27

  29. Clustering: Evaluation Scoring a Clustering Goal of a clustering algorithm is to find a clustering that maximizes a given score function. These score functions are often highly domain- and problem-specific. The k -means algorithm tries to minimize the within cluster point scatter k X X d ( s, s ′ ) W ( S 1 , . . . , S k ) := i =1 s,s ′ ∈ S i (but is not guaranteed to produce a global minimum). In clustering there is no gold standard (unlike in classification, where a classifier with 100% accuracy will be optimal according to every evaluation criterion)! DWML, 2007 15/27

  30. Clustering: Evaluation Axioms for Clustering Try to specify on an abstract level properties that a clustering algorithm should have. E.g.: For any cluster S i , s, s ′ ∈ S i , and s ′′ �∈ S i : d ( s, s ′ ) < d ( s, s ′′ ) Intuitive in many cases, but not fulfilled e.g. for “correct” clustering of two concentric circles. Kleinberg[2002] shows that there does not exist a clustering method that simultaneously satisfies three seemingly intuitive axioms. DWML, 2007 16/27

  31. Association Rules DWML, 2007 17/27

  32. Association rules Market basket data A database: Transaction Items bought 1 Beer,Soap,Milk,Butter 2 Beer,Chips,Butter 3 Milk,Spaghetti,Butter,Tomatos . . . . . . The database consists of list of itemsets , i.e. subsets of a set I of possible items. An alternative representation could be a (sparse!) 0/1-matrix: Transaction Aalborg Aquavit . . . Beer . . . Chips . . . Milk . . . ZZtop CD 1 0 . . . 1 . . . 0 . . . 1 . . . 0 2 0 . . . 1 . . . 1 . . . 0 . . . 0 3 0 . . . 0 . . . 0 . . . 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DWML, 2007 18/27

  33. Association rules Rule structure Example: { Beer, Chips } ⇒ { Pizza } . In general, an association rule is a pattern of the form α : { I α, 1 , . . . , I α,j } ⇒ i |{z} | {z } Consequent Antecedent where I is the set of items, { I α, 1 , . . . , I α,j } ⊆ I and i ∈ I . DWML, 2007 19/27

Recommend


More recommend