hierarchical clustering on principal components
play

Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 - PowerPoint PPT Presentation

Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 height 30 20 10 10 cluster 1 cluster 3 Kiev 5 cluster 2 Moscow Krakow Budapest Rome Athens 0 Helsinki Minsk Sarajevo Sofia Madrid Prague Oslo -5 Copenhagen Berlin Brussels Paris


  1. Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 height 30 20 10 10 cluster 1 cluster 3 Kiev 5 cluster 2 Moscow Krakow Budapest Rome Athens 0 Helsinki Minsk Sarajevo Sofia Madrid Prague Oslo -5 Copenhagen Berlin Brussels Paris Lisbon Reykjavik Stockholm Amsterdam -10 London Dublin -15 0 -20 -10 0 10 20 30 LE RAY Guillaume MOLTO Quentin 1 Students of AGROCAMPUS OUEST majored in applied statistics

  2. Context • R: A free, opensource software for statistics (1875 packages). • FactoMineR: a R package, developped in Agrocampus- Ouest, dedicated to factorial analysis. • The aim is to create a complementary tool to this package, dedicated to clustering , especially after a factorial analysis . • Wide range of choices and uses, results, and graphical representations. 2 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  3. Clustering and factorial analysis • Factorial analysis and hierarchical clustering are very complementary tools to explore data. • Removing the last factors of a factorial analysis remove noise and makes the clustering robuster. Analyses factorielles simples et multiples 4 éme édition, Escofier,Pagès 2008 3 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  4. Program structure Factorial analysis Factorial analysis Hierarchical Hierarchical PCA, MCA, MFA… clustering Clustering Ward, Euclidean Cutting the tree Cutting the tree partition Consolidation Consolidation Description of K-means clusters and Description of factor maps clusters and factor maps 4

  5. Statistic methods (1) Factorial analysis • Hierarchical clustering: Hierarchical clustering – Function agnes – Euclidean distance – Ward criterion=d ² (i,j)x(mi.mj)/(mi+mj) Cutting the tree • Suggested level to cut the tree: Consolidation – Intra-cluster inertia 0 2 4 6 8 10 – Partition comparison: Q=(I n+1 - I n )/I n+1 Inertia – Max = nb of individuals/2 Description of clusters and factor maps 1 2 3 4 5 6 7 8 9 10 Nb of clusters 5 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  6. Statistic methods (2) Factorial analysis Consolidation Hierarchical clustering • Non optimal partition • K means with the Cutting the tree cluster centers Consolidation Description of clusters and factor maps 6 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  7. Statistic methods (2) Factorial analysis Consolidation Hierarchical clustering • Non optimal partition • K means with the Cutting the tree cluster centers Consolidation Description of clusters and factor maps 7 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  8. Statistic methods (3) Factorial analysis Clusters description • Description by individuals: Hierarchical clustering – Use real individuals to caracterise clusters. Cutting the tree • Description by variables: – Give list of typical variable of clusters. Consolidation • Description by axes: Description of – Like in factorial analysis. clusters and factor maps 8 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  9. Dataset presentation 9 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  10. 1.0 June Dimension 2 (15.4%) July Factorial analysis 0.5 Factorial Analysis May August September 0.0 April October November -0.5 March February December Hierarchical -1.0 January clustering -1.0 -0.5 0.0 0.5 1.0 Dimension 1 (82.9%) 3 Kiev Moscow Cutting the tree 2 Dim 2 (15.4%) Budapest Minsk Athens Prague Krakow Madrid 1 Sofia Oslo Helsinki Rome Sarajevo 0 Stockholm Copenhagen Berlin Consolidation Paris -1 Amsterdam Brussels -2 London Lisbon Description of -3 Dublin clusters and Reykjavik factor maps -4 -5 0 5 10 Dim 1 (82.9%)

  11. 0 20 40 60 80 Hierarchical Clustering Factorial analysis Click to cut the tree inertia gain 70 Hierarchical clustering 60 50 Cutting the tree 40 suggested level of cutting. 30 Option: Consolidation • Sort the individuals as 20 on the first component. 10 Description of clusters and Copenhagen Helsinki Reykjavik Amsterdam Stockholm Sarajevo Budapest Moscow Krakow Prague London Madrid Athens 0 Minsk Sofia Berlin Dublin Brussels Rome Lisbon Kiev Paris Oslo factor maps 11 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  12. Hierarchical Clustering Factorial analysis Colored rectangles are 70 drawn around the clusters. We keep the Hierarchical clustering same color for each 60 cluster in the next graphs (function rect ). 50 Cutting the tree 40 30 Options: Consolidation 20 • cut automatically the tree at the 10 suggested level, Description of • Cut at level with a Copenhagen Helsinki Amsterdam Stockholm Reykjavik Sarajevo Budapest clusters and Moscow Krakow Madrid 0 Prague London Athens Minsk Dublin Brussels Kiev Sofia Berlin Rome Lisbon Oslo Paris choosen number of factor maps clusters. 12 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  13. Factor map and clusters Factorial analysis cluster 1 10 Moscow Kiev cluster 2 cluster 3 Budapest Sofia Madrid Rome Krakow Athens 5 Minsk Hierarchical Prague clustering Oslo Sarajevo Berlin Dim 2 (15.4%) Helsinki Copenhagen Stockholm Paris 0 Amsterdam Brussels Lisbon -5 London Cutting the tree -10 Dublin Reykjavik -15 Consolidation -20 -10 10 20 30 0 Dim 1 (82.9%) Options: Description of • Draw other axes, clusters and factor maps • Remove the names, the centers. 13 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  14. Factor map, clusters, and tree Factorial analysis cluster 1 cluster 2 cluster 3 Options: 70 • Draw only a part of the tree, Hierarchical clustering • Draw other axes, • Remove the names 60 • Change the height 50 Cutting the tree 40 height 30 Consolidation 20 Description of 10 10 Moscow Kiev 5 Minsk Krakow Prague Sofia Budapest Athens clusters and Elsinki Stockholm Berlin Sarajevo Rome 0 Oslo Paris Madrid -5 Copenhagen London Brussels factor maps Amsterdam Lisbon -10 Reykjavik Dublin -15 0 -20 -10 0 10 20 30 Dim 1 (82.9%) 14 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  15. Cluster description (1) factorial analysis By individuals Option: the number of individuals for each cluster (here 2) Hierarchical clustering 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Budapest Athens Cutting the tree Sofia 5 Minsk Madrid Krakow Prague Rome Sarajevo Helsinki Berlin 0 Stockholm Dim 2 (15.4%) Oslo Copenhagen Consolidation Paris Brussels Lisbon Amsterdam -5 London -10 Dublin Description of Reykjavik clusters and factor maps -15 -20 -10 10 20 30 0 15 Dim 1 (82.9%) Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC" Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  16. Cluster description (1) factorial analysis By individuals Option: the number of individuals for each cluster (here 2) Hierarchical clustering 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Budapest Athens Cutting the tree Sofia 5 Minsk Madrid Krakow Rome Prague Sarajevo Helsinki Berlin 0 Stockholm Dim 2 (15.4%) Oslo Copenhagen Consolidation Paris Lisbon Brussels Amsterdam -5 London -10 Dublin Description of Reykjavik clusters and factor maps -15 -20 -10 10 20 30 0 16 Dim 1 (82.9%) Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC" Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  17. Cluster description (2) factorial analysis By individuals Option: the number of individuals for each cluster (here 2) Hierarchical clustering 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Budapest Athens Cutting the tree 5 Minsk Sofia Madrid Krakow Prague Rome Sarajevo Helsinki Berlin 0 Stockholm Dim 2 (15.4%) Oslo Copenhagen Consolidation Paris Brussels Lisbon Amsterdam -5 London -10 Dublin Description of Reykjavik clusters and factor maps -15 -20 -10 10 20 30 0 17 Dim 1 (82.9%) Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC" Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  18. Cluster description (3) By variables factorial analysis This is the result of a catdes, it describes the Hierarchical clustering different clusters by the variables (the mean in the category, the v.test…) Cutting the tree Consolidation Option : the p.value (here 0.05). Description of clusters and factor maps 18 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  19. Cluster description (3) By axes factorial analysis Hierarchical clustering Cutting the tree Consolidation This is the result of a catdes, it describes the different clusters by Description of the axes (the mean in the category, the v.test…) clusters and factor maps Option : the p.value (here 0.05). 19 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  20. Conclusion This function was presented with a PCA, but it also acepts: – MCA and MFA results, – directly a quantitative dataset (non- scaled PCA), – a continuous variables to divide into modalities. A normal distribution divided in 3 clusters 20

Recommend


More recommend