Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it Department of Department of Preventive Medical Sciences Economics U NIVERSITY OF N APLES ITALY U NIVERSITY OF C ASSINO ITALY Dario Bruzzese, Domenico Vistocco () Compstat 2010 1 / 19
La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 2 / 19
La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 3 / 19
Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut The rep1HighNoise dataset Yeung KY, Medvedovic M, Bumgarner KY: Clustering gene-expression data with repeated measurements. Genome Biology, 2003, 4:R34 n = 200 p = 20 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut Horizontal cut k = 3 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut An alternative cut k = 3 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 5 / 19
Notation Let: Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; � � �� �� � � Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 1 C 1 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 2 C 2 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 3 C 3 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation � � C 1 L ∪ C 1 h R Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R C 1 C 1 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 2 L ∪ C 2 h � � C k L ∪ C k R h the height necessary to merge R C k L and C k R C 2 C 2 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 3 L ∪ C 3 � � C k L ∪ C k h h the height necessary to merge R R C k L and C k R C 3 C 3 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k � � h the height necessary to merge C 1 h R L C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 1 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 1 � � h C k L ∪ C k h the height necessary to merge R R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 1 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R � � C 2 C k L and C k h L R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 2 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R � � C 2 h C k L and C k R R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 2 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k � � C 3 h R L � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 3 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge � � R C 3 h R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 3 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1 Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1 until aggregationLevelsToVisit is empty Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
The algorithm - The outline Initialization i ← 0 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
The algorithm - The outline � � C 1 L ∪ C 1 h R Iteration i ← 1 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
The algorithm - The outline Iteration i ← 1 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters C 1 C 1 L R clusters to compare H 0 : C 1 L ≡ C 1 R �→ reject Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Recommend
More recommend