A Novel LTM-based Method for Multi-partition Clustering Tengfei Liu 1 Nevin L. Zhang 1 Kin Man Poon 1 Hua Liu 1 Yi Wang 2 1 The Hong Kong University of Science and Technology { liutf, lzhang, lkmpoon, aprillh } @cse.ust.hk 2 National University of Singapore wangy@comp.nus.edu.sg September 13, 2012 September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 1 / 29
Outline What is multi-partition clustering? 1 What are latent tree models? 2 Introduction to latent tree models Results on real-world data Bridged Islands Algorithm Experiment Results 3 Conclusion 4 September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 2 / 29
What is multi-partition clustering? Outline What is multi-partition clustering? 1 What are latent tree models? 2 Introduction to latent tree models Results on real-world data Bridged Islands Algorithm Experiment Results 3 Conclusion 4 September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 3 / 29
What is multi-partition clustering? What is multi-partition clustering: an example How to cluster these? September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 4 / 29
What is multi-partition clustering? What is multi-partition clustering: an example How to cluster these? By Object September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 4 / 29
What is multi-partition clustering? What is multi-partition clustering: an example How to cluster these? By Style September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 4 / 29
What is multi-partition clustering? What is multi-partition clustering: an example How to cluster these? Multi-partition clustering September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 4 / 29
What is multi-partition clustering? What is multi-partition clustering: more examples Other examples: Student Population: course grades extracurriculum activities Movie Reviews: sentiment (positive or negative) genre (comedy, action, war, etc.) Social Survey: demographic information views on social issues. September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 5 / 29
What are latent tree models? Outline What is multi-partition clustering? 1 What are latent tree models? 2 Introduction to latent tree models Results on real-world data Bridged Islands Algorithm Experiment Results 3 Conclusion 4 September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 6 / 29
What are latent tree models? Introduction to latent tree models What is Latent Tree Model? → Latent variables → Observed variables Latent Tree Models: Tree-structured Bayesian network; Encode a joint distribution 1 : n � P ( X 1 , . . . , X n , Y 1 , . . . , Y m ) = P ( X i | parent ( X i )) i =1 1 Suppose there are n observed variables X 1 ,. . . , X n and m latent variables Y 1 ,. . . , Y m in an LTM. September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 7 / 29
What are latent tree models? Introduction to latent tree models What is latent tree model: another perspective Y 1 Y 1 Generalize Y 2 X 4 Y 3 X 1 X 2 X 6 X 7 X 1 X 2 X 3 X 5 X 6 X 7 Latent Class Model Latent Tree Model LCM → LTM One latent variable → Multiple latent variables One clustering → Multiple clusterings September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 8 / 29
What are latent tree models? Results on real-world data LTMs for multi-partition clustering: survey data ICAC data: ICAC is the anticorruption agency of Hong Kong. Survey respondents are asked about their: attitude towards corruption; perception of the ICAC’s performances. Sample Question: Are you willing to report corruption? A. willing B. unwilling C. depending on circumstances D. Don’t know/no opinion September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 9 / 29
What are latent tree models? Results on real-world data LTMs for multi-partition clustering: survey data Figure 1: The structure of the LTM obtained for the ICAC data. Abbreviations: C – Corruption, I – ICAC, Y – Year, Gov – Government, Bus – Business Sector. Meanings of manifest variables: Tolerance-C-Gov means ‘tolerance towards corruption in the government’; C-City means ‘level of corruption in the city’; C-NextY means ‘change in the level of corruption next year’; I-Effectiveness means ‘effectiveness of ICAC’s work’; I-Powers means ‘ICAC powers’; Confid-I means ‘confidence in ICAC’; etc. The edge widths visually show the strength of correlation between variables. They are computed from the probability distributions of the model. September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 10 / 29
What are latent tree models? Results on real-world data LTMs for multi-partition clustering: survey data Figure 1: The structure of the LTM obtained for the ICAC data. Abbreviations: C – Corruption, I – ICAC, Y – Year, Gov – Government, Bus – Business Sector. Meanings of manifest variables: Tolerance-C-Gov means ‘tolerance towards corruption in the government’; C-City means ‘level of corruption in the city’; C-NextY means ‘change in the level of corruption next year’; I-Effectiveness means ‘effectiveness of ICAC’s work’; I-Powers means ‘ICAC powers’; Confid-I means ‘confidence in ICAC’; etc. The edge widths visually show the strength of correlation between variables. They are computed from the probability distributions of the model. September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 10 / 29
What are latent tree models? Results on real-world data Inspect individual clustering: CCPDs of Y 2 Table 1: The class conditional probability distributions of Y 2 . P ( Y 2 = s 1 ) = . 37 P ( Y 2 = s 2 ) = . 24 P ( . | Y 2 ) s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 0 s 1 s 2 s 3 s 4 s 5 s 6 0 .04 .1 .42 .28 .14 .02 .29 .24 .04 0 0 0 .43 Income Age .05 .35 .39 .17 .03 .03 .08 .41 .35 .13 Education 0 0 .04 .41 .09 .09 .37 .05 .29 .35 .26 .04 0 .01 Sex .57 .43 0 1 P ( Y 2 = s 3 ) = . 22 P ( Y 2 = s 4 ) = . 17 P ( . | Y 2 ) s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 0 s 1 s 2 s 3 s 4 s 5 s 6 .11 .17 .25 .31 .07 0 .1 .78 .08 .09 .03 0 0 .02 Income Age 0 .07 .22 .4 .3 .99 .01 0 0 0 Education .02 .29 .43 .19 .05 .01 0 0 0 .08 .47 .21 .1 .16 Sex .8 .2 .5 .5 States of the manifest variables s 0 s 1 s 2 s 3 s 4 s 5 s 6 Income none –4k 4–7k 7–10k 10–20k 20–40k 40k– Age 15–24 25–34 35–44 45–54 55– Education none primary f1-3 f4-5 f6-7 diploma degree Sex m f September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 11 / 29
What are latent tree models? Results on real-world data Inspect individual clustering: CCPDs of Y 2 Table 2: The class conditional probability distributions of Y 2 . P ( Y 2 = s 1 ) = . 37 P ( Y 2 = s 2 ) = . 24 P ( . | Y 2 ) s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 0 s 1 s 2 s 3 s 4 s 5 s 6 people with good education and good income a class of women with poor education Income 0 .04 .1 .42 .28 .14 .02 .29 .24 .04 0 0 0 .43 Age .05 .35 .39 .17 .03 .03 .08 .41 .35 .13 Education 0 0 .04 .41 .09 .09 .37 .05 .29 .35 .26 .04 0 .01 Sex .57 .43 0 1 P ( Y 2 = s 3 ) = . 22 P ( Y 2 = s 4 ) = . 17 P ( . | Y 2 ) s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 0 s 1 s 2 s 3 s 4 s 5 s 6 people with poor education and average income a class of young people with low income Income .11 .17 .25 .31 .07 0 .1 .78 .08 .09 .03 0 0 .02 Age 0 .07 .22 .4 .3 .99 .01 0 0 0 Education .02 .29 .43 .19 .05 .01 0 0 0 .08 .47 .21 .1 .16 Sex .8 .2 .5 .5 States of the manifest variables s 0 s 1 s 2 s 3 s 4 s 5 s 6 Income none –4k 4–7k 7–10k 10–20k 20–40k 40k– Age 15–24 25–34 35–44 45–54 55– Education none primary f1-3 f4-5 f6-7 diploma degree Sex m f September 13, 2012 T.F Liu, N.L Zhang, K.M Poon, H. Liu, Y. Wang A Novel LTM-based Method for Multi-partition Clustering (HKUST) 12 / 29
Recommend
More recommend