what is latent tree analysis lta repeated event co
play

What is Latent Tree Analysis (LTA)? Repeated event co-occurrences - PowerPoint PPT Presentation

MLA 2017 Latent Tree Analysis Nevin L. Zhang The Hong Kong University of Science and Technology www.cse.ust.hk/~lzhang What is Latent Tree Analysis (LTA)? Repeated event co-occurrences might Due to common hidden causes or genuine direct


  1. MLA 2017 Latent Tree Analysis Nevin L. Zhang The Hong Kong University of Science and Technology www.cse.ust.hk/~lzhang

  2. What is Latent Tree Analysis (LTA)?  Repeated event co-occurrences might  Due to common hidden causes or genuine direct correlations, OR  Be coincidental, esp. in big data  Challenge: Identify co-occurrences due to hidden causes or correlations.  Latent tree analysis solves a related and simpler problem:  Detect co-occurrences that can be statistically explained by a tree of latent variables  Can be used to solve interesting tasks Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  …  2 MLA 2017 Nevin L. Zhang/HKUST

  3. Basic Latent Tree Models (LTM)  Tree-structured Bayesian network  All variables are discrete  Variables at leaf nodes are observed  Variables at internal nodes are latent  Parameters: P(Y1), P(Y2|Y1),P(X1|Y2 ), P(X2|Y2), …  Also known as Hierarchical  Semantics: latent class (HLC) models, HLC models (Zhang. JMLR 2004) 3 MLA 2017 Nevin L. Zhang/HKUST

  4. Pouch Latent Tree Models (PLTM)  An extension of basic LTM (Poon et al. ICML 2010) Rooted tree  Internal nodes represent discrete latent variables  Each leaf node consists of one or more continuous observed variables, called a pouch.  4 MLA 2017 Nevin L. Zhang/HKUST

  5. More General Latent Variable Tree Models  Internal nodes can be observed (Choi et al. JMLR 2011)  Internal nodes can be continuous  Forest  Primary focus of this talk: the basic LTM 5 MLA 2017 Nevin L. Zhang/HKUST

  6. Identifiability Issues  Root change lead to equivalent models  So, edge orientations unidentifiable ( Zhang, JMLR 2004)  Hence, we are really talking about undirected models Undirected LTM represents an equivalent class  of directed LTMs In implementation, represented as directed  model instead of MRF so that partition function is always 1. 6 MLA 2017 Nevin L. Zhang/HKUST

  7. Identifiability Issues  |X|: Cardinality of variable X, i.e., the number of states. ( Zhang, JMLR 2004) Theor orem: em: The set of all regular models for a given set of observed variables is  finite. Latent variables cannot have too many states. 7 MLA 2017 Nevin L. Zhang/HKUST

  8. Latent Tree Analysis (LTA) Learning latent tree models: Determine Number of latent variables • Numbers of possible states for each latent variable • Connections among variables • Probability distributions • 8 MLA 2017 Nevin L. Zhang/HKUST

  9. Latent Tree Analysis (LTA) Learning latent tree models: Determine Number of latent variables • Numbers of possible states for each latent variable • Connections among nodes • Probability distributions • Difficult, but doable 9 MLA 2017 Nevin L. Zhang/HKUST

  10. Three Settings for Algorithm Developments  Setting 1: CLRG (Choi et al, 2011, Huang et al. 2015) Assume that data generated from an unknown LTM.  Investigate properties of LTMs and use them for learning   E.g., model structure from tree additivity of information distance Theoretical guarantees to recover generative model under conditions   Setting 2: EAST, BI (Chen et al, 2012, Liu et al, 2013) Do not assume that data generated from an LTM.  Fit to LTM to data using BIC score, via search or heuristics  Does not make sense to talk about theoretical guarantees  Obtains better models than Setting 1 because the assumption usually untrue.   Setting 3: HLTA (Liu et al, 2014, Chen et al, 2016) Consider usefulness in addition to model fit. Hierarchy of latent variables.  10 MLA 2017 Nevin L. Zhang/HKUST

  11. Current Capabilities  Takes a few hours on a single machine to analyze data sets with  Thousands of variables, and  Hundreds of thousands of instances  Significant additional speedup can be achieved via simplification and parallel computing 11 MLA 2017 Nevin L. Zhang/HKUST

  12. What can LTA be used for?  Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications 12 MLA 2017 Nevin L. Zhang/HKUST

  13. What can LTA be used for?  Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications 13 MLA 2017 Nevin L. Zhang/HKUST

  14. How to Cluster?  Cluster analysis: Grouping of objects into clusters such that Objects in the same cluster are similar  Objects from different clusters are dissimilar.  14 MLA 2017 Nevin L. Zhang/HKUST

  15. How to Cluster these? 15 MLA 2017 Nevin L. Zhang/HKUST

  16. How to Cluster these? 16 MLA 2017 Nevin L. Zhang/HKUST

  17. How to Cluster these? 17 MLA 2017 Nevin L. Zhang/HKUST

  18. Multidimensional Clustering  Complex data usually have multiple facets, and can be meaningfully partitioned in multiple ways.  More reasonable to look for multiple ways to partition data  How to get multiple partitions? 18 MLA 2017 Nevin L. Zhang/HKUST

  19. How to get one partition?  Finite mixture models: One latent variable z  Gaussian mixture models: Continuous data  Latent class model (mixture of multinomial distributions): Categorical data  Key point: Use models with one latent variable for one partition 19 MLA 2017 Nevin L. Zhang/HKUST

  20. How to get multiple partitions?  Use models with multiple latent variables for multiple partitions  Latent tree models  Probabilistic graphical models with multiple latent variables  A generalization of latent class models 20 MLA 2017 Nevin L. Zhang/HKUST

  21. Multidimensional Clustering of Social Survey Data // Survey on corruption in Hong Kong and performance of the anti-corruption agency -- ICAC ( Chen et al, AIJ 2012) //31 questions, 1200 samples C_City: s0 s1 s2 s3 // very common, quite common, uncommon, very uncommon C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable, totally tolerable Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 …. 21 MLA 2017 Nevin L. Zhang/HKUST

  22. Latent Structure Discovery Y2: Demographic background; Y3: Tolerance toward corruption; Y4: ICAC performance; Y5: Change in level of corruption; Y6: Level of corruption; Y7: ICAC accountability 22 MLA 2017 Nevin L. Zhang/HKUST

  23. Multidimensional Clustering Y2=s0: Low income youngsters; Y2=s1: Women with no/little income; Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income . 23 MLA 2017 Nevin L. Zhang/HKUST

  24. Multidimensional Clustering Values of observed variable: S0 - totally intolerable, …., s3 -totally tolerable Interpretations of values of latent variables Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are lenient about corruption are more lenient toward C-Bus than C-Gov 24 MLA 2017 Nevin L. Zhang/HKUST

  25. Multidimensional Clustering  Who are the toughest toward corruption among the 4 groups? Interesting finding: Y2=s2: ( good education and good income) the toughest on corruption. Y2=s3: (poor education and average income) the most lenient on corruption The other two classes are in between. 25 MLA 2017 Nevin L. Zhang/HKUST

  26. Multidimensional Clustering Summary  Latent tree analysis has  found several interesting ways to partition the ICAC data, and  revealed some interesting relationships between different partitions. (Chen et al. AIJ 2012) 26 MLA 2017 Nevin L. Zhang/HKUST

  27. What Can LTA be used for?  Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications 27 MLA 2017 Nevin L. Zhang/HKUST

  28. Hierarchical Latent Tree Analysis (HLTA)  Each word is a binary variable 0 – absence from doc, 1 – presence in doc   Each document is a binary vector over vocabulary 28 MLA 2017 Nevin L. Zhang/HKUST

  29. Topics  Each latent variable partitions docs into 2 clusters  Document clusters interpreted as topics  Z14=0: background topic  Z14=1: “ video-card- driver”  Each latent variable gives one topic 29 MLA 2017 Nevin L. Zhang/HKUST

  30. Topic Hierarchy  Latent variables at high levels  “ long- range” word co-occurrences, more general topics  Latent variables at low levels  “ short- range” word co-occurrences, more specific topics . 30 MLA 2017 Nevin L. Zhang/HKUST

  31. The New York Times Dataset  From UCI  300,000 articles from 1987 - 2007  10,000 words selected using TF/IDF  HLTA took 7 hours on a desktop machine 31 MLA 2017 Nevin L. Zhang/HKUST

Recommend


More recommend