Graph Theoretic Latent Class Discovery and Its Robustness to - PowerPoint PPT Presentation

Graph Theoretic Latent Class Discovery and It’s Robustness to Minimal Dominating Set Choice J. L. Solka, C. E. Priebe, and D. J. Marchette jsolka@nswc.navy.mil;dmarche@nswc.navy.mil NSWCDD Interface04 – p.1/24

Agenda What is latent class discovery? What are some approaches to the latent class discovery process? The class cover catch digraph classifier. Latent class discovery results on a gene expression data set. Wrap-up and conclusions. Interface04 – p.2/24

Acknowledgments Michael C. Minnotte and Jurgen Symanzik, and others for organizing the conference Office of Naval Research through their ILIR Program for funding this effort Interface04 – p.3/24

What is Latent Class Discovery? A latent class is a class of observations that reside undiscovered within a known class of observations. Develop a general methodology for the discernment of latent class structure during discriminant analysis. Moderately large hyperdimensional data sets. During training or testing. Explore applications of developed methodologies to the analysis of data sets in the areas hyperdimensional image analysis, artificial olfactory systems, computer security data, gene expression data, and text data mining. Interface04 – p.4/24

Flow Chart M U LT I D I M E N S I O NA L S CA L I N G I G RA P H T H E O R E T I C N L A T E N T H Y P E RD I M E N S I O NA L D I S CR I M I NAN T S C L A SS E S DA T A ANA L Y S I S I G H T S M E T R I C N O N L I N E AR S P AC E D I M E N S I O NA L I T Y ADA P T A T I O N R E DUC T I O N Interface04 – p.5/24

Dominating Set D o m in a t in g s e t t w o − c l a ss d a t a a nd c o v er in g di s c s Interface04 – p.6/24

CCCD-Based Latent Class Discovery 3 2 1 0 −1 −2 −3 −4 −5 −6 Interface04 – p.7/24 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4

ALL/AML Leukemia Gene Expression Analysis 72 P a t i e n t s 7129 g e n e s A pp l y CCCD t o A LL O b s er va t i o n s = A M L = A LL B − ce ll C l u s t er CCCD = A LL T − ce ll S o l u t i o n B a s e d o n R a d ii E xa m i n e C l u s t er s f o r A s cer t a i n S i g n i f i c a n ce o f L a t e n t C l a ss S t r u c t u re L a t e n t C l a ss S t r u c t u re Interface04 – p.8/24

Interface04 – p.9/24 ✡ ✗ ✭ ✩ ✩ ✩ ✧ ✡ ✬ ✬ ✪ ✲ ✤ ✄ ✴ ✸ ✰ ✬ ✭ ✧ ✹ ✸ ✔✺ ✒ ✻ ✩ ✙ ✚ ✩ ✖ ✢ ✮ ✖ ✗ ✸ ✔ � ✁ ✔✺ ✆ ✆ ✆ ✄ ✸ ✹ ✭ ✬ ✝ ✌ ✲ ✔ ✚ ✙ ✄ ✕ ✍ ✔ ✓ ✁ ✒ ✴ ✎ ✂ ✍ ✩ is Resubstitution Error ✝✟✠☛✡ ✮✱✰ an empirical risk (resubstitution error rate estimate) ✵✷✶ ✍✑✳ ✭✯✮✱✰ ✮✽✰ ✵✷✶ ✤✫✪ Rate Estimate ✍✑✳ ✙★✧ ✤✦✥ ✎✣✢ ✛✑✜ ✙★✧ ✖✘✗ ✤✦✥ ✛✑✼ ✖✘✗ ✍✑✏ ✝✟✞ ✂☎✄ calculated as ✠☞✡ For each

✡ ✴ ✆ � ✴ ✟ ✞ ✺ ✠ ✁ ✒ ✌ ✢ ✡ ☛ ✄ ✂ ✂ ☞ ✠ Classification Dimension ✝ ✁� We proceed by defining the “scale dimension” to be the cluster map dimension that ✝ ✄� ✝✑✠☞✡ minimizes a dimensionality-penalized empirical risk; ✛ ✁✝ for ✂✆☎ ✵✷✶ ✵✷✶ some penalty coefficient . Interface04 – p.10/24

ALL/AML Classification Dimension Plot Interface04 – p.11/24

Gene Latent Class Discovery Interface04 – p.12/24

ALL/AML MDS Plot Interface04 – p.13/24

How Robust is the Methodology? One other “success” story using artificial nose data. What if we had used another dominating set in our analysis? Is the discovered latent class structure independent of the dominating set used? Interface04 – p.14/24

An Exhaustive Enumeration of All Possible Dominating Sets for the Gene Data 180 21 node solutions 16 of the nodes remain fixed across the solutions 14 greedy solutions Interface04 – p.15/24

Classification Space Curves for the 180 Solutions 0.30 0.25 0.20 0.15 0.10 0.05 0.00 5 10 15 20 Interface04 – p.16/24

Classification Dimension for the 180 Solutions (red o Greedy Solutions, Green * Previous Solution) 7 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 0 20 40 60 80 100 120 140 160 180 Interface04 – p.17/24

Number of Dominating Sets for Each Vertex Number of Dominating sets for each vertex 150 T−Cell B−Cell In−degree 0 # Dominating Sets 100 50 0 0 10 20 30 40 Vertex Interface04 – p.18/24

Interface04 – p.19/24 ❋ ✿ ❁ ✽ ❅ ❇ ✿ ❈ ❁ ❇ ❉ ✿ ❊ ❀ ● ✽ ❍ ❉ ■ ❅ ❏ ❍ ❁ ❑ ❈ ❍ ❋ ❅ ✾ ✽ ❆ ❅ ✾ ❣ ❨ ❩ ❬ ❭ ❪ ❫ ❴ ❵ ❛ ❜ ❝ ❞ ❡ ❢ ❤ ❄ ✐ ❥ ❦ ❧ ♠ ♥ ✼ ✽ ✾ ✿ ❀ ❁ ♦ ❃ ❅ ▲ ❲ ❀ q ❇ ❀ ❁ ❉ ✿ q ❏ r ❀ ■ ❖ ❋ ✾ ❁ ■ ❁ ❇ s ❋ q ✾ ■ ❀ ✽ ❏ ❍ ❖ P ❂ ✿ ❈ ❁ ■ ❀ ❏ ✽ ❈ ❁ ❉ ✽ ❅ ❏ ❍ ❁ ❂ ♣ ❇ ❖ ❏ ✽ ❅ ❋ ❏ ✽ ❅ ✾ ❉ ❁ ❏ ❉ ❏ ❍ ❋ ❳ ❱ � ✵ ★ ✩ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✱ ✲ ✳ ✴ ✶ ✦ ✷ ✸ ✹ ✺ ✻ ✼ ✽ ✾ ✿ ❀ ❁ ❂ ❃ ❄ ✧ ✥ ✽ ✎ ✁ ✂ ✄ ☎ ✆ ✝ ✞ ✟ ✠ ✡ ☛ ☞ ✌ ✍ ✏ ✤ ✑ ✒ ✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✢ ✣ ❅ ❆ ❯ ✽ ✽ ❈ ❁ ❉ ✽ ❅ ❏ ❍ ❁ ❂▼ ◆ ❇ ■ ❖ ❅ ❀ ❋ ❏ ✽ ❅ ✾ ❉ ❁ ❏ ❉ P ◗ ❘ ❙ ❚ ❏ ❁ ✿ ❍ ❁ ✽ ❅ ❇ ✿ ❈ ❁ ❇ ❉ ✿ ❊ ❀ ❋ ● ❉ ▲ ■ ❅ ❏ ❍ ❁ ❑ ❈ ❍ ❋ ❅ ✾ ✽ ❅ ✾ Digraph Analysis

Latent Class Discovery Figures of Merit How can we be assured that all of the greedy dominating set solutions discover the same latent classes? Previous greedy solution had 3 clusters that are pure B and 1 cluster that contained 8/9 of the T observations Percentage of B points that are in pure B clusters and the highest percentage of T points in any one cluster Interface04 – p.20/24

Purity (Latent Class Discovery) for the Golub Gene Data , Red Triangles are the Greedy Solutions 1.00 0.95 0.90 tpercent 0.85 0.80 0.4 0.5 0.6 0.7 0.8 0.9 bpercent Interface04 – p.21/24

Remaining Questions Demonstrated similar latent class discovery among all of the greedy dominating set solutions Many of the 7129 variates (genes) are superfluous to the discriminant analysis problem Work is ongoing to examine the discovered latent classes based on subsets of the genes Various figures of merit have been used to choose the subsets of the genes Interface04 – p.22/24

Conclusions Developed a new concept for latent class discovery during discriminant analysis Illustrated one graph theoretic methodology for the discovery of the latent classes Illustrated this methodology with a gene expression data set. Presented some preliminary results examining the robustness of the discovery process to the cccd process Interface04 – p.23/24

Readings C. E. Priebe, J. L. Solka, D. J. Marchette, and B. T. Clark, “Class Cover Catch Digraphs for Latent Class Discovery in Gene Expression Monitoring by DNA Microarrays,” to appear the Special Issue of Computational Statistics and Data Analysis on Statistical Visualization, 2002+. J. L. Solka, C. E. Priebe, and B. T. Clark, “A Visualization Framework for the Analysis of Hyperdimensional Data,” in International Journal of Image and Graphics Special Issue on Data Mining, 2002. Marchette, D.J., Priebe, C.E., “Characterizing the scale dimension of a high-dimensional classification problem,” in Pattern Recognition,2002 Interface04 – p.24/24

Graph Theoretic Latent Class Discovery and Its Robustness to - PowerPoint PPT Presentation

Graph Theoretic Latent Class Discovery and Its Robustness to Minimal Dominating Set Choice J. L. Solka, C. E. Priebe, and D. J. Marchette jsolka@nswc.navy.mil;dmarche@nswc.navy.mil NSWCDD Interface04 p.1/24 Agenda What is latent

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Latent Class Analysis (LCA) in Stata Kristin MacDonald Director of Statistical Services

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Routing Problems: Approximation, Hardness, and Graph-Theoretic Insights Julia Chuzhoy

Graph-theoretic methods in combinatorial (algebraic) topology Micha l Adamaszek Universit

Hierarchical RL and Skill Discovery CS 330 1 The Plan Information-theoretic concepts Skill

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Position-theoretic semantics and entailment David Ripley Monash University

Phonics Workshop Wednesday 30 th January 2019 What are the aims of the workshop? To share

Welcome to Reading Books to Children 5 Basic Skills 1. Learning the letter sounds 2. Letter

Phonics Workshop Aims of workshop To understand the importance of phonics. To get an idea

Phonics Workshop Aims To share how phonics is taught at Lickey Hills To develop parents

Aims By the end of this workshop you will: - have a greater understanding of how and why we

Every Child A Reader What is Literacy? The ability to speak, listen, read and write. The

Te Teaching ching ch children dren to o re read d and nd wri rite te thr hrough ough

Rush Common School English Workshop 30 th April 2015 Welcome Aims of the session To

Sambuz

Useful Links

Newsletter

Mail Us

Graph Theoretic Latent Class Discovery and Its Robustness to - PowerPoint PPT Presentation

Graph Theoretic Latent Class Discovery and Its Robustness to Minimal Dominating Set Choice J. L. Solka, C. E. Priebe, and D. J. Marchette jsolka@nswc.navy.mil;dmarche@nswc.navy.mil NSWCDD Interface04 p.1/24 Agenda What is latent

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Latent Class Analysis (LCA) in Stata Kristin MacDonald Director of Statistical Services

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Routing Problems: Approximation, Hardness, and Graph-Theoretic Insights Julia Chuzhoy

Graph-theoretic methods in combinatorial (algebraic) topology Micha l Adamaszek Universit

Hierarchical RL and Skill Discovery CS 330 1 The Plan Information-theoretic concepts Skill

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Position-theoretic semantics and entailment David Ripley Monash University

Phonics Workshop Wednesday 30 th January 2019 What are the aims of the workshop? To share

Welcome to Reading Books to Children 5 Basic Skills 1. Learning the letter sounds 2. Letter

Phonics Workshop Aims of workshop To understand the importance of phonics. To get an idea

Phonics Workshop Aims To share how phonics is taught at Lickey Hills To develop parents

Aims By the end of this workshop you will: - have a greater understanding of how and why we

Every Child A Reader What is Literacy? The ability to speak, listen, read and write. The

Te Teaching ching ch children dren to o re read d and nd wri rite te thr hrough ough

Rush Common School English Workshop 30 th April 2015 Welcome Aims of the session To

Sambuz

Useful Links

Newsletter

Mail Us

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,