reasoning about sets using redescription mining
play

Reasoning about Sets using Redescription Mining Mohammed J. Zaki - PowerPoint PPT Presentation

Reasoning about Sets using Redescription Mining Mohammed J. Zaki Naren Ramakrishnan zaki@cs.rpi.edu naren@cs.vt.edu What are redescriptions? A shift-of-vocabulary, or a different way of communicating a given piece of information. Input to


  1. Reasoning about Sets using Redescription Mining Mohammed J. Zaki Naren Ramakrishnan zaki@cs.rpi.edu naren@cs.vt.edu

  2. What are redescriptions? A shift-of-vocabulary, or a different way of communicating a given piece of information.

  3. Input to Redescription Mining B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  4. Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  5. Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  6. Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  7. Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  8. Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y

  9. Basic Problem Given • a set O of objects (e.g., countries) • a collection of subsets ( descriptors ) of O Find • subsets of O that can be defined in at least two ways

  10. A Redescription USA Canada Russia Chile Russia Russia China = EXCEPT AND China Brazil France China Cuba Canada UK USA USA Argentina ‘Countries with land area > 3,000,000 sq. miles’ − ‘Tourist Destinations in the Americas’ ⇔ ‘Permanent members of U.N. Security Council’ ∩ ‘Countries with history of communism’

  11. Redescription is sort of like ... association rule mining • generalize from implications to equivalences conceptual clustering • find clusters with dual characterizations constructive induction • build features that mutually reinforce each other

  12. Applications in Bioinformatics (Gene) subsets galore! • Genes localized in the mitochondrion • Genes up-expressed two-fold or more in heat stress • Genes encoding for proteins forming the immunoglobin complex • Genes involved in glucose biosynthesis • Genes handpicked by Prof. Genie for further study • Genes clustered together by your favorite algorithm • · · ·

  13. How do redescriptions happen? RG RG RG RG BY B R Cuba BY Canada Russia China Chile BY USA Brazil UK BY Argentina France G Y

  14. How do redescriptions happen? RG RG RG RG B R BY Cuba BY Canada Russia China Chile BY USA Brazil UK BY Argentina France G Y

  15. A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  16. A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  17. A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  18. A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  19. A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  20. Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY

  21. Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG )

  22. Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG ) ⇔ ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG )

  23. Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY ) ⇔ ( RG )

  24. Redescriptions help reason about sets B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y Q: How can B be made equal to R ? Ans: Subtract Y from B ; intersect G with R , yielding BY ⇔ RG .

  25. Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects.

  26. Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects. • A conditional redescription X ⇐ ⇒ Y | Z ( Z ⊆ D ) holds when – X ∩ Y = X ∩ Z = Y ∩ Z = ∅ and – X ∩ Z and Y ∩ Z induce the same set of objects.

  27. Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects. • A conditional redescription X ⇐ ⇒ Y | Z ( Z ⊆ D ) holds when – X ∩ Y = X ∩ Z = Y ∩ Z = ∅ and – X ∩ Z and Y ∩ Z induce the same set of objects. • A redescription X ⇐ ⇒ Y is a non-redundant redescription iff there does not exist another redescription X ′ ⇐ ⇒ Y ′ for the same set of objects, such that X ′ ⊆ X and Y ′ ⊆ Y

  28. Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY

  29. Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Colored cell = closed itemset (e.g., BY RG )

  30. Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Reducible cluster of colored cells = closed itemset (e.g., BY R )

  31. Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Reducible cluster of mixed cells = non-closed itemset (e.g., BY )

  32. Adapting association mining algorithms Mining redescriptions reduces to: • mining closed itemsets (descriptor sets) • obtain submatrices reducible to these closed sets (generators) Object Descriptors o 1 d 1 d 2 d 4 d 5 d 6 o 2 d 2 d 3 d 5 d 7 o 3 d 1 d 2 d 4 d 5 d 6 o 4 d 1 d 2 d 3 d 5 d 6 d 7 o 5 d 1 d 2 d 3 d 4 d 5 d 6 d 7 o 6 d 2 d 3 d 4

  33. Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2

  34. Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 d1 => d5; d6 => d5 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2

  35. Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2

  36. Up Closed and Personal d1 d2 d5 d6 d1 d2 d5 d1 d5 d6 d1 d2 d6 d2 d5 d6 d1 d2 d1 d5 d1 d6 d2 d6 d5 d6 d1 d6

  37. Up Closed and Personal d1 d2 d5 d6 d1 d2 d5 d1 d5 d6 d1 d2 d6 d2 d5 d6 d1 d2 d1 d5 d1 d6 d2 d6 d5 d6 d1 d6 d1 <=> d6

  38. Finding Minimal Generators dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2

  39. Finding Minimal Generators dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2

Recommend


More recommend