Reasoning about Sets using Redescription Mining Mohammed J. Zaki Naren Ramakrishnan zaki@cs.rpi.edu naren@cs.vt.edu
What are redescriptions? A shift-of-vocabulary, or a different way of communicating a given piece of information.
Input to Redescription Mining B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Input to Redescription Mining (contd.) B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y
Basic Problem Given • a set O of objects (e.g., countries) • a collection of subsets ( descriptors ) of O Find • subsets of O that can be defined in at least two ways
A Redescription USA Canada Russia Chile Russia Russia China = EXCEPT AND China Brazil France China Cuba Canada UK USA USA Argentina ‘Countries with land area > 3,000,000 sq. miles’ − ‘Tourist Destinations in the Americas’ ⇔ ‘Permanent members of U.N. Security Council’ ∩ ‘Countries with history of communism’
Redescription is sort of like ... association rule mining • generalize from implications to equivalences conceptual clustering • find clusters with dual characterizations constructive induction • build features that mutually reinforce each other
Applications in Bioinformatics (Gene) subsets galore! • Genes localized in the mitochondrion • Genes up-expressed two-fold or more in heat stress • Genes encoding for proteins forming the immunoglobin complex • Genes involved in glucose biosynthesis • Genes handpicked by Prof. Genie for further study • Genes clustered together by your favorite algorithm • · · ·
How do redescriptions happen? RG RG RG RG BY B R Cuba BY Canada Russia China Chile BY USA Brazil UK BY Argentina France G Y
How do redescriptions happen? RG RG RG RG B R BY Cuba BY Canada Russia China Chile BY USA Brazil UK BY Argentina France G Y
A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
A game on Karnaugh maps RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY
Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG )
Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG ) ⇔ ( BY RG ∨ BY RG ∨ BY RG ∨ BY RG )
Reading off a redescription RG RG RG RG RG RG RG RG BY BY BY BY BY BY BY BY ( BY ) ⇔ ( RG )
Redescriptions help reason about sets B R Cuba Canada Russia China Chile USA Brazil UK Argentina France G Y Q: How can B be made equal to R ? Ans: Subtract Y from B ; intersect G with R , yielding BY ⇔ RG .
Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects.
Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects. • A conditional redescription X ⇐ ⇒ Y | Z ( Z ⊆ D ) holds when – X ∩ Y = X ∩ Z = Y ∩ Z = ∅ and – X ∩ Z and Y ∩ Z induce the same set of objects.
Some Definitions Given a collection of objects O and descriptors D : • A redescription X ⇐ ⇒ Y ( X, Y ⊆ D ) holds when – X ∩ Y = ∅ and – X and Y induce the same set of objects. • A conditional redescription X ⇐ ⇒ Y | Z ( Z ⊆ D ) holds when – X ∩ Y = X ∩ Z = Y ∩ Z = ∅ and – X ∩ Z and Y ∩ Z induce the same set of objects. • A redescription X ⇐ ⇒ Y is a non-redundant redescription iff there does not exist another redescription X ′ ⇐ ⇒ Y ′ for the same set of objects, such that X ′ ⊆ X and Y ′ ⊆ Y
Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY
Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Colored cell = closed itemset (e.g., BY RG )
Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Reducible cluster of colored cells = closed itemset (e.g., BY R )
Connections to Association Rule Mining RG RG RG RG BY BY Objects = Transactions Descriptors = Items BY BY Reducible cluster of mixed cells = non-closed itemset (e.g., BY )
Adapting association mining algorithms Mining redescriptions reduces to: • mining closed itemsets (descriptor sets) • obtain submatrices reducible to these closed sets (generators) Object Descriptors o 1 d 1 d 2 d 4 d 5 d 6 o 2 d 2 d 3 d 5 d 7 o 3 d 1 d 2 d 4 d 5 d 6 o 4 d 1 d 2 d 3 d 5 d 6 d 7 o 5 d 1 d 2 d 3 d 4 d 5 d 6 d 7 o 6 d 2 d 3 d 4
Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2
Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 d1 => d5; d6 => d5 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2
Lattice of Closed Sets dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2
Up Closed and Personal d1 d2 d5 d6 d1 d2 d5 d1 d5 d6 d1 d2 d6 d2 d5 d6 d1 d2 d1 d5 d1 d6 d2 d6 d5 d6 d1 d6
Up Closed and Personal d1 d2 d5 d6 d1 d2 d5 d1 d5 d6 d1 d2 d6 d2 d5 d6 d1 d2 d1 d5 d1 d6 d2 d6 d5 d6 d1 d6 d1 <=> d6
Finding Minimal Generators dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2
Finding Minimal Generators dset: d1 d2 d3 d4 d5 d6 d7 objset: o5 mingen: d1 d3 d4, d3 d4 d5, d3 d4 d6, d4 d7 dset: d2 d3 d4 dset: d1 d2 d4 d5 d6 dset: d1 d2 d3 d5 d6 d7 objset: o1 o3 o5 objset: o5 o6 objset: o4 o5 mingen: d1 d4, d4 d5, d4 d6 mingen: d3 d4 mingen: d1 d3, d1 d7, d3 d6, d6 d7 dset: d2 d3 d5 d7 dset: d1 d2 d5 d6 objset: o2 o4 o5 objset: o1 o3 o4 o5 mingen: d3 d5, d7 mingen: d1, d6 dset: d2 d3 dset: d2 d5 dset: d2 d4 objset: o2 o4 o5 o6 objset: o1 o2 o3 o4 o5 objset: o1 o3 o5 o6 mingen: d3 mingen: d5 mingen: d4 dset: d2 objset: o1 o2 o3 o4 o5 o6 mingen: d2
Recommend
More recommend