High Confidence Rule Mining h f d l Row Enumeration for - PowerPoint PPT Presentation

2007-11-26 Outline • Introduction High Confidence Rule Mining h f d l • Row Enumeration for Microarray Analysis • Confidence ‐ based Prune Strategy • MAXCONF Algorithm Kang Deng • Evaluation U i University of Alberta it f Alb t • References What is Microarrays? “High Confidence Rule Mining for Microarray Analysis”, by Tara McIntosh, • A DNA microarray is a collection of Sanjay Chawla, 2006 microscopic DNA spots, commonly representing microscopic DNA spots commonly representing • 27 pages single genes, arrayed on a solid surface by • 14 definitions covalent attachment to a chemical matrix. Confidence-based Strategy genes • 2 lemmas items • 4 tables samples l • Figures, formulas, etc transactions 1

2007-11-26 Row Enumeration Our Task Explosive increase of candidates • Traditional Dataset • Microarray Dataset • One main objective of molecular biology is to items items 1-length d develop a deeper understanding of how l d d t di f h 2-length transactions genes are functionally related. 3-length The length of transactions the patterns minimum support minimum confidence is much less than the average Minimum Support = 30%, Minimum Confidence = 80% number of items in One items in One transaction We do not mine association rules, but Width: 12 Length: 10000 Width: <500 Length: >>6000 Confidence Rules. How can we make the right rectangle like EXPLOSION!!! the left one? Row Enumeration Transposed table & Tree Outline Items Transactions A 1,2,5,6 B 1,4,8 • Introduction C 1,2,3,4,5,8 D 1,2,3,4,6,7,8 E 1,2,3,4,5 • Row Enumeration F 3 G 1,2,3,4,8 • Confidence ‐ based Prune Strategy H 3 I 3,5,6,7 • MAXCONF Algorithm J 7 • Evaluation Row Enumeration • References 2

2007-11-26 Row Enumeration Confidence ‐ based Strategy Tree If the current parent node n, is completely contained within a l l i d i hi RER II, “Mining frequent closed patterns in microarray data.” by G. Cong, K.-L. sibling node, a child node is not Tan, A. Tung, and F. Pan, 2004 Support-based pruning strategy constructed. Minimum Support = 30%, Minimum Confidence = 80% For Example, node 2. In Biology, we care confidence rules, but not support Confidence ‐ based Strategy Prune #1 Outline • Introduction • Row Enumeration • Confidence ‐ based Prune Strategy • MAXCONF Algorithm • Evaluation • References σ = + = m a x ( 5 ) 1 2 3 3

2007-11-26 Confidence ‐ based Strategy Confidence ‐ based Strategy Prune #1 Prune #1 → This rule has the highest confidence ( ) I ( ACEG ) → √ What about this one? ( AI ) ( CEG ) In the itemset {A,B,C}, Support(A)<=Support(B), Support(A)<=Support(C) { } pp ( ) pp ( ) pp ( ) pp ( ) So, A is the minimum feature in {A,B,C} Itemset becomes larger, the support of it will not change or even become smaller σ ≤ σ ( AI ) ( ) I σ ( itemset ) (A)->(B, C) is an I-spanning rule; (B, C)->(A) is not = confidence σ ( ( ) ) antecedent Confidence ‐ based Strategy Confidence ‐ based Strategy Prune #1 Prune #2 → → → Itemset: {CDEG} Itemset: {CDEG} C C DEG E DEG E , CDG G CDG G , CDE CDE The maximum feature of CDEG is CEG Prune Strategy #2: If maximum feature set M of an itemset at node If maximum feature set M of an itemset at node σ = + = m a x ( 5 ) 1 2 3 Maximum Support of 5 n is not empty, we can prune all child nodes of σ = ( I ) 4 Minimum Feature in this itemset is I n whose itemsets are subsets of M. σ (5) Maximum Confidence of 5: = = conf (5) max 3 / 4 max σ ( ) I If minimum confidence is 4/5, the child of node #5 will be pruned σ σ ( antecedent ) ( ) I 4

2007-11-26 Confidence ‐ based Strategy MAXCONF Algorithm Prune #2 Pruning #2 Pruning #1 σ Itemset: (1234){CDEG} (5) = = conf (5) max 3 / 4 max σ → → → → ( ) ( ) I → → C C DEG E DEG E , CDG G CDG G , CDE CDE → → → Itemset: (1234){CDEG} C DEG E , CDG G , CDE Maximum Feature {CEG} Itemset of child node {CEG} The maximum feature of CDEG is CEG Sub- → → → Node (12345)generates: C EG E , CG G , CE rules Outline Outline • Introduction • Introduction • Row Enumeration • Row Enumeration • Confidence ‐ based Prune Strategy • Confidence ‐ based Prune Strategy • MAXCONF Algorithm • MAXCONF Algorithm • Evaluation • Evaluation • Conclusion • References 5

2007-11-26 Evaluation Evaluation Scalability MAXCONF vs RER II The performance of RER II is not Two Aspect: affected by minimum confidence 1 Rule Generation 1.Rule Generation In most cases, MAXCONF is better I t MAXCONF i b tt than RER II. RER II only outperforms 2.Scalability MAXCONF when the minimum support is higher than 40%. Evaluation Rule Generation References • MAXCONF, “High Confidence Rule Mi i Mining for Microarray Analysis”, by f Mi A l i ” b Tara McIntosh, Sanjay Chawla, 2006 • RER II, “Mining frequent closed patterns in microarray data.” by G. Cong, K.-L. Tan, A. Tung, and F. When minimum support is 0, Pan, 2004 RER II will run out of memory MAXCONF generates more rules than RER II! 6

2007-11-26 Any Question? Thanks for your Attentions 7

High Confidence Rule Mining h f d l Row Enumeration for - PowerPoint PPT Presentation

2007-11-26 Outline Introduction High Confidence Rule Mining h f d l Row Enumeration for Microarray Analysis Confidence based Prune Strategy MAXCONF Algorithm Kang Deng Evaluation U i University of Alberta it f Alb t

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

THE LISTING PRESENTATION A Natural Close! CONFIDENCE CONFIDENCE CONFIDENCE CONFIDENCE Hi

CS70: Jean Walrand: Lecture 29. Confidence? Confidence? Confidence is essential is many

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Creating Confidence Intervals using Excel 2013 XL8A-V0R XL8A-V0R XL8A-V0R Create Confidence

Creating Confidence Intervals using Excel 2010 5/08/2015 V0M V0M V0M Create Confidence

STAT 113 Confidence Intervals Colin Reimer Dawson Oberlin College October 3, 2017 1 / 51

Rule Changes - Non rule change year Review of 2017 rule changes - just the easy to forgot

Common Rule Advanced Notice of Proposed Rulemaking (ANPRM) IRB Investigator Advanced Notice

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Rule #1: Have a takeaway. Rule #2: Keep It Simple. Rule #3: Repetition is Good. Rule #4: Be

A General Model for OLAP of Complex Data Jian Pei State University of New York at Buffalo, USA

Bayesian Variable Selection Method for Modeling Dose-Response Microarray Data Under Simple Order

Overview of the Bioconductor project and marray packages Sandrine Dudoit PH296, Section 36 May

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

Sparse Fuzzy Techniques There Is Room for . . . Our Idea Improve Machine Learning Towards an

Dimensionality Reduction for Data Mining - Techniques, Applications and Trends Lei Yu Binghamton

2014 Highlights 2014 HIVR4P Highlights Presentation of more than 700 research studies

The European Research Council The ERC: a Success Story for the EU Kurt Mehlhorn Member of ERC