Mining Non-Derivable Association Rules Bart Goethals Juho Muhonen - PDF document

Mining Non-Derivable Association Rules Bart Goethals ∗ Juho Muhonen Hannu Toivonen Helsinki Institute for Information Technology Department of Computer Science University of Helsinki Finland Abstract sequent among those having certain support and confi dence values ). Obviously only approaches of the latter type can Association rule mining typically results in large amounts of re- potentially address redundancy between rules. Our work will dundant rules. We introduce efficient methods for deriving tight be in this category. bounds for confidences of association rules, given their subrules. If We show how the confi dence of a rule can be bounded the lower and upper bounds of a rule coincide, the confidence is given only its subrules (the condition and consequent of uniquely determined by the subrules and the rule can be pruned as a subrule are subsets of the condition and consequent of redundant, or derivable , without any loss of information. Experi- the superrule, respectively). It turns out, in practice, that ments on real, dense benchmark data sets show that, depending on the lower and upper bounds coincide often, and thus the the case, up to 99–99.99% of rules are derivable. A lossy prun- confi dence can be derived exactly. We call these rules ing strategy, where those rules are removed for which the width of derivable: they can be considered redundant and pruned the bounded confidence interval is 1 percentage point, reduced the without loss of information. We also consider lossy pruning number of rules by a furher order of magnitude. The novelty of strategies: a rule is pruned if the confi dence can be derived our work is twofold. First, it gives absolute bounds for the confi- with a high accuracy, i.e., if the bounded interval is narrow. dence instead of relying on point estimates or heuristics. Second, Unlike practically all previous work on pruning asso- no specific inference system is assumed for computing the bounds; ciation rules by their redundancy, our method for testing the instead, the bounds follow from the definition of association rules. redundancy of a rule is based on deriving absolute bounds on Our experimental results demonstrate that the bounds are usually its confi dence rather than using an ad hoc estimate. Given an narrow and the approach has great practical significance, also in error bound, we can thus guarantee that the confi dence of the comparison to recent related approaches. pruned rules can be estimated (derived) within the bounds. No (arbitrary) selection of a derivation method is involved: 1 Introduction the bounds follow directly from the defi nitions of support Association rule mining often results in a huge amount of and confi dence. (A pragmatic choice we will make is that rules. Attempts to reduce the size of the result for easier only subrules are used to derive the bounds; see below.) inspection can be roughly divided to two categories. (1) In In a sense, the proposed method is a generalization of the subjective approaches, the user is offered some tools to the idea of only outputting the free or closed sets [PBTL99, specify which rules are potentially interesting and which are BBR00]. Using free sets and closed sets corresponds, not, such as templates [KMR + 94] and constraints [NLHP98, however, to only pruning out rules for which we know the GVdB00]. (2) In the objective approaches, user-independent confi dence is one. In the method we propose, the confi dence quality measures are applied on association rules. While can have any value, and the rule is pruned if we can derive interestingness is user-dependent to a large extent, objective that value. Closed sets and related pruning techniques measures are needed to reduce the redundancy inherent in a actually work on sets, not on association rules. There are collection of rules. other, more powerful pruning methods for sets. In particular, The objective approaches can be further categorized by our work is an extension of the work on non-derivable whether they measure each rule independently of other rules sets [CG02] to non-derivable association rules. The method (e.g., using support, confi dence, or lift) or address rule re- is simple, yet it has been overlooked by previous work on the dundancy in the presence of other rules (e.g., being a rule topic. with the most general condition and the most specifi c con- Optimally, the fi nal collection of rules should be under- standable to the user. The minimal collection of rules from which all (pruned) rules can be derived would have a small ∗ Current affi liation: Dept. of Math and Computer Science, University of Antwerp, Belgium

Mining Non-Derivable Association Rules Bart Goethals Juho Muhonen - PDF document

Mining Non-Derivable Association Rules Bart Goethals Juho Muhonen Hannu Toivonen Helsinki Institute for Information Technology Department of Computer Science University of Helsinki Finland Abstract sequent among those having certain

Disjunction Property . . . A derivable or B derivable A B derivable . . . A B

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Depth-First Non-Derivable Itemset Mining Toon Calders Bart Goethals University of Antwerp,

Toon Calders Discovery Science, October 30 th 2012, Lyon Frequent Itemset Mining F I Mi i

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

BID PROTESTS David T. Ralston, Jr. Frank S. Murray November 2007 Bid Protest Topics Why are

Stericycle Investor Presentation Q3-2017 NASDAQ: SRCL Forward - Looking Statements Safe Harbor

principles & exceptions Michael Juergen Werner [AD/2019/05] With financial support from the

EU antitrust enforcement (or the long journey from deductive to abductive reasoning) Miguel de la

Nature-Inspired Techniques for Avoiding Congestion in Wireless Sensor Networks University of

Local bus services in the UK Peter Lukacs, OFT* ACE conference Bergen 17 November 2011 *All

10 Steps for Resolving Conflict Listen, Listen and Listen some more. 1. Avoid judgement and

Conflict-Free Case Management in Home- and Community- Based Services State of Vermont Agency

Mining Non-Derivable Association Rules Bart Goethals Juho Muhonen - PDF document

Mining Non-Derivable Association Rules Bart Goethals Juho Muhonen Hannu Toivonen Helsinki Institute for Information Technology Department of Computer Science University of Helsinki Finland Abstract sequent among those having certain

Disjunction Property . . . A derivable or B derivable A B derivable . . . A B

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Depth-First Non-Derivable Itemset Mining Toon Calders Bart Goethals University of Antwerp,

Toon Calders Discovery Science, October 30 th 2012, Lyon Frequent Itemset Mining F I Mi i

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

BID PROTESTS David T. Ralston, Jr. Frank S. Murray November 2007 Bid Protest Topics Why are

Stericycle Investor Presentation Q3-2017 NASDAQ: SRCL Forward - Looking Statements Safe Harbor

principles &amp; exceptions Michael Juergen Werner [AD/2019/05] With financial support from the

EU antitrust enforcement (or the long journey from deductive to abductive reasoning) Miguel de la

Nature-Inspired Techniques for Avoiding Congestion in Wireless Sensor Networks University of

Local bus services in the UK Peter Lukacs, OFT* ACE conference Bergen 17 November 2011 *All

10 Steps for Resolving Conflict Listen, Listen and Listen some more. 1. Avoid judgement and

Conflict-Free Case Management in Home- and Community- Based Services State of Vermont Agency

principles & exceptions Michael Juergen Werner [AD/2019/05] With financial support from the