Association Rules Extracting Patterns from Large Data Sets Content - PowerPoint PPT Presentation

Association Rules Extracting Patterns from Large Data Sets

Content � Introduction to Pattern and Rule Analysis � A-priori Algorithm � Generalized Rule Induction � Sequential Patterns � Other WEKA algorithms � Outlook

Introduction � Finding unusual patterns and rules from large data sets � Examples � 10% percent of the customers buy wine and cheese � If someone today buys wine and cheese, tomorrow will buy sparkling water � If alarm A and B occur within 30 seconds, then alarm C occurs within 60 seconds with probability 0.5 � If someone visits derstandard.at, there is a 60% chance that the person will visit faz.net as well � If player X and Y were playing together as strikers, the team won 90% of the games � Application Areas: Unlimited � Question: How we can find such patterns?

General Considerations � Rule Represenation � Left-hand side proposition (antecedent) � Right-hand side proposition (consequent) � Probabilistic Rule � Consequent is true with probability p given that the antecedent is true � conditional probability � Scale Level � Especially suited for categorical data � Setting thresholds for continuous data � Advantages � Easy to compute � Easy to understand

Example Basket ID Milk Bread Water Coffee Kleenex 1 0 0 0 0 1 1 1 1 1 0 2 1 0 1 0 1 3 Example of a market basket: 4 0 0 1 0 0 0 1 1 1 0 5 The aim is to find itemsets in 1 1 1 0 0 6 order to predict accurately 1 0 1 1 0 7 (i.e. with high confidence) a 0 1 1 0 1 8 consequent from one or 1 0 0 1 0 9 more antecedents. 10 0 1 1 0 1 � Algorithms: A-Priori, Tertius and GRI

Mathematical Notations � General Notations … � p Variables , , , N Persons X X X 1 2 p ( ) ( ) θ = = ∧ ∧ = … � Itemset 1 1 k < p X X ( ) (1) ( ) k k ( ) ( ) ( ) θ = = ∧ ∧ = ⇒ = = ϕ … � Rule 1 1 1 X X X + ( ) (1) ( ) ( 1) k k k � Identification of frequent itemsets ( ) fr θ � Itemset frequency: ( ) k ( ) = fr θ ∧ ϕ � Support: s ( ) k � Accuracy (Confidence): ( ) θ ∧ ϕ fr ( ) ( ) θ ⇒ ϕ = ϕ = θ = = ( ) k 1| c p 1 ( ) ( ) k θ fr ( ) k

A-priori Algorithm * � Identification of frequent itemsets θ θ θ … � Start with one variable, i.e. then , , (1) (2) (3) � Compute the support s > s min � � List of frequent itemset � Rule generation � Split the itemset in antecedents A and consequent C � Compute evaluation measure � Evaluation measures = � Prior confidence: C s N prior c = � Posterior confidence: …rule confidence C s s ∧ post a a c * Agrawal & Srikant, 1994

Further Algorithms in WEKA � Predictive Apriori � Rules sorted expected predicted accuracy � Tertius � Confirmation values � TP-/FP-rate � Rules with and/or catenations

Generalized Rule Induction (GRI) * � Quantitative measure for interestingness � Ranks competing rules due to this measure � Information theoretic entropy-calculation � Rule generation � Basically works like a-priori Algorithm � Compute for each rule J-statistic and specialized J s by adding more antecedents � The J-measure [ ] ( ) ( ) = − − − − ∈ � Entropy: Information Measure (small) log 1 log 1 0;1 H p p p p 2 2 ( ) ( ) ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ − | 1 | p x y p x y ( ) ( ) ( ) ( ) ( ) = + − � J-measure: ⎢ ⎥ ⎜ ⎟ ⎜ ⎟ | | log 1 | log J x y p y p x y p x y ( ) ( ) − 2 2 ⎢ 1 ⎥ p x p x ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ 1 ( ) 1 ( ) ( ) ( ) ( ) = − � J s -measure: ⎢ ⎥ ⎜ ⎟ ⎜ ⎟ max | log , 1 | log J p y p x y p y p x y ( ) ( ) − s 2 2 ⎢ 1 ⎥ p x p x ⎝ ⎠ ⎝ ⎠ ⎣ ⎦ * Smyth & Goodman, 1992

Sequential Patterns * Customer Time 1 Time 2 Time 3 Time 4 1 Cheese Wine Beer - � Observations over time Wine Beer Cheese - 2 � Itemsets within each Bread Wine Cheese - 3 time point Crackers Wine Beer - 4 Beer Cheese Bread Cheese 5 � Customer performs Crackers Bread - - 6 transaction � Sequence Notation: X > Y (i.e. Y occurs after X) � Rule generation � Compute s by adding successively time points � CARMA Algorithm as before * Agrawal & Srikant, 1995

Outlook � Decision Trees � CART (Breiman et al., 1984) � C5.0 (Quinlan, 1996)

Association Rules Extracting Patterns from Large Data Sets Content - PowerPoint PPT Presentation

Association Rules Extracting Patterns from Large Data Sets Content Introduction to Pattern and Rule Analysis A-priori Algorithm Generalized Rule Induction Sequential Patterns Other WEKA algorithms Outlook Introduction

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Chapter VII.3: Association Rules 1. Generating the Association Rules 2. Measures of

IJF referee Seminar Malaga 2014 Rules presentation 1 IJF RULES 2014 - 2016 IJF RULES 2014 -

Conventional Rounding Rules Conventional Rounding Rules Conventional Rounding Rules Conventional

WIAA SOCCER RULES CLINIC 2017-18 RULES CLINIC PROCEDURE The 2017-18 Soccer Rules Clinic is

WRESTLING RULES CLINIC 2016-17 NFHS WRESTLING RULES The WIAA follows NFHS rules for Wrestling.

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Outline Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Rules Amendments February 13, 2018 Introduction Types of Changes in the Amended Rules:

Updated California Face Covering Rules Updated California Face Covering Rules New EEOC And OSHA

5/12/2016 Ground Rules for Meetings Ground Rules for Meetings (contd) The ground rules for

TDDB84: Lecture 4 Abstract Factory, Dependency Injection, Composite Abstract factory Ingredients

October 10, 2018 St. Paul Womens Guild Sharing our love and faith with each other and our

What's up with Wicket 8 and Java 8 Martijn Dashorst Topicus Education APACHE WICKET

Semantics and Pragmatics of NLP Segmented Discourse Representation Theory Alex Lascarides School

Logical Foundations 2 (A refresher) COMP34512 Sebastian Brandt brandt@cs.manchester.ac.uk

Crystal Castles Franz Lanzinger 30 years of Bentley Bear, Trees and Bees What is Crystal

Relax Into Enlightenment W E L C O M E S H A U M B R A T H E T R A N S H U M A N S E R I E S

Green Remediation: Reducing the Environmental Footprint of Cleanups Carlos Pachon Douglas