Relevance Feedback for Association Rules by Leveraging Concepts from - PowerPoint PPT Presentation

Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval Georg Ruß, russ@iws.cs.uni-magdeburg.de Mirko Böttcher, mail@mirkoboettcher.de Prof. Dr. Rudolf Kruse, kruse@iws.cs.uni-magdeburg.de December 12th, 2007

Outline Association Rules Concepts from Information Retrieval Rule Similarity Relevance Scoring Conclusion

Motivation ◮ Large amounts of transactional data ◮ Association rule mining yields rules as a condensed representation ◮ Form: IF item 1 , item 2 , . . . , item n THEN item m ◮ Problem: still too many rules to analyze ◮ Topic: find interesting association rules

Association Rules – Formalization ◮ Set D of transactions T ∈ D . ◮ Transaction T is a subset of a set of items L . ◮ A subset X ⊆ L is called itemset . ◮ A transaction T supports an itemset X if X ⊆ T . ◮ An association rule r is an expression X → Y where X and Y are itemsets, |Y| > 0 and X ∩ Y = ∅ . ◮ X : body, Y : head ◮ Rule reliability: confidence conf ( r ) = P ( Y | X ) ◮ Statistical significance: support supp ( r ) = P ( XY ) ◮ Time series: confidence and support of one rule over time

Linking to Information Retrieval ◮ Interestingness of rules is subjective. ◮ Finding interesting rules requires user input. ◮ Manual specification of user’s knowledge ◮ key aspects are often forgotten ◮ requires expert user ◮ knowledge changes ◮ hard to specify at beginning of analysis

Information Retrieval – Relevance Feedback ◮ Automatic acquisition of user’s knowledge through actions ◮ user rates what he sees ◮ easy (binary) decision: interesting / not interesting ◮ system collects user’s choices and updates results ◮ Relevance Feedback known from Information Retrieval ◮ association rules are presented (possibly pre-ordered) ◮ user can examine and rate them ◮ an internal ranking is adapted ◮ best results are presented ◮ cycle starts over

Rule Representation – Informal ◮ Use existing algorithms for relevance feedback from IR ◮ Represent rules as vectors ◮ body head � �� r = ( r 1 , . . . , r b , r b + 1 , . . . , r b + h , r b + h + 1 , . . . , r b + h + t ) (1) � �� symbolic timeseries ◮ item weights: TF-IDF approach ◮ high weight: term frequent in rule (TF), but less frequent in rule set (IDF) ◮ filters commonly used terms, captures perceived relevance

Rule Representation – Maths ◮ term frequency � 1 if x ∈ r , tf ( x , r ) = (2) 0 otherwise . ◮ inverse document frequency idf ( x , R ) = 1 − ln | r : r ∈ R ∧ x ∈ r | (3) ln | R | ◮ A rule’s feature vector is filled as follows: ◮ r i = tf ( x i , r ) · idf ( x i , R ) , i = 1 , . . . , b (4) ◮ r b + j = tf ( x j , r ) · idf ( x j , R ) , j = 1 , . . . , h (5) ◮ r timeseries . . . respective time-variant properties of rule

Interestingness of Rules ◮ Idea: compare the same features of different rules ◮ Interestingness based on (dis-)similarity ◮ Six combinations deemed interesting: dissimilar time series symbolic body head similar head - ω 4 ω 5 - body ω 1 - ω 6 - time series - - - ω 2 symbolic - - - ω 3 Table: Interestingness Matrix

Pairwise Similarity ◮ Similarity between rules as measure of interestingness ◮ Similarity can easily be computed by similarity measures for vectors ◮ Cosine similarity: � n i = 1 r i s i sim ( � r ,� s ) = (6) � � r 2 s 2 i i ◮ Dissimilarity: dissim ( � r ,� s ) = 1 − sim ( � r ,� s ) (7)

Similarity Aggregation – first step ◮ Similarity between rule and rule set: sim rs ( � r , R ) = Ω( { sim ( � r ,� s 1 ) , . . . , sim ( � r ,� s m ) } ) (8) ◮ Dissimilarity analogously to Equation 7: dissim rs ( � r , R ) = 1 − sim rs ( � r , R ) (9)

Similarity Aggregation – second step ◮ Use OWA operator to aggregate single similarities: ◮ weighting vector W = ( w 1 , w 2 , . . . , w n ) T with w j ∈ [ 0 , 1 ] and � n j = 1 w j = 1 ◮ n � Ω( { s 1 , s 2 , . . . , s n } ) = (10) w j b j , j = 1 with b j being the j -th largest of the s i .

Relevance Scoring ◮ Score calculation for each association rule ◮ User selects rule r as interesting ◮ determine interesting combinations: ◮ rules with similar head, but different body ◮ rules with similar body, but dissimilar head ◮ six combinations (see Table 1) ◮ Calculate weighted sum of the score part in those six combinations ◮ Yields a relevance score for each association rule ◮ Sort rules by score – interesting ones assumed to have high score

Conclusion ◮ Similarity-based interestingness of association rules ◮ Incorporation of relevance feedback to find interesting rules ◮ User-specific, automatic adaptation ◮ Simple relevance scoring to assess interestingness

Relevance Feedback for Association Rules by Leveraging Concepts from - PowerPoint PPT Presentation

Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval Georg Ru, russ@iws.cs.uni-magdeburg.de Mirko Bttcher, mail@mirkoboettcher.de Prof. Dr. Rudolf Kruse, kruse@iws.cs.uni-magdeburg.de December 12th,

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Hearsay Rules Under the Federal Rules of Evidence Leveraging the Rules and Exceptions to Admit

Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback Heechan Shin CS688

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Google example query: heat in query doesnt match with thermodynamics in hospital

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Pseudo-Relevance Feedback CS6200: Information Retrieval Slides by: Jesse Anderton

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

Luo Si Department of Computer Science Purdue University Query Expansion: Outline Query

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 21 to March 4, 2011

Harmony Assumptions: Extending Probability Theory for Information Retrieval (IR) and for

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

for Finding Similar Images Cyrill Stachniss Slides have been created by Cyrill Stachniss. Most

How many ways can you slice a classifier? Exploring HPC architectures and programming models for

Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps Konstantin

Text Classification Contd + Document Representations Prof. Sameer Singh CS 295: STATISTICAL NLP

A Labeled Data Set For Flow-based Intrusion Detection Anna Sperotto, Ramin Sadre, Frank van

Relevance Feedback for Association Rules by Leveraging Concepts from - PowerPoint PPT Presentation

Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval Georg Ru, russ@iws.cs.uni-magdeburg.de Mirko Bttcher, mail@mirkoboettcher.de Prof. Dr. Rudolf Kruse, kruse@iws.cs.uni-magdeburg.de December 12th,

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Hearsay Rules Under the Federal Rules of Evidence Leveraging the Rules and Exceptions to Admit

Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback Heechan Shin CS688

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Google example query: heat in query doesnt match with thermodynamics in hospital

Relevance Feedback &amp; Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Pseudo-Relevance Feedback CS6200: Information Retrieval Slides by: Jesse Anderton

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

Luo Si Department of Computer Science Purdue University Query Expansion: Outline Query

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 21 to March 4, 2011

Harmony Assumptions: Extending Probability Theory for Information Retrieval (IR) and for

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

for Finding Similar Images Cyrill Stachniss Slides have been created by Cyrill Stachniss. Most

How many ways can you slice a classifier? Exploring HPC architectures and programming models for

Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps Konstantin

Text Classification Contd + Document Representations Prof. Sameer Singh CS 295: STATISTICAL NLP

A Labeled Data Set For Flow-based Intrusion Detection Anna Sperotto, Ramin Sadre, Frank van

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC