Outline Discovering Interesting Patterns Through User’s Interactive Feedback Well begun is half done. Aristotle • Introduction and Background Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han • The Algorithm Presented by: • Examples Jeff Boisvert • Conclusions/Future Work April 11, 2007 • Critique of Paper This paper was presented at KDD ‘06 1 Introduction and Background Introduction and Background • Motivation • SVM – discover ‘interesting’ patterns in data – I think we have been presented with this enough – Subjective ‘interestingness’ � user • Clustering – Often too many patterns to assess manually – K-clusters - Minimize the maximum distance of each pattern to the nearest sample in a cluster • Distance measure – Jaccard distance (between two patterns) www.amazon.com • Setting ∩ T P ( ) T P ( ) = − 1 2 D P P ( , ) 1 – assume an available set of candidate patterns (freq item sets, etc) 1 2 ∪ T P ( ) T P ( ) – Have user rank a subset of the candidate patterns 1 2 – Learn from the users ranking – Have user rank more patterns • Ranking – Learn – Linear - i.e. 2 < 3 (difference in ranking would be 3-2 = 1) – … – Log-Linear - i.e. log(2) < log(3) (difference in ranking would be 0.176) 2 3
Outline The Algorithm • Overview Cluster N patterns in k clusters An algorithm must be seen to be believed. User ranks k patterns 1. Prune candidate patterns and micro-clustering Donald Knuth Refine model Re-rank all N patterns 2. Cluster N patterns into k clusters N=aN 3. Present k patterns to user for ranking • Introduction and Background 4. Refine the model with new user rankings • The Algorithm 5. Re-rank all N patterns with new model 6. Reduce N=a*N • Examples 7. Go to step 2 • Conclusions/Future Work • Areas to discuss • Critique of Paper – (1) Preprocessing – pruning and micro-clustering – Clustering – see introduction – (2) Selecting the k patterns to present to the user – (3) Modeling the users knowledge/ranking *** 4 5 The Algorithm ( Preprocessing ) The Algorithm ( k patterns ) • Clustering patterns • Pruning Cluster N patterns in k clusters Cluster N patterns in k clusters User ranks k patterns – Really have N micro-clusters but … User ranks k patterns – get representative patterns from candidates Refine model Refine model Re-rank all N patterns Re-rank all N patterns – start with maximal’s • Selecting Patterns N=aN N=aN – merge candidates into maximal's – Criteria 1 – patterns presented should not be redundant Which k patterns – representative pattern = maximal Redundant patterns often rank close to each other to present to user? Redundant if same composition/frequency – discard patterns, keep micro-cluster's (maximal’s) www.johndeerelandscapes.com – Criteria 2 – helps refine model of users knowledge of interesting pattern (not uninteresting patterns) • Micro-clustering • Method [ Gonzalez, 1985. Clustering to minimize the maximum intercluster distance ] – Randomly select the first pattern – Two patterns are merged if: – Second pattern – maximum distance from first pattern D(P 1 ,P 2 ) < epsilon – Third pattern – max distance to the nearest of the first and second patterns – … – D is the Jaccard distance Zaiane, COMPUT 695 notes – Epsilon provided by the user (i.e. 0.1) 6 7
The Algorithm ( refine model 1 ) The Algorithm ( refine model 2 ) *** main contribution of the paper • Log-Linear Model Cluster N patterns in k clusters Cluster N patterns in k clusters User ranks k patterns User ranks k patterns – How to model the users knowledge? – Say we have a pattern (P) in a data set of s items, f e (P) is: Refine model Refine model s – So far we have only ranked k out of N patterns… Re-rank all N patterns + ∑ Re-rank all N patterns = log f P ( ) u u N=aN N=aN e j = • Interestingness j 1 – Recall ordering of patterns by user as a constraint: – Difference between observed frequency and expected frequency f o (P) and f e (P) − > − log f ( ) P log f P ( ) log f ( P ) log f P ( ) – Observed from input o 1 e 1 o 2 e 2 – Expected calculated from the model of the users knowledge – Define a weight vector and new representation of the constraint above : f e (P) = M(P, θ ) – If f o (P) and f e (P) are different the pattern is interesting = − − − = w [ c , u , u ,..., u ] v P ( ) [log f ( ), P x ,..., x ] 0 1 s o 1 s • Ranking Will have k – if the user ranks P i as more interesting than P j : constraints R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] T > T w v P ( ) w v P ( ) – Log-linear model R [ f o (P),f e (P) ] = log f o (P) - log f e (P) 1 2 8 9 – This is a constraint on the model optimization R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] The Algorithm ( Re-rank all N patters ) The Algorithm ( Reduce N ) • Log-Linear Model ( cont.) • Reduce number of patterns = − − − = Cluster N patterns in k clusters Cluster N patterns in k clusters w [ c , u , u ,..., u ] v P ( ) [log f ( ), P x ,..., x ] User ranks k patterns User ranks k patterns 0 1 s o 1 s – Discard some patterns Refine model Refine model T > T N=aN w v P ( ) w v P ( ) Re-rank all N patterns Re-rank all N patterns 1 2 N=aN N=aN – a is specified by the user R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] – Will reduce the number of patterns to present to user at end – Stop when reached the max number of iterations also specified by the user END OF ALGORITHM ☺ Modified from • Biased belief model www.nasa.com – Not presented – Identical formulation to log-linear but assign a users belief probability to each transaction 1 = SVM Black Box v P ( ) [ ( ),..., x P x ( )] P = w [ p ,..., p ] 1 m f ( ) P 1 m o – Can now rank ALL N patterns with interesting measure: > T T m = number of transactions w v P ( ) w v P ( ) 10 11 1 2 x k (P) = 1 if the transaction k contains P R [ f (P),f (P) ] = K[v(P ),w ]
The Algorithm Outline Few things are harder to put up with than • Overview the annoyance of a good example. Mark Twain 1. Pre-process - prune / micro-clustering 2. Cluster N patterns into k clusters, present to user 3. Refine the model with new user rankings, re-rank patterns • Introduction and Background 4. Reduce N=a*N • The Algorithm 5. Stop when reached max number of iterations • Example • Input parameters • Conclusions/Future Work – a = shrinking ratio – k = number of user feedback patters • Critique of Paper – niter =number of iterations to consider (will control number of patterns in output) – Epsilon – micro-clustering parameter – Model type – log-linear vs. biased belief – Ranking type – linear vs. log 12 13 Example 1 Example - 2 Transactions Get microclusters Pick a pattern Pick a pattern Pick pattern • Their results on item sets: (35) (19) #1 #2 # k – Use data to simulate a persons prior knowledge – Partition data into 2 subsets, one background one for observed data If k = 2 present: 1 2 3 distance 0 2 2 3 7 1 6 8 0 8 1 – Background = users prior 0 5 3 4 1 1 7 1 3 6 4 5 4 2 6 4 2 0 5 0.5 – Accuracy measured by 4 3 6 3 4 2 5 5 0 0 8 1 0 8 1 3 6 4 1 to the user 4 1 0 0 1 4 4 3 2 0 6 2 ∩ 7 3 7 3 4 2 1 background learned for ranking top ( ) k top ( ) k 8 1 6 7 3 7 = Accuracy 5 1 2 0 8 1 0 8 6 7 6 5 7 Refine Log-linear 8 4 7 k 0 1 4 0.333 2 5 7 8 5 7 2 6 0 Model 8 2 7 0 6 2 0.667 1 8 1 5 6 7 1 6 4 With new f e use SVM 8 1 6 0.333 5 7 2 1 4 7 – Data set: 2 4 6 to rank all 19 1 5 2 4 2 4 8 6 7 0.667 6 4 2 49,046 transactions 2 2 6 transactions 4 5 2 6 5 7 8 4 7 0.667 7 7 8 4 5 2 2,113 items 7 1 4 8 5 7 0.667 5 2 7 Reduce N 4 2 2 8 7 4 8 2 7 0.667 average length of 74 8 0 1 Sort transactions by 6 8 7 1 6 4 0.667 – First 1000 transactions are observed set 1 6 4 rank, take the top 7 5 8 1 4 7 0.667 2 8 7 – 8,234 closed frequent item sets 7 2 5 1 5 2 0.667 aN, say a= 0.1, take 1 5 2 – Micro-clustering reduces to 769 7 5 7 6 4 2 1 the top 17 (19*0.9) 4 7 7 6 5 7 6 5 7 1 – Compare top k ranked patterns 14 15 4 5 2 1
Example - 3 Example - 4 • Their results on sequences: • Their results compared to other algorithms: – 1609 sentences – Same data as example 3 (1609 sentences) – 967 closed sequential patterns – They claim theirs is better… – Full feedback: use k = 967 Selective Sampling Yu, KDD ‘05. Top-N Shen and Zhai, SIGIR ‘05 16 17 Outline Conclusions • Conclusions "I would never die for my beliefs because I might be wrong.” – Interactive with user Bertrand Russell – Tries to learn the users knowledge – Flexible (but flexible = many parameters) • Introduction and Background – Does not work well with sparse data • The Algorithm • Proposed future work • Examples – Study different models for sparse data • Conclusions/Future Work – Better feedback strategies to maximize learning – Apply to other data types/sets • Critique of Paper 18 19
Recommend
More recommend