Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A Pattern-Growth View Pattern-Growth View � Frequent Pattern Mining by Jian Pei and Jiawei Han � Constraints presentation by Rafal Rak CMPUT 695 presentation � Pattern-Growth Approach November 16, 2004 Rafal Rak, CMPUT695 presentation 2 Outline Pushing Constraints � Background Constraints � Categories of Constraints Interesting Rules � Pattern-growth Method Pattern Mining DB � Constrained Frequent Pattern Mining Rules � Constrained Sequential Pattern Mining Drawbacks � Conclusion � Pattern Mining: inefficient � Rules: ineffective November 16, 2004 Rafal Rak, CMPUT695 presentation 3 November 16, 2004 Rafal Rak, CMPUT695 presentation 4
Pushing Constraints (2) Apriori Approach Constraints Apriori anti-monotone property: TID Items 10 a,b,c,d,f if a pattern is not frequent, its super- Interesting 20 b,c,d,f,g,h Rules pattern can never be frequent 30 a,c,d,e,f Pattern Mining DB 40 c,e,f,g Item Value min(S)>15 max(S)>35 a 40 b 0 min(df)<15 max(df)<35 � Efficient and effective c -20 d 10 => min(adf)<15 but max(adf)>35 � Feasible? e -30 f 30 g 20 anti-monotone not anti-monotone h -10 November 16, 2004 Rafal Rak, CMPUT695 presentation 5 November 16, 2004 Rafal Rak, CMPUT695 presentation 6 Categories of Constraints Categories of Constraints (2) � Application point of view � Properties point of view � Item constraint � Anti-monotone e.g. dairy products in a grocery store e.g. min(S)>v � Length constraint � Monotone e.g. at least 5 keywords in documents e.g. max(S)>v � Model-based constraint � Succinct e.g. travel agency: after visiting Washington and NYC, what’s next? � Convertible Constraints � Aggregate constraint e.g. avg(price of items) > $100 November 16, 2004 Rafal Rak, CMPUT695 presentation 7 November 16, 2004 Rafal Rak, CMPUT695 presentation 8
Anti-monotone Constraint Monotone Constraint When an itemset violates the When an itemset satisfies the TID Items TID Items constraint, so does any of its constraint, so does any of its 10 a,b,c,d,f 10 a,b,c,d,f superset. superset. 20 b,c,d,f,g,h 20 b,c,d,f,g,h 30 a,c,d,e,f 30 a,c,d,e,f 40 c,e,f,g 40 c,e,f,g min(S)>15 max(S)>35 Item Value Item Value a 40 a 40 min( df )<15 => min( adf )<15 max( af )>35 => max( adf )>35 b 0 b 0 c -20 c -20 d 10 d 10 min(S)<15 max(S)<35 e -30 e -30 f 30 f 30 min( af )>15, but min( adf )<15 max( df )<35, but max( adf )>35 g 20 g 20 h -10 h -10 November 16, 2004 Rafal Rak, CMPUT695 presentation 9 November 16, 2004 Rafal Rak, CMPUT695 presentation 10 Succinct Constraint Convertible Anti-monotone Constraint If it’s possible to explicitly and A constraint is convertible anti-monotone if there is an order on items such that precisely generate all the itemsets TID Items TID Items 10 a,b,c,d,f 10 a,b,c,d,f whenever an itemset satisfies the satisfying the constraint, then the 20 b,c,d,f,g,h 20 b,c,d,f,g,h constraint, so does any of its prefix. 30 a,c,d,e,f constraint is succinct. 30 a,c,d,e,f 40 c,e,f,g 40 c,e,f,g avg(S)>25 Item Value Item Value Item Value max(S)>15 a 40 a 40 a 40 avg( afg )>25 b 0 b 0 f 30 itemsets containing: a, f, g c -20 c -20 g 20 order => avg( af )>25, avg( a )>25, avg( f )>25 d 10 d 10 d 10 e -30 e -30 b 0 avg(S)<10 avg(S)<32 f 30 f 30 h -10 g 20 g 20 c -20 avg( afg )<32, but avg( af )>32 h -10 h -10 e -30 November 16, 2004 Rafal Rak, CMPUT695 presentation 11 November 16, 2004 Rafal Rak, CMPUT695 presentation 12
Convertible Monotone Constraint Classification of Constraints A constraint is convertible monotone if there is an order on items such that TID Items 10 a,b,c,d,f whenever an itemset violates the convertible convertible 20 b,c,d,f,g,h constraint, so does any of its prefix. anti-monotone monotone 30 a,c,d,e,f strongly 40 c,e,f,g convertible avg(S)>0 Item Value Item Value succinct a 40 e -30 avg( ech )<0 b 0 c -20 c -20 h -10 order => avg( ec )<0, avg( e )<0, avg( c )<0 d 10 b 0 anti- monotone e -30 d 10 monotone avg(S)<-22 f 30 g 20 g 20 f 30 avg( ech )>-22, but avg( ec )<-22 h -10 a 40 November 16, 2004 Rafal Rak, CMPUT695 presentation 13 November 16, 2004 Rafal Rak, CMPUT695 presentation 14 Outline Pattern-growth Method Transaction DB � Background 10 a b c d f 20 b c d f g h min_sup=2 30 a c d e f � Categories of Constraints 40 c e f g freq. items: a,b,c,d,e,f,g TID Items � Pattern-growth Method 10 a,b,c,d,f 20 b,c,d,f,g,h a -projected DB b -projected DB c -projected DB 30 a,c,d,e,f 10 b c d f � Constrained Frequent Pattern Mining 30 c d e f 40 c,e,f,g freq. items: c,d,f � Constrained Sequential Pattern Mining ac -projected DB ad -projected DB af -projected DB � Conclusion final patterns: a,b,c,d,e,f,g, ac,ad,af,bc,bd,bf,cd,ce,cf,cg,df,ef,fg, acd,acdf,adf,bcd,bcf,cdf,cef,cfg,bdf, bcdf November 16, 2004 Rafal Rak, CMPUT695 presentation 15 November 16, 2004 Rafal Rak, CMPUT695 presentation 16
Constrained Frequent Pattern Mining Constrained Frequent Pattern Mining Transaction DB Transaction DB avg(S) ≥ 25 avg(S) ≥ 25 10 a f d b c 20 f g d b h c min_sup=2 min_sup=2 30 a f d c e a -projected DB f -projected DB 40 f g c e freq. items: a,f,g,d,b,c,e 10 d b c 10 f d b c TID Items TID Items C( a )=true 30 f d c e 20 g d b c 10 a,b,c,d,f 10 a,b,c,d,f freq. items: f,d,c 30 d c e C( f )=true 20 b,c,d,f,g,h 20 b,c,d,f,g,h 40 g c e C( g )=false C( af )=true 30 a,c,d,e,f 30 a,c,d,e,f freq. items: g,d,b,c,e C( ad )=true 40 c,e,f,g 40 c,e,f,g C( fg )=true C( ac )=false C( fd )=false a -projected DB f -projected DB Item Value Item Value 10 f d b c 10 d b c a 40 a 40 20 g d b c 30 f d c e af -projected DB ad -projected DB fg -projected DB f 30 f 30 freq. items: f,d,c 30 d c e 10 d c 10 c 20 d b c g 20 g 20 40 g c e C( af )=true 30 d c 30 c 40 c e freq. items: g,d,b,c,e d 10 C( ad )=true d 10 freq. items: d,c freq. items: c freq. items: c C( ac )=false C( fg )=true b 0 b 0 C( afd )=true C( adc )=false C( fgc )=false C( fd )=false h -10 h -10 C( afc )=false c -20 c -20 af -projected DB ad -projected DB fg -projected DB final patterns: a, f, af, ad, fg, afd e -30 e -30 November 16, 2004 Rafal Rak, CMPUT695 presentation 17 November 16, 2004 Rafal Rak, CMPUT695 presentation 18 Outline Sequential Pattern Mining � Background 5-sequence Customer Transaction Items ID Time Bought length = number of transactions � Categories of Constraints 10 Nov. 10 a Nov. 13 bc < > e (ab) (bc) d d Nov. 15 e � Pattern-growth Method 20 Oct. 30 e transactions in order Nov. 3 ab � Constrained Frequent Pattern Mining Nov. 12 bc Nov. 13 d Nov. 14 d � Constrained Sequential Pattern Mining SID Sequence 30 Oct. 30 c 10 <a(bc)e> Nov. 4 aef � Conclusion 20 <e(ab)(bc)dd> Nov. 13 abc Nov. 16 dd 30 <c(aef)(abc)dd> 40 Oct. 25 a 40 <addcb> Nov. 5 d Nov. 11 d Nov. 13 c Nov. 14 b November 16, 2004 Rafal Rak, CMPUT695 presentation 19 November 16, 2004 Rafal Rak, CMPUT695 presentation 20
Recommend
More recommend