Content � Introduction Mining Free Itemsets under � Constrained itemset mining Constraints � Apriori revisit � Anti-monotone constrains � Monotone constrains By Jean-Francois Boulicaut and Baptiste Jeudy � Generic algorithm International Database Engineering and � Frequent closed itemset mining Application Symposium � CLOSE algorithm � Incorporating constraints into Apriori � Conclusion Introduction Introduction � Frequent itemset mining � Problems with frequent itemset mining algorithms � A set of items is referred to as itemset � The computation may be intractable for a user- # X � X is an item(or itemset), = Support X ( ) n given frequency threshold: the number of � Support is bounded by a threshold r frequent itemsets may explode � A frequent itemset is an itemset with a support � Lack of focus leads to huge output of frequent larger than the minimum support itemsets � Given a database, find all the frequent itemsets
Introduction Introduction � Two issues to tackle these problems � Constraint-based extraction of frequent itemsets � Syntactic constraints � Constraint-based extraction of the frequent itemsets: only a subset of the collection of � an item must not appear in the itemsets frequent itemsets is interesting. � Constraints related to objective measures of interestingness � Condensed representation of frequent itemsets: extract a subset of the frequent patterns and � the itemsets must be frequent regenerate the whole collection when necessary • Decrease the � Push constraint checking into algorithms size of output � Anti-monotone constraints • Improve user guidance � Monotone constraints Introduction Introduction � Condensed representation of frequent � Main idea of the paper itemsets � Combine the above two approaches into one algorithm � Extract a particular subset of the frequent itemset � This algorithm is based on the structure of collection Apriori � The condensed subset is much smaller than the original collection � Can be extracted efficiently � The whole frequent itemsets can be regenerated
Content Summary of paper � Introduction � Definition of constraints � Constrained itemset mining T : transactional database � � Apriori revisit 2 Items : set of all itemsets � � Anti-monotone constrains C : constraint � � Monotone constrains S 2 Items S ∈ : itemset, � � Generic algorithm 2 Items I : subset of � � Frequent closed itemset mining iff C S ture � S satisfies C in T ( , T ) = � CLOSE algorithm ∈ S I S C SAT (I)={ , satisfies } � C � Incorporating constraints into Apriori denotes Items SAT SAT (2 ) � C C � Conclusion Summary of paper Summary of paper TID Items � Constrained itemset mining Itemset Support Frequency 1 ABCD A 1,2,3,4,6 0.83 T : transactional database � 2 AC B 1,4,5,6 0.67 3 AC : constraint C AB 1,4,5 0.5 � 4 ABCD AC 1,2,3,4,6 0.83 � Computation of the collection of itemsets that 5 BC CD 1,4 0.33 satisfy together with their frequecies C 6 ABC ACD 1,4 0.33 = ∈ R {( , S F S ( )), S SAT } � C C ≡ ≥ r C ( ) S F S ( ) r : an itemset must be at least frequent. � Use Apriori for constrained itemset mining freq r = SAT { , , A B C AC BC , , } = 0,6 C where is freq C C ≡ ≤ ≡ ∉ C ( ) | S S | 2 C ( ) S B S , then and size miss freq = = SAT { , A C D AC AD CD , , , , } SAT { , A C AC , } , Λ Λ Λ C C C C C size miss size miss freq
Summary of paper - Apriori Anti-monotone constraints The completeness of Apriori relies on the � Definition: an anti-monotone constraint is a � Apriori Algorithm anti-monotonicity of Phase 1 – Candidate safe the constraint pruning constraint C such that for all itemsets S, S’: = = Φ g � 1. C : Items L ; { } Eliminate candidates for 1 1 0 k = : 1 which a subset of length k ⊆ Λ ⇒ � 2. ( ' S S S C ) S ' C satisfy satisfy is not frequent Phase 2- frequency Phase 3 – candidate � 3 .while do C ≠ Φ g � If S does not satisfy , every superset of S constraint generation for level k+1, C k (database scan) am fuse two elements that = � 4. C : safe-pruning-on( ) g C , L − does not satisfy share the same k-1 first k k k 1 C am items = � 5 . L : SAT ( C ) � Example: k C k ≤ sum S price ( , ) v freq � 6. g + = C 1 : generate ( L ) k apriori k = ∪ � A disjunction or conjunction of anti-monotone generate ( L ) { A B , where apriori k = + � 7. k : k 1 ∈ A B , L A and B share the k-1 first k , constraints is an anti-monotone constraint items(in lexicographic order)} − U k 1 L � 8 . i = 0 i Anti-monotone constraints Monotone Constraints � Apriori can be changed: � Definition � Let be an anti-monotone constraint. Step 5 of is true is true ∈ ⇒ ∀ ⊃ C S Items C , ( ) S S ' S C , ( ') S � am m m What about Apriori is replaced by = L : SAT ( C ) � Example monotone k C k am constraints? it is still correct and complete. ≥ sum S price ( , ) v � � Apriori can be used to mine constrained � Given a monotone constraint , simply C m itemsets when the given constraint is anti- replacing Step 5 in Apriori with = L : SAT ( C ) k C k monotone m leads to the loss of the completeness of Apriori.
The generation step Monotone Constraints Monotone Constraints in Apriori must be complete: i.e., it must not miss any itemset satisfying C � Example � Some definition in modified generation The pruning step procedure � Assume Itemset ABC should be ≡ ∈ C S ( ) C S . (Phase 1) must be correct, i.e., it must generated by from AB and AC but generate � Negative border: If denotes an anti-monotone C apriori not prune an itemset am since ACB is not generated whereas = constraint, is the collection of the minimal Bd C AB ( ) false , that verify C C am = C ABC ( ) true itemsets that do not satisfy C am denotes a monotone constraint, it is the � Assume Itemset ABC is correctly ≡ ∈ C C S ( ) A S . � m negation of , so equals to C ¬ C generated by from AB and AC but C ' am generate am m apriori since ACB is incorrectly pruned = C AB ( ) false , whereas = C ABC ( ) true The generation step and pruning step need to be modified in order to include monotone constraints Monotone Constraints Monotone Constraints � Generation procedure � Pruning procedure prune m = ∈ } � generate L 1 ( ) { A B where , A L and B is a 1-itemset � For all and for all such that | S’|=k ⊂ U S ' S k ∈ g k S C + • C denotes an anti- k 1 am generate L 2 ( ) = { A B where , � U A,B ∈ L } monotone constraint k k do if and ∉ = S ' L C ( ') S true ¬ C ' am • denotes a Assume and = Λ¬ k m C C C ' = � ms Max | S | am am S Bd ∈ C am ' monotone constraint then delete S from C + g generate � m Bd • denotes the k 1 C generate ( L ) = Bd ∩ Items am We do not need to collection of the m 0 C ' 1 is correct and complete prune am � , verify the monotone minimal itemsets that do m For k ≥ 1 constraint after this not satisfy C � If k<ms, generate ( L ) = generate L ( ) ∪ ( Bd ∩ Items + ) m k 1 k C ' k 1 m am generation = The algorithm is correct because it does not prune any itemset that verify � If k=ms, generate ( L ) generate L ( ) m k 1 k procedure = Λ¬ .Its completeness means that if an itemset is not pruned then � If k>ms, C C C ' = generate ( L ) generate L ( ) am am m k 2 k every proper subset of that itemset verify . C � This generation procedure is complete and ensures that am every candidate itemset verifies ( ) ¬ C ' am C m
Recommend
More recommend