H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in - PowerPoint PPT Presentation

H-Mine: Hyper-Structure Paper’s goals Mining of Frequent Patterns in Large Databases ■ Introduce a new data structure: H-struct ■ Introduce a new mining algorithm: H-mine J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang ■ Introduce a new data mining methodology: Int. Conf. on Data Mining (ICDM'01), San Jose, CA space-preserving mining Presented by Leonid Mocofan 1 2 Why a new algorithm ? H-mine characteristics Two current algorithm categories: ■ ■ It has limited and precisely predictable – Candidate generation-and-test approach: space overhead. • E.g., Apriori algorithm – Pattern growth methods: ■ It can scale up to very large databases • E.g., FP-growth, TreeProjection by using database partitioning They have performance bottlenecks: ■ – Huge space required for mining ■ When the data sets are dense, it can – Real databases contain all the cases switch to use FP-trees to continue the – Large applications need more scalability mining process 3 4

Frequent pattern mining Frequent pattern mining introduction definitions ■ set of items: I = {x 1 ,…,x n } Frequent pattern: For a transaction database TDB and a support threshold min_sup , X is a ■ itemset X: subset of items (X ⊆ I) frequent pattern if and only if sup(X) ≥ min_sup ■ transaction: T=(tid, X) ■ transaction database: TBD Frequent pattern mining: Finding the complete set of frequent patterns in a given ■ support(X): number of transactions in transaction database with respect to a given TDB containing X support threshold. 5 6 H-mine algorithm H-mine(Mem) – Example minimum support threshold is 2 Trans Items Frequent-item H-mine(Mem) – memory based, ID projection 1. 100 c,d,e,f,g,i c,d,e,g efficient pattern-growth algorithm 200 a,c,d,e,m a,c,d,e Header a c d e g 300 a,b,d,e,g,k a,d,e,g H-mine based on H-mine(Mem) for Table H 3 3 4 3 2 2. 400 a,c,d,h a,c,d large databases by first partitioning the 100 c d e g database F-list : a-c-d-e-g frequent 200 a c d E projections For dense data sets, H-mine is 3. 300 a d e g integrated with FP-growth dynamically a c d 400 H-struct 7 8

H-mine(Mem) – Example H-mine(Mem) – Example Header Header Header H eader a c d e g Table H Table H a Table H ac H eader c d e g Table H 3 3 4 3 2 Table H a 2 3 2 1 a c d e g c d e g d e 3 3 4 3 2 2 3 2 1 2 1 100 c d e g frequent 100 c d e g 200 a c d g projections frequent 200 a c d g a d e g 300 projections 300 a d e g a c d 400 a c d 400 H eader table H a and ac -queue Header table H ac 9 10 H-mine(Mem) – Example H-mine(Mem) – Example Header a c d e g H eader a c d e g c d e g Table H 3 3 4 3 2 Table H 3 3 4 3 2 H eader 2 3 2 1 Table H 100 c d e g 100 c d e g frequent frequent 200 a c d e 200 a c d g projections projections 300 a d e g a d e g 300 a c d 400 a c d 400 Adjusted hyper-links after mining H eader table H a and ad -queue a- projected database 11 12

H-mine: Mining large databases H-mine: Mining large databases ■ Apply H-mine(Mem) to TDB i with minimum ■ TDB transaction database (size n ) support threshold  min_sup ∗ n i /n  ■ Minimum support threshold min_sup ■ Find L, the set of frequent items ■ Combine F i , set of locally frequent pattern in TDB i , to get the globally frequent patterns. ■ TDB partitioned in k parts (TDB i , 1 ≤ i ≤ k ) 13 14 H-mine – Example Performance ■ H-mine has better runtime performance ■ TDB split in P 1 ,P 2 ,P 3 ,P 4 on both sparse and dense data than ■ Minimum support threshold 100 FP-growth and Apriori Local freq. pat. Partitions Accumulated sup.cnt ■ H-mine has better space usage on both ab P 1 ,P 2 ,P 3 ,P 4 280 sparse and dense data than FP-growth ac P 1 ,P 2 ,P 3 ,P 4 320 ad P 1 ,P 2 ,P 3 ,P 4 260 and Apriori abc P 1 ,P 3 ,P 4 120 ■ H-mine performs well with very large abcd P 1 ,P 4 40 … … … databases too ■ Frequent patterns: ab, ac, ad, abc 15 16

Conclusions Bibliography H-mine: ■ “H-Mine: Hyper-Structure Mining of Frequent ■ has high performance Patterns in Large Databases”, J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, Int. Conf. on Data ■ is scalable in all kinds of data Mining (ICDM'01), San Jose, CA, Nov. 2001. ■ has very small space overhead ■ “Mining Frequent Patterns without Candidate Generation”, J. Han, J. Pei, and Y. Yin, ACM- ■ can dynamically adapt to input data SIGMOD 2000, Dallas, TX, May 2000. ■ introduces structure- and space- ■ “Data Mining: Concepts and Techniques”, Jiawei Han and Micheline Kamber, The Morgan Kaufmann Pub., preserving mining methodology 2001. 17 18

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in - PowerPoint PPT Presentation

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in Large Databases Introduce a new data structure: H-struct Introduce a new mining algorithm: H-mine J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

From HyPer to Hyper Integrating an academic DBMS into a leading analytics and business

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Frequent Subgraph Mining Frequent Subgraph Mining (FSM) Outline FSM Preliminaries FSM

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Statistics and Data Analysis Logistic Regression & Frequent Pattern Mining Ling-Chieh Kung

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist

Vembu extends support to Vembu extends support to Vembu v4.0 Hyper-V Cluster with v4.0 Agenda

Hyper-Resolution AUTOMATED REASONING Hyper-resolution generalises ``bottom- (electron) up

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Finding All Implied FD's Motiv atio n: Supp ose w e ha v e a relation AB C D

Juma, N. 1999. The pedosphere and its dynamics. Salmon Productions, Edmonton, Alberta, Canada

Aggregate Programming Part 2: Resilient Programs Jacob Beal

Revisiting Approximate Polynomial Common Divisor Problem and Noisy Multipolynomial Reconstruction

Intro dution A full NLLx example: Mueller-Navelet jets Pratial implementation of the

Non locally modular reducts of ACF Dmitry Sustretov Hebrew University Neostability theory,

trt t t srs ss

Sambuz

Useful Links

Newsletter

Mail Us

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in - PowerPoint PPT Presentation

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in Large Databases Introduce a new data structure: H-struct Introduce a new mining algorithm: H-mine J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Hyper: Make VM Runs Like Container Xu Wang &lt;xu@hyper.sh&gt; Hyper HQ Agenda Lesson

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

From HyPer to Hyper Integrating an academic DBMS into a leading analytics and business

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Frequent Subgraph Mining Frequent Subgraph Mining (FSM) Outline FSM Preliminaries FSM

Frequent Pattern Mining Overview Basic Concepts and Challenges Data Mining Techniques:

Statistics and Data Analysis Logistic Regression &amp; Frequent Pattern Mining Ling-Chieh Kung

The shortcomings of the frequent pattern mining CLOSET:An Efficient Algorithm There may exist

Vembu extends support to Vembu extends support to Vembu v4.0 Hyper-V Cluster with v4.0 Agenda

Hyper-Resolution AUTOMATED REASONING Hyper-resolution generalises ``bottom- (electron) up

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Finding All Implied FD's Motiv atio n: Supp ose w e ha v e a relation AB C D

Juma, N. 1999. The pedosphere and its dynamics. Salmon Productions, Edmonton, Alberta, Canada

Aggregate Programming Part 2: Resilient Programs Jacob Beal

Revisiting Approximate Polynomial Common Divisor Problem and Noisy Multipolynomial Reconstruction

Intro dution A full NLLx example: Mueller-Navelet jets Pratial implementation of the

Non locally modular reducts of ACF Dmitry Sustretov Hebrew University Neostability theory,

trt t t srs ss

Sambuz

Useful Links

Newsletter

Mail Us

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson

Statistics and Data Analysis Logistic Regression & Frequent Pattern Mining Ling-Chieh Kung