Reductions for Frequency- Based Data Mining Problems Stefan Neumann - PowerPoint PPT Presentation

Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen

Maximal Frequent Patterns • A pattern is a subset of the data entities • itemset, subgraph, subsequence, … • A pattern is frequent if it appears su ffi ciently often in the data • A frequent pattern is maximal if it is not contained in any other frequent pattern • Studied since 1990s

Computational Complexity • Comp. complexity of maximal pattern mining surprisingly unknown • Potentially exponentially many max. patterns   ⇒ takes exponential time • More fine-grained answers: • Time w.r.t. input and output   (enumeration complexity, Johnson et al. 1988) • Time spent to count the number of maximal patterns   (counting complexity, Valiant 1979)

Reductions • A can be reduced to B if we can solve A e ff ectively with an algorithm to solve B • ” B is at least as hard as A” • In this talk : maximality-preserving reductions between frequent pattern mining problems • ”Maximum X mining is at least as hard as maximum Y mining”

State of the Art Sequences with   Undir. graphs   no repetition Directed cyclic graphs with treewidth ≤ 3 MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) Undir. graphs   with degree ≤ 3 MaxFS( BDG 3 ) MaxFS( T ) Undir. trees MaxFS( PLN ) MaxFS( DirG ) MaxFIS Planar undir. graphs Directed graphs MaxFS( G ) Itemsets Uniquely labelled   undirected graphs A → B = A can be reduced to B

Maximality-Preserving Reductions MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) MaxFS( BDG 3 ) MaxFS( T ) MaxFS( PLN ) MaxFS( DirG ) MaxFIS MaxFS( G ) These reductions preserve enumeration and counting complexity A → B = A can be reduced to B

Impressed? • Why no more reductions? • Example: From MaxFS( G ) to MaxFIS • Each edge { u , v } has a unique label ( l ( u ), l ( v )) • Make the edges as items and graphs as transactions • Mine maximal frequent itemsets • This doesn’t (quite) work!

What’s Wrong? tid A–B A–D B–C B–D C–D A B C D 1 1 0 1 0 1 A D C B 2 0 1 1 0 1 3 1 0 0 1 1 A B D C Frequent itemsets (minfreq 2/3): Not connected! (3) (2) (2) C D A B A B C D (2) (2) B C B C D

Feasible Patterns • T o be able to encode the connectedness, we need to constrain the feasible patterns • We can adjust our reductions to work with these constraints. E.g.: • maximal graph patterns must map to maximal feasible itemsets, and • it must be easy to compute the graph patterns from the feasible maximum itemsets • These constraints are transitive

Maximality-Preserving Reductions for Feasible Patterns MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) The complexity collapses under these reductions! MaxFS( BDG 3 ) MaxFS( T ) MaxFS( PLN ) MaxFS( DirG ) MaxFIS MaxFS( G ) A → B = A can be reduced to B

Maximality-Preserving Reductions for Feasible Patterns The complexity collapses under these reductions! MaxFS( BTW 3 ) MaxSQS MaxFS( T ) MaxFS( DAG ) MaxFS( BDG 3 ) MaxFS( DirG ) MaxFS( PLN ) MaxFIS MaxFS( G ) A → B = A can be reduced to B

Summary • For all feasible pattern versions of the problems: • Enumerating all feasible patterns is #P-hard • Given a set of feasible patterns, deciding whether there is any more feasible patterns is NP-hard • Even if only two patterns are given • For any fixed minfreq threshold τ , the enumeration can be done in polynomial time

Conclusions • Most maximal pattern mining problems are essentially equally hard • Methods for one type of problem can be used to solve other types, as well • Feasible patterns admit usually constraints that are amenable to standard level-wise algorithms • Notable exceptions: MaxFS on general graphs and sequences with repetitions • Subgraph isomorphism is NP-hard Ti an k Yov !

Reductions for Frequency- Based Data Mining Problems Stefan Neumann - PowerPoint PPT Presentation

Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen Maximal Frequent Patterns A pattern is a subset of the data entities itemset, subgraph, subsequence, A pattern is frequent if it appears su

CS 301 Lecture 20 Reductions Stephen Checkoway April 9, 2018 1 / 17 Reductions Reductions

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Recommended Round 2 March Budget Reductions GENERAL FUND SUMMARY TOTAL REDUCTIONS ROUNDS

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Introduction to Data Science: Common observation to be religion, income, frequency where sex and

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

(BIS) Measurement System for Wearable Devices Bassem Ibrahim , Drew A. Hall , Roozbeh Jafari

Frequency Lists Jeremiah Blocki Anupam Datta Joseph Bonneau MSR/Purdue CMU Stanford/EFF Or,

Counter/Timers Overview ATmega328P has two 8-bit and one 16-bit counter/timers. Unit C

No Time to Countdown: Backing Off in Frequency Domain Souvik Sen , Romit Roy Choudhury, Srihari

High-frequency instabilities of small-amplitude solutions of Hamiltonian PDEs Bernard Deconinck

Power Integrity of SiP (System In Package) Power Integrity of SiP (System In Package) July 21

JUST THE MATHS SLIDES NUMBER 18.1 STATISTICS 1 (The presentation of data) by A.J.Hobson

TPM-Fail TPM meets Timing and Lattice Attacks Daniel Moghimi Berk Sunar Thomas Eisenbarth

Reductions for Frequency- Based Data Mining Problems Stefan Neumann - PowerPoint PPT Presentation

Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen Maximal Frequent Patterns A pattern is a subset of the data entities itemset, subgraph, subsequence, A pattern is frequent if it appears su

CS 301 Lecture 20 Reductions Stephen Checkoway April 9, 2018 1 / 17 Reductions Reductions

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Recommended Round 2 March Budget Reductions GENERAL FUND SUMMARY TOTAL REDUCTIONS ROUNDS

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Introduction to Data Science: Common observation to be religion, income, frequency where sex and

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

(BIS) Measurement System for Wearable Devices Bassem Ibrahim *, Drew A. Hall , Roozbeh Jafari*

Frequency Lists Jeremiah Blocki Anupam Datta Joseph Bonneau MSR/Purdue CMU Stanford/EFF Or,

Counter/Timers Overview ATmega328P has two 8-bit and one 16-bit counter/timers. Unit C

No Time to Countdown: Backing Off in Frequency Domain Souvik Sen , Romit Roy Choudhury, Srihari

High-frequency instabilities of small-amplitude solutions of Hamiltonian PDEs Bernard Deconinck

Power Integrity of SiP (System In Package) Power Integrity of SiP (System In Package) July 21

JUST THE MATHS SLIDES NUMBER 18.1 STATISTICS 1 (The presentation of data) by A.J.Hobson

TPM-Fail TPM meets Timing and Lattice Attacks Daniel Moghimi Berk Sunar Thomas Eisenbarth

(BIS) Measurement System for Wearable Devices Bassem Ibrahim , Drew A. Hall , Roozbeh Jafari