lash large scale sequence mining with hierarchies
play

LASH: Large-Scale Sequence Mining with Hierarchies Kaustubh Beedkar - PowerPoint PPT Presentation

LASH: Large-Scale Sequence Mining with Hierarchies Kaustubh Beedkar and Rainer Gemulla Data and Web Science Group University of Mannheim June 2 nd , 2015 SIGMOD 2015 Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 1


  1. LASH: Large-Scale Sequence Mining with Hierarchies Kaustubh Beedkar and Rainer Gemulla Data and Web Science Group University of Mannheim June 2 nd , 2015 SIGMOD 2015 Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 1

  2. Syntactic Explorer ( Verb to Verb Noun ) Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 2

  3. Sequence Mining • Goal: Discover subsequences as patterns in sequence data • Input: Collection of sequences of items, e.g., ◮ Text collection (sequence of words) ◮ Customer transactions (sequence of products) Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 3

  4. Sequence Mining • Goal: Discover subsequences as patterns in sequence data • Input: Collection of sequences of items, e.g., ◮ Text collection (sequence of words) ◮ Customer transactions (sequence of products) • Output: subsequences that ◮ occur in σ input sequences (frequency threshold) ◮ have length at most λ (length threshold) ◮ have gap γ (contiguous subsequences or non-contiguous subsequences) Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 3

  5. Sequence Mining • Goal: Discover subsequences as patterns in sequence data • Input: Collection of sequences of items, e.g., ◮ Text collection (sequence of words) ◮ Customer transactions (sequence of products) • Output: subsequences that ◮ occur in σ input sequences (frequency threshold) ◮ have length at most λ (length threshold) ◮ have gap γ (contiguous subsequences or non-contiguous subsequences) • Example: S 1 : Anna lives in Melbourne S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 3

  6. Sequence Mining • Goal: Discover subsequences as patterns in sequence data • Input: Collection of sequences of items, e.g., ◮ Text collection (sequence of words) ◮ Customer transactions (sequence of products) • Output: subsequences that ◮ occur in σ input sequences (frequency threshold) ◮ have length at most λ (length threshold) ◮ have gap γ (contiguous subsequences or non-contiguous subsequences) • Example: S 1 : Anna lives in Melbourne S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London ◮ Subsequence: lives in σ = 2 , λ = 2 , γ = 0 Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 3

  7. Hierarchies Items can be naturally arranged in a hierarchy, e.g., Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 4

  8. Hierarchies Items can be naturally arranged in a hierarchy, e.g., DET a an the Syntactic hierarchy Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 4

  9. Hierarchies Items can be naturally arranged in a hierarchy, e.g., DET PERSON CITY a an the Syntactic hierarchy . . . Scientist Politician Melbourne . . . . . . Albert Einstein Barack Obama Semantic hierarchy Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 4

  10. Hierarchies Items can be naturally arranged in a hierarchy, e.g., DET PERSON CITY a an the Syntactic hierarchy . . . Scientist Politician Melbourne . . . . . . Albert Einstein Barack Obama Semantic hierarchy Photography Tripod DSLR Camera . . . Cannon5D Nikon5100 Product hierarchy Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 4

  11. Sequence Mining with Hierarchies • Item hierarchies are specifically taken into account • Discover non-trivial patterns Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 5

  12. Sequence Mining with Hierarchies • Item hierarchies are specifically taken into account • Discover non-trivial patterns • Example S 1 : Anna lives in Melbourne S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 5

  13. Sequence Mining with Hierarchies • Item hierarchies are specifically taken into account • Discover non-trivial patterns • Example S 1 : Anna lives in Melbourne PERSON CITY S 2 : Bob lives in the city of Berlin Anna Bob Charlie Melbourne Berlin London S 3 : Charlie likes London Semantic hierarchy Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 5

  14. Sequence Mining with Hierarchies • Item hierarchies are specifically taken into account • Discover non-trivial patterns • Example S 1 : Anna lives in Melbourne PERSON CITY S 2 : Bob lives in the city of Berlin Anna Bob Charlie Melbourne Berlin London S 3 : Charlie likes London Semantic hierarchy ◮ Generalized subsequence: PERSON lives in CITY σ = 2 , λ = 4 , γ = 3 Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 5

  15. Sequence Mining with Hierarchies Applications • Linguistic patterns, e.g., ◮ read DET book ◮ NNP lives in NNP • Information extraction, e.g., ◮ PERSON lives in CITY • Market-basket analysis, e.g, ◮ buy DSLR camera → photography book → flash • Web-usage mining • . . . Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 6

  16. LASH • Distributed framework for sequence mining with hierarchies D H • Built over MapReduce for large-scale Hierarchy-aware item-based partitioning data processing . . . • Map (Partitioning) D 1 H 1 D 2 H 2 D n H n ◮ Divide data into potentially Local mining Local mining Local mining overlapping partitions . . . F 1 F 2 F n • Reduce (mining) ◮ Partitions are mined independently F • No global post-processing Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 7

  17. Outline 1 Introduction 2 Partitioning 3 Local Mining 4 Evaluation 5 Conclusion Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 8

  18. Item-based Partitioning D H Hierarchy-aware item-based partitioning a b k . . . D 1 H 1 D 2 H 2 D n H n Local mining Local mining Local mining . . . F 1 F 2 F n F b : Filter b F k : Filter k but not c,...,k F a : Filter a but not b,...,k F Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 9

  19. Item-based Partitioning • Items are ordered by decreasing frequency, e.g., a < b < c < · · · < k D H Hierarchy-aware item-based partitioning a b k . . . D 1 H 1 D 2 H 2 D n H n Local mining Local mining Local mining . . . F 1 F 2 F n F b : Filter b F k : Filter k but not c,...,k F a : Filter a but not b,...,k F Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 9

  20. Item-based Partitioning • Items are ordered by decreasing frequency, e.g., a < b < c < · · · < k D H • Create a partition for each frequent item called pivot item Hierarchy-aware item-based partitioning a b k . . . D 1 H 1 D 2 H 2 D n H n Local mining Local mining Local mining . . . F 1 F 2 F n F b : Filter b F k : Filter k but not c,...,k F a : Filter a but not b,...,k F Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 9

  21. Item-based Partitioning • Items are ordered by decreasing frequency, e.g., a < b < c < · · · < k D H • Create a partition for each frequent item called pivot item Hierarchy-aware item-based partitioning • Key idea: partition the output space a b k a < b < c < · · · < k ◮ ���� . . . D 1 H 1 D 2 H 2 D n H n F a � �� � Local mining Local mining Local mining F b � �� � F c � �� � . . . F 1 F 2 F n F k F b : Filter b F k : Filter k but not c,...,k F a : Filter a but not b,...,k F Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 9

  22. Item-based Partitioning • Items are ordered by decreasing frequency, e.g., a < b < c < · · · < k D H • Create a partition for each frequent item called pivot item Hierarchy-aware item-based partitioning • Key idea: partition the output space a b k a < b < c < · · · < k ◮ ���� . . . D 1 H 1 D 2 H 2 D n H n F a � �� � Local mining Local mining Local mining F b � �� � F c � �� � . . . F 1 F 2 F n F k F b : Filter b F k : Filter k • Rewrite D for each pivot item but not c,...,k F a : Filter a but not b,...,k ◮ Reduces communication F ◮ Reduces computation ◮ Reduces skew Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 9

  23. Item-based Partitioning Example ( σ = 2 , γ = 3 , λ = 4 ) S 1 : Anna lives in Melbourne S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London PERSON CITY Anna Bob Charlie Melbourne Berlin London Semantic hierarchy Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 10

  24. Item-based Partitioning Example ( σ = 2 , γ = 3 , λ = 4 ) S 1 : Anna lives in Melbourne S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London PERSON CITY Anna Bob Charlie Melbourne Berlin London Semantic hierarchy • PERSON < CITY < in < lives Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 10

  25. Item-based Partitioning Example ( σ = 2 , γ = 3 , λ = 4 ) : 3 PERSON S 1 : Anna lives in Melbourne PERSON S 2 : Bob lives in the city of Berlin S 3 : Charlie likes London PERSON CITY Anna Bob Charlie Melbourne Berlin London Semantic hierarchy • PERSON < CITY < in < lives Kaustubh Beedkar and Rainer Gemulla LASH SIGMOD 2015 June 02, 2015 10

Recommend


More recommend