資料庫系統實驗室 指導教授:張玉盈 1
Relational Database Domains SNOOPYFAMILY Male Female Primary Key ❖ 利用 SQL 做查詢: ID NAME SEX 1 SNOOPY Male Select NAME 2 CHARLIE BROWN Male From SNOOPYFAMILY 3 SALLY BROWN Female Cardinality 4 LUCY VAN PELT Female Tuples Where SEX = ‘ Male ’ ; 5 LINUS VAN PELT Male ❖ 結果: 6 PEPPERMINT PATTY Female 7 MARCIE Female ID NAME SEX 8 SCHROEDER Male 1 SNOOPY Male 9 WOODSTOCK - 2 CHARLIE BROWN Male 5 LINUS VAN PELT Male Attributes 8 SCHROEDER Male Degree 2
Introduction • Data mining is widely used to mine or extract previously unknown, hidden, and potentially useful information from the large database. • Frequent pattern mining is a basic research topic in pattern mining. • It generates all frequent patterns with no smaller supports (or frequency) than a given minimum support threshold. 3
Database D C 1 L 1 TID Items Itemset Sup. Itemset Sup. Scan {A} 2 100 A C D {A} 2 D 200 B C E {B} 3 {B} 3 300 A B C E {C} 3 {C} 3 400 B E {D} 1 {E} 3 {E} 3 C 2 C 2 L 2 Itemset Itemset Sup. Scan Itemset Sup. {A B} {A B} 1 D {A C} {A C} 2 {A C} 2 {A E} {A E} 1 {B C} 2 {B C} {B C} 2 {B E} 3 {B E} {B E} 3 {C E} 2 {C E} {C E} 2 C 3 C 3 L 3 Scan Itemset D Itemset Sup. Itemset Sup. {B C E} {B C E} 2 {B C E} 2 4
Frequent Pattern Mining • This technique has two limitations. • First, it only considers that each item exists or does not exist in the binary form. • Second, all items have same value with the same importance. 5
Data Mining • Data mining is the process of finding hidden and useful knowledge form the large databases. • However, items have different importance in the real world. • For example, the iPhone (cellphone) is expensive and the telephone is cheap. • Therefore, we have to consider the importance and the count of the items at the same time. 6
Mining Weight Maximal Frequent Patterns User wants to know which pattern can make money and the most items. 7
Example TID Transaction 1 A, C Item Weight 2 B, C A 0.6 B 0.8 3 A, B, C C 0.4 4 A, B 5 B, C 8
Frequent Itemsets • We can use the Apriori algorithm to find frequent patterns. • 𝑀 1 ={A,B,C} {A,B,C}:1 • 𝑀 1 => 𝐷 2 • 𝐷 2 ={AB,AC,BC} • 𝑀 2 ={AB,AC,BC} {A,C}:2 {A,B}:2 {B,C}:3 • 𝑀 2 => 𝐷 3 • 𝐷 3 ={ABC} {A}:3 {B}:4 {C}:4 • 𝑀 3 = ø {item set}:count Min_Sup:2 9
Weighted Frequent pattern {A,B,C}:0.6 Item Weight A 0.6 {A,C}:1.0 {A,B}:1.4 {B,C}:1.8 B 0.8 C 0.4 {A}:1.8 {B}:3.2 {C}:1.6 𝑚𝑓𝑜𝑢ℎ(𝑄𝑇) (𝑄 𝑗 ) σ 𝑗=1 {item set}:WSup WSup(PS) = sup(PS)* 𝑚𝑓𝑜𝑢ℎ(𝑄𝑇) Min_Sup:1.8 10
• In this case, the weighted frequent patterns are {A}, {B}, {B,C}. • The weighted maximal frequent pattern is a pattern which does not have any weighted frequent super pattern. • So, the weighted maximal frequent patterns in this case are {A}, {B,C}. 11
• In the real world, each item has different profit and the number of items purchased by consumers may be not only one. • In utility mining, each item has internal utility value that represents the quantity of the item in each transaction, and external utility value such as profit or price. 12
Mining High Utility Patterns Which itemset can contribute the most profit value of all the transactions? 13
• In recent years, many applications have generated stream data such as transactions of retail markets. • These data are continuous, unbounded, and usually coming with high speed. 14
Traditional vs. Stream Data • Traditional Databases • Data stored in finite, persistent data sets. • Stream Data (Big data in cloud) • Data as ordered, continuous, rapid, huge amount, time-varying data streams. (In-Memory Databases) 15
Sliding Window Model … … … time t 0 t 1 t 2 t i t j t j+1 t j+2 W 1 W 2 W 3 Figure 2. Sliding Window 16
• In the sliding window model, only recent data in a fixed size window are employed to discover meaningful patterns over data streams. • This model is widely used for stream mining because of its ability to emphasize recent data and requires bounded memory resources. 17
Mining high utility patterns Problem Statement • Given a data stream and a user-specified minimum utility threshold, mining high utility patterns in a window over the data stream is equivalent to discover a set of patterns having no smaller utilities than the minimum utility threshold from this window. 18
Simple Example TID Transaction TU Item Profit T 1 (A, 2) (B, 3) (C, 1) 1550 A 200 T 2 (A, 1) (B, 2) 300 B 50 T 3 (A, 2) 400 C 1000 u T ( AB, T 1 ) = u T ( A, T 1 ) + u T ( B, T 1 ) = 200 × 2 + 50 × 3 = 550 u ( AB ) = u T ( AB, T 1 ) + u T ( AB, T 2 ) = 550 + 200 × 1 + 50 × 2 = 850 TU ( T 1 ) = 200 × 2 + 50 × 3 + 1000 × 1 = 1550 TWU ( AB ) = TU ( T 1 ) + TU ( T 2 ) = 1550 + 300 = 1850 19
Periodicity Mining in Time Series Databases • Three types of periodic patterns: • Symbol periodicity • T = abd acb aba abc • Symbol a , p = 3, stPos = 0 • Sequence periodicity (partial periodic patterns) • T = bbaa abbd abca abbc abcd • Sequence ab, p = 4, stPos = 4 • Segment periodicity (full-cycle periodicity) • T = abcab abcab abcab • Segement abcab, p = 5, stPos = 0 20
Mining Frequent Periodic Patterns User wants to know whether the pattern periodic or not in the time- series database. How to earn money? Find frequent periodic patterns and Use computer to analyze predict the future tend of the time- 21 time-series database. series database.
Mining Time-Interval Sequential Patterns Customers buy something, storage item and time-interval. Find Time-interval patterns not only Use computer to analyze reveals the order of items but also the 22 database. time intervals between successive items.
知識的表達 資料庫模型、資料結構、資料整體的維護 處理 效率 分析 查詢語言、使用方便性 查詢處理、簡單性、回應 時間、空間需求 圖例 . 資料庫系統的研究領域 23
Recommend
More recommend