Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes 1
Row Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Traditional DB: Lay out data on disk in rows CSE 662 - Database Languages & Runtimes 2
Column Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 B4 C4 D4 A4 Columnar DB: Lay out data on disk in columns CSE 662 - Database Languages & Runtimes 3
Column Stores A1 Row1 B1 Row1 C1 Row1 D1 Row1 A2 B2 C2 D2 Row2 Row2 Row2 Row2 A3 B3 C3 Row3 Row3 Row3 D3 Row3 B4 C4 D4 A4 Row4 Row4 Row4 Row4 Store with Row ID to recover original table CSE 662 - Database Languages & Runtimes 4
Why use a Column Store? CSE 662 - Database Languages & Runtimes 5
Immediate Data Access Problem: Data is initially unsorted Query: Find all rows where 100 < A ≤ 200 What is the fastest way to answer this query? CSE 662 - Database Languages & Runtimes 6
Immediate Data Access Problem: Data is initially unsorted What if you get 2 queries? … 3 queries? … 100 queries? CSE 662 - Database Languages & Runtimes 7
Immediate Data Access Problem: Data is initially unsorted Strategy 1: Index the data then run queries First few queries are much slower (upfront indexing cost) Strategy 2: Linear scans over the data Last few queries are much slower (no indexing!) CSE 662 - Database Languages & Runtimes 8
Immediate Data Access Problem: Data is initially unsorted Strategy 3: Index while you run queries! Re-use compute effort of scans. CSE 662 - Database Languages & Runtimes 9
Cracking 0 9 3 5 4 8 7 1 2 6 Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 10
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 11
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 12
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 13
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 14
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 15
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 9 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 16
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 9 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 17
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 5 9 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 18
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 5 9 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 19
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 1 9 8 7 5 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 20
Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 21
Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 High Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 22
Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 High Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 23
Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 8 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 24
Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 5 6 9 8 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 25
Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 5 6 9 8 Result Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 26
Cracking Binary ≤ 4 ≤ 7 Tree 0 3 4 1 2 7 5 6 9 8 Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 27
Cracking ≤ 4 ≤ 7 0 3 4 1 2 7 5 6 9 8 Query 2: Find 2 < X ≤ 5 CSE 662 - Database Languages & Runtimes 28
Cracking ≤ 2 ≤ 4 ≤ 5 ≤ 7 0 1 2 3 4 5 7 6 9 8 Result Query 2: Find 2 < X ≤ 5 CSE 662 - Database Languages & Runtimes 29
3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 30
3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 31
3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 32
3-Way Cracking Low High 0 6 3 5 4 8 7 1 2 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 33
3-Way Cracking Low High 0 3 6 5 4 8 7 1 2 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 34
3-Way Cracking Low High 0 3 4 2 1 5 6 7 8 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 35
Discussion Questions… Does cracking work with a row-oriented database? CSE 662 - Database Languages & Runtimes 36
Discussion Questions… How would one crack a multi-attribute index? (e.g., a spatial index?) CSE 662 - Database Languages & Runtimes 37
Discussion Questions… Can updates be performed efficiently on a cracker index? CSE 662 - Database Languages & Runtimes 38
Discussion Questions… Can updates be performed efficiently on a cracker index? What constraints are required? CSE 662 - Database Languages & Runtimes 39
Discussion Questions… What applications would cracking work well on? What applications would cracking work poorly on? CSE 662 - Database Languages & Runtimes 40
Discussion Questions… Upfront Indexing vs Sequential Scan vs Cracking… Where is the cutoff? CSE 662 - Database Languages & Runtimes 41
Recommend
More recommend