CS 591: Da Data S Systems Arch chitect ctures Prof. Manos Athanassoulis mathan@bu.edu http://manos.athanassoulis.net/classes/CS591
CS591 progress bar Storage Layouts Rows vs Cols vs Hybrid A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D A B C D
CS591 progress bar Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core
CS591 progress bar Storage Layouts Rows vs Cols vs Hybrid New Hardware Flash Storage Multi-core Indexing When to use? UpBit or scan index
CS591 progress bar Storage Layouts Rows vs Cols vs Hybrid New Hardware A=10 A=20 A=30 Flash Storage UB UB UB Multi-core 0 0 0 0 1 0 Indexing 0 0 1 0 0 0 When to use? UpBit 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Flash Storage Multi-core memory storage Indexing When to use? UpBit Bloom fence buf buffer filters pointers X
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Flash Storage Multi-core Indexing When to use? UpBit Read-Copy-Update In-Place-Update Increasing Logical Address LA = 0 LA = ∞ Stable Mutable Read-Only Disk In-Memory Figure 5: Logical Address Space in
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Indexing Flash Storage Data Skipping Multi-core Adaptive Indexing Indexing When to use? UpBit year grade course year grade course year course grade t 2 t 2 t 1 t 1 2011 A AI A 2011 AI 2012 A DB t 3 t 3 t 2 2011 B OS 2011 OS t 2 A 2011 A AI t 3 2011 B OS t 1 t 1 t 3 2012 DB B 2012 A DB t 4 2013 C DB t 4 t 4 t 4 2013 DB 2013 C DB C
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Indexing Flash Storage Data Skipping Multi-core Adaptive Indexing Indexing When to use? Index Index Index Index Column Column Column Column UpBit < 6 < 13 >= 6 < 13 >= 13 ? sorted >= 13 < 27 Q 0 =[13,42) Q 1 =[6,27) Q 2 ... Q n < 42 >=27 < 42 >= 42 >= 42
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Indexing Flash Storage Data Skipping Multi-core Adaptive Indexing Indexing Scientific Data Management When to use? In-situ Query Processing UpBit Adaptive Partitioning BF BF+BTree BF BTree BF BF BTree Cache Raw Data File Positional Map
CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Indexing Flash Storage Data Skipping Multi-core Adaptive Indexing Indexing Scientific Data Management When to use? In-situ Query Processing UpBit Today: Array Data
Today: Array Data Storage Manager Up to now: uni uni-dim dimensio nsional nal data (integers, real, string) Array Data: mu multi-dim dimensio nsional nal data why is this a challenge? No unique order (cannot sort!) How to store? Co Concepts : multi-dimensional arrays, storage manager, tiles, thread-safe, dense vs. sparse arrays, global cell order, fragments, dense vs. sparse fragments, consolidation
New Paradigms CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Hash-based New Hardware Indexing Flash Storage Data Skipping Multi-core Adaptive Indexing Indexing Scientific Data Management When to use? In-situ Query Processing UpBit Today: Array Data
New Paradigms CS591 progress bar Storage Layouts NoSQL Engines Rows vs Cols vs Hybrid LSM-Trees Distributed DB ML for Systems Hash-based Database Systems Automatic Data New Hardware at Global Scale Indexing System Design Flash Storage Data Skipping Learned Indexes MapReduce Multi-core Adaptive Indexing Learn Data Distributions Computing at Scale Indexing for Indexing Scientific Data Management When to use? Data Calculator In-situ Query Processing Systems for ML UpBit Synthesize Indexes Today: Array Data ML building blocks
Do not forget: re reviews ws You can skip up to 3 reviews 18 classes: 5 long + 10 short + 3 skipped ne new r w rule ule : you can do extra long reviews, 1 long counts as 3 short Normally for full marks: 5 long + 10 short or 6 long + 7 short or 7 long + 4 short or 8 long + 1 short
Do not forget: pr projec ect Do not leave your project work for last minute! th every group in OH to discuss progress Until Tu Tuesday April 16 th April 30 and May 2 project presentations: problem + approach + results + open questions Project presentations will also be peer-evaluated
Recommend
More recommend