Online Bigtable merge compaction Neal E. Young 1 Claire Mathieu Carl Staelin Arman Yousefia CNRS Paris Google Haifa UC Riverside UCLA Northeastern University, September 17, 2015 1 funded by faculty re$earch award
BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail . . . use BIGTABLE to store data. I 24,500 Bigtable Servers I 1.2 million requests per second I 16 GB/s of outgoing RPC tra ffi c I over a petabyte of data just for Google Crawl and Analytics I these figures are from 2006 Similar to other “NoSQL” databases: Accumulo, AsterixDB, Cassandra, HBase, Hypertable, Spanner, . . . Used by Adobe, Ebay, Facebook, GitHub, Meetup, Netflix, Twitter, . . . “Log-structured merge tree” architecture — for high-volume, highly reliable, distributed, real-time data storage.
BIGTABLE — implements dictionary data type operations supported by a Bigtable instance: I write (key, value) I read (key) — return most recent value written for key I . . . there’s more, but not today . . .
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: –empty– file sequence
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: (1 , a ) file sequence write (1 , a );
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: (1 , a ) (2 , b ) file sequence write (1 , a ); write (2 , b );
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: (1 , a ) (2 , b ) (3 , c ) file sequence write (1 , a ); write (2 , b ); write (3 , c );
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: (1 , a ) (2 , b ) (3 , c ) (4 , d ) file sequence write (1 , a ); write (2 , b ); write (3 , c ); write (4 , d );
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) | {z } from 1st flush write (1 , a ); write (2 , b ); write (3 , c ); write (4 , d ); flush ();
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: (5 , e ) (6 , f ) (7 , g ) file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) | {z } from 1st flush write (1 , a ); write (2 , b ); write (3 , c ); write (4 , d ); flush (); write (5 , e ); write (6 , f ); write (7 , g );
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) | {z } | {z } from 1st flush from 2nd flush write (1 , a ); write (2 , b ); write (3 , c ); write (4 , d ); flush (); write (5 , e ); write (6 , f ); write (7 , g ); flush ();
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) (8 , h ) (9 , i ) | {z } | {z } | {z } from 1st flush from 2nd flush from 3rd flush write (1 , a ); write (2 , b ); write (3 , c ); write (4 , d ); flush (); write (5 , e ); write (6 , f ); write (7 , g ); flush (); write (8 , h ); write (9 , i ); flush ();
BIGTABLE — writes and flushes write (key, value): 1. Store key/value pair in cache (e.g. hash table in RAM). Environment periodically forces flush of cache to new immutable disk file. Example cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) (8 , h ) (9 , i ) | {z } | {z } | {z } from 1st flush from 2nd flush from 3rd flush Environment forces Flushes at arbitrary times.
BIGTABLE — reads and compactions cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) (8 , h ) (9 , i ) | {z } | {z } | {z } from 1st flush from 2nd flush from 3rd flush read (key): 1. Check cache for key. 2. If not found, check files (most recent first). ← cost = O (# files )
BIGTABLE — reads and compactions cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) (8 , h ) (9 , i ) | {z } | {z } | {z } from 1st flush from 2nd flush from 3rd flush read (key): 1. Check cache for key. 2. If not found, check files (most recent first). ← cost = O (# files ) compaction (): ← asynchronous background process, to reduce read costs Periodically select files to merge .
BIGTABLE — reads and compactions cache: –empty– file sequence: (1 , a ) (2 , b ) (3 , c ) (4 , d ) (5 , e ) (6 , f ) (7 , g ) (8 , h ) (9 , i ) | {z } | {z } from 1st flush merge of 2nd and 3rd read (key): 1. Check cache for key. 2. If not found, check files (most recent first). ← cost = O (# files ) compaction (): ← asynchronous background process, to reduce read costs ← cost = O ( SIZE of merged files ) !! Periodically select files to merge . goals: (i) keep read costs low (ii) keep compaction costs low constraint: each merge must merge a contiguous subsequence of files
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost.
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = ∞ , problem is easy — never merge
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = ∞ , problem is easy — never merge after flush 1 :
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = ∞ , problem is easy — never merge after flush 1 : after flush 2 :
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = ∞ , problem is easy — never merge after flush 1 : after flush 2 : after flush 3 : after flush 4 : . . . Total compaction cost = 0.
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = 1, problem is easy — must merge everything each time after flush 1 :
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = 1, problem is easy — must merge everything each time after flush 1 : after flush 2 : ← too many files!
Bigtable Merge Compaction ( bmc ) — formal definition given: Sequence x 1 , x 2 , . . . , x n . ← x t is size of file resulting from flush t Integer k > 0. ← tuned to workload; typically 3–40. choose: Compactions. Ensure number of files never exceeds k . objective: Minimize total compaction cost. If k = 1, problem is easy — must merge everything each time after flush 1 : after flush 2 : ← compaction cost x 1 + x 2
Recommend
More recommend