Distributed Systems Maciej Łopatka
Facebook Inbox Search Authors Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik Facebook code dump Community Transfer to Apache Software Foundation An Apache top level project
BigTable data model An Amazon Dynamo-like infrastructure
Distributed multidimensional map indexed by a key Four or five dimensions Key Value Data Timestamp
Keyspace → Column Family Column Family → Column Family Row Column Family Row → Columns Column → Data value
Keyspace → Super Column Family Super Column Family → Super Column Family Row Super Column Family Row → Columns Row Column Row → Columns Column → Data value
Replication Log file Bootstrapping Partitioning Consistent Hashing Periodic Data Compaction Gossip Anti-Entropy data sync (uses Merkel tree) Write and Read Quorum W + R > N
RandomPartitioner OrderPreservingPartitioner
Terabytes of data Replaced MySQL Detecting failures in 15 seconds ZooKeeper used to locate nodes Replaced by HBase
50+TB of data on a 150 node cluster, east and west coast data centers Term search UserId -> Word -> MessageId Columns Interaction search UserId -> Recipient UserId -> MessageId Columns Latency Stat Search Inte teractio tions Term Search Min 7.69ms 7.78ms Median 15.69ms 18.27ms Max 26.13ms 44.41ms Tab. Read performance
Workload A — 50 percent reads and 50 percent updates, update heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
Workload B — 50 percent reads and 50 percent updates, Read heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
Designed to run on cheap commodity hardware Handle high write throughput while not sacricing read eciency Decentralized Elasticity Fault-tolerant Tunable consistency
http://en.wikipedia.org/wiki/Apache_Cassandra Cassandra - A Decentralized Structured Storage System, Avinash Lakshman, Prashant Malik, Facebook http://maxgrinev.com/2010/07/09/a-quick- introduction-to-the-cassandra-data-model/ http://www.facebook.com/note.php?note_id=454991 608919 http://horicky.blogspot.com/2010/10/bigtable- model-with-cassandra-and-hbase.html http://www.datastax.com/docs/1.0/ddl/index Benchmarking Cloud Serving Systems with YCSB, Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears
Recommend
More recommend