distributed systems
play

Distributed Systems Maciej opatka Facebook Inbox Search Authors - PowerPoint PPT Presentation

Distributed Systems Maciej opatka Facebook Inbox Search Authors Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik Facebook code dump Community Transfer to Apache Software Foundation An Apache top


  1. Distributed Systems Maciej Łopatka

  2.  Facebook Inbox Search  Authors Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik  Facebook code dump  Community  Transfer to Apache Software Foundation  An Apache top level project

  3.  BigTable data model  An Amazon Dynamo-like infrastructure

  4.  Distributed multidimensional map indexed by a key  Four or five dimensions Key Value Data Timestamp

  5.  Keyspace → Column Family  Column Family → Column Family Row  Column Family Row → Columns  Column → Data value

  6.  Keyspace → Super Column Family  Super Column Family → Super Column Family Row  Super Column Family Row → Columns Row  Column Row → Columns  Column → Data value

  7.  Replication  Log file  Bootstrapping  Partitioning  Consistent Hashing  Periodic Data Compaction  Gossip  Anti-Entropy data sync (uses Merkel tree)  Write and Read Quorum  W + R > N

  8.  RandomPartitioner  OrderPreservingPartitioner

  9.  Terabytes of data  Replaced MySQL  Detecting failures in 15 seconds  ZooKeeper used to locate nodes  Replaced by HBase

  10.  50+TB of data on a 150 node cluster, east and west coast data centers  Term search UserId -> Word -> MessageId Columns  Interaction search UserId -> Recipient UserId -> MessageId Columns Latency Stat Search Inte teractio tions Term Search Min 7.69ms 7.78ms Median 15.69ms 18.27ms Max 26.13ms 44.41ms Tab. Read performance

  11. Workload A — 50 percent reads and 50 percent updates, update heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)

  12. Workload B — 50 percent reads and 50 percent updates, Read heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)

  13.  Designed to run on cheap commodity hardware  Handle high write throughput while not sacricing read eciency  Decentralized  Elasticity  Fault-tolerant  Tunable consistency

  14.  http://en.wikipedia.org/wiki/Apache_Cassandra  Cassandra - A Decentralized Structured Storage System, Avinash Lakshman, Prashant Malik, Facebook  http://maxgrinev.com/2010/07/09/a-quick- introduction-to-the-cassandra-data-model/  http://www.facebook.com/note.php?note_id=454991 608919  http://horicky.blogspot.com/2010/10/bigtable- model-with-cassandra-and-hbase.html  http://www.datastax.com/docs/1.0/ddl/index  Benchmarking Cloud Serving Systems with YCSB, Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears

Recommend


More recommend