Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , - PowerPoint PPT Presentation

Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , 2011

Overview Overview • Background k d • Data Model • API • Architecture • Architecture • Users • Linearly scalability • Replication and Consistency Replication and Consistency • Tradeoff

Background Background • Cassandra is a highly scalable, eventually consistent, distributed, structured key ‐ value y store. • Cassandra was open sourced by Facebook in • Cassandra was open sourced by Facebook in 2008, and it was designed to fullfill the storage needs of the Inbox Search problem. It is in f h b h bl production use at Facebook but is still under heavy development.

Background Background • Cassandra is Dynamo and Bigtable’s lovechild. Distributed systems technology Distributed systems technology Data model Data model Dynamo Cassandra BigTable • Like Dynamo Cassandra is eventually Like Dynamo, Cassandra is eventually consistent; Like BigTable, Cassandra provides a C l ColumnFamily ‐ based data model. F il b d d t d l

Data Model Data Model • Basic concepts: – Cluster: the machines(nodes) in a logical Cassandra instance. Cluster can contain multiple keyspaces. – Keyspace: a namespace for ColumnFamilies, typically one per application. – ColumnFamilies: contain multiple columns, each of p which has a name, value, and a time stamp, and which are referecenced by row keys. – SuperColumns: can be thought of as columns that themselves have sub columns.

Data Model Data Model • Columns – The column is lowest/smallest increment of data. / It is a tuple(triplet) that contains a name, a value and a timestamp. p – Example in Java:

Data Model Data Model • Super Column – A container for one or more columns

Data Model Data Model • Column Families(CF) – A container for columns, analogous to table in a relational database relational database. – The columnFamily has a name a map with a key and name, a map with a key and a value(which is a map containing columns) containing columns).

Data Model Data Model • Column Families(CF)

Data Model Data Model

Data Model Data Model • SuperColumnFamily – The largest container, g , instead of having Columns in the inner most Map, we have SuperColumns . p So it just adds an extra dimension.

Data Model Data Model • Keyspaces – The container for column families. From an RDBMS point of view you can compare this to the schema, normally you have one per application. , y y p pp

API API • The Cassandra API consists of the following three methods: – insert(table, key, rowMutation) – get(table, key, columnName) get(table key columnName) – delete(table, key, columnName) columnName can refer to a specific column within a column family, a column family, a super column family or a column within a super column.

API API • Thrift h if – Cassandra driver ‐ level interface that the clients below build on. NOT recommend… • High level clients: g – Python(Telephus, Pycassa…) – Java(Hector, Pelops…) Java(Hector, Pelops…) – .NET(FluentCassandra, Aquiles…) – PHP(phpcassa, SimpleCassie…) PHP(phpcassa SimpleCassie ) – Others…

Architecture Architecture • Architecture layers Core Layer Middle Layer Top Layer Messaging Service g g Commit log g Tombstones Gossip Memtable Hinted handoff Failure detection SSTable Read repair Cluster state Indexes Bootstrap Partitioner Compaction Monitoring Replication Admin tools

Architecture Architecture • Write Path – First write to a disk commit log (sequential) g ( q ) – After write to log it is sent to approriate nodes – Each node receiving write first records it in a local Each node receiving write first records it in a local log, then makes update to memtables . – Memtables are flushed to disk when bl fl h d di k h • Out of space • Too many keys(128 is default) • Time duration(Client provided)

Architecture Architecture • When memtables written out two files go out: – DataFile( SSTable ) ( ) – Index File( SSTable Index ) • When a commit log has had all its column Wh it l h h d ll it l families pushed to disk, it is deleted • Compaction : Data files accumulate over time. Periodically data files are merged sorted into a Periodically data files are merged sorted into a new file(and creates new index).

Architecture Architecture • Write properties: W it ti – No reads – No seeks No seeks – Fast – Atomic within ColumnFamily – Atomic within ColumnFamily – Always writable • Read properties: Read properties: – Read multiple SSTables – Slower than writes(but still fast) Slower than writes(but still fast) – Seeks can be mitigated with more RAM – Scales to billions of rows

Users Users • Facebook F b k – Uses Cassandra to power Inbox Search, with over 200 nodes deployed Abandoned in late 2010 nodes deployed. Abandoned in late 2010. • Twitter – But not for tweets But not for tweets. • IBM – Research in building a scalable email system based on Research in building a scalable email system based on Cassandra • Cisco’s WebEx – Uses Cassandra to store user feed and activity in near real time.

Next Topics Next Topics 1. Linearly scalability y y 2. Replication and Consistency 3 3. T d Tradeoff ff

Linearly Scalability Linearly Scalability N3 N2 Nx Key y N1

Bootstrap Bootstrap N3 N3 N4 N2 N1

Consistent Hashing Consistent Hashing Cause a problem… N3 N2 Nx Key y N1

Load Balance Load Balance N4 N4 N3 N2 N1

Replication and Consistency Replication and Consistency Replication l Tunable Eventually consistency u ab e e tua y co s ste cy

Replication(Simple Case) Replication(Simple Case) N4 N3 Key N2 N1

Tunable Consistency Tunable Consistency Write(W) Read(R) Level Description Level Description ZERO Cross fingers N/A 1 st Response 1 st Response ANY ANY N/A N/A (Including HH) 1 st Response 1 st R 1 st Response 1 st R O One One O QUORUM N/2 + 1 QUORUM N/2 + 1 Replicas Replicas l ALL All Replicas ALL All Replicas

A Quorum Level Example(1) A Quorum Level Example(1) N=3 N1 Write N2 Operation N3

A Quorum Level Example(2) A Quorum Level Example(2) N=3 N1 Read N2 Operation N3

A Quorum Level Example(3) A Quorum Level Example(3) • But…

Final Question about Cassandra Final Question about Cassandra Why write/read fast? (1) No read/write locks (1) No read/write locks (2) Organize all the write operations into a sequential write which can maximize the i l i hi h i i h disk’s throughput (3) Flexible Data Model

Similarity with Dynamo and Bigtable Similarity with Dynamo and Bigtable Dynamo ‐ like features Dynamo ‐ like features a. Symmetric,P2P architecture No Special nodes, No SPOF(Single Point Of Failure) l ) b. Gossip Based cluster management c c. Distributed hash table for data placement(DHT) Distributed hash table for data placement(DHT) d. Tunable and Eventual Consistency BigTable ‐ like Features a. Data Model d l b. SSTable Disk Storage Append ‐ only Commit Log Append only Commit Log MemTable (Buffer & Sort) Immutable SSTable Files c. Hadoop Integration(Some ideas Based on GFS) H d I i (S id B d GFS)

Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , - PowerPoint PPT Presentation

Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , 2011 Overview Overview Background k d Data Model API Architecture Architecture Users Linearly scalability Replication and Consistency Replication

Pu Wang 1 Pu Wang 2 Pu Wang 3 Pu Wang 4 4 1 2 3 Path: 1,2,3,4 Pu Wang 5 Pu Wang 6

Cloud Gateways Suli Yang, Kiran Srinivasan, Kishore Udayashankar, Swetha Krishnan, Jingxin Feng,

PRESENTATION ON MAR TYPE STP PRESENTATION ON MAR TYPE STP WHAT IS MAR WHAT IS MAR

Mar 17: Paper submission Mar 17 Paper submission Mar 22 Assignment to ACs Papers will be

The Drone Used to Measure the Tianlai Dish Antenna Gain Patterns Jingxin Liu, Qiaolin Shuang,

Del Mar Bluff Stabilization Update to Del Mar City Council, March 5 th 2018 by Bruce Smith of

Del Mar Heights Student Relocation Del Mar Hills Academy October 1, 2019 Del Mar Heights

Text Indexing Arun Chauhan COMP 314 Lecture 15, 16 Mar 4, Mar 6, 2003 Searching Text grep

Todays Menu (week 10) Announcements Mostly Images Questions? 31-Mar-10

Whats Next for Networked Games? Wu-chang Feng W. Feng, "What's Next for Networked

Partially Information Coupled Duo-binary Turbo Codes Xiaowei Wu, Min Qiu, and Jinhong Yuan School

New Distinguishing Attack on MAC Using Secret- Prefix Method 1,2 , Wei Wang Wang 1,2 , Wei Wang 2

Real GDP growth Per cent Per cent 6 6 5 5 Through the year 4 4 10-year average (through

MAR I NE E S T AT E WYTOWNO W Y T O W N O MAR I NE E S T AT E KE Y F AC

Brazils macro has weakened for years The fiscal imbalance is one Unemployment has of the

Feasibility of Consistent, Feasibility of Consistent, Feasibility of Consistent, Feasibility of

Transparency Overlays and Applications Melissa Chase (Microsoft Research Redmond) Sarah Meiklejohn

Opportunistic Spatial Gossip over Mobile Social Networks A. Chaintreau, P. Fraigniaud, E. Lebhar

Blockchain consensus Protocols in the Wild Tao Wang, Lihang Pan ECS 265 Apache Kafka

FRing: A P2P Overlay Network for Fast and Robust Blockchain Systems Haoran Qiu, Tao Ji HKU

1 Mate Mate: Architecture Architecture Instruction Set Capsules Execution Code

Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi Distribuiti e Cloud Computing A.A.

Fast Forward I/O & Storage Eric Barton Lead Architect High Performance Data Division 1

STATE-AWARE DECENTRALIZED DATABASE SYSTEMS (SA-DDBS) FOR WIDE AREA MONITORING NOVEMBER 2013

Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , - PowerPoint PPT Presentation

Xiaowei Wang Xiaowei Wang Jingxin Feng Jingxin Feng Mar 7 th , 2011 Overview Overview Background k d Data Model API Architecture Architecture Users Linearly scalability Replication and Consistency Replication

Pu Wang 1 Pu Wang 2 Pu Wang 3 Pu Wang 4 4 1 2 3 Path: 1,2,3,4 Pu Wang 5 Pu Wang 6

Cloud Gateways Suli Yang, Kiran Srinivasan, Kishore Udayashankar, Swetha Krishnan, Jingxin Feng,

PRESENTATION ON MAR TYPE STP PRESENTATION ON MAR TYPE STP WHAT IS MAR WHAT IS MAR

Mar 17: Paper submission Mar 17 Paper submission Mar 22 Assignment to ACs Papers will be

The Drone Used to Measure the Tianlai Dish Antenna Gain Patterns Jingxin Liu, Qiaolin Shuang,

Del Mar Bluff Stabilization Update to Del Mar City Council, March 5 th 2018 by Bruce Smith of

Del Mar Heights Student Relocation Del Mar Hills Academy October 1, 2019 Del Mar Heights

Text Indexing Arun Chauhan COMP 314 Lecture 15, 16 Mar 4, Mar 6, 2003 Searching Text grep

Todays Menu (week 10) Announcements Mostly Images Questions? 31-Mar-10

Whats Next for Networked Games? Wu-chang Feng W. Feng, &quot;What's Next for Networked

Partially Information Coupled Duo-binary Turbo Codes Xiaowei Wu, Min Qiu, and Jinhong Yuan School

New Distinguishing Attack on MAC Using Secret- Prefix Method 1,2 , Wei Wang Wang 1,2 , Wei Wang 2

Real GDP growth Per cent Per cent 6 6 5 5 Through the year 4 4 10-year average (through

MAR I NE E S T AT E WYTOWNO W Y T O W N O MAR I NE E S T AT E KE Y F AC

Brazils macro has weakened for years The fiscal imbalance is one Unemployment has of the

Feasibility of Consistent, Feasibility of Consistent, Feasibility of Consistent, Feasibility of

Transparency Overlays and Applications Melissa Chase (Microsoft Research Redmond) Sarah Meiklejohn

Opportunistic Spatial Gossip over Mobile Social Networks A. Chaintreau, P. Fraigniaud, E. Lebhar

Blockchain consensus Protocols in the Wild Tao Wang, Lihang Pan ECS 265 Apache Kafka

FRing: A P2P Overlay Network for Fast and Robust Blockchain Systems Haoran Qiu, Tao Ji HKU

1 Mate Mate: Architecture Architecture Instruction Set Capsules Execution Code

Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi Distribuiti e Cloud Computing A.A.

Fast Forward I/O &amp; Storage Eric Barton Lead Architect High Performance Data Division 1

STATE-AWARE DECENTRALIZED DATABASE SYSTEMS (SA-DDBS) FOR WIDE AREA MONITORING NOVEMBER 2013

Whats Next for Networked Games? Wu-chang Feng W. Feng, "What's Next for Networked

Fast Forward I/O & Storage Eric Barton Lead Architect High Performance Data Division 1