Tao: Facebook's Distributed Data Store For The Social Graph Bronson - PowerPoint PPT Presentation

Tao: Facebook's Distributed Data Store For The Social Graph Bronson et. al., ATC 2013 Joy Arulraj CMU 15-799 : Paper Presentation

Talk Overview • Graph-aware cache backed by a database – Efficiency vs. consistency

Motivation • Memcached – Distributed in-memory key-value store – Memory object caching system – Data mapping in client code (PHP API)

Limitations • Association lists – Get entire list to update one edge • Control logic – Clients manage lookaside cache – But, only have a local perspective • Expensive read-after-write consistency – Writes forwarded to master – Local state updated asynchronously

Problem Statement • Need a “smart” caching layer – Graph-aware – Distributed cache management – Provides read-my-write consistency • Solution – Fix the API and leverage its constraints !

Example Alice was at CMU with Bob Cathy : Wish we were there ! David likes this Id : 200 otype : User Id : 600 otype : LOCATION name: Alice name: CMU LOC Id : 300 otype : User name: Bob FRIEND Id : 700 otype : CHECKIN Id : 400 otype : User CMT name: Cathy Id : 500 otype : USER Id : 800 otype : COMMENT LIKES name: David text: Wish we were there !

Data Model • Object – (id) -> (otype, (key->value)*) – Entities, repeatable actions – Ex: users, comments • Association – (id1, atype, id2) -> (time, (key->value)*) – Relationships, actions that model state transitions – Ex: tagged at, likes

Data Model • Association List – (id1, atype) -> [a new ,…,a old ] – Supports the Association Query API – Ex: (“CMU”, “COMMENT”)

API • Association API – assoc_add(id1, atype, id2, time, (k->v)*) – assoc_delete(id1, atype, id2) • Association Query API – [POINT] assoc_get(id1, atype, id2) – [RANGE] assoc_range(id1, atype, pos, limit) – [COUNT] assoc_count(id1, atype)

Client Queries • All queries start from an <id, atype> • 5 most recent comments on Alice’s checkin – assoc_range (“Alice”, “COMMENT”, 0, 5) • Number of friends of Bob – assoc_count (“Bob”, “FRIEND”)

Tao’s Goals • Low read latency • Write consistency • High read availability

Basic Architecture Webservers - Stateless Cache servers - Objects, Association Lists - Partitioned based on <id> TAO Database - Partitioned based on <id>

Low Read Latency Webservers - Too many network hops Cache servers - Hotspots with smaller shards

Datacenter-level Scalability Tiers - Distributed write logic Database - Thundering herds

Splitting the cache layer Follower Cache Leader Cache

Write Consistency • Followers – Absorb read hits – Forward read misses and writes to leaders – Write-through cache • Leader updates – Synchronously sent in reply to writer – Asynchronously sent to other followers

Write consistency • Leaders – Serialize concurrent writes – Can prevent “thundering herds” • Association list updates – Refills instead of invalidates – Idempotent pull-based incremental updates

Multi-datacenter Scalability Forwarded writes Master Replica Async DB replication Datacenter Datacenter

High Read Availability • Follower failure – Client contacts backup follower tier – May break read-after-write consistency • Leader failure – Follower tiers reroute read misses directly to DB – Writes sent to another member of leader tier

Handling Hot Spots • Consistent hashing – Simplifies cluster expansion – Request rerouting • Load balancing – Shard cloning – Small client-side cache

Results • Reads dominate writes – 99.8% read requests – 40% of requests are range queries • Most edge queries have empty results – Tao can use cached assoc_count – Key advantage of app-aware caching

Results • Availability – Fraction of failed queries : 4.9*10 -6 • Follower Throughput – 8 core Xeon + 144GB RAM + 10Gb Ethernet – 30-60K requests/sec

Tao Summary • Low read latency – Application-aware cache layer • Write consistency – Replication model • High read availability – Fault-tolerance

Talk Summary • Graph-aware cache backed by a database – Efficiency vs. consistency • Why did they not use a graph database ? – They trust MySQL – Tao’s cache layer handles their demands Thanks !

Tao: Facebook's Distributed Data Store For The Social Graph Bronson - PowerPoint PPT Presentation

Tao: Facebook's Distributed Data Store For The Social Graph Bronson et. al., ATC 2013 Joy Arulraj CMU 15-799 : Paper Presentation Talk Overview Graph-aware cache backed by a database Efficiency vs. consistency Motivation Memcached

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Facebook Strategies Facebook www.facebook.com Facebook TIPS Idea #1: Share the School Calendar.

TAO: Facebooks Distributed Data Store for the Social Graph Before TAO Data stored in MySQL

GETTING STARTED WITH FACEBOOK ADVERTISING 1.Facebook Ads Growth 2.Why theyre popular

Introducing Live for Facebook Available Now (beta) Coming Soon Available On Facebook Mentions

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

Running a Successful Facebook Ad Campaigns 7th of April 2020 What will be covered today?

A D A C C O U N T SET-UP PROCESS Facebook: 3 STEP SET-UP 1. Facebook Ad Account 2.

Facebook Basics Hannah Digital Literacy Specialist Skokie Public Library What is Facebook?

FACEBOOK July 12, 2009 JGS of the Conejo Valley and Ventura County Who or What Is Facebook?

TEC Entrepreneurial Communit y Website: www.tecbruins.org Facebook: http://facebook.com/UCLA.TEC

@CFED facebook.com/CFEDNews cfed.org/blog/inclusiveeconomy @CFED facebook.com/CFEDNews

Facebook 101 FACEBOOK 101 What is Facebook? Facebooks Mission is to give people the

Pfff: PHP Program Analysis at Facebook Yoann Padioleau (Facebook)

Social Advertising Facebook Ads overlooked - organic reach Facebook Ads overlooked bad ads

Joseph Gonzalez Joint work with: Yucheng Haijie Danny Carlos Low Gu Bickson Guestrin

Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY F LeFloid Kendall Neymar

CSCI 3136 Principles of Programming Languages Syntactic Analysis and Context-Free Grammars - 4

Ventura River Ventura River Multi- -Species HCP Species HCP Multi Presentation to: Ventura

Election in Trees and Rings T-79.4001 Seminar on Theoretical Computer Science Ilari Nieminen

Leader Election Chapter 3 Observations Election in the Ring Election in the Mesh Election in

L. Confrontation at the Feast of Dedication John 10:22 42 1. John 10:22 During the 400

Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using