Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, - PowerPoint PPT Presentation

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University Yandong Mao, Neha Narula, Robert Morris MIT

tl;dr Web application caches should support materialized views natively. In-cache materialized views are easy to use and have good performance. 2

application cache • fast key-value cache – examples: memcached, Redis • offloads reads from database • managed by application developer – assume burden of maintenance 3

100 timeline checks for every new post! 6

timeline database query SELECT ¡post.time, ¡post.poster, ¡post.content ¡ ¡ ¡ ¡FROM ¡post ¡JOIN ¡sub ¡ ¡ ¡ ¡ ¡WHERE ¡sub.follows ¡= ¡post.poster ¡ ¡ ¡ ¡ ¡ ¡ ¡AND ¡sub.user ¡= ¡'bk' ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AND ¡post.time ¡>= ¡100 ¡ ¡ ¡ ¡ORDER ¡BY ¡post.time; ¡ ¡ 13

timeline materialized view CREATE ¡MATERIALIZED ¡VIEW ¡tline ¡AS ¡ SELECT ¡sub.user, ¡post.time, ¡post.poster, ¡post.content ¡ ¡ ¡ ¡FROM ¡post ¡JOIN ¡sub ¡ ¡ ¡ ¡ ¡WHERE ¡sub.follows ¡= ¡post.poster; ¡ ¡ ¡ SELECT ¡* ¡FROM ¡tline ¡ ¡ ¡WHERE ¡tline.user ¡= ¡‘bk’ ¡AND ¡tline.time ¡>= ¡100 ¡ ¡ ¡ORDER ¡BY ¡tline.time; ¡ • arrange data for quick reading – computation happens in advance—good! – simple query on materialized data—good! 15

easy, but slow • the database becomes a bottleneck – most important job: durable storage – handling reads + writes may be too much – better to offload reads – implementation issues (locks, transactions, …) 17

Pequod • a distributed application cache • materialized views in a key-value cache – operations: get, put, scan, plus join • good performance and programmability 19

advanced materialized views • simple materialized views are a bad fit for caches – need advanced features from recent research • partial : only portions are materialized as needed • dynamic : portions are selected based on requests • incremental updates : track dependencies between data • eager updates • lazy updates • distributed • in an ordered key-value cache! 20

KV materialized views? CREATE ¡MATERIALIZED ¡VIEW ¡tline ¡AS ¡ SELECT ¡sub.user. ¡post.time, ¡post.poster, ¡post.content ¡ ¡ ¡ ¡FROM ¡post ¡JOIN ¡sub ¡ ¡ ¡ ¡ ¡WHERE ¡sub.follows ¡= ¡post.poster; ¡ ¡ • but Pequod only understands get, put, scan! – want key-value for performance – how to represent the relations needed for views? 21

scan(tline|bk|100, ¡tline|bk ∞ ) ¡ 26

scale • distributed Pequod scales to large data sets – key design choice: computation is local • base data is partitioned – example: sub, post “tables” • cache joins can be computed anywhere – base data transparently replicated as necessary 31

distributed deployment 32

distributed deployment (read) 33

distributed deployment (write) 36

distributed deployment (write) 37

other features • advanced cache joins – interleaved: collocate different kinds of data – stacked – materialized, non-materialized, or snapshot – aggregates • eviction • consistency 38

evaluation • Twitter-like benchmark – based on 2009 Twitter social graph – check, subscribe, post (100:10:1) • evaluate potential bottlenecks in Pequod – database omitted in experiments – clients write data directly to Pequod 39

system comparison Do cache joins have key-value cache performance? • goal: perform no worse than existing caches • compare with: – fast KV caches: Redis, memcached – DB-as-cache: Postgres (in-memory, tuned) • Postgres uses “materialized views” (triggers) 40

system comparison 350 300 QPS (thousands / s) 250 200 150 100 50 0 Pequod Redis memcached Postgres 41

scaling Pequod Will adding servers improve performance? What is the overhead of data movement? • cluster on Amazon EC2 • two-tier deployment – subscriptions, posts on “base” servers – timelines executed on “compute” servers – replication is required 42

scaling Pequod 5 QPS (millions / s) 4 3 2 1 0 12 24 36 48 Compute servers 43

scaling Pequod (overhead) • steady-state bandwidth for data movement – 10 è 16% (larger fanout) • total memory consumption – 290 è 297GB at base (subscription metadata) – 1.2 è 1.5TB at compute (duplicate data) • overhead is noticeable but not crippling 44

selected related work • DMV [Zhou et al, 2007] – partial, dynamic database materialized views • DBProxy [Amiri et al, 2002-3] – distributed cache built from databases – incremental updates to cached results • MV in PNUTS [Agrawal et al, 2009] – materialized views in a key-value store – incremental updates, not partial 45

conclusion • Pequod cache joins – programmability of materialized views – performance of a key-value cache – code release soon! github.com/bryankate 46

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, - PowerPoint PPT Presentation

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University Yandong Mao, Neha Narula, Robert Morris MIT tl;dr Web application caches should support materialized views natively. In-cache materialized views

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

POST: A Secure, Resilient, Cooperative Messaging System A. Mislove, A. Post, C. Reis, P.

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario & Quebec Adaptation March 17,

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE

The Scope of Sequential Screening with Ex-Post Participation Constraints Francisco Castro

POST SECONDARY PLANNING NIGHT September 29, 2020 Virtual Meeting Rachel DeWyngaert HS Grades

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, - PowerPoint PPT Presentation

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University Yandong Mao, Neha Narula, Robert Morris MIT tl;dr Web application caches should support materialized views natively. In-cache materialized views

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

SQL$Joins Max$Masnick August&amp;7,&amp;2015 What%are%joins?

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

POST: A Secure, Resilient, Cooperative Messaging System A. Mislove, A. Post, C. Reis, P.

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario &amp; Quebec Adaptation March 17,

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE

The Scope of Sequential Screening with Ex-Post Participation Constraints Francisco Castro

POST SECONDARY PLANNING NIGHT September 29, 2020 Virtual Meeting Rachel DeWyngaert HS Grades

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario & Quebec Adaptation March 17,