Dynamic, partially-stateful data-flow for high-performance Web applications
Jon Gjengset Jonathan Behrens Lara Timbó Araújo Martin Ek Eddie Kohler
- M. Frans Kaashoek
Robert Morris Malte Schwarzkopf
Noria
Noria Dynamic, partially-stateful data-flow for high-performance - - PowerPoint PPT Presentation
Noria Dynamic, partially-stateful data-flow for high-performance Web applications Jon Gjengset Malte Schwarzkopf Jonathan Behrens Lara Timb Arajo Martin Ek Eddie Kohler M. Frans Kaashoek Robert Morris 2 Frontend 2
Dynamic, partially-stateful data-flow for high-performance Web applications
Jon Gjengset Jonathan Behrens Lara Timbó Araújo Martin Ek Eddie Kohler
Robert Morris Malte Schwarzkopf
Noria
2
Frontend
2
Frontend
2
Backend
Frontend
2
Backend Frontend
3
Backend Frontend
3
Stories Votes
Backend Frontend
4
Stories Votes
JOIN COUNT FILTER
Query
Backend Frontend
4
Stories Votes
JOIN COUNT FILTER
Query
Backend Frontend
4
Stories Votes
90% reads
10% writes
JOIN COUNT FILTER
Query
Backend Frontend
4
Stories Votes
Slow reads, repeated work!
90% reads
10% writes
JOIN COUNT FILTER
Query
Frontend
5
Stories Votes
Precomputed results
2 2
JOIN COUNT FILTER
Query
Frontend
5
Stories Votes
Precomputed results
2 2
READ
JOIN COUNT FILTER
Query
Frontend
5
Stories Votes
Precomputed results
2 2
READ
JOIN COUNT FILTER
Query
Store in base table? — manual, slow.
Frontend
5
Stories Votes
Precomputed results
2 2
READ
JOIN COUNT FILTER
Query
Store in base table? — manual, slow. memcached? — complex [Facebook NSDI’13].
Frontend
5
Stories Votes
JOIN COUNT FILTER
2 2
Streaming data-flow?
Store in base table? — manual, slow. memcached? — complex [Facebook NSDI’13].
Frontend
6
Stories Votes
JOIN COUNT FILTER
2 2
Streaming data-flow?
INSERT
Frontend
6
Stories Votes
JOIN COUNT FILTER
2 2
Streaming data-flow?
3
Frontend
6
Stories Votes
JOIN COUNT FILTER
Materialized view
2 2
Streaming data-flow?
3
Frontend
6
Stories Votes
JOIN COUNT FILTER
Materialized view
2 2
Fast reads. Efficient writes. Parallelizes well.
Streaming data-flow?
3
Frontend
7
Stories Votes
Challenges
JOIN COUNT FILTER
2 3 1 3 2
Frontend
7
Stories Votes
Challenges
State-of-the-art data-flow systems:
JOIN COUNT FILTER
2 3 1 3 2
Frontend
7
Stories Votes
Challenges
State-of-the-art data-flow systems:
JOIN COUNT FILTER SUM
4 2
4 2
2 3 2 3 1
Frontend
7
Stories Votes
Challenges
State-of-the-art data-flow systems:
JOIN COUNT FILTER SUM
4 2
4 2
2 3 2 3 1
Frontend
8
Stories Votes
Noria
Frontend
8
Stories Votes
JOIN COUNT FILTER
3 2
Noria
3 2 1
Frontend
8
Stories Votes
JOIN COUNT FILTER
3 2
Noria
3 2 1
Frontend
8
Stories Votes
JOIN COUNT FILTER
3 2
Noria
4 2
4 2
SUM
3 2 1
Frontend
8
Stories Votes
JOIN COUNT FILTER
3 2
Noria
4 2
4 2
SUM
3 2 1
Frontend
8
Stories Votes
JOIN COUNT FILTER
3
Noria
4 2
4 2
SUM
3
Frontend
8
Stories Votes
JOIN COUNT FILTER
3
Noria
4 2
4 2
SUM
3
New model:
9
10
Stories Votes
JOIN COUNT FILTER
3 1 3 2 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
10
Stories Votes
JOIN COUNT FILTER
3 1 3 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
10
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Frontend
10
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint.
Frontend
10
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint. No need to update absent entries.
Frontend
10
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow
Data-flow state is partial: entries for some keys are absent ( ).
Lower memory footprint. No need to update absent entries. Enables live data-flow changes.
Frontend
11
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow: upqueries
READ
Frontend
11
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow: upqueries
??? Need to fill absent entry!
READ
Frontend
11
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow: upqueries
??? Need to fill absent entry!
READ
Solution: upquery through data-flow.
upstream state
Frontend
Frontend
12
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow: upqueries
Solution: upquery through data-flow.
upstream state
READ
Frontend
12
Stories Votes
JOIN COUNT FILTER
3 3 2
Partially-stateful data-flow: upqueries
Solution: upquery through data-flow.
upstream state
2
READ
13
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
13
SUM
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
13
SUM
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
READ
13
SUM
Start new views and operator state empty, fill via upqueries.
Partial state enables live data-flow changes
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
READ
13
SUM
Start new views and operator state empty, fill via upqueries.
4 4
Partial state enables live data-flow changes
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
READ
14
SUM
4 4
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
High performance requires concurrency
14
SUM
4 4
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
High performance requires concurrency
Process operators concurrently. Read from views concurrently. Process shards concurrently. Without global coordination!
14
SUM
4 4
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
High performance requires concurrency
Process operators concurrently. Read from views concurrently. Process shards concurrently. Without global coordination!
14
SUM
4 4
Stories Votes
JOIN COUNT FILTER
3 2 3 2 1
Frontend
High performance requires concurrency
Process operators concurrently. Read from views concurrently. Process shards concurrently. Without global coordination!
Challenges implementing partially-stateful data-flow
15
Challenges implementing partially-stateful data-flow
15
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
15
Must maintain correctness under concurrency!
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
COUNT
2
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2
COUNT
2
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2
COUNT
2
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2
COUNT
2
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2 Upquery response is a snapshot of state
COUNT
2
2
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2 Upquery response is a snapshot of state
COUNT
2
2 includes 1 2 does not include 3
3
Correctness under concurrency
16
Goal: upquery restores state as if present all along.
1 2 Upquery response is a snapshot of state
COUNT
2
2 includes 1 2 does not include
Solution: Maintain order of upquery response and surrounding updates, despite lack of global coordination.
3
3
Upquery responses in total order with updates
17
Goal: upquery restores state as if present all along.
Upquery responses in total order with updates
17
Goal: upquery restores state as if present all along.
2
2
3 3 1
resulting state respects total order
Upquery responses in total order with updates
17
Goal: upquery restores state as if present all along.
2
2
3 3 1
resulting state respects total order
2
3 2 2 1
resulting state violates total order
Upquery responses in total order with updates
17
Goal: upquery restores state as if present all along.
2
2
3 3 1
resulting state respects total order
2
3 2 2 1
resulting state violates total order
More complex cases: merged upquery responses, evictions (Paper).
Challenges implementing partially-stateful data-flow
18
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
18
COUNT
3 2
… absent
Must maintain correctness under concurrency!
Challenges implementing partially-stateful data-flow
18
COUNT
3 2
… absent
Must maintain correctness under concurrency! Drop updates that touch absent state, future upquery repeats them.
Challenges implementing partially-stateful data-flow
18
COUNT
3 2
… absent
Must maintain correctness under concurrency! Drop updates that touch absent state, future upquery repeats them.
(see Paper)
19
Noria implementation
19
Noria implementation
19
Noria implementation
MySQL adapter
19
Noria implementation
Data-flow graph MySQL adapter
Transform
19
Noria implementation
Data-flow graph MySQL adapter
Transform
19
Noria implementation
Data-flow graph MySQL adapter
Transform
20
Evaluation
20
Evaluation
Amazon EC2 c5.4xlarge instance (16 vCPUs) Open-loop clients, measuring latency & throughput Setup
20
Evaluation
Amazon EC2 c5.4xlarge instance (16 vCPUs) Open-loop clients, measuring latency & throughput Setup
multi-machine experiments comparison with differential dataflow} see Paper
21
Case study: Lobsters (http://lobste.rs)
21
Case study: Lobsters (http://lobste.rs)
with MySQL backend
21
Case study: Lobsters (http://lobste.rs)
with MySQL backend
developers to pre-compute aggregations
21
Case study: Lobsters (http://lobste.rs)
with MySQL backend
developers to pre-compute aggregations
235 operators, 35 views
21
Case study: Lobsters (http://lobste.rs)
with MySQL backend
developers to pre-compute aggregations
235 operators, 35 views
22
Can Noria improve Lobsters’ performance?
22
Can Noria improve Lobsters’ performance?
Better Better
22
Can Noria improve Lobsters’ performance?
Better Better
22
Can Noria improve Lobsters’ performance?
Better Better
22
Noria with natural queries supports 5x MySQL’s throughput.
Can Noria improve Lobsters’ performance?
Better Better
23
How does Noria compare to alternatives?
Better Better
23
How does Noria compare to alternatives?
95% reads, 5% writes
Better Better
23
How does Noria compare to alternatives?
95% reads, 5% writes
Better Better
24
How does Noria compare to alternatives?
Better Better
95% reads, 5% writes
24
How does Noria compare to alternatives?
Better Better
95% reads, 5% writes
24
How does Noria compare to alternatives?
Noria outperforms an in-memory key- value store and simplifies its interface.
Better Better
95% reads, 5% writes
3 2
25
Can Noria change queries without downtime?
JOIN COUNT FILTER
Stories Votes
3 2
25
Can Noria change queries without downtime?
JOIN COUNT FILTER
Stories Votes
3 ⭐ 1.5 ⭐
AVG JOIN
COUNT
Ratings
⭐⭐⭐ ⭐ ⭐⭐
26
Can Noria change queries without downtime?
Better
26
Can Noria change queries without downtime?
new table & query added
Better
26
Can Noria change queries without downtime?
new table & query added
Better
26
Can Noria change queries without downtime?
new table & query added
Better
2M existing votes at transition
27
Can Noria change queries without downtime?
Better
2M existing votes at transition
27
Can Noria change queries without downtime?
instantaneous transition, no downtime for writes
Better
2M existing votes at transition
27
Can Noria change queries without downtime?
instantaneous transition, no downtime for writes 80% of reads from new view proceed without upquery after 1 second
Better
2M existing votes at transition
27
Can Noria change queries without downtime?
instantaneous transition, no downtime for writes 80% of reads from new view proceed without upquery after 1 second
Noria achieves downtime-free query change with partial state.
Better
2M existing votes at transition
28
Noria — Summary
28
https://pdos.csail.mit.edu/noria
Noria — Summary
(see our demo at poster #37 today!)