MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU
DISCLAIMER: NOT AN OLTP TALK HOW TO GET ALMOST EVERYTHING FOR NOTHING
SHARED-MEMORY SYSTEM IS BACK shared data • Fine-grained mini-jobs • Hard to batch reads mini-jobs • Low-latency in-place updates compute (transaction) • Hard to partition the data space in-place update • Applications • Machine learning (SGD and others) • Graph computing (Vertex-centric systems) • Streaming (S-Store) Serializability theory Atomic and Correct time isolation behavior
SCALABILITY LATENCY DATA CONSISTENCY & THROUGHPUT JOB ISOLATION
DO WE NEED IT? • Approach: remove data consistency controller • Pros: super-fast, yeeeeh! • Cons: could cause data consistency issues • HogWild! & Parameter Server & others • Correctness proofs rely on special properties • Convexity • Lipschitz-continuity • Bounded staleness • PBS: Probabilistic Bounded Staleness • Weak consistency actually provides strong semantics • Single key only Serializability Run the • Probabilistic theory Atomic and Correct transactions isolation Behavior like serial
THE DATABASE WAY • Fewer assumptions, more applications • Non-convex (deep learning) • Discrete & combinational (graph problems) Serializability Run the theory Atomic and Correct transactions isolation Behavior like serial Run the Weak ? transactions consistency Behavior + anomalies This talk
DATA CONFLICT GRAPH 2 1 3 • Each vertex represents a txn 5 6 • An edge if two txns share data 4 7 • Potential conflicts 8
GOOD AND BAD 2 1 3 • Good No. 1: serial execution 5 6 4 7 8 1 2 3 4 5 6 7 8 time
GOOD AND BAD 2 1 3 • Good No. 2: a nice scheduler • No direct edge in concurrent txns 5 6 4 7 7 3 8 6 3 8 6 7 8 5 4 5 4 1 2 1 2 time dependency graph
GOOD AND BAD BD=2 • Bad: potential conflict BD=1 2 1 • Bad degree (for a transaction) 3 • # of potential conflict transactions • Concurrent 5 • Share same data (adjacent in graph) 6 4 7 1 2 3 4 5 6 7 8 3 6 BD=0 always 8 time 2 5 8 8 3 6 7 1 4 7 5 1 2 4 time time
BAD DEGREE AND CORRECTNESSES MAX BAD CONCURRENCY TXN RESULTS DEGREE CONTROL SEMANTICS ACCURACY 0 NO SERIALIZABILITY CORRECT NO NO DON’T KNOW >0 YES SERIALIZABILITY CORRECT
BAD THINGS DO NOT COME IN 3 (BN3) • BN3: bad degree ≤ 1 for all transactions MAX BAD CONCURRENCY TXN RESULTS DEGREE CONTROL SEMANTICS ACCURACY 0 NO SERIALIZABILITY CORRECT 1 (BN3) NO NO DON’T KNOW >1 YES SERIALIZABILITY CORRECT
IS BN3 TRUE? 2 1 3 5 6 • Depends on 4 7 • Data conflict severity: the density of data conflict graph |"| 8 |#| $ • Job type GRAPH |V| |E| DENSITY • Access pattern (in 10 6 ) (in 10 6 ) (in 10 -4 ) NAME uk-2007-05 106 3,739 4.2 uk-2014 787 47,614 4.7 Web Graphs eu-2015 1,070 91,792 5.8 claw-2012 3,563 128,736 1.4 wise 59 265 4.0 Social Networks friendster 66 1,806 0.7 TPC-C New Order >1000
BAD DEGREE DISTRIBUTION ≦ 1 BN3 (bad degree ≦ 1) 0BD (bad degree = 0) 1.00 1.00 0.99 0.99 0.98 0.98 Probability Probability 0.96 0.96 0.94 0.94 0.92 0.92 conflict graph density:10 -6 conflict graph density: 10 -6 conflict graph density:10 -5 conflict graph density: 10 -5 conflict graph density:10 -4 conflict graph density: 10 -4 0.90 0.90 1 2 4 8 1 3 6 1 2 5 1 1 2 4 8 1 3 6 1 2 5 1 6 2 4 2 5 1 0 6 2 4 2 5 1 0 8 6 2 2 8 6 2 2 4 4 Number of Cores Number of Cores
WHAT GOOD IS BN3? THE TRANSACTIONS EXECUTED WITHOUT ANY CONSISTENCY MECHANISM IS UNDER SNAPSHOT ISOLATION (SI)
PROOF: A TWO-STEP APPROACH 2 1 0. BN3 restricts the size of “mafia” 3 • Two crews (vertices) at most 1. Only two bad transactions case 5 6 bad edge • Proof by enumerating the type of edges 4 7 2. Other good transactions • Does not cause more cycles 8 • Adjacent (non-bad) vertices: behind or after • Non-adjacent vertices: none of their business
BAD DEGREE AND CORRECTNESSES MAX BAD CONCURRENCY TXN RESULTS DEGREE CONTROL SEMANTICS ACCURACY 0 NO SERIALIZABILITY CORRECT SNAPSHOT 1 (BN3) NO WRITE-SKEW ISOLATION NO NO DON’T KNOW ANY YES SERIALIZABILITY CORRECT
256 cores 128 cores 10 -3 10 -3 Residual Residual No Consistency No Consistency 10 -4 10 -4 Read Uncommitted Read Uncommitted Read Committed Read Committed Serializability Serializability 2*10 -3 1*10 -4 6*10 -5 5*10 -5 2*10 -3 1*10 -4 6*10 -5 5*10 -5 Conflict graph density Conflict graph density 1.00 0.99 0.98 Probability “BN3-ness” 0.96 0.94 density:2*10 -3 0.92 density:1*10 -4 density:6*10 -5 density:5*10 -5 x: vary the conflict graph density 0.90 1 2 4 8 1 3 6 1 2 5 1 6 2 4 2 5 1 0 lines: vary isolation levels 8 6 2 2 4 y: residual after 50 iterations of Page Rank Number of Cores
TAKE HOME MESSAGES • Life is not just all-or-nothing • Flawlessness costs a lot • It is possible to have almost everything for free • BN3: realistic assumption, practical conclusion • Some future works • Runtime: monitor the BN3-ness • BN3 as a new consistency level • Mixed concurrency control
Thank you
EXPERIMENTAL STUDIES (THROUGHPUT) 80 No Consistency No Consistency Read Uncommitted Read Uncommitted Throughout (* 10 6 ) Throughout (* 10 6 ) Read Committed Read Committed Serializability Serializability 60 60 40 40 2*10 -3 1*10 -4 6*10 -5 5*10 -5 2*10 -3 1*10 -4 6*10 -5 5*10 -5 Conflict graph density Conflict graph density
Recommend
More recommend