cloud data management
play

Cloud Data Management Felix Gessert December 18, 2018, Universitt - PowerPoint PPT Presentation

Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universitt Hamburg, DBIS Group Presentation is loading Web Performance An Open Challenge With a Huge Impact = $1.7 Billion 100 ms faster +1% Revenue = $19 Billion


  1. Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universität Hamburg, DBIS Group

  2. Presentation is loading

  3. Web Performance An Open Challenge With a Huge Impact = $1.7 Billion 100 ms faster → +1% Revenue = $19 Billion → 500 ms faster +20% Ad Sales Greg Linden. Make Data Tammy Everts. Time Is Money: The Business 3 Useful. 2006 Value of Web Performance. O’Reilly Media, 2016.

  4. Cloud-Based Web Applications Three Sources of Page Load Time 2. Network Delays 3. Frontend Rendering 1. Backend Processing 4

  5. Latency is the Problem Throughput vs. Latency Throughput Latency 3500 4500 4000 3000 Page Load Time (ms) Page Load Time (ms) 3500 2500 3000 VS 2000 2500 2000 1500 1500 1000 1000 500 500 0 0 1 2 3 4 5 6 7 8 9 10 240 220 200 180 160 140 120 100 80 60 40 20 0 Bandwidth in MBit/s (at 60ms latency) Latency in ms (at 5MBit/s bandwidth) Mike Belshe. More Bandwidth Doesn’t Matter 5 (much). Technical report, Google Inc., 2010.

  6. Latency is the Problem Throughput vs. Latency 2 × Throughput ½ Latency = ≈ VS Same Load Time ½ Load Time Mike Belshe. More Bandwidth Doesn’t Matter 6 (much). Technical report, Google Inc., 2010.

  7. Problem Statement Four Challenges 2 1 Latency of Direct Client Dynamic Data Access 4 3 Transaction Polyglot Persistence Abort Rates 7

  8. Problem Statement Research Question How can the latency of retrieving dynamic data from cloud services be minimized in an application- and database-independent way while maintaining strict consistency guarantees? 8

  9. Problem Statement Four Challenges 1. Latency 4. Polyglot Persistence 3. Transactions 2. Direct Access 9

  10. Outline Background & Cloud Data Manage- Caching Outlook and Motivation ment Middleware Dynamic Data Summary The Future of End-to-End Cache Sketches: Providing Modern Polyglot Data Latency in Solving Staleness NoSQL Systems 4 1 2 3 Management in Cloud-based of Reads and as a Low-Latency the Cloud Architectures Queries DBaaS 10

  11. Why is end-to-end latency an open problem?

  12. Background & Motivation Cloud Data Management Frontend Network 12

  13. Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  Web interface based on HTML, files,  No direct access or integration APIs, and application logic to data management  Performance defined by critical  Storage maintained manually rendering path 13

  14. Background & Motivation Cloud Data Management Frontend Network State of the Art: Problem:  Service interfaces use REST & HTTP  Web caching not compatible with  Web caching can reduce consistent dynamic data end-to-end latency 13

  15. Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  SaaS , PaaS , and IaaS models  Combination of cloud services  Scalability & multi-tenancy entails high latency  Common application building blocks often re-implemented 13

  16. Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  Scalability & high availability through  Lack of common data management NoSQL systems abstractions  Sharding & replication  DBaaS model not supported  Database-as-a-Service (DBaaS) model  Polyglot persistence manual & error-prone 13

  17. Background & Motivation Data Management RDBMS Critical Data Document Nested Data Store Key-Value Challenge: Volatile Data Store Wide-Column How to tackle the mapping problem? Large Data Sets Store data database Distributed Static Files File System 13

  18. Functional Data Management Non-Functional Requirements Requirements Techniques Sharding Data Scalability Scan Queries Range-Sharding Hash-Sharding Write Scalability Entity-Group Sharding Consistent Hashing ACID Transactions Shared-Disk Read Scalability Replication Elasticity Conditional or Atomic Writes Commit/Consensus Protocol Synchronous Asynchronous Consistency Primary Copy Joins Update Anywhere Write Latency Storage Management Sorting Logging Read Latency Update-in-Place Caching In-Memory Storage Write Throughput Filter Queries Append-Only Storage Read Availability Query Processing Full-text Search Global Secondary Indexing Local Secondary Indexing Write Availability Query Planning Analytics Framework Aggregation and Analytics Durability Materialized Views [GWFR16] 18

  19. NoSQL Decision Sharding Data Scalability Scan Queries Range-Sharding Toolbox Tree Hash-Sharding Write Scalability Entity-Group Sharding Consistent Hashing ACID Transactions Access Shared-Disk Read Scalability Fast Lookups Complex Queries Replication Elasticity Conditional or Atomic Writes Commit/Consensus Protocol Volume Volume Synchronous HDD-Size Unbounded Unbounded RAM Asynchronous Consistency Primary Copy Joins Update Anywhere CAP Consistency Query Pattern Write Latency Storage Management Availability Ad-hoc Analytics AP CP ACID Sorting Logging Read Latency Update-in-Place Redis Cassandra HBase RDBMS CouchDB MongoDB Hadoop, Spark Caching Memcache Riak MongoDB Neo4j MongoDB RethinkDB Parallel DWH In-Memory Storage Write Throughput Voldemort CouchBase RavenDB SimpleDB HBase, Cassandra, HBase Filter Queries Append-Only Storage Aerospike DynamoDB MarkLogic Accumulo Riak, MongoDB ElasticSearch, Solr Read Availability Query Processing Full-text Search Global Secondary Indexing Shopping- Order Social Cache OLTP Website Big Data Local Secondary Indexing Write Availability basket History Network Query Planning Analytics Framework Example Applications Aggregation and Analytics Durability Materialized Views [GWFR16] 19

  20. NoSQL Decision Unified Data Toolbox Tree Management API Backend-as-a-Service Database-as-a-Service Object Transaction Data Persistence Processing Validation Query Partial Code Support Updates Execution Schema Indexing & Access Management Configuration Control [GFW+14] 20

  21. How can cloud data management be unified & combined with low latency?

  22. Orestes: Goals A Data Management Middleware for Low Latency Database DBaaS & BaaS Independence Functionality Scalable, Available, Low Latency with Multi-Tenant Tunable Consistency 22

  23. Orestes Concept Overview Heterogeneous Data Stores Unmodified Database Systems Scalable Data Management Platform (Multi-Tenancy, Scaling , Caching, Failover, …) DBaaS/BaaS Middleware Data and Default Modules Unified REST API Web Caching for Low Latency Web and Mobile Applications [GBR14, GB13] 23

  24. How can dynamic data be accelerated through web caching?

  25. The Web‘s Caching Model Client Expiration-Based Caches: Browser Caches,  An object x is considered Forward Proxies, ISP Caches fresh for TTL x seconds Expiration - based Caches  Server assigns TTLs for each object Request Cache Content Delivery Path Hits Networks, Invalidation-Based Caches: Reverse Proxies Invalidation -  Expose object eviction based Caches operation to the server Invalidations, Objects Server/DB [GSW+15] 25

  26. Web Caching for Data Management Overview of Cache Sketch Method Data Cached for Fixed TTL Without Cache Sketch: invalidate Stale Cached Data Invalidation Expiration Cache Cache Add to Server Validate Cache Sketch Freshness Compact Cache Sketch 0 1 0 1 1 0 2 3 0 4 0 1 [GSW+15, GSW+17] 26

  27. The Cache Sketch Approach Minimize Bloom 10101010 filter Client Needs Revalidation ? Staleness Initialization Client Cache Sketch 1 from Cache Periodic at at every Δ transaction Δ -Atomic Expiration - connect seconds begin 2 based Caches Consistency Request Cache 1 2 3 Path Hits Cache-Aware 3 Transactions Invalidation - based Caches 10101010 10201040 Minimize Invalidation Invalidations, 4 Objects Counting Non-expired Minimization Report Expirations Invalidations Bloom Filter Object Keys and Writes 4 Server/DB Server Cache Sketch Needs Invalidation ? [GSW+15] 27

  28. Cache Sketch Main Properties To ensure Δ -atomicity the Cache Sketch at time t contains key(x) of every object x that was written before it expired in all caches. Retrieved Staleness Cache Sketch Bound TTL Δ w(x) r(x) r(x) c t t 1 + TTL t 1 t 3 t 2 t timespan for which x c [GSW+15, GSW+17] 28

  29. Cache Sketch Construction To ensure compactness the Cache Sketch stores n keys in a Bloom filter with m bits, k hash functions and a false 𝑙 . positive rate of 𝑔 ≈ 1 − exp 𝑙⋅𝑜 Example 𝑛 20 000 entries & 5% false positives Hit key Client Cache Sketch GET request Miss ↓ no h 1 11 KB in size Cache key ... 1 0 0 1 1 0 1 1 Bits = 1 find(key) h k yes key Revalidation k hash functions m Bloom filter bits [GSW+15, GSW+17] 29

Recommend


More recommend