netcache balancing key value stores with fast in network
play

NetCache: Balancing Key-Value Stores with Fast In-Network Caching - PowerPoint PPT Presentation

NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li , Haoyu Zhang, Robert Soul Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica NetCache is a rack-scale key-value store that leverages in-network data


  1. NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li , Haoyu Zhang, Robert Soul é Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica

  2. NetCache is a rack-scale key-value store that leverages in-network data plane caching to achieve ~10 μ s latency & billions QPS throughput even under & highly-skewed rapidly-changing workloads. New generation of systems enabled by programmable switches J

  3. Goal: fast and cost-efficient rack-scale key-value storage q Store, retrieve, manage key-value objects § Critical building block for large-scale cloud services … § Need to meet aggressive latency and throughput objectives efficiently q Target workloads § Small objects § Read intensive § Highly skewed and dynamic key popularity

  4. Key challenge: highly-skewed and rapidly-changing workloads & low throughput high tail latency Load Server Q: How to provide effective dynamic load balancing?

  5. Opportunity: fast, small cache can ensure load balancing Cache absorbs hottest queries Balanced load

  6. Opportunity: fast, small cache can ensure load balancing [B. Fan et al. SoCC’11 , X. Li et al. NSDI’16 ] Cache O( N log N ) hottest items E.g., 10,000 hot objects N: # of servers E.g., 100 backends with 100 billions items Requirement : cache throughput ≥ backend aggregate throughput

  7. NetCache: towards billions QPS key-value storage rack Cache needs to provide the aggregate throughput of the storage layer flash/disk cache in-memory each: O(100) KQPS O(10) MQPS total: O(10) MQPS storage layer cache layer in-memory cache each: O(10) MQPS O(1) BQPS total: O(1) BQPS

  8. NetCache: towards billions QPS key-value storage rack Cache needs to provide the aggregate throughput of the storage layer flash/disk cache in-memory each: O(100) KQPS O(10) MQPS total: O(10) MQPS storage layer cache layer in-memory cache in-network each: O(10) MQPS O(1) BQPS total: O(1) BQPS Small on-chip memory? Only cache O( N log N ) small items

  9. Key-value caching in network ASIC at line rate ?! q How to identify application-level packet fields ? q How to store and serve variable-length data ? q How to efficiently keep the cache up-to-date ?

  10. PISA: Protocol Independent Switch Architecture q Programmable Parser § Converts packet data into metadata q Programmable Mach-Action Pipeline § Operate on metadata and update memory states Match + Action ALU Memory … … … … Programmable Parser Programmable Match-Action Pipeline

  11. PISA: Protocol Independent Switch Architecture q Programmable Parser § Parse custom key-value fields in the packet q Programmable Mach-Action Pipeline § Read and update key-value data § Provide query statistics for cache updates Match + Action ALU Memory … … … … Programmable Parser Programmable Match-Action Pipeline

  12. PISA: Protocol Independent Switch Architecture Network Control plane (CPU) Management PCIe Run-time API Network Data plane (ASIC) Functions Match + Action ALU Memory … … … … Programmable Match-Action Pipeline Programmable Parser

  13. NetCache rack-scale architecture Network Cache Management Management PCIe Run-time API Network Key-Value Query Clients Functions Cache Statistics Top of Rack Switch Storage Servers q Switch data plane § Key-value store to serve queries for cached keys § Query statistics to enable efficient cache updates q Switch control plane § Insert hot items into the cache and evict less popular items § Manage memory allocation for on-chip key-value store

  14. Data plane query handling 1 Read Query Cache Stats Hit Update (cache hit) 2 Server Client 1 2 Read Query Cache Stats Update Miss (cache miss) 4 3 Client Server 1 2 Write Query Invalidate Cache Stats 4 3 Client Server

  15. Key-value caching in network ASIC at line rate q How to identify application-level packet fields ? q How to store and serve variable-length data ? q How to efficiently keep the cache up-to-date ?

  16. NetCache Packet Format Existing Protocols NetCache Protocol ETH IP TCP/UDP OP SEQ KEY VALUE reserved read, write, L2/L3 Routing port # delete, etc. q Application-layer protocol: compatible with existing L2-L4 layers q Only the top of rack switch needs to parse NetCache fields

  17. Key-value caching in network ASIC at line rate q How to identify application-level packet fields ? q How to store and serve variable-length data ? q How to efficiently keep the cache up-to-date ?

  18. Key-value store using register array in network ASIC Match pkt.key == A pkt.key == B process_array (0) process_array (1) Action pkt.value: A B 0 1 2 3 action process_array (idx): A B if pkt.op == read: Register Array pkt.value array[idx] elif pkt.op == cache_update: array[idx] pkt.value

  19. Variable-length key-value store in network ASIC? Match pkt.key == A pkt.key == B process_array (0) process_array (1) Action pkt.value: A B 0 1 2 3 A B Register Array Key Challenges: q No loop or string due to strict timing requirements q Need to minimize hardware resources consumption § Number of table entries § Size of action data from each entry § Size of intermediate metadata across tables

  20. Combine outputs from multiple arrays Bitmap indicates arrays that store the key’s value Match pkt.key == A bitmap = 111 Action Index indicates slots in the arrays to get the value Lookup Table index = 0 Minimal hardware resource overhead pkt.value: A0 A1 A2 0 1 2 3 Match bitmap[0] == 1 Value Table 0 Register Array 0 A0 process_array_0 (index ) Action Match bitmap[1] == 1 Value Table 1 Register Array 1 process_array_1 (index ) A1 Action Match bitmap[2] == 1 Value Table 2 Register Array 2 A2 process_array_2 (index ) Action

  21. Combine outputs from multiple arrays Match pkt.key == A pkt.key == B pkt.key == C pkt.key == D bitmap = 111 bitmap = 110 bitmap = 010 bitmap = 101 Action Lookup Table index = 0 index = 1 index = 2 index = 2 D0 D1 pkt.value: A0 A1 A2 B0 B1 C0 0 1 2 3 Match bitmap[0] == 1 Value Table 0 Register Array 0 A0 B0 D0 process_array_0 (index ) Action Match bitmap[1] == 1 Value Table 1 Register Array 1 process_array_1 (index ) A1 B1 C0 Action Match bitmap[2] == 1 Value Table 2 Register Array 2 A2 D1 process_array_2 (index ) Action

  22. Key-value caching in network ASIC at line rate q How to identify application-level packet fields ? q How to store and serve variable-length data ? q How to efficiently keep the cache up-to-date ?

  23. Cache insertion and eviction q Challenge: cache the hottest O( N log N ) items with limited insertion rate q Goal: react quickly and effectively to workload changes with minimal updates 1 Data plane reports hot keys Cache Management 3 Control plane compares loads of 2 new hot and sampled cached keys 4 2 PCIe Control plane fetches values for 1 3 keys to be inserted to the cache Key-Value Query Cache Statistics 4 Control plane inserts and evicts keys Tor Switch Storage Servers

  24. Query statistics in the data plane report not cached hot pkt.key Cache Bloom filter Lookup Count-Min sketch cached Per-key counters for each cached item q Cached key: per-key counter array q Uncached key § Count-Min sketch: report new hot keys § Bloom filter: remove duplicated hot key reports

  25. Evaluation q Can NetCache run on programmable switches at line rate? q Can NetCache provide significant overall performance improvements? q Can NetCache efficiently handle workload dynamics?

  26. Prototype implementation and experimental setup q Switch § P4 program (~2K LOC) § Routing: basic L2/L3 routing § Key-value cache: 64K items with 16-byte key and up to 128-byte value § Evaluation platform: one 6.5Tbps Barefoot Tofino switch q Server § 16-core Intel Xeon E5-2630, 128 GB memory, 40Gbps Intel XL710 NIC § TommyDS for in-memory key-value store § Throughput: 10 MQPS ; Latency: 7 us

  27. The “boring life” of a NetCache switch Single switch benchmark 2.5 2.5 TKrougKSut (B436) ThroughSut (B436) 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 16. 32. 48. 64. 0 32 64 96 128 CacKe 6ize 9alue 6ize (Byte) (b) Throughput vs. cache size.

  28. And its “not so boring” benefits 1 switch + 128 storage servers 1oCDche 1eWCDche(servers) 1eWCDche(cDche) 2.0 ThroughSuW (BQPS) 1.5 1.0 0.5 0.0 uQiforP ziSf-0.9 ziSf-0.95 ziSf-0.99 WorNloDd DisWribuWioQ 3-10x throughput improvements

  29. Impact of workload dynamics hot-in workload (radical change) random workload (moderate change) 50 50 ThroughSut (0436) ThroughSut (0436) 40 40 30 30 20 20 10 10 average throughSut Ser sec. average throughSut Ser sec. average throughSut Ser 10 sec. average throughSut Ser 10 sec. 0 0 0 20 40 60 80 100 0 20 40 60 80 100 TiPe (s) TiPe (s) Quickly and effectively reacts to a wide range of workload dynamics. (2 physical servers to emulate 128 storage servers, performance scaled down by 64x)

  30. NetCache is a rack-scale key-value store that leverages in-network data plane caching to achieve ~10 μ s latency & billions QPS throughput even under & highly-skewed rapidly-changing workloads.

Recommend


More recommend