optimizing redis
play

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. - PowerPoint PPT Presentation

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. Lavanya S. 15-799 Project Presentation 12/4/2013 1 Goals of Our Project Leverage DRAM and dataset characteristics to improve performance of in-memory database Locality :


  1. Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. Lavanya S. 15-799 Project Presentation 12/4/2013 1

  2. Goals of Our Project • Leverage DRAM and dataset characteristics to improve performance of in-memory database • Locality : Exploit DRAM internal buffers • Capacity : Exploit redundancy in dataset 2

  3. DRAM System Organization CPU DRAM Bus 3

  4. DRAM System Organization DRAM System CPU Bus Bank Banks can be accessed in parallel 4

  5. DRAM Bank Organization Columns Rows (8KB) Row Buffer • Row buffer serves as a fast cache in a bank – Row buffer miss transfers an entire row of data to the row buffer – Row buffer hit for accesses in the same row (reduces latency by 1-2x) 5

  6. RBL in In-Memory Databases • Idea : Map hot data to a few DRAM rows • Hot data : Data with high temporal correlation • Examples of temporally correlated data : – Records touched around the same time – Query terms searched together often 6

  7. Challenge • How are data mapped to DRAM? Which bank? Which row? Virtual Address Virtual Page Number Offset Physical Address Physical Page Number Offset Unexposed to the system: Determined by the HW (memory controller) DRAM System Bank 7

  8. Task 1: Find the Mapping to DRAM • Approach: Kernel module with assembly code to observe access latency to different addresses Input: addr1 & addr2 1. Cache hit 1. Load addr1 // Fill TLB for addr1 2. Cache miss – Row Hit 2. Load addr2 // Fill TLB for addr2 3. Cache miss – Row Miss 3. Flush the cache lines of addr1 and addr2 4. Load addr1 5. Read CPU cycle counter // Tstart for addr2 6. Load addr2 7. Read CPU cycle counter // Tend of addr2 Courtesy: Backbone kernel module is obtained from Hyoseung Kim under Prof. Rajkumar 8

  9. Task 1: Find the Mapping to DRAM • Experimental setup: 3.4GHz Haswell CPU, 2GB DRAM DIMM (8 banks) • With an exhaustive selection of addr1 and addr2, we discover the mapping to be: Physical Address Physical Page Number Offset 18 16 15 13 12 0 Row Row Bank Offset Byte offset within a row (8KB) XOR bit [15:13] with bit [18:16] to select a bank 9

  10. Task 1: Find the Mapping to DRAM 0xFFFF 18 16 15 13 12 0 Row Row Bank Offset Byte offset within a row (8KB) XOR bit [15:13] with bit [18:16] to select a bank P0 P1 P7 Rows … 0x4000 P9 P8 P1 0x2000 Bank 0 Bank 1 Bank 7 P0 8KB 0x0000 Physical Address Space 10

  11. Task 1: Find the Mapping to DRAM • Measurement: Request Type Approximate Latency (CPU cycles) Cache hit 30 Row hit in the same bank 170 60% increase Row hit in a different bank 220 Row miss 270 • The cache hit latency includes the overhead of extra assembly instructions • Under investigation: Why does row hit in a different bank incurs extra latency? 11

  12. Task2: Microbenchmark • Kernel: Allocates 128KB of memory space(guaranteed to be contiguous physical pages) Test 1: Striding within a row Row X+1 -> Results in row hits Bank Y Base + (9 * 8KB) Test 2: Zigzag b/w 2 rows in the same bank … -> Results in row misses Base + 8KB Row X Bank Y Base 12

  13. Why Understand Mapping to DRAM? • Enables mapping application data to exploit locality • Pages mapped to rows: – Data accesses to the same row incur low latency – Colocate frequently accessed data in same row • Next cache line prefetched: – Accessing next cache line incurs low latency – Map data accessed together to adjacent cache lines 13

  14. Data Mapping Benefits in Redis • Is memory access the bottleneck? • Profiling using Performance API (PAPI) – An interface to hardware performance counters • Profile set and get key functions – Determine what fraction of cycles are set and get 14

  15. Data Mapping Benefits in Redis 0.35 0.3 Fraction of Cycles 0.25 0.2 0.15 Set Cycle Fraction 0.1 Get Cycle Fraction 0.05 0 Number of Random Queries Memory is not a significant bottleneck in Redis 15

  16. Sensitivity to Payload Size 0.4 0.35 Fraction Of Cycles 0.3 0.25 0.2 Set Fraction 0.15 0.1 0.05 0 2 4 64 128 8192 16384 32768 65536 Payload Size Memory still not a significant bottleneck in Redis 16

  17. Next Steps • Row-hit vs. miss behavior on Redis : – Memmap to allocate data contiguously in a page – Microbenchmarks to access same and different rows/pages Row X+1 Bank Y … Row X Bank Y 17

  18. More Potential for Data Mapping? • Single-node databases • Mainframe transaction processing systems • Data analytics systems 18

  19. Dataset • Could not find suitable in-memory dataset • We constructed our own dataset based on the English Wikipedia corpus 1. XML dump of current revisions for all English articles • 43GB (uncompressed) • 11/04/2013 • http://dumps.wikimedia.org/enwiki/20131104/enwiki-20131104- pages-articles.xml.bz2 2. Article hit-count log (one hour) • 307MB (uncompressed) • Last hour of 11/04/2013 • http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013- 11/pagecounts-20131105-000001.gz 19

  20. Dataset (cont’d) • Sanitation was unexpectedly non-trivial... – Spam and/or invalid user queries – ASCII vs. UTF-8 vs. ISO/IEC 8859-1 – URI escape characters, HTML escape characters – Running out of memory • Sanitized dataset – 141K key-value pairs: (title, article) – 3.6GB (uncompressed) 20

Recommend


More recommend