Hash Table Design and Optimization for Software Virtual Switches - PowerPoint PPT Presentation

Hash Table Design and Optimization for Software Virtual Switches PRESENTER: REN WANG YIPENG WANG, SAMEH GOBRIEL, REN WANG, CHARLIE TAI, CRISTIAN DUMITRESCU INTEL LABS

OUTLINE  Background and motivation  Survey and understanding  Analysis

Background  We found the most common data structure used in virtual switch is hash table.  wildcarding match (tuple space search): routing table, ACL  exact match: con-track table, flow cache, etc.  Comparing to tree based data structure, hash table based data structure has certain advantages:  More parallelism: no pointer chasing  Faster rule updates 3

Background  Hash table lookup is also one of the most time consuming stage during packet processing:  E.g. Open vSwitch (100k rules, 20 subtables) IO Preprocessing Rule lookup others Execution ~8% ~5% ~78% ~9% percentage  A major source of hash table lookup overhead is memory access latency. 4

Motivation  Hash table is a simple data structure, but there are many different design and implementations.  Understanding of hash table performance and how to design an efficient hash table structure is the key to a good software switch.  A general guideline to hash table designs will benefit future vswitch development. 5

Basic hash table structure  The evolution of hash table algorithms: single array -> bucket-based -> n-hash Key A Key A Key A 6

Cuckoo hashing  Cuckoo hash algorithm: existing keys can be displaced to alternative bucket Key A Key B Key A Hash func Key B 7

Survey  We also studied into various open source virtual switch applications to learn their implementations.  Three major purposes these applications use hash table for:  Routing table/ACL – tuple space search  Connect tracking table – exact match  Flow cache – exact/signature match with replacement policy 8

Observations  Set-associative table and cuckoo hash are widely used.  Bucket size is usually 4-8 entries  Cache alignment  Vectorization  Capacity guarantee is needed in telcom use cases  Linked list based hash table as extended table  Software techniques to improve performance:  Software pipelining  Batching  Read write concurrency  Optimistic locking  Intel TSX 9

Analysis - Table organization and data structure  Number of keys per bucket  More entries in a bucket can directly improve the table utilizations. load factors vs. assoc. C uckoo hash insertion cycles vs. load factor 100% 90% 10000 9000 80% 8000 70% 7000 60% 1 hash 6000 50% 5000 2hash 40% 4000 cuckoo 3000 30% 2000 20% 1000 10% 0 0% 90 92 94 96 98 100 1 2 4 8 16  Conclusion: when table utilization is important, cuckoo hash should be used. Multiple hash function and multiple ways per bucket also help a lot. 10

Analysis - Separate key storage and cache alignment  Hash tables could store key-data pair in a separate memory location, and only keep signatures and index in the table.  Pros: signature and index are easily to be cache aligned, benefit cache miss case.  Cons: requires another memory jump when hit. Key+data SIG  Out tests show that with optimized DPDK hash tables, storing keys in or outside the table does not show major difference with 16 or 32-byte key size.  However, cache alignment will improve hash table lookup speed by 6.5-16.7% in our DPDK based performance test. 11

Analysis - Hash table based cache  When use hash table for flow cache we need to consider cache miss ratio. lookup  4-8 ways per bucket can already keep the miss ratio to be reasonable low.  We propose a new AVX-based LRU implementation.  Use Intel AVX instruction to permute the bucket. miss ratio vs. ways/bucket 30.00% miss ratio 25.00% 20.00% speed comparison (lower the better) 15.00% 15 10.00% void adjust_location(int location, bucket* bucket){ 5.00% avx age ll bplru tplru 10 __m256i array = avx_load(bucket) 0.00% __m256i permute_pattern = avx_load(permute_index[location]) 1 2 4 8 16 __m256i permuted_array = avx_permute(array, permute_pattern) 5 avx_store (bucket, permuted_array) } 0 insert lookup 12

Analysis - Software pipelining and batching  Batching can enable us to prefetch hash table bucket for different lookup keys.  Together with batching, software pipelining can further improve performance.  Software pipelining + batching easily improve performance by 2X in our test case. cycle/pkt with prefetch or pipelining 140 120 100 80 60 40 20 0 no opt. prefetch pipelined 13

Analysis- Vectorization  Besides using Intel AVX instruction for LRU operation, we can also use AVX instruction to perform signature comparison.  We compare three mechanisms:  No vectorization.  Horizontal vectorization : compare one key’s signature to all signatures in a bucket.  Vertical vecotrization : compare all key’s signatures in a batch to different entries across different buckets. lookup cycle vs. avg entry location  Observation: 140  Vertical or scalar better for low table utilization. 120 100 scalar  Horizontal better for high table utilization. 80 vert  An adaptive method could benefit. 60 40 hori 20 0 4.2 3.8 2.4 1.05 1 14

Future directions of Hash Table Design  Cuckoo hash + extended linked list design  Linked list based hash table provides capacity guarantee.  Cuckoo hash table provides high table utilization and constant table lookup time.  The combination of both to achieve both capacity guarantee and better utilization.  Adaptive vrouter  From the study, we found no single data structure could fit all use cases.  Runtime decision based on traffic patterns could benefit.  During runtime, a “learning” ( e.g., trial and rank) phase to try various hash table data structures. 15

Conclusion  We investigated multiple hash table algorithms and implementations in popular virtual switches.  We analyzed various hash table designs and provide guide lines for different use cases.  We proposed Intel AVX based LRU cache implementation and adaptive signature comparison.  We proposed future directions on hash table design in virtual switches. 16

Hash Table Design and Optimization for Software Virtual Switches - PowerPoint PPT Presentation

Hash Table Design and Optimization for Software Virtual Switches PRESENTER: REN WANG YIPENG WANG, SAMEH GOBRIEL, REN WANG, CHARLIE TAI, CRISTIAN DUMITRESCU INTEL LABS OUTLINE Background and motivation Survey and understanding

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Table In a hash table, we allocate an array of size m, which is much smaller than |U|

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to

Databases Announcements Create Table and Drop Table Create Table 4 Create Table CREATE

Hash Functions and MACs Properties of Cryptographic Hash Functions Introduction to Message

Hash Functions and MACs Properties of Cryptographic Hash Functions Introduction to Message

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Vlserver Memory Cache Mark Vitale OpenAFS Workshop 2019 19 Jun 2019 vlserver performance

PIR-PSI : SCALING PRIVATE CONTACT DISCOVERY PETS 2018 Peter Rindal Daniel Demmler Mike Rosulek

Haow do I sandbox?!?! Cuckoo Sandbox Internals Jurriaan Bremer @skier t Student (University of

Advanced Algorithms COMS31900 Hashing part three Cuckoo Hashing Rapha el Clifford Slides

Cache Digests for HTTP/2 Kazuho Oku Cache Digest (IETF 100) Pull Request #413 proposes:

Hig igh-Performance Key- Carnegie Mellon Value Store University Intel Labs Presented by:

INTRODUCTION H ERE Claudio nex Guarnieri @botherder Security

Strong Randomness Properties of (Hyper-)Graphs Generated by Simple Hash Functions Martin Aum