Hash- Tables Introduction Dictionary Dictionary stores key-value - PowerPoint PPT Presentation

Hash- Tables

Introduction

Dictionary Dictionary ◮ stores key-value pairs Find( k ) Insert( k , v ) Delete( k ) O ( n ) O ( 1 ) O ( n ) List O ( log n ) O ( n ) O ( n ) Sorted Array O ( log n ) O ( log n ) O ( log n ) Balanced BST Dictionary implementations we know. Goal ◮ All operations in O ( 1 ) time. 3 / 22

Hash Tables

Naive Approach Direct Access Table ◮ One large array A . ◮ For each key value pair ( k , v ): A [ k ] = v . v 1 v 2 · · · A · · · 0 1 2 k 1 k 2 Problems 1. Keys must be a non-negative integers. 2. Very large key range. Thus, huge amount of memory needed. 5 / 22

Problem 1: Getting Integer Keys Prehashing ◮ Take a key k and map it on a non-negative integer k ′ . ◮ Easy in theory, because all finite data can be represented as integer. ◮ k ′ should not change when object changes. ◮ In ideal case: k ′ is unique for the object. 6 / 22

Problem 2: Getting Small Keys Hashing ◮ U is huge universe of all possible (non-neg. int.) keys. ◮ Hash function h m : U → { 0 , 1 , . . . , m − 1 } reduces keys to small range of integers. Ideally, m ∈ Θ( n ) with a small constant c ≥ 1 . ◮ Computing h m should require O ( 1 ) time with small constant. 0 1 2 h m . . . m − 3 m − 2 m − 1 Keyspace Hashtable 7 / 22

Collisions Problem with Hashing ◮ Because | U | ≫ m , in some cases: h m ( k 1 ) = h m ( k 2 ) . This is called collision . Questions 1. How to design h such that number of collisions is low? 2. How do we handle collisions? For 1. ◮ For this class, assume h m is given and has uniform distribution of hash values. ◮ Thus, expected size of sets with same hash value is n m . ◮ α = n m is called load factor of the table. 8 / 22

Chaining

Chaining Idea ◮ Use a list (or other data structure) of colliding items in each slot of the table. k 4 k 3 k 0 k 1 k 2 Find Operation 1. Use hash to determine slot in table. 2. Search in list for item. 10 / 22

Open Addressing

Open Addressing Idea ◮ Store all items in the array (i. e., one item per slot). Problem ◮ How do we handle collisions? h m ( k 2 ) k 2 k 1 ? Solution: Probing ◮ If slot is already used, compute new hash value. Repeat until free slot was found 12 / 22

Probing Hash function h specifies order of slots for a key k . h m : U × { 0 , 1 , . . . , m − 1 } → { 0 , 1 , . . . , m − 1 } Resulting order: σ ( k ) = � h m ( k , 0 ) , h m ( k , 1 ) , . . . , h m ( k , m − 1 ) � In ideal case, σ ( k ) is permutation of { 0 , 1 , . . . , m − 1 } . h m ( k 2 , 0 ) h m ( k 2 , 1 ) k 2 k 1 k 0 h m ( k 2 , 2 ) 13 / 22

Example Let h m ( 49 , 0 ) = 4 , h m ( 49 , 1 ) = 6 , h m ( 49 , 2 ) = 1 , and h m ( 49 , 3 ) = 5 . Perform 1. Insert(49) 2. Delete(58) 3. Find(49) 58 13 20 48 0 1 2 3 4 5 6 7 14 / 22

Example Let h m ( 49 , 0 ) = 4 , h m ( 49 , 1 ) = 6 , h m ( 49 , 2 ) = 1 , and h m ( 49 , 3 ) = 5 . Perform 1. Insert(49) 2. Delete(58) 3. Find(49) 49 58 13 20 48 0 1 2 3 4 5 6 7 14 / 22

Example Let h m ( 49 , 0 ) = 4 , h m ( 49 , 1 ) = 6 , h m ( 49 , 2 ) = 1 , and h m ( 49 , 3 ) = 5 . Perform 1. Insert(49) 2. Delete(58) 3. Find(49) 58 58 13 20 49 48 0 1 2 3 4 5 6 7 14 / 22

Example Let h m ( 49 , 0 ) = 4 , h m ( 49 , 1 ) = 6 , h m ( 49 , 2 ) = 1 , and h m ( 49 , 3 ) = 5 . Perform 1. Insert(49) 2. Delete(58) 3. Find(49) 49 Not found 13 20 49 48 0 1 2 3 4 5 6 7 14 / 22

Open Addressing – Delete Delete ◮ Simple deletion can lead to failure of Insert/Find. ◮ Flag slot with deleted item as ‘ Deleted ’. ◮ Use fl ag for Insert/Find. Question ◮ What if k is already in the table, but Insert encounters a fi eld fl agged as ‘ Deleted ’ ? 15 / 22

Probing Strategies – Linear Probing Idea ◮ Slightly increase index h m ( k , i ) = ( h m ( k ) + i ) mod m Good ◮ Gives a permutation (i. e., no index checked twice) Problem ◮ Clustering : consecutive groups of occupied slots. ◮ For 0 . 01 < α < 0 . 99 , there are clusters of size Θ( log n ) even if h m is perfect. 16 / 22

Probing Strategies – Double Hashing h ( k , i ) = ( h 1 ( k ) + i · h 2 ( k )) mod m h 1 and h 2 should be independent, i. e., probability for h 1 ( x ) = h 1 ( y ) and h 2 ( x ) = h 2 ( y ) is 1 m 2 . Hits all slots if h 2 ( k ) and m have no common divisor (e. g. m = 2 r and h 2 ( k ) is always odd). Assuming ideal hash function h , the expected cost for an operation 1 is ≤ 1 − α . 17 / 22

Using Hash Tables

Dictionaries Dictionary ◮ Stores key-value pairs ◮ The key is an identifier. ◮ The value is an information associated with the key. Operations ◮ Insert. Inserts or overrides a given key-value pair into the dictionary. ◮ Find. Returns the value associated with the given key. (often implemented as two function: Contains and GetValue ) ◮ Delete. Deletes the key-value pair with the key. 19 / 22

Example: Counting You are given an array A of integers. Determine the most frequent number. 20 / 22

Example: Counting You are given an array A of integers. Determine the most frequent number. Idea ◮ Numbers in A are keys. ◮ Value in dictionary is counter for associated key. ◮ Key with largest associated value is answer. 20 / 22

Example: Counting Input: An array A of integers. Output: The most frequent number in A . 1 Create empty dictionary D . 2 For i = 0 To | A | − 1 If D contains key A [ i ] Then 3 Insert key-value pair ( A [ i ] , 0 ) . 4 Increase the value of key A [ i ] by 1 . 5 6 Let k be the key in D with the largest associated value. 7 Return k 21 / 22

Exercises You are given an array A of integers. Find the last (i. e., with the highest index) non-repeating integer in A in linear time. You are given an array A of integers and an integer k . Determine whether there are two distinct indices i and j such that A [ i ] = A [ j ] and | i − j | ≤ k . Given two strings S and T (only lowercase letters). T is generated by shuffling S and then adding one more letter at a random position. Determine the letter that was added into T . You are given an array A of integers. Determine the longest connected subsequence without repeating characters. 22 / 22

Hash- Tables Introduction Dictionary Dictionary stores key-value - PowerPoint PPT Presentation

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k , v ) Delete( k ) O ( n ) O ( 1 ) O ( n ) List O ( log n ) O ( n ) O ( n ) Sorted Array O ( log n ) O ( log n ) O ( log n ) Balanced BST

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

Hash tables Most data structures that were going to see are about storing and manipulating data

Dictionaries A Dictionary stores keyelement pairs, called items . Several Inf 2B: Hash Tables

csci 210: Data Structures Maps and Hash Tables Summary Topics the Map ADT Map

Uses of dictionaries n Symbol table in a compiler n Key: nameof identifier n Values:

csci 210: Data Structures Maps and Hash Tables Summary Topics the Map ADT Map

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and

Hash Tables, Dictionaries, and the Art of O(1) Lookup n. a presentation by Matt Zhang for

dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti School of Informatics University

Hash Tables LAST TODAY NEXT Hashing Unbounded arrays Implementing Genericity

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Conditional Course Lecture 4 Hash Tables I: Separate Chaining and Open Addressing Fabian Kuhn

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Hash Table Analysis When do hash tables degrade in performance? How should we set the maximum

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to

Hash- Tables Introduction Dictionary Dictionary stores key-value - PowerPoint PPT Presentation

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k , v ) Delete( k ) O ( n ) O ( 1 ) O ( n ) List O ( log n ) O ( n ) O ( n ) Sorted Array O ( log n ) O ( log n ) O ( log n ) Balanced BST

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

Hash tables Most data structures that were going to see are about storing and manipulating data

Dictionaries A Dictionary stores keyelement pairs, called items . Several Inf 2B: Hash Tables

csci 210: Data Structures Maps and Hash Tables Summary Topics the Map ADT Map

Uses of dictionaries n Symbol table in a compiler n Key: nameof identifier n Values:

csci 210: Data Structures Maps and Hash Tables Summary Topics the Map ADT Map

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and

Hash Tables, Dictionaries, and the Art of O(1) Lookup n. a presentation by Matt Zhang for

dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Inf 2B: Hash Tables Lecture 4 of ADS thread Kyriakos Kalorkoti School of Informatics University

Hash Tables LAST TODAY NEXT Hashing Unbounded arrays Implementing Genericity

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Conditional Course Lecture 4 Hash Tables I: Separate Chaining and Open Addressing Fabian Kuhn

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Hash Table Analysis When do hash tables degrade in performance? How should we set the maximum

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used