data structures and algorithms 2020 09 28 lecture 9 overview hash - PowerPoint PPT Presentation

data structures and algorithms 2020 09 28 lecture 9

overview hash tables trees

dynamic sets elements with key and (possibly) satellite data we wish to add, remove, search (and maybe more) we have seen: heaps, stacks, queues, linked lists now: hashing

hashing hash table is an effective data structure for implementing dictionaries keys for example: strings of characters worst-case for operations usually in Θ( n ) with n number of items in practice often much better, search even in O (1) hash table generalizes array where we access address i in O (1) applications of hashing in compilers and cryptography

direct-address table universe of keys: U = { 0 , . . . , m − 1 } with m small use array of length m : T [0 . . . ( m − 1)] what is stored in T [ k ]? either nil if there is no item with key k or a pointer x to the item (or element) with key is x . key = k and possibly satellite data

operations for direct-address table insert( T , x ) add element x T [ x . key ] := x delete( T , x ) remove element x T [ x . key ] := nil search( T , k ) seach for key k return T [ k ] x is pointer to element with key x . key and satelite data x . element

analysis of direct-address table worst-case of inserting, deleting, searching all in O (1) instead of pointer to the object we can store without pointer in the array drawbacks: if universe of keys U is large we need a lot of storage also if we actually use only a small subset of U and: keys must be integers so: hashing

hash tables hash function maps keys to indices (slots) 0 , . . . , m − 1 of a hash table so h : U → { 0 , . . . , m − 1 } element with key k ∈ U hashes to slot h ( k ) usually more keys than indices: | U | >> m space: reduce storage requirement to size of set of actually used keys time: ideally computing a hash value is easy, on average in O (1)

example of a simplistic hash function keys are first names, additional data phone numbers hash function: length modulo 5 (Alice, 0205981555) 0 Alice 1 John 2 (Sue, 0620011223) Sue 3 (John, 0201234567) 4

collisions problem: different keys may be hashed to the same slots namely if h ( k ) = h ( k ′ ) with k � = k ′ this is called a collision if number of keys | U | larger than number of slots m then the hash function h cannot be injective (a function f : A → B is injective if a � = a ′ implies f ( a ) � = f ( a ′ )) even if we cannot totally avoid collisions, we try to avoid as much as possible by taking a ‘good’ hash function

do we often have collisions? for p items and a hash table of size m : m p possibilities for a hash function if p = 8 and m = 10 already 10 8 possibilities m ! there are ( m − p )! possibilities for hashing without collision if p = 8 and m = 10 then 3 · 4 · . . . · 10 such possibilities illustration: birthday paradox for 23 people the probability that everyone has a unique birthday is < 1 2 that is: for p = 23 and m = 366 the probability of collision is ≥ 1 2

how to deal with collisions? either using chaining put items that hash to the same value in a linked list or using open addressing use a probe sequence to find alternative slots if necessary

chaining: example hash function is month of birth modulo 5 drawback: pointer structures are expensive 0 (01.01., Sue) 1 ∅ 2 (12.03., John) (16.08., Madonna) 3 ∅ 4

solving collisions using chaining create a list for each slot link records in the same slot into a list slot in hash table points to head of a linked list and is nil if list is empty

chaining with doubly linked lists: worst-case analysis insert element x in hash table T : in O (1) insert at the front of a doubly linked list delete element x from hash table T : in O (1) if lists are doubly-linked if we have the element available, no search is needed we use the doubly linked structure search key k in hash table T : in O ( n ) with n the size of the dictionary worst-case if every key hashes the same slot, then linear in the total number of elements for exam: know and be able to explain this

load factor assumption: key is hashed to any arbitrary slot, independent of other keys we have n keys and m slots probability of h ( k ) = h ( k ′ ) is 1 m expected length of list at T [ h ( k )] is n m this is called the load factor α = n m

chaining: average case for unsuccessful search: compute h ( k ) and search through the list: in Θ(1 + α ) for successful search: also in Θ(1 + α ) so if α ∈ O (1) (constant!) then average search time in Θ(1) so for example if n ∈ O ( m ) (number of slots proportional to number of keys) if hash table is too small it does not work properly !

intermezzo: choosing a hash function in view of the assumption in the analysis: what is a good hash function? distributes keys uniformly and seemingly randomly regularity of keys distribution should not affect uniformity hash values are easy to compute: in O (1) (these properties can be difficult to check)

possible hash functions with keys natural numbers division method: a key k is hashed to k mod m pro: easy to compute contra: not good for all values of m ; take for m a prime not too close to a power of 2 multiplication method: a key k is hashed to ⌊ ( m · ( k · c − ⌊ k · c ⌋ )) ⌋ with c a constant c with 0 < c < 1 which c is good ? remark: we do not consider universal hashing (book 11.3.3)

open addressing alternative to chaining for solving collisions every slot of the hash table contains either nil or an element for hashing, we make a probe sequence h : U × { 0 , . . . , m − 1 } → { 0 , . . . , m − 1 } that for every key k ∈ U is a permutation of the avalaible slots 0 , . . . , m − 1 we only use the table, no pointers, the load factor is at most 1 for insertion: we try the slots of the probe sequene and take the first available one deletion is difficult, so we omit deletion

remark: removal is difficult suppose the hash function gives probe sequence 2 , 4 , 0 , 3 , 1 for key a , and probe sequence 2 , 3 , 4 , 0 , 1 for key b we insert a , then insert b , then delete a , then search for b if deletion of a gives nil in slot 2, then our search for b fails of deletion of a is marked by a special marker, which is skipped in a search, then search time is also influenced by the amount of markers (not only by the load factor)

open addressing: linear probing next probe: try the next address modulo m h ( k , i ) = h ′ ( k ) + i mod m the probe sequence for a key k is h ′ ( k ) + 0 mod m h ′ ( k ) + 1 mod m h ′ ( k ) + 2 mod m . . . h ′ ( k ) + m − 1 mod m we get clustering! (and removal difficult, as in general for open addressing)

open addressing: double hashing next probe: use the second hash function h ( k , i ) = h 1 ( k ) + i · h 2 ( k ) mod m with h 2 ( k ) relatively prime to the size of the hash table the probe sequence for a key k is: h 1 ( k ) + 0 · h 2 ( k ) mod m h 1 ( k ) + 1 · h 2 ( k ) mod m h 1 ( k ) + 2 · h 2 ( k ) mod m . . . h 1 ( k ) + ( m − 1) · h 2 ( k ) mod m

double hashing: example m = 13, h ( k ) = k mod 13, h ′ ( k ) = 7 − ( k mod 7) h ( k ) h ′ ( k ) try k 18 5 3 5 41 2 1 2 22 9 6 9 44 5 5 5, 10 59 7 4 7 32 6 3 6 31 5 4 5,9,13 73 8 4 8

open addressing: analysis probe sequence: h ( k , 0) , h ( k , 1) , . . . , h ( k , m − 1) assumption: uniform hashing, that is: each key is equally likely to have any one of the m ! permutations as its probe sequence regardless of what happens to the other keys assumption: load factor α = n m < 1

expected number of probes for unsuccessful search probe 1: with probability n m collision, so go to probe 2 probe 2: with probability n − 1 m − 1 collision, so go to probe 3 probe 3: with probability n − 2 m − 2 collision, so go to probe 4 m − i < n n − i note: m = α expected number of probes: m (1 + n − 1 m − 1 (1 + n − 2 1 + n m − 2 ( . . . ))) ≤ 1 + α (1 + α (1 + α ( . . . ))) ≤ 1 + α + α 2 + α 3 + . . . = i =0 α i = Σ ∞ 1 1 − α

open addressing: remarks we assume α < 1 and uniform hashing inserting, and successful or unsuccessful search in O (1) constant then expected number of probes in O (1) if table is full for 50% then we expect 2 probes if table is full for 90% then we expect 10 probes

overview hash tables trees

recap definitions binary tree: every node has at most 2 successors (empty tree is also a binary tree) depth of a node x : length (number of edges) of a path from the root to x height of a node x : length of a maximal path from x to a leaf height of a tree: height of its root number of levels is height plus one

binary tree: linked implementation linked data structure with nodes containing • x . key from a totally ordered set • x . left points to left child of node x • x . right points to right child of node x • x . p points to parent of node x if x . p = nil then x is the root T . root points to the root of the tree (nil if empty tree)

binary tree: alternative implementation remember the heap binary trees can be represented as arrays using the level numbering

tree traversals how can we visit all nodes in a tree exactly once? we will mainly focus on binary trees

data structures and algorithms 2020 09 28 lecture 9 overview hash - PowerPoint PPT Presentation

data structures and algorithms 2020 09 28 lecture 9 overview hash tables trees dynamic sets elements with key and (possibly) satellite data we wish to add, remove, search (and maybe more) we have seen: heaps, stacks, queues, linked lists

data structures and algorithms 2020 10 08 lecture 12 nice code def whatisthis(n): return (4

Analysis of Algorithms Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Algorithms and Data Structures Lecture 6 Binary Search Trees I Fabian Kuhn Algorithms and

Algorithms and Data Structures: Overview Algorithms and data structures Data Abstraction,

welcome to data structures and algorithms data structures and algorithms 2020 08 31 lecture 1

Data Structures and Algorithms Course at D-MATH (CSE) of ETH Zurich Spring 2020 1. Introduction

Lecture 15: Sorting CSE 373: Data Structures and Algorithms Algorithms CSE 373 WI 19 - KASEY

} Input Correctness Output Rigorous, Unambiguous and Sufficiently Basic at each

Algorithms and Data Structures Lecture 11 Dynamic Programming Fabian Kuhn Algorithms and

Graphs Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Lecture 2: Stacks and CSE 373: Data Structures and Queues Algorithms CSE 373 19 SP - KASEY

Tries Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer

Exams Gerth Stlting Brodal Algorithms and Data Structures Retreat, Sandbjerg, Denmark, March 3,

Hash Tables Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Lecture 13: Computer CSE 373 Data Structures and Memory Algorithms CSE 373 SP 18 - KASEY

Introduction to Algorithms and Data Structures CSC 1051 Algorithms and Data Structures I Dr.

Directed Graphs Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1

Lecture 1: Welcome! CSE 373: Data Structures and Algorithms CSE 373 19 WI - KASEY CHAMPION 1

Course Objective : to teach you some data structures and associated algorithms INF421, Lecture 3

Course Objective : to teach you some data structures and associated algorithms INF421, Lecture 7

Algorithms and Data Structures, or . . . Classical Algorithms of the 50s, 60s and 70s Mary Cryan

Course Objective : to teach you some data structures and associated algorithms INF421, Lecture 4