review
play

Review Linked List Struktur Data & Algoritme insert, find, - PowerPoint PPT Presentation

Review Linked List Struktur Data & Algoritme insert, find, delete operations take O(n) Stack & Queue ( Data Structures & Algorithms ) insert, find, delete operations take O(1) but the access is restricted Hash Table


  1. Review � Linked List Struktur Data & Algoritme � insert, find, delete operations take O(n) � Stack & Queue ( Data Structures & Algorithms ) � insert, find, delete operations take O(1) � but the access is restricted Hash Table � Binary Search Tree � insert, find, delete operations take O(log n) in average case, but take O(n) in worst case � AVL Tree, Red-Black Tree Denny ( denny@cs.ui.ac.id ) � insert, find, delete operations take O(log n) Suryana Setiawan ( setiawan@cs.ui.ac.id ) Fakultas I lm u Kom puter Universitas I ndonesia Sem ester Genap - 2 0 0 4 / 2 0 0 5 Version 2 .0 - I nternal Use Only SDA/ TOPIC/ V2.0/ 2 Review Objectives � Array � Understand hash table and its operations � all operations take O(1) time � Understand the advantage and disadvantage using hash table � data accessed using index (integer) � size should be determined first � not growable SDA/ TOPIC/ V2.0/ 3 SDA/ TOPIC/ V2.0/ 4 1

  2. Hash Tables Outline � Hashing � Hashing is used for storing relatively large amount of data in a table called hash table ADT. � Definition � Hash table is usually fixed as H-size, which is larger � Hash function than the amount of data that we want to store. � Collition resolution � We define the load factor ( λ ) to be the ratio of data to • Open hashing the size of the hash table. • Separate chaining � Hash function maps an item into an index in range. • Closed hashing (Open addressing) • Linear probing hash table • Quadratic probing 0 item • Double hashing 1 key hash • Primary Clustering, Secondary Clustering 2 function � Access: insert, find, delete 3 H-1 SDA/ TOPIC/ V2.0/ 5 SDA/ TOPIC/ V2.0/ 6 Hash Tables (2) Hash Function � Hashing is a technique used to perform insertions, � Hashing function should have the following features: deletions, and finds in constant average time. � Easy to compute. � To insert or find a certain data, we assign a key to the � Two distinct key map to two different cells in array (Not elements and use a function to determine the location true in general) - why?. of the element within the table called hash function. � This can be achieved by using direct-address table where universal set of keys is reasonably small. � Hash tables are arrays of cells with fixed size containing data or keys corresponding to data. � Distributes the keys evenly among cells. � For each key, we use the hashing function to map key � One simple hashing function is to use mod function into some number in the range 0 to H-size-1 using with a prime number. hashing function. � Any manipulation of digits, with least complexity and good distribution can be used. SDA/ TOPIC/ V2.0/ 7 SDA/ TOPIC/ V2.0/ 8 2

  3. Hash Function: Truncation Hash Function: Folding � Part of the key is simply ignored, with the remainder � The data can be split up into smaller chunks which truncated or concatenated to form the index. are then folded together in some form. Phone no: index Phone no: 3-group index 731-3018 338 7313018 73+13+018 104 539-2309 329 5392309 53+92+309 454 428-1397 217 4281397 42+81+397 520 SDA/ TOPIC/ V2.0/ 9 SDA/ TOPIC/ V2.0/ 10 Hash Function: Modular arithmetic Choosing a hash function � Convert the data into an integer, divide by the size of � A good has function should satisfy two criteria: the hash table, and take the remainder as the index. 1. It should be quick to compute 3-group index 2. It should minimize the number of collisions 731+3018 3749 % 100 = 49 539+2309 2848 % 100 = 48 428+1397 1825 % 100 = 25 SDA/ TOPIC/ V2.0/ 11 SDA/ TOPIC/ V2.0/ 12 3

  4. Example of hash function Example of hash function � Hash function for string int hash(String key, int tableSize) { � X = 128 int hashVal = 0; � A 3 X 3 + A 2 X 2 + A 1 X 1 + A 0 X 0 for (int i=0; i < key.length(); i++) { � (((A 3 X) + A 2 ) X + A 1 ) X + A 0 hashVal = (hashVal * 128 � The result of hash function is much larger than the + key.charAt(i)) % tableSize; size of table, so we should modulo the result with the } size of hash table. return hashVal % tableSize; } � Modulo � (A + B) % C = (A % C + B % C) % C � (A * B) % C = (A % C * B % C) % C SDA/ TOPIC/ V2.0/ 13 SDA/ TOPIC/ V2.0/ 14 Example of hash function Example of hash function int hash(String key, int tableSize) { int hash(String key, int tableSize) { int hashVal = 0; int hashVal = 0; for (int i=0; i < key.length(); i++) { for (int i=0; i < key.length(); i++) { hashVal = (hashVal * 37 hashVal += key.charAt(i) + key.charAt(i)); } } return hashVal % tableSize; hashVal %= tableSize; } if (hashVal < 0) { hashVal += tableSize; } return hashVal; } SDA/ TOPIC/ V2.0/ 15 SDA/ TOPIC/ V2.0/ 16 4

  5. Collision resolution Closed Hashing � When two keys map into the same cell, we get a � If collision, try to find alternative cells within table. collision. � Closed hashing also known as open addressing. � We may have collision in insertion, and need to set a � For insertion, we try cells in sequence by using procedure (collision resolution) to resolve it. incremented function like: � h i (x) = (hash(x) + f(i)) mod H-size f(0) = 0 � Function f is used as collision resolution strategy. � The table is bigger than the number of data. � Different method to choose function f : � Linear probing � Quadratic probing � Double hashing SDA/ TOPIC/ V2.0/ 17 SDA/ TOPIC/ V2.0/ 18 Linear probing Hashing - insert � Use a linear function f(i) = i 0 alpha � Find the first position in the table for the key, which is 1 crystal 2 close to the actual position. 3 dawn 4 emerald � Least complex function. 5 flamingo � May result in primary clustering. 6 7 hallmark � Elements that hash to the different location probe the 8 9 same alternative cells 10 � The complexity of this probing is dependent on the 11 marigold 12 moon value of λ (load factor). 13 14 � We do not use this probing if λ > 0.5. 15 . . . SDA/ TOPIC/ V2.0/ 19 SDA/ TOPIC/ V2.0/ 20 5

  6. Hashing - lookup Hashing - delete � lazy deletion - why? 0 alpha 1 cobalt? 2 crystal 0 alpha 3 dawn 1 4 emerald 2 crystal 5 flamingo 3 dawn 6 delete emerald 4 7 hallmark 5 flamingo 8 6 9 7 hallmark 10 8 11 9 marigold? 12 moon 10 13 marigold 11 14 delete moon 12 private? 15 private 13 marigold . 14 . . 15 private . . . SDA/ TOPIC/ V2.0/ 21 SDA/ TOPIC/ V2.0/ 22 Hashing - operation after delete Primary Clustering � Elements that hash to the different location probe the 0 alpha same alternative cells 1 custom (insert) 2 crystal 3 dawn alpha alpha 4 5 flamingo canary canary cobalt crystal crystal 6 dark dawn dawn 7 hallmark custom custom 8 flamingo flamingo 9 10 hallmark hallmark 11 marigold? 12 13 marigold 14 15 private . . marigold marigold . private private . . . . . . SDA/ TOPIC/ V2.0/ 23 SDA/ TOPIC/ V2.0/ 24 6

  7. Quadratic probing Double hashing � Eliminate the primary clustering by selecting f(i) = i 2 � Collision resolution function is another hash function like f(i) = i * hash2 (x) � There is more problem with a hash table that is more than half full. � Each time a factor of hash2 (x) is added to probe. � You have to select appropriate table size that is not � Have to be careful for the choice of second hash square of a number. function to ensure that it does not come to zero and it probes all the cells. � We can prove that quadratic probing with table size prime number and at least half empty will always find � It is essential to have a prime size hash table. a location for an element. � Can use increment to collision by noting that quadratic function f(i) = i 2 = f(i-1) + 2 i - 1. � Elements that hash to the same location will probe the same alternative cells (secondary clustering). SDA/ TOPIC/ V2.0/ 25 SDA/ TOPIC/ V2.0/ 26 Double Hashing Open Hashing � Collision problems is solved by inserting all elements that hash to the same bucket into a single collection of values. alpha alpha � Open Hashing: canary cobalt crystal crystal � To keep a linked list of all the elements that are dark done dawn dawn hashed to the same cell (separate chaining). custom custom flamingo flamingo � Each cell in the hash table contains a pointer to a linked list containing the data. hallmark hallmark � Functions and Analysis of Open Hashing: � Inserting a new element in to the table: We add the element at the beginning or the end of the appropriate marigold marigold linked list. � Depending if you would want to check for duplicates or private private . . not. . . . . � It also depends on how frequent you expect to access the most recently added elements. SDA/ TOPIC/ V2.0/ 27 SDA/ TOPIC/ V2.0/ 28 7

Recommend


More recommend