dynamic programming hash tables and biostatistics 615 815
play

Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture - PowerPoint PPT Presentation

. . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang February 1st, 2011 Hyun Min Kang Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture 8: . . . . . . Summary . Introduction . . . . . . . . .


  1. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang February 1st, 2011 Hyun Min Kang Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture 8: . . . . . . Summary . Introduction . . . . . . . . . . Hash Tables . ChainedHash OpenHash Fibonacci 1 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. . . . . . . class web page . 815 projects . . . . . . . . Instructor sent out E-mails to individually today morning Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . . . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash Fibonacci . Summary Announcements . Homework #2 2 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • For problem 3, assume that all the input values are unique • Include the class definition into myTree.h and myTreeNode.h (do not make .cpp file) • The homework .tex file containing the source code is uploaded in the

  3. . . . . . . class web page . 815 projects . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . . . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash Fibonacci . Summary Announcements . Homework #2 2 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • For problem 3, assume that all the input values are unique • Include the class definition into myTree.h and myTreeNode.h (do not make .cpp file) • The homework .tex file containing the source code is uploaded in the • Instructor sent out E-mails to individually today morning

  4. . Summary February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Hash Tree List SortedArray . Array Remove Insert Search Recap : Elementary data structures 3 / 36 . Fibonacci OpenHash ChainedHash Hash Tables . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Θ( n ) Θ(1) Θ( n ) Θ( log n ) Θ( n ) Θ( n ) Θ( n ) Θ(1) Θ( n ) Θ( log n ) Θ( log n ) Θ( log n ) Θ(1) Θ(1) Θ(1) • Array or list is simple and fast enough for small-sized data • Tree is easier to scale up to moderate to large-sized data • Hash is the most robust for very large datasets

  5. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Recap: Example of a linked list Summary . Fibonacci OpenHash 4 / 36 ChainedHash Hash Tables Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Example of a doubly-linked list • Singly-linked list if prev field does not exist

  6. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Recap: An example binary search tree Summary . Fibonacci OpenHash 5 / 36 ChainedHash Hash Tables Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Pointers to left and right children ( Nil if absent) • Pointers to its parent can be omitted.

  7. . . . . . . . . Or create a Makefile and just type ’make’ . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Individually compile and link - Does NOT work with template Hash Tables 6 / 36 OpenHash Fibonacci . Correction: Building your program (lecture 6) Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Include the content of your .cpp files into .h • For example, Main.cpp includes myArray.h user@host: ˜ /> g++ -o myArrayTest Main.cpp all: myArrayTest # binary name is myArrayTest myArrayTest: Main.cpp # link two object files to build binary g++ -o myArrayTest Main.cpp # must start with a tab clean: rm *.o myArrayTest

  8. . . . . . . . . Dynamic programming . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . Data structure . . . . . . . . . . Introduction Hash Tables ChainedHash Fibonacci OpenHash . Summary Today . 7 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Hash table • Divide and conquer vs dynammic programming

  9. . . . . . . . . . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . OpenHash . . . . . . . . . . Introduction Hash Tables Containers for single-valued objects - last lectures ChainedHash 8 / 36 . . Two types of containers Summary Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Insert ( T , x ) - Insert x to the container. • Search ( T , x ) - Returns the location/index/existence of x . • Remove ( T , x ) - Delete x from the container if exists • STL examples include std::vector , std::list , std::deque , std::set , and std::multiset . Containers for (key,value) pairs - this lecture • Insert ( T , x ) - Insert ( x . key , x . value ) to the container. • Search ( T , k ) - Returns the value associated with key k . • Remove ( T , x ) - Delete element x from the container if exitst • Examples include std::map , std::multimap , and gnu cxx::hash map

  10. . . . . . . . . Direct address table : a constant-time continaer . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Hash Tables 9 / 36 OpenHash Fibonacci . Direct address tables Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example (key,value) container • U = { 0 , 1 , · · · , N − 1 } is possible values of keys ( N is not huge) • No two elements have the same key Let T [0 , · · · , N − 1] be an array space that can contain N objects • Insert ( T , x ) : T [ x . key ] = x • Search ( T , k ) : return T [ k ] • Remove ( T , x ) : T [ x . key ] = Nil

  11. . . . . . . . Memory requirement . . . . . . . . arbitrary-length strings (or exponential to the length of the string) Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Hash Tables . 10 / 36 OpenHash Summary Time complexity Fibonacci . Analysis of direct address tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Requires a single memory access for each operation • O (1) - constant time complexity • Requires to pre-allocate memory space for any possible input value • 2 32 = 4 GB × (size of data) for 4 bytes (32 bit) key • 2 64 = 18 EB (1 . 8 × 10 7 TB ) × (size of data) for 8 bytes (64 bit) key • An infinite amount of memory space needed for storing a set of

  12. . . . good performance . Key components . . . . . . . . Hash function h x key mapping key onto smaller ’addressible’ space H Total required memory is the possible number of hash values Good hash function minimize the possibility of key collisions Collision-resolution strategy, when h k h k . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . Fibonacci . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash . 11 / 36 . Summary Hash Tables . Key features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • O (1) complexity for Insert , Search , and Remove • Requires large memory space than the actual content for maintainng • But uses much smaller memory than direct-addres tables

  13. . . . . . . good performance . Key components . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . OpenHash . . . . . . . . . . Introduction Hash Tables . ChainedHash 11 / 36 Key features Fibonacci . Hash Tables . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . • O (1) complexity for Insert , Search , and Remove • Requires large memory space than the actual content for maintainng • But uses much smaller memory than direct-addres tables • Hash function • h ( x . key ) mapping key onto smaller ’addressible’ space H • Total required memory is the possible number of hash values • Good hash function minimize the possibility of key collisions • Collision-resolution strategy, when h ( k 1 ) = h ( k 2 ) .

  14. . . . . . . . uniformly’ distribute key values to H . . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction A good hash function Hash Tables OpenHash Fibonacci . Chained hash : A simple example Summary . 12 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Assume that we have a good hash function h ( x . key ) that ’fairly • What makes a good hash function will be discussed later today. A ChainedHash • Each possible hash key contains a linked list • Each linked list is originally empty • An input (key,value) pair is appened to the linked list when inserted • O (1) time complexity is guaranteed when no collision occurs • When collision occurs, the time complexity is proportional to size of linked list assocated with h ( x . key )

  15. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Illustration of ChainedHash Summary . Fibonacci OpenHash ChainedHash Hash Tables Introduction . . . . . . . . . . 13 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Recommend


More recommend