Hashing Today’s announcements ◮ HW3 out, due Nov 15, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 ◮ PA2 due Nov 1, 23:59 Today’s Plan ◮ B-Tree intro Warm up: What collision resolution strategy is best? What’s the best dictionary? Why consider balanced BSTs? More info: http://jeffe.cs.illinois.edu/teaching/algorithms/notes/05-hashing.pdf 1 / 10
Memory Hierarchy Why worry about the number of disk I/Os? Size Access Time < 1 cycle hundreds of bytes CPU registers Cache memory tens of kilobytes L1 a few cycles L2 megabytes L3 tens of cycles gigabytes Main memory hundreds of cycles terabytes Disk millions of cycles 2 / 10
Time Cost: Processor to Disk 7200 RPM Processor ◮ Operates at a few GHz (gigahertz = billion cycles per second) . ◮ Several instructions per cycle. ◮ Average time per instruction < 1ns (nanosecond = 10 − 9 seconds) . Disk ◮ Seek time ≈ 10ms (ms = millisecond = 10 − 3 seconds) ◮ (Solid State Drives have “seek time” ≈ 0.1ms.) Result: 10 million instructions for each disk read! Hold on... How long does it take to read a 1TB (terrabyte = 10 12 bytes) disk? 1TB × 10ms = 10 billion seconds > 300 years? What’s wrong? Each disk read/write moves more than a byte. Sequential disk access is faster than Seek. 3 / 10
Memory Blocks Each memory access to a slower level of the hierarchy fetches a block of data. Block Size Block Name CPU a few bytes word Cache 10s bytes cache line Main memory a few kilobytes page Disk A block is the contents of consecutive memory locations. So random access between levels of the hierarchy is very slow. 4 / 10
Chopping Trees into Blocks Idea Store data for many adjacent nodes in consecutive memory locations. Result One memory block access provides keys to determine many (more than two) search directions. 5 / 10
m -ary Search Tree 3 7 12 21 k < 3 3 < k < 7 21 < k 7 < k < 12 12 < k < 21 m -ary tree property ◮ Each node has ≤ m children Result: Complete m -ary tree with n nodes has height Θ(log m n ) Search tree property ◮ Each node has ≤ m − 1 search keys: k 1 < k 2 < k 3 . . . ◮ All keys k in i th subtree obey k i < k < k i +1 for i = 0 , 1 , . . . . Disk I/O’s (runtime) for find : 6 / 10
B-Trees B-Trees of order m are specialized m -ary search trees: ◮ ALL leaves are at the same depth! ◮ Internal nodes have between ⌈ m / 2 ⌉ and m children, except the Root has between 2 and m children ◮ Leaves hold at most m − 1 keys Result ◮ Height is Θ(log m n ) ◮ Insert, delete, find visit Θ(log m n ) nodes ◮ m is chosen so that each (full) node fills one page of memory. Each node visit (disk I/O) retrieves between m / 2 and m keys. 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68 7 / 10
B-Tree Nodes Internal node with i search keys m − 1 1 2 i left sibling · · · · · · right sibling k 1 k 2 k i ∅ ∅ ◮ i + 1 subtree pointers ◮ parent and left & right sibling pointers Each node may hold a different number of items. 8 / 10
Making a B-Tree 3 3 14 insert(3) insert(14) the empty B-Tree M = 3 The root is a leaf. What happens when we now insert(1)? 9 / 10
Splitting the Root 3 3 14 1 14 insert(1) Split the leaf Make a new root Move key 3 up Too many keys for one leaf! So, make a new leaf and create a parent (the new root) for both. Which key goes to the parent? 10 / 10
Recommend
More recommend