Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - PowerPoint PPT Presentation

Hashing Today’s announcements ◮ HW3 out, due Nov 15, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 ◮ PA2 due Nov 1, 23:59 Today’s Plan ◮ B-Tree intro Warm up: What collision resolution strategy is best? What’s the best dictionary? Why consider balanced BSTs? More info: http://jeffe.cs.illinois.edu/teaching/algorithms/notes/05-hashing.pdf 1 / 10

Memory Hierarchy Why worry about the number of disk I/Os? Size Access Time < 1 cycle hundreds of bytes CPU registers Cache memory tens of kilobytes L1 a few cycles L2 megabytes L3 tens of cycles gigabytes Main memory hundreds of cycles terabytes Disk millions of cycles 2 / 10

Time Cost: Processor to Disk 7200 RPM Processor ◮ Operates at a few GHz (gigahertz = billion cycles per second) . ◮ Several instructions per cycle. ◮ Average time per instruction < 1ns (nanosecond = 10 − 9 seconds) . Disk ◮ Seek time ≈ 10ms (ms = millisecond = 10 − 3 seconds) ◮ (Solid State Drives have “seek time” ≈ 0.1ms.) Result: 10 million instructions for each disk read! Hold on... How long does it take to read a 1TB (terrabyte = 10 12 bytes) disk? 1TB × 10ms = 10 billion seconds > 300 years? What’s wrong? Each disk read/write moves more than a byte. Sequential disk access is faster than Seek. 3 / 10

Memory Blocks Each memory access to a slower level of the hierarchy fetches a block of data. Block Size Block Name CPU a few bytes word Cache 10s bytes cache line Main memory a few kilobytes page Disk A block is the contents of consecutive memory locations. So random access between levels of the hierarchy is very slow. 4 / 10

Chopping Trees into Blocks Idea Store data for many adjacent nodes in consecutive memory locations. Result One memory block access provides keys to determine many (more than two) search directions. 5 / 10

m -ary Search Tree 3 7 12 21 k < 3 3 < k < 7 21 < k 7 < k < 12 12 < k < 21 m -ary tree property ◮ Each node has ≤ m children Result: Complete m -ary tree with n nodes has height Θ(log m n ) Search tree property ◮ Each node has ≤ m − 1 search keys: k 1 < k 2 < k 3 . . . ◮ All keys k in i th subtree obey k i < k < k i +1 for i = 0 , 1 , . . . . Disk I/O’s (runtime) for find : 6 / 10

B-Trees B-Trees of order m are specialized m -ary search trees: ◮ ALL leaves are at the same depth! ◮ Internal nodes have between ⌈ m / 2 ⌉ and m children, except the Root has between 2 and m children ◮ Leaves hold at most m − 1 keys Result ◮ Height is Θ(log m n ) ◮ Insert, delete, find visit Θ(log m n ) nodes ◮ m is chosen so that each (full) node fills one page of memory. Each node visit (disk I/O) retrieves between m / 2 and m keys. 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68 7 / 10

B-Tree Nodes Internal node with i search keys m − 1 1 2 i left sibling · · · · · · right sibling k 1 k 2 k i ∅ ∅ ◮ i + 1 subtree pointers ◮ parent and left & right sibling pointers Each node may hold a different number of items. 8 / 10

Making a B-Tree 3 3 14 insert(3) insert(14) the empty B-Tree M = 3 The root is a leaf. What happens when we now insert(1)? 9 / 10

Splitting the Root 3 3 14 1 14 insert(1) Split the leaf Make a new root Move key 3 up Too many keys for one leaf! So, make a new leaf and create a parent (the new root) for both. Which key goes to the parent? 10 / 10

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - PowerPoint PPT Presentation

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, 19:00-21:00 WOOD 2 PA2 due Nov 1, 23:59 Todays Plan B-Tree intro Warm up: What collision resolution strategy is best? Whats the best dictionary? Why

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Information near-duplicates Minimum hashing; Locality Sensitive Hashing Web Search Information

Hashing Algorithms Hash functions Separate Chaining Linear Probing Double Hashing Symbol-Table

Discrete Hashing Fast, scalable retrieval and classification Fumin Shen Center for Future Media,

Heat Engines and the Second Law of Thermodynamics Heat Engines Reversible and Irreversible

Introduction CptS 223 Advanced Data Structures Larry Holder School of Electrical Engineering

CS104 Computer Organization and Design Datapaths CS104 (Hilton): Datapaths [Slides adapted from

Spanning line configurations Brendan Pawlowski (joint with Brendon Rhoades) University of

Measuring Performance November 17, 2008 Measuring Performance Introduction CPU Peformance and

Red Versus Blue Disks in Triaxial Dark Matter Halos (wrt the Milky Way) Victor P. Debattista

2D Image Transforms 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Extract

More Worked Problem from Problem Set 3 Problem 4: Determine the Poisson brackets (a) { p , A x }