Hash-Based Indexes Database Management Systems, R. Ramakrishnan and - PDF document

Hash-Based Indexes Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction � As for any index, 3 alternatives for data entries k* : � Data record with key value k � < k , rid of data record with search key value k > � < k , list of rids of data records with search key k > � Hash-based indexes are best for equality selections . – Provide constant-time searches – But cannot support range searches � Static and dynamic hashing techniques exist – Trade-offs similar to ISAM vs. B+ trees Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Static Hashing � # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. � h ( k ) mod N = bucket to which data entry with key k belongs . (N = # of buckets) 0 h(key) mod N 2 key h N-1 Primary bucket pages Overflow pages Database Management Systems, R. Ramakrishnan and J. Gehrke 3

Static Hashing (Contd.) � Buckets contain data entries . � Hash fn works on search key field of record r. Must distribute values over range 0 ... N-1. – h ( key ) = (a * key + b) usually works well. – a and b are constants; lots known about how to tune h . � Long overflow chains can develop and degrade performance – Extendible and Linear Hashing : Dynamic techniques to fix this problem. Database Management Systems, R. Ramakrishnan and J. Gehrke 4 Extendible Hashing � Main idea: If bucket (primary page) becomes full, why not re-organize file by doubling # of buckets? – Essentially “splitting” buckets � But reading and writing all buckets is expensive! – Idea : Use directory of pointers to buckets , – Double # of buckets by doubling the directory, splitting just the bucket that overflowed! – Directory much smaller than file, so doubling it is much cheaper. – No overflow pages ! Database Management Systems, R. Ramakrishnan and J. Gehrke 5 Insert h (r)=14 2 2 LOCAL DEPTH LOCAL DEPTH Bucket A Bucket A 32*16* 4* 12* 32*16* 4* 12* GLOBAL DEPTH GLOBAL DEPTH 2 2 2 2 Bucket B Bucket B 00 1* 5* 21*13* 00 1* 5* 21*13* 01 01 2 2 10 10 Bucket C Bucket C 10* 11 10* 11 14* 2 2 DIRECTORY DIRECTORY Bucket D Bucket D 15* 7* 19* 15* 7* 19* Database Management Systems, R. Ramakrishnan and J. Gehrke 6

Insert h (r)=20 2 2 LOCAL DEPTH LOCAL DEPTH Bucket A Bucket A 32* 16* 32* 16* GLOBAL DEPTH 4* 12* GLOBAL DEPTH 2 2 2 2 Bucket B Bucket B 00 1* 5* 21*13* 00 1* 5* 21*13* 01 01 2 10 10 2 Bucket C Bucket C 11 10* 11 10* 2 2 DIRECTORY Bucket D DIRECTORY Bucket D 15* 7* 19* 15* 7* 19* 2 Bucket A2 4* 12* 20* (`split image' of Bucket A) Database Management Systems, R. Ramakrishnan and J. Gehrke 7 Insert h (r)=20 2 LOCAL DEPTH 3 LOCAL DEPTH Bucket A 32* 16* GLOBAL DEPTH 32* 16* Bucket A GLOBAL DEPTH 2 2 3 2 Bucket B 00 1* 5* 21*13* 000 1* 5* 21*13* Bucket B 01 001 2 10 2 010 Bucket C 10* 11 10* 011 Bucket C 100 2 2 DIRECTORY 101 Bucket D 15* 7* 19* 15* 7* 19* Bucket D 110 111 2 3 Bucket A2 4* 12* 20* DIRECTORY (`split image' 4* 12* 20* Bucket A2 of Bucket A) (`split image' of Bucket A) Database Management Systems, R. Ramakrishnan and J. Gehrke 8 Insert h (r)=32 LOCAL DEPTH LOCAL DEPTH GLOBAL DEPTH GLOBAL DEPTH 1 Bucket A 1 4* 12* 10* 32* 0 0 Bucket A 0 1* 10* 4* 12* 1 1 DIRECTORY Bucket B 1* DIRECTORY Database Management Systems, R. Ramakrishnan and J. Gehrke 9

Insert h (r)=16 LOCAL DEPTH 2 LOCAL DEPTH Bucket A 32* 16* 4* 12* GLOBAL DEPTH GLOBAL DEPTH 1 1 2 Bucket A 1 4* 12* 10* 32* Bucket B 1* 00 0 01 1 1 2 10 Bucket C Bucket B 10* 11 1* DIRECTORY DIRECTORY Database Management Systems, R. Ramakrishnan and J. Gehrke 10 Insert h (r)=20 3 LOCAL DEPTH 32* 16* Bucket A GLOBAL DEPTH 2 LOCAL DEPTH 3 1 Bucket A 32* 16* 4* 12* GLOBAL DEPTH 000 1* Bucket B 001 1 2 2 010 Bucket B 1* 00 10* 011 Bucket C 01 100 2 10 101 Bucket C 10* 11 110 111 DIRECTORY 3 DIRECTORY 4* 12* 20* Bucket A2 (`split image' of Bucket A) Database Management Systems, R. Ramakrishnan and J. Gehrke 11 Insert h (r)=5, 15, 7, 19 3 3 LOCAL DEPTH LOCAL DEPTH 32* 16* Bucket A 32* 16* Bucket A GLOBAL DEPTH GLOBAL DEPTH 3 1 3 2 5* 15* 7* 000 1* Bucket B 000 1* 5* Bucket B 001 001 2 2 010 010 10* Bucket C 10* Bucket C 011 011 100 100 2 101 101 15* 7* 19* Bucket B2 110 110 (`split image' 111 111 of Bucket B) 3 3 DIRECTORY DIRECTORY 4* 12* 20* Bucket A2 4* 12* 20* Bucket A2 (`split image' (`split image' of Bucket A) of Bucket A) Database Management Systems, R. Ramakrishnan and J. Gehrke 12

Deletions � Inverse of insertion � If removal of data entry makes bucket empty, merge with ‘split image’ � If each directory element points to same bucket as its split image, can halve directory Database Management Systems, R. Ramakrishnan and J. Gehrke 13 Comments on Extendible Hashing � If directory fits in memory, equality search answered with one disk access; else two – 100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory. � Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large – Multiple entries with same hash value cause problems! – When would this happen? Database Management Systems, R. Ramakrishnan and J. Gehrke 14 Linear Hashing � This is another dynamic hashing scheme, an alternative to Extendible Hashing � LH handles the problem of long overflow chains without using a directory, and handles duplicates � Main idea: split one bucket at a time in rounds Database Management Systems, R. Ramakrishnan and J. Gehrke 15

Inserting h(r) = 43 Level=2, N=4 Level=2 h PRIMARY h OVERFLOW h h PRIMARY 3 2 PAGES 3 2 Next=0 Next=0 PAGES PAGES 32* 44* 36* 32* 000 00 000 00 Data entry r 9* 25* 5* 9* 25* 5* with h(r)=5 001 01 001 01 14* 18*10*30* 14* 18*10*30* 010 10 Primary 010 10 bucket page 31*35* 7* 11* 31*35* 7* 11* 43* 011 11 011 11 ( This info (The actual contents is for illustration of the linear hashed only!) file) Database Management Systems, R. Ramakrishnan and J. Gehrke 16 Example (Inserting h(r) = 43) Level=2 Level=2 h h PRIMARY OVERFLOW h h OVERFLOW PRIMARY 3 2 PAGES PAGES 3 2 Next=0 PAGES PAGES 32* 32* 000 00 000 00 Next=1 9* 25* 5* 9* 25* 5* 001 01 001 01 14* 18*10*30* 14* 18*10*30* 010 10 010 10 31*35* 7* 11* 43* 31*35* 7* 11* 43* 011 11 011 11 100 00 44* 36* Database Management Systems, R. Ramakrishnan and J. Gehrke 17 Inserting h(r) = 50 (End of a Round) Level=3 PRIMARY OVERFLOW h3 h PAGES 2 PAGES Next=0 Level=2 000 00 32* PRIMARY OVERFLOW PAGES h3 h PAGES 2 001 01 9* 25* 32* 000 00 010 10 66* 18* 10* 34* 50* 9* 25* 001 01 011 11 43* 35* 11* 010 10 66* 18* 10* 34* Next=3 100 00 44* 36* 31* 35* 7* 11* 43* 011 11 101 11 5* 37* 29* 100 44* 36* 00 14* 30* 22* 5* 110 10 101 37*29* 01 14* 30* 22* 31*7* 111 11 110 10 Database Management Systems, R. Ramakrishnan and J. Gehrke 18

Overview of LH File � In the middle of a round. Buckets split in this round: Bucket to be split If h ( search key value ) Level Next is in this range, must use h Level+1 ( search key value ) Buckets that existed at the to decide if entry is in beginning of this round: `split image' bucket. this is the range of h Level `split image' buckets: created (through splitting of other buckets) in this round Database Management Systems, R. Ramakrishnan and J. Gehrke 19 Summary � Hash-based indexes: best for equality searches, cannot support range searches. � Static Hashing can lead to long overflow chains. � Extendible Hashing uses directory doubling to avoid overflow pages – Duplicates may require overflow pages � Linear hashing avoids directory by splitting in rounds – Naturally handles skew and duplicates – Uses overflow buckets (but not very long in practice) Database Management Systems, R. Ramakrishnan and J. Gehrke 20

Hash-Based Indexes Database Management Systems, R. Ramakrishnan and - PDF document

Hash-Based Indexes Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k* : Data record with key value k < k , rid of data record with search key value k >

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Indexing Shan-Hung Wu CS, NTHU Outline Overview API in VanillaCore Hash-Based

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Implementing Data Layout Optimizations Implementing Data Layout Optimizations in the LLVM

B Master SE National Centre Mathematics & Computer Science Programming languages and

Storytelling, How to Give a Talk CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

Nonconformist Resilience: Database-backed Job Queues John Mileham | @jmileham User Signup

Particle Methods for Rare Event Monte Carlo Paul Dupuis Division of Applied Mathematics Brown

A Higher-Order Polymorphic Lambda-Calculus With Sized Types This is where the subtitle would have

Software Engineering for Consistent Modification Management Megan Monroe, Sana Malik, Chris

BCNF revisited: 40 Years Normal Forms J.A. Makowsky Faculty of Computer Science Technion - IIT,

Sambuz

Useful Links

Newsletter

Mail Us

Hash-Based Indexes Database Management Systems, R. Ramakrishnan and - PDF document

Hash-Based Indexes Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k* : Data record with key value k < k , rid of data record with search key value k >

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

Indexing Shan-Hung Wu CS, NTHU Outline Overview API in VanillaCore Hash-Based

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Security Proofs for the MD6 Hash Algorithm Ahmed Ezzat Outline Introduction to hash

LUX Hash Function Ivica Nikoli c, Alex Biryukov, Dmitry Khovratovich University of Luxembourg

HASH FUNCTIONS Mihir Bellare UCSD 1 Mihir Bellare UCSD 2 Hash functions Hash functions

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Implementing Data Layout Optimizations Implementing Data Layout Optimizations in the LLVM

B Master SE National Centre Mathematics &amp; Computer Science Programming languages and

Storytelling, How to Give a Talk CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

Nonconformist Resilience: Database-backed Job Queues John Mileham | @jmileham User Signup

Particle Methods for Rare Event Monte Carlo Paul Dupuis Division of Applied Mathematics Brown

A Higher-Order Polymorphic Lambda-Calculus With Sized Types This is where the subtitle would have

Software Engineering for Consistent Modification Management Megan Monroe, Sana Malik, Chris

BCNF revisited: 40 Years Normal Forms J.A. Makowsky Faculty of Computer Science Technion - IIT,

Sambuz

Useful Links

Newsletter

Mail Us

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

B Master SE National Centre Mathematics & Computer Science Programming languages and