Outline/summary Conventional Indexes Sparse vs. dense Primary - PowerPoint PPT Presentation

Outline/summary • Conventional Indexes • Sparse vs. dense • Primary vs. secondary • B trees • B+trees vs. indexed sequential • Hashing schemes --> Next

Hashing <key> key → h(key) Buckets (typically 1 . disk block) . .

T wo alternatives . . . records (1) key → h(key) . . .

T wo alternatives record (2) key → h(key) key 1 Index • Alt (2) for “secondary” search key

Example hash function • Key = ‘x 1 x 2 … x n ’ n byte character string • Have b buckets • h: add x 1 + x 2 + ….. x n – compute sum modulo b

 This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash function:  Expected number of keys/bucket is the same for all buckets

Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical & Inserts/Deletes not too frequent

Next: example to illustrate inserts, overfmows, deletes h(K)

EXAMPLE 2 records/bucket 0 INSERT: d h(a) = 1 1 a e c h(b) = 2 2 b h(c) = 1 3 h(d) = 0 h(e) = 1

EXAMPLE: deletion Delete: 0 a e 1 b d f c d c 2 e 3 f maybe move g “g” up

Rule of thumb: • T ry to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fjt • If < 50%, wasting space • If > 80%, overfmows signifjcant depends on how good hash function is & on # keys/bucket

How do we cope with growth? • Overfmows and reorganizations • Dynamic hashing • Extensible • Linear

Extensible hashing: two ideas (a) Use i of b bits output by hash function b b 00110101 h(K) → i use i → grows over time….

(b) Use directory . . h(K)[ i ] to bucket . . . .

Example: h(k) is 4 bits; 2 keys/bucket i = 2 1 00 i = 1 0001 01 0 1 10 2 1 1001 11 1010 1100 New directory Insert 2 1 1100 1010

Example continued 2 0000 i = 2 0001 00 2 1 01 0111 0001 10 0111 2 11 1001 1010 Insert: 2 0111 1100 0000

Example continued i = 3 2 0000 000 0001 i = 2 001 00 2 0111 010 01 011 3 10 1001 1001 100 11 3 2 1010 1001 101 1010 Insert: 110 2 1001 1100 111

Extensible hashing: deletion • No merging of blocks • Merge blocks and cut directory if possible (Reverse insert procedure)

Deletion example: • Run thru insert example in reverse!

Extensible hashing Summary Can handle growing fjles + - with less wasted space - with no full reorganizations Indirection - (Not bad if directory in memory) Directory doubles in size - (Now it fjts, now it does not)

Linear hashing • Another dynamic hashing scheme T wo ideas: b (a) Use i low order bits of 01110101 hash grows i (b) Number of buckets in use grows linearly Constraint: 2 i-1 ≤ n+1 < 2 i (We take n to be the id of the largest bucket in use, starting at 0.)

Example b =4 bits, i =2, 2 keys/bucket • insert 0101 0101 • can have overfmow chains! Future growth 0000 0101 buckets 1010 1111 00 01 10 11 n = 01 (number of last bucket in use) Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

Example b =4 bits, i =2, 2 keys/bucket • insert 1110 1110 bucket h(k)[ i ] - 2 i -1 is the bucket whose ith bit is fmipped in binary Future growth 0000 0101 buckets 1010 1111 00 01 10 11 n = 01 (number of last bucket in use) Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

Example b =4 bits, i =2, 2 keys/bucket 0101 • insert 0101 Future growth 0000 0101 1010 1111 buckets 0101 1010 1111 00 01 10 11 n = 01 10 11 Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

Example Continued: How to grow beyond this? Constraint: 2 i-1 ≤ n+1 < 2 i i = 2 3 0101 0000 0101 1010 1111 0101 0101 101 100 0 00 01 10 0 0 0 11 . . . 100 101 110 111 n = 11 100 101 Rule If h(k)[ i ] ≤ n , then look at bucket h(k)[i ] else, look at bucket h(k)[ i ] - 2 i -1

 When do we expand fjle? • Keep track of: # records = U # buckets • If U > threshold then increase n (and maybe i )

Linear Hashing Summary Can handle growing fjles + - with less wasted space - with no full reorganizations + No indirection like extensible hashing Can still have overfmow chains -

Example: BAD CASE Very full Very empty Need to move n here… Would waste space...

Summary Hashing - How it works - Dynamic hashing - Extensible - Linear

B+trees vs Hashing • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

B+T rees vs Hashing • INDEXING (Including B T rees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

Outline/summary Conventional Indexes Sparse vs. dense Primary - PowerPoint PPT Presentation

Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. indexed sequential Hashing schemes --> Next Hashing <key> key h(key) Buckets (typically 1 . disk block)

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview Estimating smooths How

New patent case filings per year 1 Summary Judgment motions per year 2 All courts: 101 Summary

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

Q3FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

Summary 1. Summary of

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY & OUTLOOK SUMMARY

EXECUTIVE SUMMARY ABOUT SEMPERTI Semperti Executive Summary Version: v1 // 2016 SEMPERTI

Q1FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

How similar are these curves? Jessica Sherette EAPSI Research and Experience Summary of Proposal

Lecture 12: Summary Summary Advanced Digital Communications (EQ2410) 1 Standards Final Exam

Security Summary Michael McCool Intel Osaka, W3C Web of Things F2F, 17 May 2017 Summary

@kannonboy @kannonboy Photo: Le Monde en Vido @kannonboy @kannonboy Photo: Le Monde en

Data Management Systems Access Methods Hashing Pages and Blocks Indexing B+ trees

Modeling Sudoku puzzles with Python Sean Davis Matthew Henderson Andrew Smith June 30, 2010

Introduction to Algorithms Introduction to Algorithms Insertion sort: Insertion sort:

Wh What is git? British slang for someone who is annoying or incompetent But thats not

As attendees enter, hand each one a card w/ pain points listed on one side and a community dialog

data learning Locality Filtering PageRank, Recommen sensitive data SVM SimRank der

Lecture 2 Log into Linux Does everyone have completed mymath.h, mymath.cpp (defining the

Outline/summary Conventional Indexes Sparse vs. dense Primary - PowerPoint PPT Presentation

Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. indexed sequential Hashing schemes --> Next Hashing <key> key h(key) Buckets (typically 1 . disk block)

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview Estimating smooths How

New patent case filings per year 1 Summary Judgment motions per year 2 All courts: 101 Summary

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

Q3FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

Summary 1. Summary of

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY &amp; OUTLOOK SUMMARY

EXECUTIVE SUMMARY ABOUT SEMPERTI Semperti Executive Summary Version: v1 // 2016 SEMPERTI

Q1FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

How similar are these curves? Jessica Sherette EAPSI Research and Experience Summary of Proposal

Lecture 12: Summary Summary Advanced Digital Communications (EQ2410) 1 Standards Final Exam

Security Summary Michael McCool Intel Osaka, W3C Web of Things F2F, 17 May 2017 Summary

@kannonboy @kannonboy Photo: Le Monde en Vido @kannonboy @kannonboy Photo: Le Monde en

Data Management Systems Access Methods Hashing Pages and Blocks Indexing B+ trees

Modeling Sudoku puzzles with Python Sean Davis Matthew Henderson Andrew Smith June 30, 2010

Introduction to Algorithms Introduction to Algorithms Insertion sort: Insertion sort:

Wh What is git? British slang for someone who is annoying or incompetent But thats not

As attendees enter, hand each one a card w/ pain points listed on one side and a community dialog

data learning Locality Filtering PageRank, Recommen sensitive data SVM SimRank der

Lecture 2 Log into Linux Does everyone have completed mymath.h, mymath.cpp (defining the

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY & OUTLOOK SUMMARY