Indexing Ramakrishnan/Gehrke Ch. 8 How index -learning turns no - PowerPoint PPT Presentation

Indexing Ramakrishnan/Gehrke Ch. 8 “How index -learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope (1688-1744) 340151 Big Databases & Cloud Services (P. Baumann) 1

Range Searches  `` Find all students with gpa > 3.0 ’’ • sorted file (by gpa!), fixed-length records: binary search to find first student, then scan to find rest • Cost of binary search can be quite high  Simple idea: Create an `index’ file containing only key values + search values • Can do binary search on (smaller) index file! Index File kN k1 k2 Data File tuple 1 tuple 2 tuple 3 tuple N 340151 Big Databases & Cloud Services (P. Baumann) 2

Indexes  speeds up selections on predefined search key field(s) • one relation (~file) • Any attribute (except BLOB) can be search key for an index on the relation  collection of data entries • For efficient retrieval of all data entries k* for given key value k  Index vs sorted files • Both: search faster than just heap • Updates: index much faster 340151 Big Databases & Cloud Services (P. Baumann) 3

B+ Tree Indexes Index pages Leaf pages  Ordered, balanced tree of degree m  Non-leaf pages: index entries = keys & pointers fill factor P 0 K 1 P 1 K 2 P 2 K m P m  Leaf pages: keys + data pointers; prev/next page chain 340151 Big Databases & Cloud Services (P. Baumann) 4

B+-Tree Definition  B+-Tree of Order m has the following properties... • Property #1 - All leaf nodes must be at same level. • Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys. • Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2 children. • Property #5 - A non leaf node with n-1 keys must have n number of children. • key values in a node sorted in ascending order [http://btechsmartclass.com/data_structures/b-trees.html] 340151 Big Databases & Cloud Services (P. Baumann) 5

B+ Tree: Operations Root Note how data entries in leaf level are sorted 17 Entries < 17 Entries => 17 27 5 13 30 2* 3* 5* 7* 8* 22* 24* 27* 29* 33* 34* 38* 39* 14* 16*  Find 28*? 29*? All > 15* and < 30*?  Insert/delete: Find data entry in leaf, change it; adjust parent if needed • change sometimes bubbles up the tree O( log F N ) where F = fan-out, N = # leaf pages • https://www.cs.usfca.edu/~galles/visualization/BTree.html 340151 Big Databases & Cloud Services (P. Baumann) 6

More Exercises  Consider • # node reads from disk – determines speed • # comparisons – not performance relevant, but for understanding mechanics  Find 15, 20, 0  Find all 11 – 15; 20 – 32 [https://condor.depaul.edu/ntomuro/courses/417/notes/lecture3.html] 340151 Big Databases & Cloud Services (P. Baumann) 7

B+ Trees in Practice  Typical fill-factor: 67% (outdated; today ~90%)  Average fan-out: 133  Typical capacities: Height 3: 133 3 = 2,352,637 records • Height 4: 133 4 = 312,900,700 records •  Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 MBytes • 340151 Big Databases & Cloud Services (P. Baumann) 8

Hash-Based Indexes  Goal: compute address without disk access • get data in O(1)  Idea: distribute data evenly into fixed number of “ buckets ” • Compute location from key via Hashing function • Ex: h(int r) = r* a mod b , b prime relative to a • overflow pages  Hash index = bucket set + hashing function • Bucket = primary page + 0..n overflow pages  only equality, no range queries [Shankai Yan] 340151 Big Databases & Cloud Services (P. Baumann) 9

Index Classification  Primary index: index contains primary key, otherwise: secondary index • Unique index  Clustered vs. unclustered: order of data records same as / `close to’ order of data entries  clustered index • Cost varies greatly 340151 Big Databases & Cloud Services (P. Baumann) 10

Clustered vs. Unclustered Index  To build clustered index, first sort Heap file  Overflow pages may be needed for inserts • order of data recs `close to’, but not identical to, sort order Index entries UNCLUSTERED CLUSTERED direct search for data entries Index File Data entries Data file Data Records Data Records 340151 Big Databases & Cloud Services (P. Baumann) 11

Composite Search Keys  Composite Search Keys = Examples of composite key indexes using lexicographic order: Search on combination of fields Equality query: • 11,80 11 Every field value equal to a constant value 12 12,10 Ex: for <sal,age> index: age=20 and sal=75 name age sal 12,20 12 Range query: • 20,75 bob 12 10 20 Some field value within interval <age, sal> <age> cal 11 80 Ex: age>20; or age=20 and sal>10 joe 12 20  Data entries in index sorted by search 10,12 sue 20 75 10 key to support range queries 20 20,12 Data records sorted by name 75,20 75 80 80,11 <sal, age> <sal> Data entries in index Data entries sorted by <sal,age> sorted by <sal> 340151 Big Databases & Cloud Services (P. Baumann) 12

Composite Search Keys: How To Use  To retrieve Emp records with age =30 AND sal =4000: • index on < age,sal > better than index on age, or index on sal • Choice of index key orthogonal to clustering etc.  If condition is: 20< age <30 AND 3000< sal <5000: Clustered tree index on < age,sal > or < sal,age > is best •  If condition is: age =30 AND 3000< sal <5000: Clustered < age,sal > index much better • than < sal,age > index!  Composite indexes are larger, updated more often 340151 Big Databases & Cloud Services (P. Baumann) 13

Index-Only Plans  A number of queries can be answered without retrieving any tuples from one or more of the relations involved …if a suitable index is available SELECT E.dno, COUNT (*) < E.dno > FROM Emp E GROUP BY E.dno SELECT E.dno, MIN (E.sal) < E.dno,E.sal > FROM Emp E Tree index! GROUP BY E.dno < E. age,E.sal > SELECT AVG (E.sal) or FROM Emp E < E.sal, E.age > WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 Tree index! 340151 Big Databases & Cloud Services (P. Baumann) 14

Index-Only Plans (contd.)  Index-only plans possible SELECT E.dno, COUNT (*) if key is <dno,age> or <age,dno> FROM Emp E • Which is better? WHERE E.age=30 GROUP BY E.dno 340151 Big Databases & Cloud Services (P. Baumann) 15

Index-Only Plans (contd.)  Index-only plans possible SELECT E.dno, COUNT (*) if key is <dno,age> or <age,dno> FROM Emp E • Which is better? WHERE E.age=30 GROUP BY E.dno SELECT E.dno, COUNT (*) • What if we consider the second query? FROM Emp E WHERE E.age>30 GROUP BY E.dno 340151 Big Databases & Cloud Services (P. Baumann) 16

Index-Only Plans (Contd.)  Index-only plans also for queries involving >1 table < E.dno > < E.dno,E.eid > SELECT D.dno FROM Dept D, Emp E SELECT D.dno, E.eid WHERE D.dno=E.dno FROM Dept D, Emp E WHERE D.dno=E.dno 340151 Big Databases & Cloud Services (P. Baumann) 17

Understanding the Workload  For each query in workload: • Which relations accessed? • Which attributes retrieved? • Which attributes involved in selection/join conditions? How selective likely?  For each update in workload: • Which attributes involved in selection/join conditions? How selective likely? • Type of update (INSERT/DELETE/UPDATE) + attributes affected  Trade-off: Indexes can make queries faster, updates slower • …and require disk space 340151 Big Databases & Cloud Services (P. Baumann) 18

Index Selection Guidelines  Attributes in WHERE clause candidates for index keys • Exact match condition hash index • Range query tree index • Clustering especially useful for range queries • can also help on equality queries if many duplicates  Multi-attribute search keys should be considered when a WHERE clause contains several conditions • Order of attributes is important for range queries • Such indexes can sometimes enable index-only strategies for important queries • For index-only strategies, clustering is not important! 340151 Big Databases & Cloud Services (P. Baumann) 19

Index Selection Guidelines (contd.)  Choose indexes that benefit as many queries as possible • impact on updates: Indexes make queries faster, updates slower • require disk space!  only one index can be clustered per relation choose based on important queries that benefit most from clustering  understand how DBMS evaluates queries & creates query evaluation plans  …a practitioner's approach: • Consider most important queries in turn: • Consider best plan using current indexes • See if a better plan is possible with an additional index • If so, create it 340151 Big Databases & Cloud Services (P. Baumann) 20

Indexing Ramakrishnan/Gehrke Ch. 8 How index -learning turns no - PowerPoint PPT Presentation

Indexing Ramakrishnan/Gehrke Ch. 8 How index -learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) 340151 Big Databases & Cloud Services (P. Baumann) 1 Range Searches `` Find

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

Biometric Indexing Yi Wang alice.yi.wang@ieee.org 13/Jan/2017 Outlines Introduction to

Chapter V: Indexing & Searching Information Retrieval & Data Mining Universitt des

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay

Class 12: Tree-Structured Indexing Instructor: Manos Athanassoulis

Briefing on Radioactive Source Security Experience w ith NSTS 23 April 2009 Kate Roughan

2/23/2017 US Preventive Services Task Force and guidelines addressing the needs of Disclosure:

GGIE::Leveraging IPv6 Glass to Glass Glenn Deen NBCUniversal

ATEM MODULE 3: CONTROLS A c c r e d i t e d T e x a s E n e r g y M a n a g e r System Types

LArPix Design Details Dan Dwyer (LBNL) May 16, 2019 Introduction Important that LArPix-v2

1. Z-transform the step re- sponse to obtain Y s ( z ) . Discrete ZOH Signals 2. Divide the

Directed Graphs Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Indexing Ramakrishnan/Gehrke Ch. 8 How index -learning turns no - PowerPoint PPT Presentation

Indexing Ramakrishnan/Gehrke Ch. 8 How index -learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) 340151 Big Databases & Cloud Services (P. Baumann) 1 Range Searches `` Find

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

Biometric Indexing Yi Wang alice.yi.wang@ieee.org 13/Jan/2017 Outlines Introduction to

Chapter V: Indexing &amp; Searching Information Retrieval &amp; Data Mining Universitt des

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

Media Indexing &amp; Retrieval Media Indexing &amp; Retrieval Prepared by Ling Guan Jose Lay

Class 12: Tree-Structured Indexing Instructor: Manos Athanassoulis

Briefing on Radioactive Source Security Experience w ith NSTS 23 April 2009 Kate Roughan

2/23/2017 US Preventive Services Task Force and guidelines addressing the needs of Disclosure:

GGIE::Leveraging IPv6 Glass to Glass Glenn Deen NBCUniversal

ATEM MODULE 3: CONTROLS A c c r e d i t e d T e x a s E n e r g y M a n a g e r System Types

LArPix Design Details Dan Dwyer (LBNL) May 16, 2019 Introduction Important that LArPix-v2

1. Z-transform the step re- sponse to obtain Y s ( z ) . Discrete ZOH Signals 2. Divide the

Directed Graphs Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Chapter V: Indexing & Searching Information Retrieval & Data Mining Universitt des

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay