Data Management Systems • Query Processing • Execution models • Optimization – heuristics & Access to base tables rewriting Sorting and Aggregation • Optimization – cost models Joins Operators Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Operators 1
Starting point • Data and indexes are stored in blocks as part of extents • Data from the base tables must be read from the buffer cache and into the working space of the query (will be processed while reading it) • Basics: • Minimize I/O if blocks not in memory • Minimize accesses to tuples if data in memory • Prefer sequential access Operators 2
Access to base tables Operators 3
The fastest access for single tuple: row_id SELECT * FROM T • The fastest way to access one WHERE T.row_id = 123456 tuple is by using its row or tuple Buffer cache id: • Row_id = Block_id, offset • Essentially, a pointer to the tuple SELECT * FROM T • Only need to access the block WHERE T.key = AB34TF where the tuple is • The row_id is found: • In the query itself index • Through an index Operators 4
When to use row_id access • Row_id access can also be used when there is a predicate over an SELECT * FROM T WHERE T.A = 42 indexed attribute: • use the index to find the matching tuples and retrieve index them using the row_id on A • If there are many matches, it might induce many random accesses to different blocks Operators 5
When to use row_id access SELECT * FROM T • Row_id access through an index WHERE T.A = 42 AND T.B > 50 works even with more complex predicates • use the index to find the index on A matching tuples using the indexed attribute • Retrieve the tuples that match the other predicate YES Emit tuple T.B > 50 NO Ignore tuple Operators 6
Which index to use? • Assume there are two indexes, one on A and one on B SELECT * FROM T WHERE T.A = 42 AND T.B > 50 • We can use any of them to retrieve the data • Find all tuple T.A = 42 and then check T.B > 50 • Find all the tuples T.B > 50 and then check T.A = 42 • Find all tuples T.A = 42, find all tuples T.B > 50, match the two lists of row_ids • Which one to use depends on the relative selectivities of each predicate Operators 7
The “slowest” access: full table scan • A full table scan reads all the blocks and all the tuples in each block => it is expensive, especially if data not in memory • But not is not the slowest, very stupid plans might be worse SELECT * FROM T WHERE T.id =1 OR T.id=2 OR T.id=3 … SELECT * FROM T WHERE T.age <= 20 OR T.age >20 • Full table scan is the upper bound in cost for retrieving data from a base table • Worst case scenario for query planning, if nothing else works, use a full table scan • Nevertheless, it reads the data sequentially and it can take advantage of prefetching Operators 8
When to use a full table scan? • When there is no other option: • There are no indexes or indexes are not selective enough • Predicates involving several columns of the same table (self join) SELECT * FROM T where T.A > T.B • The amount of data retrieved is large enough that sequential access is better than many random accesses • Several ways to minimize the overhead of a full table scan: • Shared scans = use the cursor from the scan of another query • Sample scans = do not read everything but just a sample • Column store = scanning a compressed column using SIMD can be fast once data is in memory Operators 9
Clustered indexes • A clustered index enforces that the data in the extent is organized according to the index: • B+ tree = data is sorted • Hash index = tuples with same key are in the same bucket • In those cases, we might not traverse the index for each tuple: • Find the relevant blocks • Scan those blocks sequentially Operators 10
Clustered index example SELECT * FROM T SELECT * FROM T WHERE T.A = 42 AND T.B > 50 WHERE T.A = 42 AND T.B > 50 Clustered index index on B on A YES YES Emit tuple T.A = 42 Emit tuple T.B > 50 NO NO Ignore tuple Ignore tuple Operators 11
Lowering the costs of table scans: Zone Maps • A zone map is a combination of coarse index and statistics • For every block in of a table • Keep the max and min values for some or all columns • Before reading a block, check the zone map: • If range does not match the predicate, do not read the block • It can significantly reduce I/O cost • In some cases it can replace an index • Other statistics can be kept in a zone map • Example of use of the Zone Maps concept is Snowflake (see chapter on Storage) Operators 12
Other considerations • Other factors affecting how to access a base table: • A table scan using an index can be expensive but it will return sorted data: • Start at the beginning of the leaves of the index and retrieve the tuples one by one (sequential access to the row_ids) • Expensive if index not clustered but might be cheaper than sorting the data • I/O is significantly more expensive that accessing data in memory • Random accesses are far worse for I/O than for data in memory • Scans in memory can be reasonably fast • Changes the decision points between random access and scans • Scans on columns not the same as scans on rows • Changes the decision point on when to do a scan Operators 13
Sorting and Aggregation Operators 14
Why sorting data? • Recall that data is not necessarily stored in a sorted manner • The query requires it SELECT * FROM T ORDER_BY (T.age) • Some operations are easier over sorted data SELECT DISTINCT(T.name) FROM T sort the data by T.name and return the first for each group SELECT AVG(T.age) FROM T GROUP_BY(T.level) sort the data by T.level and then find the average age for each group Joins, selections, intra- table predicates … • Sorting is expensive • Requires extra space (no sorting in place for base tables) • Requires CPU (comparisons) Operators 15
External sort (data does not fit in memory) • Why external sort? • Obvious: data does not fit in main memory (data and results!!) • Less obvious: many queries running at the same time sharing memory • Two key parameters • N: number of pages of input • M: size of in memory buffer • Behavior of algorithm determined by many parameters: I/O, CPU, I/O costs, caches, data types involved, etc. Operators 16
Two-phase External Sort • N size of the input in pages, M size of the buffer in pages • Phase I: Create Runs 1. Load allocated buffer space with tuples 2. Sort tuples in buffer pool 3. Write sorted tuples (run) to disk 4. Goto Step 1 (create next run) until all tuples processed • Phase II: Merge Runs • Use priority heap to merge tuples from runs • Special cases • M >= N: no merge needed • M < sqrt(N): multiple merge phases necessary Operators 17
For simplicity we will hide this • The size of the buffer needed: • Minimal configuration: • Number of blocks -1 are used to read in data blocks • 1 block is used to write data out • Better: • Number of blocks -1 to read data in • 2 or more blocks to write data out so that we do not have to wait to write a block out sort read write Operators 18
External Sort 97 17 3 5 27 16 2 99 13 Operators 19
External Sort load 97 97 17 3 17 5 27 3 16 2 99 13 Operators 20
External Sort sort 97 3 17 3 17 5 27 97 16 2 99 13 Operators 21
External Sort run write 3 97 17 3 17 97 3 17 5 27 97 16 2 99 13 Operators 22
External Sort load 3 97 17 5 17 97 3 27 5 27 16 16 2 99 13 Operators 23
External Sort sort & write 3 97 17 5 17 97 3 16 5 5 16 27 27 27 16 2 99 13 Operators 24
External Sort load 3 97 1 7 2 17 97 3 99 5 5 16 27 13 27 16 2 99 13 Operators 25
External Sort End of Phase 1 3 97 17 2 17 97 3 13 5 5 16 27 99 27 16 2 2 99 13 13 99 Operators 26
External Sort merge 3 17 3 97 5 5 16 2 27 2 13 99 Operators 27
External Sort merge 3 2 17 3 97 5 5 16 2 27 2 13 99 Operators 28
External Sort merge 3 2 17 3 3 97 5 5 16 13 27 2 13 99 Operators 29
External Sort merge 3 2 17 3 17 5 97 5 5 16 13 27 2 13 99 Operators 30
External Sort merge 3 2 17 3 17 5 97 16 5 16 13 27 2 13 99 Operators 31
External Sort 3 2 17 3 17 5 97 16 13 5 16 13 27 2 13 99 Operators 32
One pass vs. multi-pass sort • Previous algorithm is a one-pass algorithm (every data item is read once and written once) • However: • If there are many runs, I/O overhead is too high (we need to bring too many runs to memory) • Merge step cannot be parallelized Operators 33
Multi-way Merge (N = 7; M = 2) Operators 34
Ways to speed up sorting • Prefetching and double buffering: • Use more than one block for writing data out • Prefetch blocks as needed while processing • Take advantage of indexes • Clustered: read the data in order as it is already sorted (no CPU cost, sequential access) • Non-clustered: use the index to read the data in order (no CPU cost but very expensive in terms of random access), may work for small ranges Operators 35
Recommend
More recommend