Indexes, Query Optimization PDBM 7.5, 12.3.5 Dr. Chris Mayfield - - PowerPoint PPT Presentation

indexes query optimization
SMART_READER_LITE
LIVE PREVIEW

Indexes, Query Optimization PDBM 7.5, 12.3.5 Dr. Chris Mayfield - - PowerPoint PPT Presentation

Indexes, Query Optimization PDBM 7.5, 12.3.5 Dr. Chris Mayfield Department of Computer Science James Madison University Feb 18, 2020 Part 1: SQL Indexes How data is stored Heap File Index File (table: movie ) (column: movie.year ) RID id


slide-1
SLIDE 1

Indexes, Query Optimization

PDBM 7.5, 12.3.5

  • Dr. Chris Mayfield

Department of Computer Science James Madison University

Feb 18, 2020

slide-2
SLIDE 2

Part 1: SQL Indexes

slide-3
SLIDE 3

How data is stored

Heap File (table: movie) RID id title year 1 282375 Inception 2010 2 293178 Jane Eyre 2011 3 246720 Harry Potter 2010 4 57126 Avatar 2009 5 662387 TRON: Legacy 2010 . . . . . . . . . . . . Index File (column: movie.year) year RID 2009 4 2010 1 2010 3 2010 5 2011 2 . . . . . . Does this query have to look at every movie?

SELECT * FROM movie WHERE year >= 2011;

Feb 18, 2020 Indexes, Query Optimization 3 of 18

slide-4
SLIDE 4

Creating indexes

Single column

CREATE INDEX ON movie (title); CREATE INDEX ON movie (year);

Multiple columns

CREATE INDEX ON movie (title, year); CREATE INDEX ON movie (year, title);

These are just examples; don’t do all of them! A LOT happens behind the scenes:

◮ http://en.wikipedia.org/wiki/B-tree

Feb 18, 2020 Indexes, Query Optimization 4 of 18

slide-5
SLIDE 5

Selection of indexes

Why not just index every column?

◮ They take up disk space ◮ They take time to build ◮ Expensive to maintain

Deciding which columns to index:

◮ PRIMARY KEYs (automatic) ◮ UNIQUE attributes (automatic) ◮ Attributes in WHERE clauses

Feb 18, 2020 Indexes, Query Optimization 5 of 18

slide-6
SLIDE 6

Example

StarsIn(movieTitle, movieYear, starName) What should be indexed?

  • - Query #1

SELECT movieTitle, movieYear FROM StarsIn WHERE starName = s;

  • - Query #2

SELECT starName FROM StarsIn WHERE movieTitle = t AND movieYear = y;

  • - Query #3

INSERT INTO StarsIn VALUES (t, y, s);

Feb 18, 2020 Indexes, Query Optimization 6 of 18

slide-7
SLIDE 7

Part 2: Query Optimization

slide-8
SLIDE 8

Evaluating query plans

In psql, type EXPLAIN in front of any query In pgAdmin, press F7 instead of F5

◮ Shows graphical version of EXPLAIN output

Demo: analyzing HW4 query performance

◮ If possible, avoid sequential scans ◮ Where should you create indexes?

Feb 18, 2020 Indexes, Query Optimization 8 of 18

slide-9
SLIDE 9

Selectivity estimation

DBs maintain statistics for each relation / attribute:

◮ Histograms (if numeric) ◮ Most common values ◮ % NULL attributes ◮ Average size (in bytes) ◮ Physical correlation ◮ . . .

16 18 20 22 24 26 28 30 32 34 0.00 0.04 0.08 0.12 0.16

temp

SELECT * FROM sensor WHERE temp > 25; -- Index Scan SELECT * FROM sensor WHERE temp < 25; -- Seq Scan

Feb 18, 2020 Indexes, Query Optimization 9 of 18

slide-10
SLIDE 10

Summary of indexes

Data distribution affects plan choice

◮ For example, query 10,000 rows ◮ SELECT * FROM t WHERE a = 1;

◮ [Plan A] when 90% have a = 1 ◮ [Plan B] when a = 1..10000, 1 time each ◮ [Plan C] when a = 1..10, 1000 times each

Performance tips

◮ Rebuild optimizer statistics after updates ◮ Use EXPLAIN ANALYZE to profile your queries ◮ Be aware of which indexes are being used

Feb 18, 2020 Indexes, Query Optimization 10 of 18

slide-11
SLIDE 11

pgAdmin demo

◮ EXPLAIN = show estimated cost/rows ◮ EXPLAIN ANALYZE = show actual time/rows

Feb 18, 2020 Indexes, Query Optimization 11 of 18

slide-12
SLIDE 12

Basic scans

Sequential Scan

Most basic physical operation that reads an entire table from beginning to end.

Index Scan

Scans a table using an (unclustered) index, most often with search criteria or other stop- ping conditions.

Feb 18, 2020 Indexes, Query Optimization 12 of 18

slide-13
SLIDE 13

Bitmap scans

Bitmap Heap Scan

Combines the output of multiple index scans by constructing a bitmap, and then reads the resulting table rows in physical order.

Bitmap Index Scan

Similar to a bitmap heap scan, but uses an additional index scan at the end.

Feb 18, 2020 Indexes, Query Optimization 13 of 18

slide-14
SLIDE 14

Hash joins

Hash

Constructs a temporary hash table over the given rows.

Hash Join

Constructs a hash of the inner table, scans the outer table sequentially, and joins rows via hash lookups.

Feb 18, 2020 Indexes, Query Optimization 14 of 18

slide-15
SLIDE 15

Merge joins

Sort

Orders the given rows by one or more values (using external sorting).

Merge Join

Used when both tables are sorted; scans each table simultaneously and merges any matching rows.

Feb 18, 2020 Indexes, Query Optimization 15 of 18

slide-16
SLIDE 16

Other joins

Nested Loops

Most basic join technique: for each row in the outer table, compare with each row in the inner table.

Materialize

Saves the current query results to memory as a new (but temporary) table.

Feb 18, 2020 Indexes, Query Optimization 16 of 18

slide-17
SLIDE 17

Aggregation

Group

Organizes rows into groups by sorting or hashing their values.

Aggregate

Combines each group of rows into a single row by applying a function.

Feb 18, 2020 Indexes, Query Optimization 17 of 18

slide-18
SLIDE 18

Miscellaneous

Append

Adds additional rows to the current result (e.g., UNION ALL queries).

Unique

Removes any duplicate rows (using sorting

  • r hashing).

Limit

Returns only the top k rows. Often changes the entire plan.

Feb 18, 2020 Indexes, Query Optimization 18 of 18