CSE 344 SECTION 4 – RELATIONAL ALGEBRA
! Formalism)for)describing)queries) Why RA? ! Basis)of)rela4onal)databases) ! Will)make)you)a)SQL)wizard!)
Notes on RA ! Mul4ple)possible)query)plans) ! Logical)vs.)Physical)query)plans) )
Example: RA-to-SQL Person(id,)name,)countryid)) Country(id,)name,)con4nent)) ) SELECT C.id ) ) FROM Person P, Country C ) ) WHERE P.countryid = C.id ) ) AND C.continent=‘Africa’ ) ) GROUP BY C.id ) ) HAVING COUNT(*) > 10000000 ) Can)we)make)a)more)efficient)plan?)
Demo in Azure!
RA Reference Sheet
From Logical Plans to Physical Plans CSE 344 - Winter 2015 1
Query Evaluation Steps SQL query Parse & Check Query Check syntax, Translate query access control, string into internal table names, etc. representation Decide how best to answer query: query optimization Logical plan ! physical plan Query Query Execution Evaluation Return Results CSE 344 - Winter 2015 2
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Example SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � Give a relational algebra expression for this query CSE 344 - Winter 2015 3
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Relational Algebra SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � π sname ( σ scity= � Seattle � ∧ sstate= � WA � ∧ pno=2 ( Supplier sid = sid Supply )) CSE 344 - Winter 2015 4
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Relational Algebra SELECT sname π sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 σ scity= � Seattle �� ∧ sstate= � WA �� ∧ pno=2 and x.scity = � Seattle � and x.sstate = � WA � sid = sid Relational algebra expression is also called the “logical query Supplier Supply plan” CSE 344 - Winter 2015 5
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 1 (On the fly) π sname A physical query plan is a logical query plan annotated with (On the fly) physical implementation details σ scity= � Seattle �� ∧ sstate= � WA �� ∧ pno=2 SELECT sname FROM Supplier x, Supply y (Block-nested loop) WHERE x.sid = y.sid sid = sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � Supplier Supply (File scan) (File scan) CSE 344 - Winter 2015 6
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 2 Different but equivalent logical query plan; different physical plan ( d ) (On the fly) π sname SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid ( c ) (Sort-merge join) and y.pno = 2 sid = sid and x.scity = � Seattle � (Scan and x.sstate = � WA � (Scan write to T1) write to T2) ( a ) σ scity= � Seattle �� ∧ sstate= � WA � ( b ) σ pno=2 Supplier Supply (File scan) (File scan) CSE 344 - Winter 2015 7
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 3 (On the fly) ( d ) π sname Another logical plan that (On the fly) produces the same result and ( c ) σ scity= � Seattle �� ∧ sstate= � WA � is implemented with a different physical plan ( b ) (Index nested loop) sid = sid SELECT sname (Use index) FROM Supplier x, Supply y ( a ) σ pno=2 WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � Supplier and x.sstate = � WA � Supply (Index lookup on pno ) (Index lookup on sid) 8 Assume: clustered Doesn � t matter if clustered or not
Physical Data Independence • Means that applications are insulated from changes in physical storage details – E.g., can add/remove indexes without changing apps – Can do other physical tunings for performance • SQL and relational algebra facilitate physical data independence because both languages are “set-at-a-time”: Relations as input and output CSE 344 - Winter 2015 9
Index • An additional file, that allows fast access to records in the data file given a search key CSE 344 - Winter 2015 1
Index • An additional file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: – The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record CSE 344 - Winter 2015 2
Index • An additional file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: – The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record • Could have many indexes for one table Key = means here search key CSE 344 - Winter 2015 3
This Is Not A Key Different keys: • Primary key – uniquely identifies a tuple • Key of the sequential file – how the datafile is sorted, if at all • Index key – how the index is organized CSE 344 - Winter 2015 4
Student Example 1: ID# fName# lName# 10# Tom# Hanks# Index on ID 20# Amy# Hanks# Data File Student …# Index Student_ID on Student.ID 10 Tom Hanks 10 20 Amy Hanks 20 50 50 … … 200 200 … 220 220 240 420 240 800 420 950 800 … CSE 344 - Winter 2015 5
Student Example 2: ID# fName# lName# 10# Tom# Hanks# Index on fName 20# Amy# Hanks# Index Student_fName Data File Student …# on Student.fName 10 Tom Hanks Amy 20 Amy Hanks Ann Bob 50 … … Cho 200 … … 220 … … 240 … 420 … 800 … Tom CSE 344 - Winter 2015 6
Index Organization Several index organizations: • Hash table • B+ trees – most popular – They are search trees, but they are not binary instead have higher fanout – will discuss them briefly next • Specialized indexes: bit maps, R-trees, inverted index CSE 344 - Winter 2015 7
B+ Tree Index by Example d = 2 Find the key 40 80 40 ≤ 80 20 60 100 120 140 20 < 40 ≤ 60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 ≤ 40 10 15 18 20 30 40 50 60 65 80 85 90 CSE 344 - Winter 2015 8
Clustered vs Unclustered B+ Tree B+ Tree Data entries Data entries ( Index File ) ( Data file ) Data Records Data Records CLUSTERED UNCLUSTERED Every table can have only one clustered and many unclustered indexes CSE 344 - Winter 2015 9
Getting Practical: Creating Indexes in SQL CREATE##TABLE####V(M#int,###N#varchar(20),####P#int);# CREATE##INDEX#V1#ON#V(N)# CREATE##INDEX#V2#ON#V(P,#M)# CREATE##INDEX#V3#ON#V(M,#N)# CREATE#UNIQUE#INDEX#V4#ON#V(N)# Not#supported#in# SQLite# CREATE#CLUSTERED#INDEX#V5#ON#V(N)# CSE 344 - Winter 2015 10
Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • How many indexes could we create? • Which indexes should we create? CSE 344 - Winter 2015 11
Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • How many indexes could we create? • Which indexes should we create? In general this is a very hard problem 12
Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • The index selection problem – Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!) • Who does index selection: – The database administrator DBA – Semi-automatically, using a database administration tool CSE 344 - Winter 2015 13
Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • The index selection problem – Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!) • Who does index selection: – The database administrator DBA – Semi-automatically, using a database administration tool CSE 344 - Winter 2015 14
Index Selection: Which Search Key • Make some attribute K a search key if the WHERE clause contains: – An exact match on K – A range predicate on K – A join on K CSE 344 - Winter 2015 15
The Index Selection Problem 1 V(M, N, P); Your workload is this 100000 queries: 100 queries: SELECT * SELECT * FROM V FROM V WHERE N=? WHERE P=? What indexes ? CSE 344 - Winter 2015 16
The Index Selection Problem 1 V(M, N, P); Your workload is this 100000 queries: 100 queries: SELECT * SELECT * FROM V FROM V WHERE N=? WHERE P=? A: V(N) and V(P) (hash tables or B-trees) CSE 344 - Winter 2015 17
The Index Selection Problem 2 V(M, N, P); Your workload is this 100000 queries: 100 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N>? and N<? WHERE P=? What indexes ? CSE 344 - Winter 2015 18
The Index Selection Problem 2 V(M, N, P); Your workload is this 100000 queries: 100 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N>? and N<? WHERE P=? A: definitely V(N) (must B-tree); unsure about V(P) CSE 344 - Winter 2015 19
The Index Selection Problem 3 V(M, N, P); Your workload is this 100000 queries: 1000000 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N=? WHERE N=? and P>? What indexes ? CSE 344 - Winter 2015 20
Recommend
More recommend