Database Theory: Conjunctive Queries & Static Analysis CS 645 Feb 20, 2006 1
Life of a database theoretician • Expressiveness of query languages – Any query in L1 can be expressed in L2 – Query q cannot be expressed in L • Complexity of languages – Bounds on resources required to evaluate any query in language L • Static analysis of queries (for optimization) – Given q in L: is it minimal? – Given q1 and q2 in L: are they equivalent? • Views 2
Coming lectures • TODAY: – Overview of languages – Conjunctive queries (CQs) – Properties of CQs – Containment/equivalence for CQs • Next Week – Adding recursion – Reasoning about views 3
Query languages • So far we ʼ ve seen: – Relational algebra – Relational calculus – SQL 4
Review: relational algebra • Five operators: – Union: ∪ – Difference: - – Selection: σ – Projection: Π – Cartesian Product: × • Derived or auxiliary operators: – Intersection, complement – Joins (natural,equi-join, theta join) – Renaming: ρ
Review: relational calculus Name and sid of students who are taking the English: course “DB” Π name , sid ( Students Takes σ name = ”DB” ( Course) RA: {x name, x sid | ∃ x cid ∃ x term Students(x sid ,x name ) ∧ Takes(x sid ,x cid ) ∧ RC: Course(x cid ,”DB”, x term ) }
Review: SQL Basic form: SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)
Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT Conjunctive single datalog Queries rule
Conjunctive Queries abbreviated : CQ • A subset of FO queries (i.e. less expressive) • Many queries in practice are conjunctive • Some optimizers handle only conjunctive queries - break larger queries into many CQs • CQ ʼ s have “better” theoretical properties than arbitrary queries 9
Conjunctive Queries in rule-based (datalog) notation Variables Subgoals P(x,z) :- R(x,y) & R(y,z) ← Implicit ∃ Head Body Conjunction “IF” • R: Extensional database (EDB) - stored • P: Intentional database (IDB) - computed 10
Conjunctive Queries Intuitively: when facts in the body are true of stored relations, then we infer the fact in the head P(x,z) :- R(x,y) & R(y,z) • More formally: • Consider all possible substitutions : assignments of the variables in the body 11
Examples EDB Relation: ManagedBy(emp,mgr) A(x) :- ManagedBy(“Smith”,y) & ManagedBy(x,y) All employees having the same manager as “Smith” ManagedBy(x,y) A(x) substitution substitution Sam Frank Sally substitution y = Bob y = Joan Smith Joan y = Joan Smith x = ? x = Smith x = Sally Sally Joan Smith Bob 12
Defining answers to CQ general form ans(u) :- R1(u1) & ... & Rn(un) e.g. ui = (x,y,z) of a CQ q v(ui) = (a,a,b) • A substitution v is a function from variables into the domain. e.g. x → a, y → a, z → b, u → c • Let I be an instance, i.e. relations I(R1) ... I(Rn) • A tuple t is in the answer q(I) if there is a substitution v s.t: – v(u1) ∈ I(R1) for each i, and – t = v(u) 13
Examples EDB Relation: ManagedBy(emp,mgr) • Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z) (Your director is your manager ʼ s manager) 14
Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive datalog σ , π , × Queries rule
CQ and RA Relational Algebra: • CQ correspond precisely to σ C , Π A , × (missing: ∪ , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) Π $2.name $1.manager=$2.manager σ name=“Smith” 16 ManagedBy ManagedBy
Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive S d FW datalog σ , π , × Queries rule
CQ and SQL Rule-based: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) Notice SQL: “distinct” select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager 18
Boolean queries A() :- ManagedBy(“Smith”, x), ManagedBy(“Sally”, x) Is there someone who manages both Smith and Sally? non-empty relation • Returns: – relation { 〈〉 } if the answer is yes – relation { } if the answer is no empty relation 19
Properties of Conjunctive Queries • Satisfiability – A query q is satisfiable if there exists some input relation I such that q(I) is non-empty. – FACT: Every CQ is satisfiable. • Monotonicity – A query q is monotonic if for each instance I,J over schema, I ⊆ J implies q(I) ⊆ q(J). – FACT: Every CQ is monotonic. 20
Satisfiability of CQs We can always generate satisfying EDB relations from the body of the rule. S(x,y,z) :- P(x,w) & R(w,y,v) & P(v,z) d e a b b c d P a b R b c d S a c e d e 21
Monotonicity of CQs general form ans(u) :- R1(u1) & ... & Rn(un) e.g. ui = (x,y,z) of a CQ q • Consider two databases I, J s.t. I ⊆ J. • let t ∈ q(I). – Then for some substitution v: • v(ui) ∈ I(Ri) for each i. • t = v(u) – Since I ⊆ J, v(ui) ∈ J(Ri) for each i – So t ∈ q(J)
Consequence of monotonicity Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works” SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker= ʻ Gizmo-Works ʼ ) • This query is NOT monotone. • Therefore, it is not in the class of conjunctive queries. • It cannot be expressed as a simple SFW query.
Extensions of CQs 24
Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive S d FW datalog σ , π , × Queries rule
Extensions of CQ: disequality CQ ≠ Find managers that manage at least 2 employees A(y) :- ManagedBy(x,y), ManagedBy(z,y), x ≠ z 26
Extensions of CQ: inequality CQ < Find employees earning more than their manager A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u > v Additional EDB Relation: Salary(emp,money)
Extensions of CQ: negation CQ ¬ Find people sharing the same office with Alice, but with a different manager A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x), ¬ManagedBy(y,x) Additional EDB Relation: Office(emp,officenum) 28
Extensions of CQ: union UCQ Unions of conjunctive queries Rule-based: A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address) Datalog notation is very convenient for expressing unions (no need for ∨ ) 29
Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT UCQ CQ < CQ ≠ CQ ¬ single RA: Conjunctive S d FW datalog σ , π , × Queries rule
Extensions of CQ • If we extend too much, we capture FO – Namely: CQs + Union, Negation • Theoreticians need to be careful: small extensions may make a huge difference on certain theoretical properties of CQ 31
Query language classes Algebra Logic SQL Recursive Queries Expressiveness (safe) RC SFW + FO queries RA UNION UCQ ¬ EXCEPT UCQ CQ < CQ ≠ CQ ¬ single RA: Conjunctive S d FW datalog σ , π , × Queries rule
Query Equivalence and Containment • One kind of static analysis • Useful for query optimization • Intensively studied since 1977 33
Query Equivalence SELECT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ʻ Sales ʼ and x.office = y.office and x.floor = 5 and y.dept = ʻ Sales ʼ Hmmmm…. Is there a simpler way to write that ? 34
Query Equivalence • Queries q 1 and q 2 are equivalent if for every database D , q 1 ( D ) = q 2 ( D ). • Notation: q 1 ≡ q 2 relations equal 35
Query Containment • Query q 1 is contained in q 2 if for every database D , q 1 ( D ) ⊆ q 2 ( D ). • Notation: q 1 ⊆ q 2 • Obviously: q 1 ⊆ q 2 and q 2 ⊆ q 1 iff q 1 ≡ q 2 • Conversely: q 1 ∧ q 2 ≡ q 2 iff q 1 ⊆ q 2 We will study the containment problem only.
Sidenote: containment for Boolean queries • Recall: q 1 is contained in q 2 if for every database D , q 1 ( D ) ⊆ q 2 ( D ). – if q1, q2 are boolean they return { 〈〉 } or { } – containment says: – whenever q1(D) = { 〈〉 } then q2(D) = { 〈〉 }. • Containment is implication: q1 → q2 37
Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,y), R(y,z), R(z,w) q 2 (x) :- R(x,y), R(y,z) 38
Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,y), R(y,z), R(z,x) q 2 (x) :- R(x,y), R(y,x) Counter-example B A C 39
Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) Example A B 40
Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) 41
Query Containment • Theorem Query containment for CQ is decidable and NP-complete. (query complexity) 42
Recommend
More recommend