database theory conjunctive queries static analysis
play

Database Theory: Conjunctive Queries & Static Analysis CS 645 - PowerPoint PPT Presentation

Database Theory: Conjunctive Queries & Static Analysis CS 645 Feb 20, 2006 1 Life of a database theoretician Expressiveness of query languages Any query in L1 can be expressed in L2 Query q cannot be expressed in L


  1. Database Theory: Conjunctive Queries & Static Analysis CS 645 Feb 20, 2006 1

  2. Life of a database theoretician • Expressiveness of query languages – Any query in L1 can be expressed in L2 – Query q cannot be expressed in L • Complexity of languages – Bounds on resources required to evaluate any query in language L • Static analysis of queries (for optimization) – Given q in L: is it minimal? – Given q1 and q2 in L: are they equivalent? • Views 2

  3. Coming lectures • TODAY: – Overview of languages – Conjunctive queries (CQs) – Properties of CQs – Containment/equivalence for CQs • Next Week – Adding recursion – Reasoning about views 3

  4. Query languages • So far we ʼ ve seen: – Relational algebra – Relational calculus – SQL 4

  5. Review: relational algebra • Five operators: – Union: ∪ – Difference: - – Selection: σ – Projection: Π – Cartesian Product: × • Derived or auxiliary operators: – Intersection, complement – Joins (natural,equi-join, theta join) – Renaming: ρ

  6. Review: relational calculus Name and sid of students who are taking the English: course “DB” Π name , sid ( Students Takes σ name = ”DB” ( Course) RA: {x name, x sid | ∃ x cid ∃ x term Students(x sid ,x name ) ∧ Takes(x sid ,x cid ) ∧ RC: Course(x cid ,”DB”, x term ) }

  7. Review: SQL Basic form: SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)

  8. Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT Conjunctive single datalog Queries rule

  9. Conjunctive Queries abbreviated : CQ • A subset of FO queries (i.e. less expressive) • Many queries in practice are conjunctive • Some optimizers handle only conjunctive queries - break larger queries into many CQs • CQ ʼ s have “better” theoretical properties than arbitrary queries 9

  10. Conjunctive Queries in rule-based (datalog) notation Variables Subgoals P(x,z) :- R(x,y) & R(y,z) ← Implicit ∃ Head Body Conjunction “IF” • R: Extensional database (EDB) - stored • P: Intentional database (IDB) - computed 10

  11. Conjunctive Queries Intuitively: when facts in the body are true of stored relations, then we infer the fact in the head P(x,z) :- R(x,y) & R(y,z) • More formally: • Consider all possible substitutions : assignments of the variables in the body 11

  12. Examples EDB Relation: ManagedBy(emp,mgr) A(x) :- ManagedBy(“Smith”,y) & ManagedBy(x,y) All employees having the same manager as “Smith” ManagedBy(x,y) A(x) substitution substitution Sam Frank Sally substitution y = Bob y = Joan Smith Joan y = Joan Smith x = ? x = Smith x = Sally Sally Joan Smith Bob 12

  13. Defining answers to CQ general form ans(u) :- R1(u1) & ... & Rn(un) e.g. ui = (x,y,z) of a CQ q v(ui) = (a,a,b) • A substitution v is a function from variables into the domain. e.g. x → a, y → a, z → b, u → c • Let I be an instance, i.e. relations I(R1) ... I(Rn) • A tuple t is in the answer q(I) if there is a substitution v s.t: – v(u1) ∈ I(R1) for each i, and – t = v(u) 13

  14. Examples EDB Relation: ManagedBy(emp,mgr) • Find all employees having the same director as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z) (Your director is your manager ʼ s manager) 14

  15. Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive datalog σ , π , × Queries rule

  16. CQ and RA Relational Algebra: • CQ correspond precisely to σ C , Π A , × (missing: ∪ , –) A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) Π $2.name $1.manager=$2.manager σ name=“Smith” 16 ManagedBy ManagedBy

  17. Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive S d FW datalog σ , π , × Queries rule

  18. CQ and SQL Rule-based: A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) Notice SQL: “distinct” select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager 18

  19. Boolean queries A() :- ManagedBy(“Smith”, x), ManagedBy(“Sally”, x) Is there someone who manages both Smith and Sally? non-empty relation • Returns: – relation { 〈〉 } if the answer is yes – relation { } if the answer is no empty relation 19

  20. Properties of Conjunctive Queries • Satisfiability – A query q is satisfiable if there exists some input relation I such that q(I) is non-empty. – FACT: Every CQ is satisfiable. • Monotonicity – A query q is monotonic if for each instance I,J over schema, I ⊆ J implies q(I) ⊆ q(J). – FACT: Every CQ is monotonic. 20

  21. Satisfiability of CQs We can always generate satisfying EDB relations from the body of the rule. S(x,y,z) :- P(x,w) & R(w,y,v) & P(v,z) d e a b b c d P a b R b c d S a c e d e 21

  22. Monotonicity of CQs general form ans(u) :- R1(u1) & ... & Rn(un) e.g. ui = (x,y,z) of a CQ q • Consider two databases I, J s.t. I ⊆ J. • let t ∈ q(I). – Then for some substitution v: • v(ui) ∈ I(Ri) for each i. • t = v(u) – Since I ⊆ J, v(ui) ∈ J(Ri) for each i – So t ∈ q(J)

  23. Consequence of monotonicity Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works” SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker= ʻ Gizmo-Works ʼ ) • This query is NOT monotone. • Therefore, it is not in the class of conjunctive queries. • It cannot be expressed as a simple SFW query.

  24. Extensions of CQs 24

  25. Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT single RA: Conjunctive S d FW datalog σ , π , × Queries rule

  26. Extensions of CQ: disequality CQ ≠ Find managers that manage at least 2 employees A(y) :- ManagedBy(x,y), ManagedBy(z,y), x ≠ z 26

  27. Extensions of CQ: inequality CQ < Find employees earning more than their manager A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u > v Additional EDB Relation: Salary(emp,money)

  28. Extensions of CQ: negation CQ ¬ Find people sharing the same office with Alice, but with a different manager A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x), ¬ManagedBy(y,x) Additional EDB Relation: Office(emp,officenum) 28

  29. Extensions of CQ: union UCQ Unions of conjunctive queries Rule-based: A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address) Datalog notation is very convenient for expressing unions (no need for ∨ ) 29

  30. Query language classes Algebra Logic SQL Recursive Queries Expressiveness SFW + FO queries RA (safe) RC UNION EXCEPT UCQ CQ < CQ ≠ CQ ¬ single RA: Conjunctive S d FW datalog σ , π , × Queries rule

  31. Extensions of CQ • If we extend too much, we capture FO – Namely: CQs + Union, Negation • Theoreticians need to be careful: small extensions may make a huge difference on certain theoretical properties of CQ 31

  32. Query language classes Algebra Logic SQL Recursive Queries Expressiveness (safe) RC SFW + FO queries RA UNION UCQ ¬ EXCEPT UCQ CQ < CQ ≠ CQ ¬ single RA: Conjunctive S d FW datalog σ , π , × Queries rule

  33. Query Equivalence and Containment • One kind of static analysis • Useful for query optimization • Intensively studied since 1977 33

  34. Query Equivalence SELECT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ʻ Sales ʼ and x.office = y.office and x.floor = 5 and y.dept = ʻ Sales ʼ Hmmmm…. Is there a simpler way to write that ? 34

  35. Query Equivalence • Queries q 1 and q 2 are equivalent if for every database D , q 1 ( D ) = q 2 ( D ). • Notation: q 1 ≡ q 2 relations equal 35

  36. Query Containment • Query q 1 is contained in q 2 if for every database D , q 1 ( D ) ⊆ q 2 ( D ). • Notation: q 1 ⊆ q 2 • Obviously: q 1 ⊆ q 2 and q 2 ⊆ q 1 iff q 1 ≡ q 2 • Conversely: q 1 ∧ q 2 ≡ q 2 iff q 1 ⊆ q 2 We will study the containment problem only.

  37. Sidenote: containment for Boolean queries • Recall: q 1 is contained in q 2 if for every database D , q 1 ( D ) ⊆ q 2 ( D ). – if q1, q2 are boolean they return { 〈〉 } or { } – containment says: – whenever q1(D) = { 〈〉 } then q2(D) = { 〈〉 }. • Containment is implication: q1 → q2 37

  38. Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,y), R(y,z), R(z,w) q 2 (x) :- R(x,y), R(y,z) 38

  39. Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,y), R(y,z), R(z,x) q 2 (x) :- R(x,y), R(y,x) Counter-example B A C 39

  40. Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,u), R(u,u) q 2 (x) :- R(x,u), R(u,v), R(v,w) Example A B 40

  41. Examples of Query Containments Is q 1 ⊆ q 2 ? q 1 (x) :- R(x,u), R(u,”Smith”) q 2 (x) :- R(x,u), R(u,v) 41

  42. Query Containment • Theorem Query containment for CQ is decidable and NP-complete. (query complexity) 42

Recommend


More recommend