consistent query answering
play

Consistent Query Answering Jan Chomicki University at Buffalo - PowerPoint PPT Presentation

Consistent Query Answering Jan Chomicki University at Buffalo October 2017 Jan Chomicki University at Buffalo CQA October 2017 1 / 34 Table of contents Motivation 1 Basics 2 Computing CQA 3 Computational Complexity 4 Dichotomy 5


  1. Research Goals Formal definition What constitutes reliable (consistent) information in an inconsistent database. Algorithms How to compute consistent information. Computational complexity analysis tractable vs. intractable classes of queries and integrity constraints tradeoffs: complexity vs. expressiveness. Jan Chomicki University at Buffalo CQA October 2017 10 / 34

  2. Research Goals Formal definition What constitutes reliable (consistent) information in an inconsistent database. Algorithms How to compute consistent information. Computational complexity analysis tractable vs. intractable classes of queries and integrity constraints tradeoffs: complexity vs. expressiveness. Implementation preferably using DBMS technology. Jan Chomicki University at Buffalo CQA October 2017 10 / 34

  3. Research Goals Formal definition What constitutes reliable (consistent) information in an inconsistent database. Algorithms How to compute consistent information. Computational complexity analysis tractable vs. intractable classes of queries and integrity constraints tradeoffs: complexity vs. expressiveness. Implementation preferably using DBMS technology. Applications data cleaning Jan Chomicki University at Buffalo CQA October 2017 10 / 34

  4. Basic Notions Repair D ′ of a database D w.r.t. the integrity constraints IC : D ′ : over the same schema as D D ′ | = IC symmetric difference between D and D ′ is minimal. Jan Chomicki University at Buffalo CQA October 2017 11 / 34

  5. Basic Notions Repair D ′ of a database D w.r.t. the integrity constraints IC : D ′ : over the same schema as D D ′ | = IC symmetric difference between D and D ′ is minimal. Consistent query answer to a query Q in D w.r.t. IC : an element of the result of Q in every repair of D w.r.t. IC . Jan Chomicki University at Buffalo CQA October 2017 11 / 34

  6. Basic Notions Repair D ′ of a database D w.r.t. the integrity constraints IC : D ′ : over the same schema as D D ′ | = IC symmetric difference between D and D ′ is minimal. Consistent query answer to a query Q in D w.r.t. IC : an element of the result of Q in every repair of D w.r.t. IC . Another incarnation of the idea of sure/certain query answers (Lipski [Jr.79]). Jan Chomicki University at Buffalo CQA October 2017 11 / 34

  7. A Logical Aside Logical inconsistency inconsistent database: database facts together with integrity constraints form an inconsistent set of formulas trivialization of reasoning does not occur because constraints are not used in relational query evaluation. Jan Chomicki University at Buffalo CQA October 2017 12 / 34

  8. Exponentially many repairs Example relation R ( A , B ) A B violates the dependency A → B has 2 n repairs. a 1 b 1 a 1 c 1 a 2 b 2 a 2 c 2 · · · a n b n a n c n A → B Jan Chomicki University at Buffalo CQA October 2017 13 / 34

  9. Exponentially many repairs Example relation R ( A , B ) A B violates the dependency A → B has 2 n repairs. a 1 b 1 a 1 c 1 a 2 b 2 a 2 c 2 · · · a n b n a n c n A → B It is impractical to apply the definition of CQA directly. Jan Chomicki University at Buffalo CQA October 2017 13 / 34

  10. Computing Consistent Query Answers Query Rewriting Given a query Q and a set of integrity constraints IC , build a query Q IC such that for every database instance D the set of answers to Q IC in D = the set of consistent answers to Q in D w.r.t. IC. Jan Chomicki University at Buffalo CQA October 2017 14 / 34

  11. Computing Consistent Query Answers Query Rewriting Given a query Q and a set of integrity constraints IC , build a query Q IC such that for every database instance D the set of answers to Q IC in D = the set of consistent answers to Q in D w.r.t. IC. Representing all repairs Given IC and D : build a space-efficient representation of all repairs of D w.r.t. IC 1 use this representation to answer (many) queries. 2 Jan Chomicki University at Buffalo CQA October 2017 14 / 34

  12. Computing Consistent Query Answers Query Rewriting Given a query Q and a set of integrity constraints IC , build a query Q IC such that for every database instance D the set of answers to Q IC in D = the set of consistent answers to Q in D w.r.t. IC. Representing all repairs Given IC and D : build a space-efficient representation of all repairs of D w.r.t. IC 1 use this representation to answer (many) queries. 2 Logic programs Given IC , D and Q : build a logic program P IC , D whose models are the repairs of D w.r.t. IC 1 build a logic program P Q expressing Q 2 use a logic programming system that computes the query atoms present in all 3 models of P IC , D ∪ P Q . Jan Chomicki University at Buffalo CQA October 2017 14 / 34

  13. Constraint classes Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  14. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  15. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  16. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  17. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  18. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  19. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Functional dependencies X → Y : key dependency: Y = U Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  20. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Functional dependencies Example primary-key dependency X → Y : Name → Address Salary key dependency: Y = U Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  21. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Functional dependencies Example primary-key dependency X → Y : Name → Address Salary key dependency: Y = U Inclusion dependencies R [ X ] ⊆ S [ Y ]: a foreign key constraint: key Y Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  22. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Functional dependencies Example primary-key dependency X → Y : Name → Address Salary key dependency: Y = U Example foreign key constraint Inclusion dependencies M [ Manager ] ⊆ M [ Name ] R [ X ] ⊆ S [ Y ]: a foreign key constraint: key Y Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  23. Constraint classes Example Universal constraints ∀ . A 1 ∧ · · · ∧ A n ⇒ B 1 ∨ · · · ∨ B m ∀ . Par ( x , y ) ⇒ Ma ( x , y ) ∨ Fa ( x , y ) Example Tuple-generating dependencies ∀ . A 1 ∧ · · · ∧ A n ⇒ B ∀ . Ma ( x , y ) ∧ Ma ( x , z ) ⇒ Sib ( y , z ) Denial constraints Example ∀ . ¬ ( A 1 ∧ · · · ∧ A n ) ∀ . ¬ ( M ( n , s , m ) ∧ M ( m , t , w ) ∧ s > t ) Functional dependencies Example primary-key dependency X → Y : Name → Address Salary key dependency: Y = U Example foreign key constraint Inclusion dependencies M [ Manager ] ⊆ M [ Name ] R [ X ] ⊆ S [ Y ]: a foreign key constraint: key Y Hyper Jan Chomicki University at Buffalo CQA October 2017 15 / 34

  24. Query Rewriting Building queries that compute CQAs relational calculus (algebra) ❀ relational calculus (algebra) SQL ❀ SQL leads to PTIME data complexity Jan Chomicki University at Buffalo CQA October 2017 16 / 34

  25. Query Rewriting Building queries that compute CQAs relational calculus (algebra) ❀ relational calculus (algebra) SQL ❀ SQL leads to PTIME data complexity Query Emp ( x , y , z ) Jan Chomicki University at Buffalo CQA October 2017 16 / 34

  26. Query Rewriting Building queries that compute CQAs relational calculus (algebra) ❀ relational calculus (algebra) SQL ❀ SQL leads to PTIME data complexity Query Emp ( x , y , z ) Integrity constraint ∀ x , y , z , y ′ , z ′ . ¬ Emp ( x , y , z ) ∨ ¬ Emp ( x , y ′ , z ′ ) ∨ z = z ′ Jan Chomicki University at Buffalo CQA October 2017 16 / 34

  27. Query Rewriting Building queries that compute CQAs relational calculus (algebra) ❀ relational calculus (algebra) SQL ❀ SQL leads to PTIME data complexity Query Emp ( x , y , z ) Integrity constraint ∀ x , y , z , y ′ , z ′ . ¬ Emp ( x , y , z ) ∨ ¬ Emp ( x , y ′ , z ′ ) ∨ z = z ′ Jan Chomicki University at Buffalo CQA October 2017 16 / 34

  28. Query Rewriting Building queries that compute CQAs relational calculus (algebra) ❀ relational calculus (algebra) SQL ❀ SQL leads to PTIME data complexity Query Emp ( x , y , z ) Integrity constraint ∀ x , y , z , y ′ , z ′ . ¬ Emp ( x , y , z ) ∨ ¬ Emp ( x , y ′ , z ′ ) ∨ z = z ′ Rewritten query Emp ( x , y , z ) ∧ ∀ y ′ , z ′ . ¬ Emp ( x , y ′ , z ′ ) ∨ z = z ′ Jan Chomicki University at Buffalo CQA October 2017 16 / 34

  29. The Scope of Query Rewriting (Arenas, Bertossi, Ch. [ABC99]) Integrity constraints: binary universal Queries: conjunctions of literals (relational algebra: σ, × , − ) Jan Chomicki University at Buffalo CQA October 2017 17 / 34

  30. The Scope of Query Rewriting (Arenas, Bertossi, Ch. [ABC99]) Integrity constraints: binary universal Queries: conjunctions of literals (relational algebra: σ, × , − ) (Fuxman, Miller [FM07]) Integrity constraints: primary key functional dependencies Queries: C forest a class of conjunctive queries ( π, σ, × ) no cycles no non-key or non-full joins no repeated relation symbols no built-ins Jan Chomicki University at Buffalo CQA October 2017 17 / 34

  31. SQL Rewriting SQL query SELECT Name FROM Emp WHERE Salary ≥ 10K Jan Chomicki University at Buffalo CQA October 2017 18 / 34

  32. SQL Rewriting SQL query SELECT Name FROM Emp WHERE Salary ≥ 10K SQL rewritten query SELECT e1.Name FROM Emp e1 WHERE e1.Salary ≥ 10K AND NOT EXISTS (SELECT * FROM EMPLOYEE e2 WHERE e2.Name = e1.Name AND e2.Salary < 10K) Jan Chomicki University at Buffalo CQA October 2017 18 / 34

  33. SQL Rewriting SQL query SELECT Name FROM Emp WHERE Salary ≥ 10K SQL rewritten query SELECT e1.Name FROM Emp e1 WHERE e1.Salary ≥ 10K AND NOT EXISTS (SELECT * FROM EMPLOYEE e2 WHERE e2.Name = e1.Name AND e2.Salary < 10K) (Fuxman, Fazli, Miller [FM05]) ConQuer: a system for computing CQAs conjunctive ( C forest ) and aggregation SQL queries databases can be annotated with consistency indicators tested on TPC-H queries and medium-size databases Jan Chomicki University at Buffalo CQA October 2017 18 / 34

  34. Conflict Hypergraph Vertices Tuples in the database. (Gates, Redmond, 20M) (Musk, Palo Alto, 10M) (Gates, Redmond, 30M) Jan Chomicki University at Buffalo CQA October 2017 19 / 34

  35. Conflict Hypergraph Vertices Tuples in the database. (Gates, Redmond, 20M) Edges Minimal sets of tuples (Musk, Palo Alto, 10M) violating a constraint. (Gates, Redmond, 30M) Jan Chomicki University at Buffalo CQA October 2017 19 / 34

  36. Conflict Hypergraph Vertices Tuples in the database. (Gates, Redmond, 20M) Edges Minimal sets of tuples (Musk, Palo Alto, 10M) violating a constraint. Repairs (Gates, Redmond, 30M) Maximal independent sets in the conflict graph. Jan Chomicki University at Buffalo CQA October 2017 19 / 34

  37. Conflict Hypergraph Vertices Tuples in the database. (Gates, Redmond, 20M) Edges Minimal sets of tuples (Musk, Palo Alto, 10M) violating a constraint. Repairs (Gates, Redmond, 30M) Maximal independent sets in the conflict graph. Jan Chomicki University at Buffalo CQA October 2017 19 / 34

  38. Conflict Hypergraph Vertices Tuples in the database. (Gates, Redmond, 20M) Edges Minimal sets of tuples (Musk, Palo Alto, 10M) violating a constraint. Repairs (Gates, Redmond, 30M) Maximal independent sets in the conflict graph. Representation applicable only to denial constraints. Jan Chomicki University at Buffalo CQA October 2017 19 / 34

  39. Computing CQAs Using Conflict Hypergraphs Algorithm HProver INPUT: query Φ a disjunction of ground literals, conflict hypergraph G OUTPUT: is Φ false in some repair of D w.r.t. IC ? ALGORITHM: ¬ Φ = P 1 ( t 1 ) ∧ · · · ∧ P m ( t m ) ∧ ¬ P m +1 ( t m +1 ) ∧ · · · ∧ ¬ P n ( t n ) 1 find a consistent set of facts S such that 2 S ⊇ { P 1 ( t 1 ) , . . . , P m ( t m ) } for every fact A ∈ { P m +1 ( t m +1 ) , . . . , P n ( t n ) } : A �∈ D or there is an edge E = { A , B 1 , . . . , B m } in G and S ⊇ { B 1 , . . . , B m } . Jan Chomicki University at Buffalo CQA October 2017 20 / 34

  40. Computing CQAs Using Conflict Hypergraphs Algorithm HProver INPUT: query Φ a disjunction of ground literals, conflict hypergraph G OUTPUT: is Φ false in some repair of D w.r.t. IC ? ALGORITHM: ¬ Φ = P 1 ( t 1 ) ∧ · · · ∧ P m ( t m ) ∧ ¬ P m +1 ( t m +1 ) ∧ · · · ∧ ¬ P n ( t n ) 1 find a consistent set of facts S such that 2 S ⊇ { P 1 ( t 1 ) , . . . , P m ( t m ) } for every fact A ∈ { P m +1 ( t m +1 ) , . . . , P n ( t n ) } : A �∈ D or there is an edge E = { A , B 1 , . . . , B m } in G and S ⊇ { B 1 , . . . , B m } . (Ch., Marcinkowski, Staworko [CMS04]) Hippo: a system for computing CQAs in PTIME quantifier-free queries and denial constraints only edges of the conflict hypergraph are kept in main memory optimization can eliminate many (sometimes all) database accesses in HProver tested for medium-size synthetic databases Jan Chomicki University at Buffalo CQA October 2017 20 / 34

  41. Logic programs Specifying repairs as answer sets of logic programs (Arenas, Bertossi, Ch. [ABC03]) (Greco, Greco, Zumpano [GGZ03]) (Cal` ı, Lembo, Rosati [CLR03b]) Jan Chomicki University at Buffalo CQA October 2017 21 / 34

  42. Logic programs Specifying repairs as answer sets of logic programs (Arenas, Bertossi, Ch. [ABC03]) (Greco, Greco, Zumpano [GGZ03]) (Cal` ı, Lembo, Rosati [CLR03b]) Example emp ( x , y , z ) ← emp D ( x , y , z ) , not dubious emp ( x , y , z ) . dubious emp ( x , y , z ) ← emp D ( x , y , z ) , emp ( x , y ′ , z ′ ) , y � = y ′ . dubious emp ( x , y , z ) ← emp D ( x , y , z ) , emp ( x , y ′ , z ′ ) , z � = z ′ . Jan Chomicki University at Buffalo CQA October 2017 21 / 34

  43. Logic programs Specifying repairs as answer sets of logic programs (Arenas, Bertossi, Ch. [ABC03]) (Greco, Greco, Zumpano [GGZ03]) (Cal` ı, Lembo, Rosati [CLR03b]) Example emp ( x , y , z ) ← emp D ( x , y , z ) , not dubious emp ( x , y , z ) . dubious emp ( x , y , z ) ← emp D ( x , y , z ) , emp ( x , y ′ , z ′ ) , y � = y ′ . dubious emp ( x , y , z ) ← emp D ( x , y , z ) , emp ( x , y ′ , z ′ ) , z � = z ′ . Answer sets { emp ( Gates , Redmond , 20 M ) , emp ( Musk , PaloAlto , 10 M ) , . . . } { emp ( Gates , Redmond , 30 M ) , emp ( Musk , PaloAlto , 10 M ) , . . . } Jan Chomicki University at Buffalo CQA October 2017 21 / 34

  44. Logic Programs for computing CQAs Logic Programs disjunction and classical negation checking whether an atom is in all answer sets is Π p 2 -complete dlv , smodels , . . . Jan Chomicki University at Buffalo CQA October 2017 22 / 34

  45. Logic Programs for computing CQAs Logic Programs disjunction and classical negation checking whether an atom is in all answer sets is Π p 2 -complete dlv , smodels , . . . Scope arbitrary first-order queries and universal constraints approach unlikely to yield tractable cases Jan Chomicki University at Buffalo CQA October 2017 22 / 34

  46. Logic Programs for computing CQAs Logic Programs disjunction and classical negation checking whether an atom is in all answer sets is Π p 2 -complete dlv , smodels , . . . Scope arbitrary first-order queries and universal constraints approach unlikely to yield tractable cases INFOMIX (Eiter et al. [EFGL03]) combines CQA with data integration (GAV) uses dlv for repair computations optimization techniques: localization, factorization tested on small-to-medium-size legacy databases Jan Chomicki University at Buffalo CQA October 2017 22 / 34

  47. Logic Programs for computing CQAs Logic Programs disjunction and classical negation checking whether an atom is in all answer sets is Π p 2 -complete dlv , smodels , . . . Scope arbitrary first-order queries and universal constraints approach unlikely to yield tractable cases INFOMIX (Eiter et al. [EFGL03]) combines CQA with data integration (GAV) uses dlv for repair computations optimization techniques: localization, factorization tested on small-to-medium-size legacy databases complexity Jan Chomicki University at Buffalo CQA October 2017 22 / 34

  48. Co-NP-completeness of CQA Theorem (Ch., Marcinkowski [CM05a]) For primary-key functional dependencies and conjunctive queries, consistent query answering is data-complete for co-NP. Jan Chomicki University at Buffalo CQA October 2017 23 / 34

  49. Co-NP-completeness of CQA Theorem (Ch., Marcinkowski [CM05a]) For primary-key functional dependencies and conjunctive queries, consistent query answering is data-complete for co-NP. Proof. Membership: V is a repair iff V | = IC and W �| = IC if W = V ∪ M . Co-NP-hardness: reduction from MONOTONE 3-SAT. Positive clauses β 1 = φ 1 ∧ · · · ∧ φ m , negative clauses β 2 = ψ m +1 ∧ · · · ∧ ψ l . 1 Database D contains two binary relations R ( A , B ) and S ( A , B ): 2 R ( i , p ) if variable p occurs in φ i , i = 1 , . . . , m . S ( i , p ) if variable p occurs in ψ i , i = m + 1 , . . . , l . A is the primary key of both R and S . 3 � � Query Q ≡ ∃ x , y , z . R ( x , y ) ∧ S ( z , y ) . 4 There is an assignment which satisfies β 1 ∧ β 2 iff there exists a repair in which Q is 5 false. Jan Chomicki University at Buffalo CQA October 2017 23 / 34

  50. Co-NP-completeness of CQA Theorem (Ch., Marcinkowski [CM05a]) For primary-key functional dependencies and conjunctive queries, consistent query answering is data-complete for co-NP. Proof. Membership: V is a repair iff V | = IC and W �| = IC if W = V ∪ M . Co-NP-hardness: reduction from MONOTONE 3-SAT. Positive clauses β 1 = φ 1 ∧ · · · ∧ φ m , negative clauses β 2 = ψ m +1 ∧ · · · ∧ ψ l . 1 Database D contains two binary relations R ( A , B ) and S ( A , B ): 2 R ( i , p ) if variable p occurs in φ i , i = 1 , . . . , m . S ( i , p ) if variable p occurs in ψ i , i = m + 1 , . . . , l . A is the primary key of both R and S . 3 � � Query Q ≡ ∃ x , y , z . R ( x , y ) ∧ S ( z , y ) . 4 There is an assignment which satisfies β 1 ∧ β 2 iff there exists a repair in which Q is 5 false. Q does not belong to C forest . Jan Chomicki University at Buffalo CQA October 2017 23 / 34

  51. Data complexity of CQA Primary keys Arbitrary keys Denial Universal σ, × , − σ, × , − , ∪ σ, π σ, π, × σ, π, × , − , ∪ Jan Chomicki University at Buffalo CQA October 2017 24 / 34

  52. Data complexity of CQA Primary keys Arbitrary keys Denial Universal σ, × , − PTIME PTIME PTIME: binary σ, × , − , ∪ σ, π σ, π, × σ, π, × , − , ∪ (Arenas, Bertossi, Ch. [ABC99]) Jan Chomicki University at Buffalo CQA October 2017 24 / 34

  53. Data complexity of CQA Primary keys Arbitrary keys Denial Universal σ, × , − PTIME PTIME PTIME PTIME: binary σ, × , − , ∪ PTIME PTIME PTIME PTIME co-NPC co-NPC σ, π σ, π, × co-NPC co-NPC co-NPC σ, π, × , − , ∪ co-NPC co-NPC co-NPC (Arenas, Bertossi, Ch. [ABC99]) (Ch., Marcinkowski [CM05a]) Jan Chomicki University at Buffalo CQA October 2017 24 / 34

  54. Data complexity of CQA Primary keys Arbitrary keys Denial Universal σ, × , − PTIME PTIME PTIME PTIME: binary σ, × , − , ∪ PTIME PTIME PTIME PTIME co-NPC co-NPC σ, π σ, π, × co-NPC co-NPC co-NPC PTIME: C forest σ, π, × , − , ∪ co-NPC co-NPC co-NPC (Arenas, Bertossi, Ch. [ABC99]) (Ch., Marcinkowski [CM05a]) (Fuxman, Miller [FM07]) Jan Chomicki University at Buffalo CQA October 2017 24 / 34

  55. Data complexity of CQA Primary keys Arbitrary keys Denial Universal σ, × , − PTIME PTIME PTIME PTIME: binary Π p 2 -complete Π p σ, × , − , ∪ PTIME PTIME PTIME 2 -complete Π p PTIME co-NPC co-NPC 2 -complete σ, π Π p σ, π, × co-NPC co-NPC co-NPC 2 -complete PTIME: C forest Π p σ, π, × , − , ∪ co-NPC co-NPC co-NPC 2 -complete (Arenas, Bertossi, Ch. [ABC99]) (Ch., Marcinkowski [CM05a]) (Fuxman, Miller [FM07]) (Staworko, Ph.D., 2007), (Staworko, Ch., 2008): quantifier-free queries co-NPC for full TGDs and denial constraints PTIME for acyclic full TGDs, join dependencies and denial constraints Last Jan Chomicki University at Buffalo CQA October 2017 24 / 34

  56. Dichotomy Complexity of self-join-free conjunctive queries (Koutris, Wijsen [KW17]) it can be decided whether or not CQA can be computed by a first-order query (and if so the corresponding SQL query is easily computable) computing CQA is either in PTIME or co-NP complete (and it can be decided which case applies) Jan Chomicki University at Buffalo CQA October 2017 25 / 34

  57. The Explosion of Semantics Tuple-based repairs asymmetric treatment of insertion and deletion: repairs by minimal deletions only (Ch., Marcinkowski [CM05a]): data possibly incorrect but complete repairs by minimal deletions and arbitrary insertions (Cal` ı, Lembo, Rosati [CLR03a]): data possibly incorrect and incomplete minimal cardinality changes (Lopatenko, Bertossi [LB07]), more... Jan Chomicki University at Buffalo CQA October 2017 26 / 34

  58. The Explosion of Semantics Tuple-based repairs asymmetric treatment of insertion and deletion: repairs by minimal deletions only (Ch., Marcinkowski [CM05a]): data possibly incorrect but complete repairs by minimal deletions and arbitrary insertions (Cal` ı, Lembo, Rosati [CLR03a]): data possibly incorrect and incomplete minimal cardinality changes (Lopatenko, Bertossi [LB07]), more... Attribute-based repairs repairs of minimum cost (Bohannon et al. [BFFR05]) checking existence of a repair of cost < K NP-complete. Clean Last Jan Chomicki University at Buffalo CQA October 2017 26 / 34

  59. The Need for Attribute-based Repairing Tuple-based repairing leads to information loss. Jan Chomicki University at Buffalo CQA October 2017 27 / 34

  60. The Need for Attribute-based Repairing Tuple-based repairing leads to information loss. EmpDept Name Dept Location John Sales Buffalo Mary Sales Toronto Name → Dept Dept → City Jan Chomicki University at Buffalo CQA October 2017 27 / 34

  61. The Need for Attribute-based Repairing Tuple-based repairing leads to information loss. Name Dept Location John Sales Buffalo EmpDept Name → Dept Name Dept Location Dept → City John Sales Buffalo Mary Sales Toronto Name Dept Location Name → Dept Mary Sales Toronto Dept → City Name → Dept Dept → City Jan Chomicki University at Buffalo CQA October 2017 27 / 34

  62. Attribute-based Repairs through Tuple-based Repairs (Wijsen [Wij06]) Repair the lossless join decomposition: π Name , Dept ( EmpDept ) ✶ π Dept , Location ( EmpDept ) Jan Chomicki University at Buffalo CQA October 2017 28 / 34

  63. Attribute-based Repairs through Tuple-based Repairs (Wijsen [Wij06]) Repair the lossless join decomposition: π Name , Dept ( EmpDept ) ✶ π Dept , Location ( EmpDept ) Name Dept Location John Sales Buffalo John Sales Toronto Mary Sales Buffalo Mary Sales Toronto Name → Dept Dept → City Jan Chomicki University at Buffalo CQA October 2017 28 / 34

  64. Attribute-based Repairs through Tuple-based Repairs (Wijsen [Wij06]) Repair the lossless join decomposition: π Name , Dept ( EmpDept ) ✶ π Dept , Location ( EmpDept ) Name Dept Location John Sales Buffalo Mary Sales Buffalo Name Dept Location Name → Dept John Sales Buffalo Dept → City John Sales Toronto Mary Sales Buffalo Name Dept Location Mary Sales Toronto John Sales Toronto Name → Dept Mary Sales Toronto Dept → City Name → Dept Dept → City Jan Chomicki University at Buffalo CQA October 2017 28 / 34

  65. Probabilistic framework for “dirty” databases (Andritsos, Fuxman, Miller [AFM06]) potential duplicates identified and grouped into clusters worlds ≈ repairs: one tuple from each cluster world probability: product of tuple probabilities clean answers: in the query result in some (supporting) world clean answer probability: sum of the probabilities of supporting worlds consistent answer: clean answer with probability 1 Jan Chomicki University at Buffalo CQA October 2017 29 / 34

  66. Probabilistic framework for “dirty” databases (Andritsos, Fuxman, Miller [AFM06]) potential duplicates identified and grouped into clusters worlds ≈ repairs: one tuple from each cluster world probability: product of tuple probabilities clean answers: in the query result in some (supporting) world clean answer probability: sum of the probabilities of supporting worlds consistent answer: clean answer with probability 1 Salaries with probabilities EmpProb Name Salary Prob Gates 20M 0.7 Gates 30M 0.3 Musk 10M 0.5 Musk 20M 0.5 Jan Chomicki University at Buffalo CQA October 2017 29 / 34 Name → Salary

  67. Computing Clean Answers SQL query SELECT Name FROM EmpProb e WHERE e.Salary > 15M Jan Chomicki University at Buffalo CQA October 2017 30 / 34

  68. Computing Clean Answers SQL query SQL rewritten query SELECT Name SELECT e.Name,SUM(e.Prob) FROM EmpProb e FROM EmpProb e WHERE e.Salary > 15M WHERE e.Salary > 15M GROUP BY e.Name Jan Chomicki University at Buffalo CQA October 2017 30 / 34

  69. Computing Clean Answers SQL query SQL rewritten query SELECT Name SELECT e.Name,SUM(e.Prob) FROM EmpProb e FROM EmpProb e WHERE e.Salary > 15M WHERE e.Salary > 15M GROUP BY e.Name EmpProb Name Salary Prob Gates 20M 0.7 Gates 30M 0.3 Musk 10M 0.5 Musk 20M 0.5 Name → Salary Jan Chomicki University at Buffalo CQA October 2017 30 / 34

Recommend


More recommend