logic as a query language
play

Logic As a Query Language If-then logical rules have been used in - PDF document

Logic As a Query Language If-then logical rules have been used in many systems. Datalog Most important today: EII (Enterprise Information Integration). Nonrecursive rules are equivalent to the Logical Rules core relational algebra.


  1. Logic As a Query Language � If-then logical rules have been used in many systems. Datalog � Most important today: EII (Enterprise Information Integration). � Nonrecursive rules are equivalent to the Logical Rules core relational algebra. Recursion � Recursive rules extend relational SQL-99 Recursion algebra --- have been used to add recursion to SQL-99. 1 2 A Logical Rule Anatomy of a Rule � Our first example of a rule uses the Happy(d) < - Frequents(d,rest) AND relations: Likes(d,soda) AND Sells(rest,soda,p) � Frequents(customer,rest), � Likes(customer,soda), and � Sells(rest,soda,price). � The rule is a query asking for “happy” customers --- those that frequent a rest that serves a soda that they like. 3 4 Anatomy of a Rule sub-goals Are Atoms � An atom is a predicate , or relation Happy(d) < - Frequents(d,rest) AND name with variables or constants as Likes(d,soda) AND Sells(rest,soda,p) arguments. � The head is an atom; the body is the Head = “consequent,” Body = “antecedent” = a single sub-goal AND of sub-goals . AND of one or more atoms. � Convention: Predicates begin with a Read this symbol “if” capital, variables begin with lower-case. 5 6 1

  2. Example: Atom Example: Atom Sells(rest, soda, p) Sells(rest, soda, p) The predicate Arguments are = name of a variables relation 7 8 Interpreting Rules Interpreting Rules � A variable appearing in the head is � Rule meaning: called distinguished ; � The head is true of the distinguished � otherwise it is nondistinguished . variables � if there exist values of the nondistinguished variables � that make all sub-goals of the body true. 9 10 Example: Interpretation Example: Interpretation Happy(d) < - Frequents(d,rest) AND Happy(d) < - Frequents(d,rest) AND Likes(d,soda) AND Sells(rest,soda,p) Likes(d,soda) AND Sells(rest,soda,p) Distinguished Nondistinguished variable variables Interpretation: customer d is happy if there exist Interpretation: customer d is happy if there exist a rest, a soda, and a price p such that d frequents the a rest, a soda, and a price p such that d frequents the rest, likes the soda, and the rest sells the soda at price p. rest, likes the soda, and the rest sells the soda at price p. 11 12 2

  3. Arithmetic sub-goals Example: Arithmetic � In addition to relations as predicates, a � A soda is “cheap” if there are at least predicate for a sub-goal of the body can two rests that sell it for under $1. be an arithmetic comparison. � Figure out a rule that would determine � We write such sub-goals in the usual way, whether a soda is cheap or not. e.g.: x < y . 13 14 Example: Arithmetic Negated sub-goals � We may put “NOT” in front of a sub- Cheap(soda) < - goal, to negate its meaning. Sells(rest1,soda,p1) AND Sells(rest2,soda,p2) AND p1 < 1.00 AND p2 < 1.00 AND rest1 < > rest2 15 16 Negated sub-goals Algorithms for Applying Rules � Example: Think of Arc(a,b) as arcs in a � Two approaches: graph. 1. Variable-based : Consider all possible � S(x,y) says the graph is not transitive from assignments to the variables of the body. If the assignment makes the body true, x to y ; i.e., there is a path of length 2 add that tuple for the head to the result. from x to y , but no arc from x to y . 2. Tuple-based : Consider all assignments of S(x,y) < - Arc(x,z) AND Arc(z,y) tuples from the non-negated, relational AND NOT Arc(x,y) sub-goals. If the body becomes true, add the head’s tuple to the result. 17 18 3

  4. Example: Variable-Based --- 1 Example: Variable-Based; x= 1, z= 2 S(x,y) < - Arc(x,z) AND Arc(z,y) S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) 1 1 2 2 1 AND NOT Arc(x,y) � Arc(1,2) and Arc(2,3) are the only tuples in the Arc relation. � Only assignments to make the first sub-goal Arc(x,z) true are: 1. x = 1; z = 2 2. x = 2; z = 3 19 20 Example: Variable-Based; x= 1, z= 2 Example: Variable-Based; x= 2, z= 3 S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) 1 1 2 2 1 3 3 3 2 2 3 3 2 3 is the only value of y that makes all three sub-goals true. Makes S(1,3) a tuple of the answer 21 22 Example: Variable-Based; x= 2, z= 3 Tuple-Based Assignment � Start with the non-negated, relational sub- S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) goals only. 2 2 3 3 2 � Consider all assignments of tuples to these sub-goals. No value of y makes Arc(3,y) � Choose tuples only from the corresponding Thus, no contribution true. to the head tuples; relations. S = { (1,3)} � If the assigned tuples give a consistent value to all variables and make the other sub-goals true, add the head tuple to the result. 23 24 4

  5. Example: Tuple-Based Example: Tuple-Based S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) S(x,y) < - Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y) Only possible values Arc(1,2), Arc(2,3) Only possible values Arc(1,2), Arc(2,3) � Four possible assignments to first two sub- � Four possible assignments to first two sub- goals: goals: Only assignment Arc(x,z) Arc(z,y) Arc(x,z) Arc(z,y) with consistent (1,2) (1,2) (1,2) (1,2) z-value. Since it These two rows (1,2) (2,3) (1,2) (2,3) also makes are invalid since (2,3) (1,2) (2,3) (1,2) NOT Arc(x,y) true, z can’t be (2,3) (2,3) (2,3) (2,3) add S(1,3) to (3 and 1) or result. (3 and 2) simultaneously. 25 26 Datalog Programs Evaluating Datalog Programs � A Datalog program is a collection of � As long as there is no recursion, rules. � we can pick an order to evaluate the IDB � In a program, predicates can be either predicates, 1. EDB = Extensional Database � so that all the predicates in the body of its � = stored table. rules have already been evaluated. 2. IDB = Intensional Database � If an IDB predicate has more than one � = relation defined by rules. rule, � Never both! No EDB in heads. � each rule contributes tuples to its relation. 27 28 Example: Datalog Program Expressive Power of Datalog � Using following EDB find all the � Without recursion, manufacturers of sodas Joe doesn’t sell: � Datalog can express all and only the � Sells(rest, soda, price) and queries of core relational algebra. � The same as SQL select-from-where, � sodas(name, manf). without aggregation and grouping. JoeSells(b) < - Sells(’Joe’’s rest’, b, p) Answer(m) < - Sodas(b,m) AND NOT JoeSells(b) 29 30 5

  6. Recursive Example: Expressive Power of Datalog Generalized Cousins � But with recurson, � EDB: Parent(c,p) = p is a parent of c . � Datalog can express more than these languages. � Generalized cousins: people with common � Yet still not Turing-complete. ancestors one or more generations back. � Note: We are all cousins according to this definition. 31 32 Recursive Example Definition of Recursion Sibling(x,y) < - Parent(x,p) � Form a dependency graph whose AND Parent(y,p) nodes = IDB predicates. AND x< > y � Arc X -> Y if and only if � there is a rule with X in the head and Y in Cousin(x,y) < - Sibling(x,y) the body. Cousin(x,y) < - Parent(x,xParent) � Cycle = recursion; AND Parent(y,yParent) � No cycle = no recursion. AND Cousin(xParent,yParent) 33 34 Example: Dependency Graphs Evaluating Recursive Rules � The following works when there is no negation: Cousin Answer 1. Start by assuming all IDB relations are empty. Sibling JoeSells 2. Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new Recursive Non-recursive IDB. 3. End when no change to IDB. 35 36 6

  7. The “Naïve” Evaluation Algorithm Example: Evaluation of Cousin � Remember the rules: Sibling(x,y) < - Start: IDB = 0 Parent(x,p) AND Parent(y,p) AND x< > y Cousin(x,y) < - Sibling(x,y) Apply rules to IDB, EDB Cousin(x,y) < - Parent(x,xParent) AND Parent(y,yParent) no yes AND Cousin(xParent,yParent) Change done to IDB? 37 38 Semi-naive Evaluation Example: Evaluation of Cousin � Since the EDB never changes, � We’ll proceed in rounds to infer � on each round we only get new IDB tuples � Sibling facts (red) if we use at least one IDB tuple that was � and Cousin facts (green). obtained on the previous round. � Saves work; lets us avoid rediscovering most known facts. � A fact could still be derived in a second way. 39 40 Parent Data: Parent Above Child Parent Data: Parent Above Child The parent data, and Exercise: edge goes downward from a parent to child. 1. What do you a d a d expect after first Exercises: 1. List some of the round? parent-child relationships. b c e b c e 2. What is contained in the Sibling and Cousin data? f g h f g h j k i j k i 41 42 7

Recommend


More recommend