natural language processing csci 4152 6509 lecture 27
play

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado Keselj Time and date: 09:3510:25, 13-Mar-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 27 1 / 23 Previous Lecture


  1. Natural Language Processing CSCI 4152/6509 — Lecture 27 Parsing with Prolog Instructor: Vlado Keselj Time and date: 09:35–10:25, 13-Mar-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 27 1 / 23

  2. Previous Lecture Context-Free Grammars review continued: ◮ formal definition ◮ inducing a grammar from parse trees ◮ derivations ◮ some notions and terminology Bracket representation of a parse tree CYK Chart Parsing Algorithm Chomsky Normal Form (CNF) CYK algorithm (started) CYK Algorithm example CSCI 4152/6509, Vlado Keselj Lecture 27 2 / 23

  3. Explanation of Index Use in CYK j l i i+l−1 i+l i+j−1 i+j . . . . . . β [i,l,k1] β [i+l,j−l,k2] [i,j,k] β CSCI 4152/6509, Vlado Keselj Lecture 27 3 / 23

  4. CYK Algorithm Require: sentence = w 1 . . . w n , and a CFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: parsed sentence 1: allocate matrix β ∈ { 0 , 1 } n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← 1 4: 5: for j ← 2 to n do for i ← 1 to n − j + 1 do 6: for l ← 1 to j − 1 do 7: for all rules N k → N k 1 N k 2 do 8: 9: β [ i, j, k ] ← β [ i, j, k ] OR ( β [ i, l, k 1 ] AND β [ i + l, j − l, k 2 ]) 10: return β [1 , n, 1] CSCI 4152/6509, Vlado Keselj Lecture 27 4 / 23

  5. Parsing Natural Languages Must deal with possible ambiguities Decide whether to make a phrase structure or dependency parser When parsing NLP, there are generally two approaches: Backtracking to find all parse trees 1 Chart parsing 2 CSCI 4152/6509, Vlado Keselj Lecture 27 5 / 23

  6. Parsing with Prolog We will go over a brief Prolog review ◮ more details are provided in the lab Implicative normal form: p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ q 1 ∨ q 2 ∨ . . . ∨ q m If m ≤ 1 , then the clause is called a Horn clause. If resolution is applied to two Horn clauses, the result is again a Horn clause. Inference with Horn clauses is relatively efficient CSCI 4152/6509, Vlado Keselj Lecture 27 6 / 23

  7. Rules A Horn clause with m = 1 is called a rule : p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ q 1 It is expressed in Prolog as: q1 :- p1, p2, ..., p_n. CSCI 4152/6509, Vlado Keselj Lecture 27 7 / 23

  8. Facts A clause with m = 0 is called a fact : p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ ⊤ is expressed in Prolog as: p1, p2, ..., p_n. or :- p1, p2, ..., p_n. and it is called a fact. CSCI 4152/6509, Vlado Keselj Lecture 27 8 / 23

  9. Rabbit and Franklin Example The ‘rabbit and franklin’ example in Prolog: hare(rabbit). turtle(franklin). faster(X,Y) :- hare(X), turtle(Y). Save the program in a file, load the file. After loading the file, on Prolog prompt, type: faster(rabbit,franklin). Try: faster(X,franklin). and faster(X,Y). CSCI 4152/6509, Vlado Keselj Lecture 27 9 / 23

  10. Unification and Backtracking Two important features of Prolog: unification and backtracking What happens after we type: ?- faster(rabbit,franklin). Prolog will search for a ‘matching’ fact or head of a rule: faster(rabbit,franklin) and faster(X,Y) :- ... ‘Matching’ here means unification Unification is an operation of making two terms equal by substituting variables with some terms CSCI 4152/6509, Vlado Keselj Lecture 27 10 / 23

  11. Unification and Backtracking (2) After unifying faster(rabbit,franklin) and faster(X,Y) with substitution X ← rabbit and Y ← franklin , the rule becomes: faster(rabbit,franklin) :- hare(rabbit), turtle(franklin). Prolog interpreter will now try to satisfy predicates at the right hand side: hare(rabbit) and turtle(franklin) and it will easily succeed based on the same facts If it does not succeed, it can generally try other options through backtracking CSCI 4152/6509, Vlado Keselj Lecture 27 11 / 23

  12. Variables. Variable names start with an uppercase letter or an underscore (‘ ’). ‘ ’ is a special, anonymous variable; two occurrences of this variable can represent different values, with no connection between them. Examples: ?- faster(rabbit,franklin). Yes ; ... ?- faster(rabbit,X). X = franklin ; ... ?- hare(X). X = rabbit ; CSCI 4152/6509, Vlado Keselj Lecture 27 12 / 23

  13. Lists (Arrays), Structures. Lists are implemented as linked lists. Structures (records) are expressed as terms. Examples: In program: person(john,public,’123-456’). Interactively: ?- person(john,X,Y). [] is an empty list. A list is created as a nested term, usually a special function ‘ . ’ (dot): ?- is_list(.(a, .(b, .(c, [])))). CSCI 4152/6509, Vlado Keselj Lecture 27 13 / 23

  14. List Notation (.(a, .(b, .(c, []))) is the same as [a,b,c] This is also equivalent to: [ a | [ b | [ c | [] ]]] or [ a, b | [ c ] ] A frequent Prolog expression is: [H|T] where H is head of the list, and T is the tail, which is another list. CSCI 4152/6509, Vlado Keselj Lecture 27 14 / 23

  15. Example: Calculating Factorial factorial(0,1). factorial(N,F) :- N>0, M is N-1, factorial(M,FM), F is FM*N. After saving in factorial.prolog and loading to Prolog: ?- [’factorial.prolog’]. % factorial.prolog compiled 0.00 sec, 1,000 bytes Yes ?- factorial(6,X). X = 720 ; CSCI 4152/6509, Vlado Keselj Lecture 27 15 / 23

  16. Using Prolog to Parse NL Example: Let us consider a simple CFG to parse the following two sentences: “the dog runs” and “the dogs run” The grammar is: S -> NP VP N -> dog NP -> D N N -> dogs D -> the VP -> run VP -> runs CSCI 4152/6509, Vlado Keselj Lecture 27 16 / 23

  17. Control structures. Example (testing membership of a list): member(X, [X|_]). member(X, [_|L]) :- member(X,L). CSCI 4152/6509, Vlado Keselj Lecture 27 17 / 23

  18. Using Difference Lists The problem of parsing using this grammar can be expressed in the following way in Prolog: s(S,R) :- np(S,I), vp(I, R). np(S,R) :- d(S,I), n(I,R). d([the|R], R). n([dog|R], R). n([dogs|R], R). vp([run|R], R). vp([runs|R], R). CSCI 4152/6509, Vlado Keselj Lecture 27 18 / 23

  19. Parsing using Difference Lists Save this in file parse.prolog . On Prolog prompt we type: ?- [’parse.prolog’]. % parse.prolog compiled 0.00 sec, 1,888 bytes Yes ?- s([the,dog,runs],[]). Yes ?- s([runs,the,dog],[]). No CSCI 4152/6509, Vlado Keselj Lecture 27 19 / 23

  20. Basic Definite Clause Grammar (DCG) DCG — Prolog built-in mechanism for parsing Example s --> np, vp. np --> d, n. d --> [the]. n --> [dog]. n --> [dogs]. vp --> [run]. vp --> [runs]. CSCI 4152/6509, Vlado Keselj Lecture 27 20 / 23

  21. Building a Parse Tree A parse tree can be built in the following way: s(s(Tn,Tv)) --> np(Tn), vp(Tv). np(np(Td,Tn)) --> d(Td), n(Tn). d(d(the)) --> [the]. n(n(dog)) --> [dog]. n(n(dogs)) --> [dogs]. vp(vp(run)) --> [run]. vp(vp(runs)) --> [runs]. At Prolog prompt we type and obtain: ?- s(X, [the, dog, runs], []). X = s(np(d(the),n(dog)),vp(runs)); CSCI 4152/6509, Vlado Keselj Lecture 27 21 / 23

  22. Handling Agreement s(s(Tn,Tv)) --> np(Tn,A), vp(Tv,A). np(np(Td,Tn),A) --> d(Td), n(Tn,A). d(d(the)) --> [the]. n(n(dog),sg) --> [dog]. n(n(dogs),pl) --> [dogs]. vp(vp(run),pl) --> [run]. vp(vp(runs),sg) --> [runs]. This grammar will accept sentences “the dog runs” and “the dogs run” but not “the dog run” and “the dogs runs”. Other phenomena can be modeled in a similar fashion. CSCI 4152/6509, Vlado Keselj Lecture 27 22 / 23

  23. Embedded Code We can embed additional Prolog code using braces, e.g.: s(T) --> np(Tn), vp(Tv), {T = s(Tn,Tv)}. and so on, is another way of building the parse tree. CSCI 4152/6509, Vlado Keselj Lecture 27 23 / 23

Recommend


More recommend