overview
play

Overview Last Time Grammatical Structure Context-Free Grammar - PowerPoint PPT Presentation

University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Parsing and Parser Evaluation Stephan Oepen & Murhaf Fares Language Technology Group (LTG) November 10, 2016


  1. University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Parsing and Parser Evaluation Stephan Oepen & Murhaf Fares Language Technology Group (LTG) November 10, 2016

  2. Overview Last Time ◮ Grammatical Structure ◮ Context-Free Grammar ◮ Treebanks ◮ Probabilistic CFGs

  3. Overview Last Time ◮ Grammatical Structure ◮ Context-Free Grammar ◮ Treebanks ◮ Probabilistic CFGs Today ◮ Parser Evaluation ◮ Syntactic Parsing ◮ Na¨ ıve: Recursive-Descent ◮ Dynamic Programming: CKY ◮ Generalized Chart Parsing

  4. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S �

  5. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S � ◮ C is the set of categories (aka non-terminals ), ◮ { S , NP , VP , V }

  6. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S � ◮ C is the set of categories (aka non-terminals ), ◮ { S , NP , VP , V } ◮ Σ is the vocabulary (aka terminals ), ◮ { Kim , snow , adores , in }

  7. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S � ◮ C is the set of categories (aka non-terminals ), ◮ { S , NP , VP , V } ◮ Σ is the vocabulary (aka terminals ), ◮ { Kim , snow , adores , in } ◮ P is a set of category rewrite rules (aka productions ) S → NP VP NP → Kim VP → V NP NP → snow V → adores

  8. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S � ◮ C is the set of categories (aka non-terminals ), ◮ { S , NP , VP , V } ◮ Σ is the vocabulary (aka terminals ), ◮ { Kim , snow , adores , in } ◮ P is a set of category rewrite rules (aka productions ) S → NP VP NP → Kim VP → V NP NP → snow V → adores ◮ S ∈ C is the start symbol , a filter on complete results;

  9. Recall: CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = � C , Σ , P , S � ◮ C is the set of categories (aka non-terminals ), ◮ { S , NP , VP , V } ◮ Σ is the vocabulary (aka terminals ), ◮ { Kim , snow , adores , in } ◮ P is a set of category rewrite rules (aka productions ) S → NP VP NP → Kim VP → V NP NP → snow V → adores ◮ S ∈ C is the start symbol , a filter on complete results; ◮ for each rule α → β 1 , β 2 , ..., β n ∈ P : α ∈ C and β i ∈ C ∪ Σ

  10. ParsEval ◮ The ParsEval metric (Black, et al., 1991) measures constituent overlap. ◮ The original formulation only considered the shape of the (unlabeled) bracketing. ◮ The modern ‘standard’ uses a tool called evalb , which reports precision, recall and F 1 score for labeled brackets, as well as the number of crossing brackets.

  11. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) )

  12. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np

  13. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 0,1 dt

  14. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 0,1 dt 1,3 advp

  15. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 0,1 dt 1,3 advp

  16. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 0,1 dt 2,3 jj 1,3 advp

  17. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 0,1 dt 2,3 jj 1,3 advp 3,6 nom

  18. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 1,3 advp 3,6 nom

  19. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 1,3 advp 3,6 nom

  20. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 1,3 advp 3,6 nom 5,6 nn

  21. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn

  22. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn Correct: 7

  23. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn Recall: Correct = 7 Precision: Correct System = 7 9 9 Gold

  24. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn Recall: Correct = 7 Precision: Correct System = 7 F 1 score: 7 9 9 9 Gold

  25. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn Recall: Correct = 2 Precision: Correct System = 2 F 1 score: 2 3 3 3 Gold

  26. ParsEval Gold Standard System Output (NP (DT a ) (NP (DT a ) (ADVP (RB pretty ) (JJ pretty ) (JJ big )) (NOM (JJ big ) (NOM (NN dog ) (NOM (NN dog ) (POS ’s ) (POS ’s ) (NN house )))) (NN house )) ) 0,6 np 2,6 nom 3,4 nn 0,6 np 1,2 rb 3,4 nn 0,1 dt 2,3 jj 4,5 pos 0,1 dt 2,3 jj 4,5 pos 1,2 jj 3,6 nom 5,6 nn 1,3 advp 3,6 nom 5,6 nn Recall: Correct = 2 Precision: Correct System = 2 F 1 score: 2 3 3 3 Gold Crossing Brackets: 1

  27. Parsing with CFGs: Moving to a Procedural View S ✬ ✩ S → NP VP NP VP VP → V | V NP | VP PP NP → NP PP Kim VP PP PP → P NP NP → Kim | snow | Oslo V NP P NP V → adores adores snow in Oslo P → in ✫ ✪ S All Complete Derivations NP VP • are rooted in the start symbol S ; • label internal nodes with cate- Kim V NP gories ∈ C , leafs with words ∈ Σ ; NP PP adores • instantiate a grammar rule ∈ P at P NP snow each local subtree of depth one. in oslo inf4820 —  -nov-  ( oe@ifi.uio.no ) Chart Parsing for Context-Free Grammars (3)

  28. Parsing with CFGs: Moving to a Procedural View S ✬ ✩ S → NP VP NP VP VP → V | V NP | VP PP NP → NP PP Kim VP PP PP → P NP NP → Kim | snow | Oslo V NP P NP V → adores adores snow in Oslo P → in ✫ ✪ S All Complete Derivations NP VP • are rooted in the start symbol S ; • label internal nodes with cate- Kim V NP gories ∈ C , leafs with words ∈ Σ ; NP PP adores • instantiate a grammar rule ∈ P at P NP snow each local subtree of depth one. in oslo inf4820 —  -nov-  ( oe@ifi.uio.no ) Chart Parsing for Context-Free Grammars (3)

  29. Recursive Descend: A Na¨ ıve Parsing Algorithm Control Structure • top-down: given a parsing goal α , use all grammar rules that rewrite α ; • successively instantiate (extend) the right-hand sides of each rule; • for each β i in the RHS of each rule, recursively attempt to parse β i ; • termination: when α is a prefix of the input string, recursion succeeds. (Intermediate) Results • Each result records a (partial) tree and remaining input to be parsed; • complete results consume the full input string and are rooted in S ; • whenever a RHS is fully instantiated, a new tree is built and returned; • all results at each level are combined and successively accumulated. inf4820 —  -nov-  ( oe@ifi.uio.no ) Chart Parsing for Context-Free Grammars (4)

Recommend


More recommend