Parser Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef van Genabith The Metrics Evaluation Results National Centre for Language Technology School of Computing Dublin City University 29th May 2008
Parser What is this work about? Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold Standard Parser Evaluation 1. Creating a set of gold standard parse trees for The Parsers The Metrics 1,000 sentences from the BNC Evaluation Results 2. Using these trees as a test set to evaluate various parsers
Parser Outline Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold Standard BNC Gold Standard Parser Evaluation The Parsers The Metrics Evaluation Results Parser Evaluation The Parsers The Metrics Evaluation Results
Parser The British National Corpus Evaluation and the BNC Jennifer Foster and Josef van Genabith The BNC is a one hundred million word BNC Gold balanced corpus of British English (Burnard, Standard Parser 2000) Evaluation The Parsers The Metrics ◮ 90% of the BNC is written text Evaluation Results ◮ 75% factual ◮ 25% fiction ◮ The 10% spoken component consists of ◮ informal dialogue ◮ business meetings ◮ speeches
Parser The British National Corpus Evaluation and the BNC Jennifer Foster and Josef van Genabith The BNC is a one hundred million word BNC Gold balanced corpus of British English (Burnard, Standard Parser 2000) Evaluation The Parsers The Metrics ◮ 90% of the BNC is written text Evaluation Results ◮ 75% factual ◮ 25% fiction ◮ The 10% spoken component consists of ◮ informal dialogue ◮ business meetings ◮ speeches
Parser The British National Corpus Evaluation and the BNC Jennifer Foster and Josef van Genabith The BNC is a one hundred million word BNC Gold balanced corpus of British English (Burnard, Standard Parser 2000) Evaluation The Parsers The Metrics ◮ 90% of the BNC is written text Evaluation Results ◮ 75% factual ◮ 25% fiction ◮ The 10% spoken component consists of ◮ informal dialogue ◮ business meetings ◮ speeches
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Choosing the sentences Evaluation and the BNC Jennifer Foster and Josef van 1,000 sentences in test set Genabith BNC Gold Standard Parser ◮ Not chosen completely at random Evaluation The Parsers ◮ They are different from WSJ training data: The Metrics Evaluation ◮ Contain a verb in BNC but not in WSJ2-21 Results ◮ 25,874 verb lemmas in BNC but not in WSJ2-21 ◮ 14,787 occur only once in BNC (e.g. jitter, unfade, transpersonalize, kerplonk ) ◮ 537 occur greater than 100 times (e.g. murmur, frown, damn ) ◮ Likely to represent a difficult test for WSJ-trained parsers
Parser BNC Test Set: Some examples Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold Text Type # Example Standard Spoken 10 The seconder of formally seconded Parser Evaluation Poem 9 Groggily somersaulting to get air- The Parsers borne The Metrics Evaluation Results Caption 4 Community Personified Headline 2 Drunk priest is nicked driving to a fu- neral Average sentence length: 28 words
Parser BNC Test Set: Some examples Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold Text Type # Example Standard Spoken 10 The seconder of formally seconded Parser Evaluation Poem 9 Groggily somersaulting to get air- The Parsers borne The Metrics Evaluation Results Caption 4 Community Personified Headline 2 Drunk priest is nicked driving to a fu- neral Average sentence length: 28 words
Parser BNC Test Set: Annotation Process Evaluation and the BNC Jennifer Foster and Josef van Genabith BNC Gold ◮ One annotator Standard Parser ◮ Two passes through the data Evaluation The Parsers ◮ Approximately 100 hours The Metrics Evaluation Results ◮ As references, the annotator used 1. Penn Treebank bracketing guidelines (Bies et al 1995) 2. Penn Treebank itself ◮ Functional tags and traces not annotated
Parser BNC Test Set: Annotation Difficulties Evaluation and the BNC Jennifer Foster and Josef van What happens when the references clash? Genabith BNC Gold Standard ◮ The noun phrase almost certain death occurs in Parser Evaluation BNC gold standard sentence The Parsers The Metrics Evaluation ◮ According to the guidelines, it should be Results annotated as (NP (ADJP almost certain) death) ◮ A search for almost in the Penn Treebank yields the following example (NP almost unimaginable speed) ◮ In such cases, annotator chose the analysis set out in the guidelines
Parser BNC Test Set: Annotation Difficulties Evaluation and the BNC Jennifer Foster and Josef van What happens when the references clash? Genabith BNC Gold Standard ◮ The noun phrase almost certain death occurs in Parser Evaluation BNC gold standard sentence The Parsers The Metrics Evaluation ◮ According to the guidelines, it should be Results annotated as (NP (ADJP almost certain) death) ◮ A search for almost in the Penn Treebank yields the following example (NP almost unimaginable speed) ◮ In such cases, annotator chose the analysis set out in the guidelines
Parser BNC Test Set: Annotation Difficulties Evaluation and the BNC Jennifer Foster and Josef van What happens when the references clash? Genabith BNC Gold Standard ◮ The noun phrase almost certain death occurs in Parser Evaluation BNC gold standard sentence The Parsers The Metrics Evaluation ◮ According to the guidelines, it should be Results annotated as (NP (ADJP almost certain) death) ◮ A search for almost in the Penn Treebank yields the following example (NP almost unimaginable speed) ◮ In such cases, annotator chose the analysis set out in the guidelines
Parser BNC Test Set: Annotation Difficulties Evaluation and the BNC Jennifer Foster and Josef van What happens when the references clash? Genabith BNC Gold Standard ◮ The noun phrase almost certain death occurs in Parser Evaluation BNC gold standard sentence The Parsers The Metrics Evaluation ◮ According to the guidelines, it should be Results annotated as (NP (ADJP almost certain) death) ◮ A search for almost in the Penn Treebank yields the following example (NP almost unimaginable speed) ◮ In such cases, annotator chose the analysis set out in the guidelines
Recommend
More recommend