Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman IST & L2F INESC-ID (Portugal) & NYU (USA) and New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ c˜ ao para a Ciˆ encia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Outline Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks Introduction 1 Motivation Approach Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks What is NER? Mary is studying in Rabat at Mohammed V University � NE Tagger � Mary PER is studying in Rabat LOC at Mohammed V University ORG Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks The Problem x o UE 25 x CEE O União Europeia x x Comunidade Europeia Name occurrences / 100K words 20 O O O x O O 15 O O O O o O o o x 10 o o x o o x o o o x 5 x O x x O O O x x x x O x x x x O x x x x x x x x o x x x x x o o o o o x 0 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Do texts vary over time in a way that affects NE recognition? Should NE taggers be also conceived time-aware? Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks Approach Corpus Analysis NER Performance Analysis Measure corpus similarity based on Assess performance by training Words and testing with different configurations (train,test) Compute name list overlaps Increase time gap between By type training and test data By token Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Introduction 1 Corpus Analysis 2 Corpus Similarity Algorithm (Kilgarriff, 2001) Name List Overlaps NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) Similarity(A,B): Split corpus A and B into k slices each Repeat m times: Randomly allocate k 2 slices to A i and k 2 to B i Construct word frequency lists for A i and B i Compute CBDF between A and B for the n most frequent words of the joint corpus ( A i + B i ) [CBDF = χ 2 by degrees of freedom] Output mean and standard deviation of CBDF of all experiments Repeat using corpus A only: Similarity(A,A) → Homogeneity(A) Repeat using corpus B only: Similarity(B,B) → Homogeneity(B) Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) 1 2 Corpus A + 1 Corpus A 2 Corpus B Corpus B D AA ′ 1 D BB ′ 1 D AB ′ 1 D AA ′ 2 D BB ′ 2 D AB ′ 2 . . . . . . . . . D AA ′ n D BB ′ n D AB ′ n ¯ ¯ D AA ′ ¯ D BB ′ D AB Homogeneity(A) Homogeneity(B) Similarity(A, B) Lower values of ¯ D ⇒ higher homogeneity/similarity Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps | T A ∩ T B | type overlap = (1) | T A | + | T B | − | T A ∩ T B | � N i =1 min ( f A ( i ) , f b ( i )) token overlap = (2) � N i =1 max ( f A ( i ) , f B ( i )) T A = list of different names (name types) of text A f A ( i ) = frequency of name i in text A Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps A name list: Mary (3), Rabat (5), Mohammed V University (4) B name list: John (1), Rabat (2), Mohammed V Universirty (6) Type Overlap |{ Rabat , MohammedVUniversity }| |{ Mary , Rabat , MohammedVUniversity , John }| = 2 / 4 Token Overlap min (3 , 0) + min (5 , 2) + min (4 , 6) + min (0 , 1) max (3 , 0) + max (5 , 2) + max (4 , 6) + max (0 , 1) = 6 / 15 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 NE Tagger Description (Collins & Singer, 1999) Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks NE Tagger Description (Collins & Singer, 1999) Raw TEXT Classification in detail: ❄ POS Tagging + Parsing Name Rules :- Name seeds ❄ ❄ ✛ ✻ Shallow Parsed TEXT Label with Name Rules ❄ ❄ ✲ TEXT with unclassified NE NE Identification Infer Contextual Rules ❄ ❄ List of Examples (NE,context) Label with Contextual Rules ❄ ❄ ✛ ✲ Name seeds NE Classification Infer Name Rules ❄ ❄ List of Labeled Examples (NE, context, label) Label with Name + Contextual Rules ❄ ❄ List of Labeled Examples (NE, Context, Label) Text Update + NE Propagation ❄ ✛ ❄ TEXT with classified NE Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Experimental Setting F-Measure over Time Politics Dissimilarity over Time Politics Name List Overlap over Time F-Measure compared to Dissimilarity Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting CETEMPublico (Santos & Rocha, 2001) is a Portuguese public journalistic corpus Size: 180 million words Culture 1e+07 Sports Economy Time span: 8 years Politics Society 8e+06 Number of words 6e+06 Organization: randomly shuffled extracts 4e+06 [1 extract ≅ 2 paragraphs] 2e+06 Classification: 10 topics and 16 time 0e+00 frames (year + semester) 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Mark up: paragraphs, sentences, enumeration lists and authors Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting Topic: politics Time unit: year Text unit: sentence Size: 10 slices x 60000 words per time frame N most frequent words: 2000 words Names compared: 82400 per time frame Seeds (S): different names in the first 2500 name instances [first 198 extracts per semester] Test (T): next 208 extracts per semester grouped by year Unlabeled examples (U): first 82456 names with context per year [following 7856 extracts] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks NER Performance: F-Measure over Time When the texts are from the same 0.85 year (time gap = 0), the 0.84 F-measure ranges approximately from 82% to 85% F−measure (%) 0.83 When the texts are 5 years apart 0.82 the F-measure ranges from about 0.81 79% to 82% 0.80 As the time gap between ( S k , U k ) and T j increases, the F-measure 0.79 0 1 2 3 4 5 6 7 shows a tendency to decay Time gap (year) Training-test configuration: ( S i , U i , T j ), i =91..98, j =91..98 [64 tests] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Recommend
More recommend