Is this NE tagger getting old? Language Resources and Evaluation - PowerPoint PPT Presentation

Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman IST & L2F INESC-ID (Portugal) & NYU (USA) and New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ c˜ ao para a Ciˆ encia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Outline Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks Introduction 1 Motivation Approach Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks What is NER? Mary is studying in Rabat at Mohammed V University � NE Tagger � Mary PER is studying in Rabat LOC at Mohammed V University ORG Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks The Problem x o UE 25 x CEE O União Europeia x x Comunidade Europeia Name occurrences / 100K words 20 O O O x O O 15 O O O O o O o o x 10 o o x o o x o o o x 5 x O x x O O O x x x x O x x x x O x x x x x x x x o x x x x x o o o o o x 0 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Do texts vary over time in a way that affects NE recognition? Should NE taggers be also conceived time-aware? Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks Approach Corpus Analysis NER Performance Analysis Measure corpus similarity based on Assess performance by training Words and testing with different configurations (train,test) Compute name list overlaps Increase time gap between By type training and test data By token Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Introduction 1 Corpus Analysis 2 Corpus Similarity Algorithm (Kilgarriff, 2001) Name List Overlaps NER Performance Analysis 3 Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) Similarity(A,B): Split corpus A and B into k slices each Repeat m times: Randomly allocate k 2 slices to A i and k 2 to B i Construct word frequency lists for A i and B i Compute CBDF between A and B for the n most frequent words of the joint corpus ( A i + B i ) [CBDF = χ 2 by degrees of freedom] Output mean and standard deviation of CBDF of all experiments Repeat using corpus A only: Similarity(A,A) → Homogeneity(A) Repeat using corpus B only: Similarity(B,B) → Homogeneity(B) Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) 1 2 Corpus A + 1 Corpus A 2 Corpus B Corpus B D AA ′ 1 D BB ′ 1 D AB ′ 1 D AA ′ 2 D BB ′ 2 D AB ′ 2 . . . . . . . . . D AA ′ n D BB ′ n D AB ′ n ¯ ¯ D AA ′ ¯ D BB ′ D AB Homogeneity(A) Homogeneity(B) Similarity(A, B) Lower values of ¯ D ⇒ higher homogeneity/similarity Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps | T A ∩ T B | type overlap = (1) | T A | + | T B | − | T A ∩ T B | � N i =1 min ( f A ( i ) , f b ( i )) token overlap = (2) � N i =1 max ( f A ( i ) , f B ( i )) T A = list of different names (name types) of text A f A ( i ) = frequency of name i in text A Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps A name list: Mary (3), Rabat (5), Mohammed V University (4) B name list: John (1), Rabat (2), Mohammed V Universirty (6) Type Overlap |{ Rabat , MohammedVUniversity }| |{ Mary , Rabat , MohammedVUniversity , John }| = 2 / 4 Token Overlap min (3 , 0) + min (5 , 2) + min (4 , 6) + min (0 , 1) max (3 , 0) + max (5 , 2) + max (4 , 6) + max (0 , 1) = 6 / 15 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 NE Tagger Description (Collins & Singer, 1999) Experiments 4 Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks NE Tagger Description (Collins & Singer, 1999) Raw TEXT Classification in detail: ❄ POS Tagging + Parsing Name Rules :- Name seeds ❄ ❄ ✛ ✻ Shallow Parsed TEXT Label with Name Rules ❄ ❄ ✲ TEXT with unclassified NE NE Identification Infer Contextual Rules ❄ ❄ List of Examples (NE,context) Label with Contextual Rules ❄ ❄ ✛ ✲ Name seeds NE Classification Infer Name Rules ❄ ❄ List of Labeled Examples (NE, context, label) Label with Name + Contextual Rules ❄ ❄ List of Labeled Examples (NE, Context, Label) Text Update + NE Propagation ❄ ✛ ❄ TEXT with classified NE Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Introduction 1 Corpus Analysis 2 NER Performance Analysis 3 Experiments 4 Experimental Setting F-Measure over Time Politics Dissimilarity over Time Politics Name List Overlap over Time F-Measure compared to Dissimilarity Final Remarks 5 Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting CETEMPublico (Santos & Rocha, 2001) is a Portuguese public journalistic corpus Size: 180 million words Culture 1e+07 Sports Economy Time span: 8 years Politics Society 8e+06 Number of words 6e+06 Organization: randomly shuffled extracts 4e+06 [1 extract ≅ 2 paragraphs] 2e+06 Classification: 10 topics and 16 time 0e+00 frames (year + semester) 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Mark up: paragraphs, sentences, enumeration lists and authors Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting Topic: politics Time unit: year Text unit: sentence Size: 10 slices x 60000 words per time frame N most frequent words: 2000 words Names compared: 82400 per time frame Seeds (S): different names in the first 2500 name instances [first 198 extracts per semester] Test (T): next 208 extracts per semester grouped by year Unlabeled examples (U): first 82456 names with context per year [following 7856 extracts] Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks NER Performance: F-Measure over Time When the texts are from the same 0.85 year (time gap = 0), the 0.84 F-measure ranges approximately from 82% to 85% F−measure (%) 0.83 When the texts are 5 years apart 0.82 the F-measure ranges from about 0.81 79% to 82% 0.80 As the time gap between ( S k , U k ) and T j increases, the F-measure 0.79 0 1 2 3 4 5 6 7 shows a tendency to decay Time gap (year) Training-test configuration: ( S i , U i , T j ), i =91..98, j =91..98 [64 tests] Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Is this NE tagger getting old? Language Resources and Evaluation - PowerPoint PPT Presentation

Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman

net.tagger: Crowdsourcing Local physical network infrastructure Justin P. Rohrer Robert Beverly

Speeding up target-language driven part-of-speech tagger training for machine translation Felipe

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

A Low-budget Tagger for Old Czech Jirka Hana 1 Anna Feldman 2 Katsiaryna Aharodnik 2 1 Charles

ThorneConsulting.com W E L C O M E From Getting Noticed From Getting Noticed to Getting

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST) , Yibo Zhu,

A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between

The Meson Spectroscopy Program Using the Forward Tagger with CLAS12 at Jefferson Lab Stuart

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul

Resonance Searches with an Updated Top Tagger G. Kasieczka, T. Plehn, T.S., T. Strebler, G. P.

The CLAS12 Forward Tagger M.Battaglieri, R.DeVita, A.Bersani, A.Celentano, R.Cereseto, E.Fanchini,

Status report on Cosmic Ray Tagger for 3x1x1/6x6x6, and observation of upward going particles in

Protodune Cosmic Ray tagger (CRT) Camillo Mariani ProtoDUNE DAQ Review November 3 rd and 4 th

Evaluating ProtoDUNE Single Phase Detector Response with a Cosmic Ray Tagger (CRT) Richie Diurba

The Forward Tagger facility for low Q 2 experiments at Jefferson Laboratory A. Celentano

Proximity based one-class classification with Common N-Gram dissimilarity for authorship

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang)

Making A Many-Colored Processing Engine: Signal Processing with Optical Filters Christi K. Madsen

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility

Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Machine Learning Lecture Notes on Clustering (II) 2016-2017 Davide Eynard davide.eynard@usi.ch

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Is this NE tagger getting old? Language Resources and Evaluation - PowerPoint PPT Presentation

Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman

net.tagger: Crowdsourcing Local physical network infrastructure Justin P. Rohrer Robert Beverly

Speeding up target-language driven part-of-speech tagger training for machine translation Felipe

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

A Low-budget Tagger for Old Czech Jirka Hana 1 Anna Feldman 2 Katsiaryna Aharodnik 2 1 Charles

ThorneConsulting.com W E L C O M E From Getting Noticed From Getting Noticed to Getting

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST) , Yibo Zhu,

A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between

The Meson Spectroscopy Program Using the Forward Tagger with CLAS12 at Jefferson Lab Stuart

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul

Resonance Searches with an Updated Top Tagger G. Kasieczka, T. Plehn, T.S., T. Strebler, G. P.

The CLAS12 Forward Tagger M.Battaglieri, R.DeVita, A.Bersani, A.Celentano, R.Cereseto, E.Fanchini,

Status report on Cosmic Ray Tagger for 3x1x1/6x6x6, and observation of upward going particles in

Protodune Cosmic Ray tagger (CRT) Camillo Mariani ProtoDUNE DAQ Review November 3 rd and 4 th

Evaluating ProtoDUNE Single Phase Detector Response with a Cosmic Ray Tagger (CRT) Richie Diurba

The Forward Tagger facility for low Q 2 experiments at Jefferson Laboratory A. Celentano

Proximity based one-class classification with Common N-Gram dissimilarity for authorship

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang)

Making A Many-Colored Processing Engine: Signal Processing with Optical Filters Christi K. Madsen

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl &amp; Bernhard

How to Optimize Gower Distance Weights for the k-Medoids Clustering Algorithm to Obtain Mobility

Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Machine Learning Lecture Notes on Clustering (II) 2016-2017 Davide Eynard davide.eynard@usi.ch

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Fast and Accurate Distance Computation from Unaligned Genomes Fabian Kltzl & Bernhard