NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza - PowerPoint PPT Presentation

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza Bosagh Zadeh October 29, 2009

OVERVIEW  Task Description  How to solve outside a NELL system  Simple approach evaluated  How to tackle in a NELL system: initial experiments

WHAT IS “NAMED ENTITY RECOGNITION”? Extract named-entities from text, label as “Person”, “Organization”, October 14, 2002, 4:00 a.m. PT etc For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of Microsoft Corporation open-source software with Orwellian fervor, denouncing its communal licensing as a CEO "cancer" that stifled technological innovation. Bill Gates Today, Microsoft claims to "love" the open- Microsoft source concept, by which software code is made Gates public to encourage improvement and development by outside programmers. Gates Microsoft himself says Microsoft will gladly disclose its Bill Veghte crown jewels--the coveted code behind the Windows operating system--to select customers. Microsoft VP "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft Richard Stallman VP. "That's a super-important shift for us in founder terms of code access.“ Free Software Foundation Richard Stallman, founder of the Free Software Foundation, countered saying… Slide: William Cohen, Information Extraction

WHAT PATTERNS?  Yarow-sky  Min-ski  Bosagh-Zadeh  Milose-vitch Current RTW system helps us find popular names using context frames. Should be able to find patterns in popular names and use them to discover rarely used names.

MODELS FOR NER Classify Pre-segmented Sliding Window Lexicons (Gazetteers) Candidates Abraham Lincoln was born in Kentucky. Abraham Lincoln was born in Kentucky. Abraham Lincoln was born in Kentucky. member? Classifier Classifier Alabama which class? which class? Alaska … Try alternate Wisconsin window sizes: Wyoming Token Tagging This is often treated as Boundary Models a structured prediction Abraham Lincoln was born in Kentucky. problem…classifying Abraham Lincoln was born in Kentucky. tokens sequentially BEGIN Most likely state sequence? Classifier which class? HMMs, CRFs, …. BEGIN END BEGIN END Slide: William Cohen, Information Extraction

PAPER: MIKHEEV ET. AL.  How well can we perform with only a lexicon (list/ gazeteer)?  With lists:

NER FOR NELL  Don’t have easy access to supervised data: doesn’t fit the never-ending-learner model  Context isn’t important anymore!  Want to use Morphological patterns abundant in human names and surnames  Need to be fast each iteration  Initial experiment: focus on suffixes

COMMON SUFFIXES - TRIGRAMS  Most common trigram endings of NPs in the list of person names currently obtained from RTW:  Not very useful: would have us believe “Rowing” is a person name.

COMMON SUFFIXES - NGRAMS  Most common fourgram endings of NPs in the list of person names currently obtained from RTW:  Not very useful: would have us believe “Protein” is a person name.  Same problem for ngrams of length 3 to 6

PROBLEM: HOW TO FIND DISCRIMINATIVE NGRAMS?  Not only identify the most common suffixes in the list of names, but those name suffixes which also appear rarely in all NPs.  Two competing requirements  Borrow ideas from TF-IDF and define score for ngram i: a i : frequency of ngram i in names list b i : frequency of ngram i in entire NP list

MUCH NICER  Take all ngrams and sort by score function  Use top 100-scoring ngrams  Length freely varying from 3 to 5  Picks up…

MUCH NICER New names, not picked up before List not filtered or altered in any way: all seem to be names Some very familiar-but-rare suffixes, such as -vitch

NEXT STEPS  Use prefixes as well as suffixes: McDowell McCartney O'Connor O'Dowel  Try other categories Aghani -stan Paki -stan Fin -land Can potentially work for locations: Green -land Eng -land

NEXT STEPS  Put this into main pipeline for RTW  Insert new names during bootstrapping process  Should be interesting to see the interaction between morphologically identified names and names found using contexts  Use confidence scores

Thanks!

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza - PowerPoint PPT Presentation

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza Bosagh Zadeh October 29, 2009 OVERVIEW Task Description How to solve outside a NELL system Simple approach evaluated How to tackle in a NELL system: initial

Commonsense Reasoning: Knowledge Acquisition Never-Ending Language Learner (NELL) Contents

Nell Bank (15 th - 17 th ) June 2020 Parent Information Evening 7 th January 2020 Agenda

Nell Bank (4 th - 6 th ) March 2019 Parent Information Evening 21 st January 2019 Agenda

Engaging Youth in g g g Prevention Activities Abby Sims Welcome to the web conference. We will

Current Research in NLP Mausam Plan (first 25%) Classical papers/problems in IE NELL,

Cornell nell Cen entr tre e Precinct ecinct Plan an DSC Meet eting ng Februar ary 5,

Empowering DAP Strategies for Literacy in Early Childhood Programs Nell K. Duke University of

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

QAPI and Antimicrobial Stewardship Paul Mulhausen, MD Nell Griffin, Sr. QIF Telligen QIN-QIO

Englands Great Walking Trails Nell Barrington @NellB101 Funding by Discover England Fund,

Workshops for Families of Preschoolers Kathryn L. Roberts, Wayne State University Nell K. Duke,

Pulse Power Conversion for Medium Voltage Systems Alyona Crews, Andrew Davidson, Nell Kane &

Presentation to: County of Middlesex December 16, 2014 Presented by: Carla Y. Nell, Principal

the two remaining bolstering. Judge Green's dissent- a Fr3,e analysis, not a pure opinion ing

One size does not fit all: A field experiment on the drivers of tax compliance and delivery

To Towards visualiz visualization ions dri driven en na navi vigation tion of of the the

CA CAPI PITOL VI VIEW: UPD PDATE ON N CH CHANGE NGES TO OHIO LAW W AND ND RULE C LE

Update Cruz-Guzman et. al. v. State of Minnesota et. al. A16-1265 Presented and prepared by

Sex-biased movement of Atlantic halibut on Scotian Shelf and southern Grand Banks but first,

Language Understanding Systems IBM Watson Can we create a computer system to compete against the

SCIENCES CENTER SELECTED ACCOMPLISHMENTS 2013-2014 Sandra B. Dunbar, RN, DSN, Charles Howard

Common Core/PARCC Public School Administrators and Supervisors Associations (PSASA) Fall

Alzheimers or a Similar Illness Ken Hepburn, PhD Emory Roybal Center for Dementia Family

Translating Evidence Into Practice Susan E. Shapiro, PhD, RN Associate Chief Nursing Officer,

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza - PowerPoint PPT Presentation

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza Bosagh Zadeh October 29, 2009 OVERVIEW Task Description How to solve outside a NELL system Simple approach evaluated How to tackle in a NELL system: initial

Commonsense Reasoning: Knowledge Acquisition Never-Ending Language Learner (NELL) Contents

Nell Bank (15 th - 17 th ) June 2020 Parent Information Evening 7 th January 2020 Agenda

Nell Bank (4 th - 6 th ) March 2019 Parent Information Evening 21 st January 2019 Agenda

Engaging Youth in g g g Prevention Activities Abby Sims Welcome to the web conference. We will

Current Research in NLP Mausam Plan (first 25%) Classical papers/problems in IE NELL,

Cornell nell Cen entr tre e Precinct ecinct Plan an DSC Meet eting ng Februar ary 5,

Empowering DAP Strategies for Literacy in Early Childhood Programs Nell K. Duke University of

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

QAPI and Antimicrobial Stewardship Paul Mulhausen, MD Nell Griffin, Sr. QIF Telligen QIN-QIO

Englands Great Walking Trails Nell Barrington @NellB101 Funding by Discover England Fund,

Workshops for Families of Preschoolers Kathryn L. Roberts, Wayne State University Nell K. Duke,

Pulse Power Conversion for Medium Voltage Systems Alyona Crews, Andrew Davidson, Nell Kane &amp;

Presentation to: County of Middlesex December 16, 2014 Presented by: Carla Y. Nell, Principal

the two remaining bolstering. Judge Green's dissent- a Fr3,e analysis, not a pure opinion ing

One size does not fit all: A field experiment on the drivers of tax compliance and delivery

To Towards visualiz visualization ions dri driven en na navi vigation tion of of the the

CA CAPI PITOL VI VIEW: UPD PDATE ON N CH CHANGE NGES TO OHIO LAW W AND ND RULE C LE

Update Cruz-Guzman et. al. v. State of Minnesota et. al. A16-1265 Presented and prepared by

Sex-biased movement of Atlantic halibut on Scotian Shelf and southern Grand Banks but first,

Language Understanding Systems IBM Watson Can we create a computer system to compete against the

SCIENCES CENTER SELECTED ACCOMPLISHMENTS 2013-2014 Sandra B. Dunbar, RN, DSN, Charles Howard

Common Core/PARCC Public School Administrators and Supervisors Associations (PSASA) Fall

Alzheimers or a Similar Illness Ken Hepburn, PhD Emory Roybal Center for Dementia Family

Translating Evidence Into Practice Susan E. Shapiro, PhD, RN Associate Chief Nursing Officer,

Pulse Power Conversion for Medium Voltage Systems Alyona Crews, Andrew Davidson, Nell Kane &