Mohamed Thahir Traditional and Open Relation Extraction Read the - PowerPoint PPT Presentation

Mohamed Thahir

 Traditional and Open Relation Extraction  Read the Web Relation Extraction  Experimental Results  Coupled learning of Predicates  Challenges and ongoing work

 A relation is instantiated with a set of manually provided positive and negative examples  city “capital of” Country Positive Seeds: {“washington d.c , USA”;”New Delhi , India”..} Negative Seeds: {“USA , Canada”;”London , India”….}

 Proposed by Banko et.al 2007  A classifier is built which given the entities and their context, identifies if there a valid relation  Performs “Unlexicalized” extraction  E1 Context E2 Some Features: ◦ Part of Speech (POS) tags in ‘Context’ ◦ Number of tokens and stop words in ‘Context’ ◦ POS tag to left of E1 and to right of E2

 Banko et.al 2008 – “TradeOff between Open and Traditional RE”  Comparison between Traditional (R1-CRF) and Open RE (O-CRF) Averaged results for 4 common relations O-CRF O-CRF O-CRF O-CRF R1-CRF R1-CRF R1-CRF R1-CRF Train Ex Train Ex (P) (P) (R) (R) (P) (P) (R) (R) 75.0 75.0 18.4 73.9 73.9 58.4 5930

Pros:  Open RE can scale to the size of the web (hundreds of thousands of relation predicates)  Does not require human input unlike traditional RE  Pretty reasonable level of precision

Cons: o Open RE has much lower recall o 30% of extracted tuples are not well-formed (does not imply a relation) o (demands, securing of, border) o (29, dropped , instruments) o 87% of well-formed tuples are abstract/ underspecified o (Einstein, derived, theory) – abstract tuple o (Washington dc, capital of, USA) – concrete tuple

Combine beneficial aspects of Traditional and Open Relation Extraction with RTW  Find new Relation Predicates automatically  Also extract positive seed examples and negative seed examples automatically  Leverage the constrained & coupled learning offered by RTW  Improve learning of the existing category and relation predicates as well

Actor Actor Movie Movie Titanic De Caprio Pirates of Carr.. Johnny Depp Arnold Terminator ….. ….. ….. ….. ….. ….. ….. ….. Actor “stars in” Movie Actor “starring in” Movie Movie “movie” Actor Actor “praised“ Movie Actor “sang in” Movie

 Patterns which are rare are removed  Patterns which have either a very small Domain or very small Range are removed ◦ Removes many irrelevant patterns ( caused due to ambiguity) NP “was engulfed in” flames Vehicle Vehicle Sportsteam Sportsteam ◦ Removes very specific patterns

starring starring stars in stars in movie movie sang in sang in praised praised DeCaprio:Titanic 10 22 15 0 2 Depp:Pirates of.. 22 10 19 0 0 Arnold:Terminat. 12 15 20 0 1 Arnold:Titanic 0 0 0 0 6 X:Y 0 0 0 7 3 XX:YY 3 5 2 0 0

starring starring stars in stars in movie movie sang in sang in praised praised DeCaprio:Titanic 10 22 15 0 2 Depp:Pirates of.. 22 10 19 0 0 Arnold:Terminat. 12 15 20 0 1 Arnold:Titanic 0 0 0 0 6 X:Y 0 0 0 7 3 XX:YY 3 5 2 0 0 o TF/IDF Normalization o K-means clustering

 Each cluster with sufficient instances is taken as a new relation predicate ( NR )  Instances near the centroid of the cluster are taken as seed instances  Relations whose domain and range are mutually exclusive to the domain and range of NR are considered as mutually exclusive for NR  NR is introduced to RTW system as a new predicate

 Movie category predicate classifier Titanic Terminator Promoted Not Promoted Co-occurrence with positive patterns Co-occurrence with negative patterns

 Actor-Movie relation predicate classifier Arnold : Terminator Terminator Promoted Promoted  New Relation helps learning new Category instances

 Improved learning for existing category predicates  Validation without running the RTW  Actor : Movie Actor : Movie predicate and its high confidence relation pattern set R R  Obtained all instances of “NP1 Context NP2” Where, ◦ Context is in R R ◦ Either NP1 or NP2 is a promoted Actor instance ◦ List the other NP that is not the Actor

 200+ new movie instances  Constrained by the number of promoted Actor instances (~800 in CBL)  Future iterations should cause further increase in Actor and Movie instances.  > 80% precision ◦ Negatives: comedy film  RTW system category predicate classifiers would ideally not promote these negatives

 Actor-Movie relation predicate classifier Jim Carry: Comedy Comedy Film Film Not Promoted Not Promoted  Promoted only when category classifier is reasonably confident about the instance

Repeated same experiment for Food-Food Food-Food relation predicates Two relations were extracted Relation Relation Patterns Patterns Instances Instances Precision Precision Contains “contain”, “is rich in”, >700 ~60% “are rich in” typeOf “Such as”, “and other”, >3000 ~70% “including” Negatives: apple “contains” few calories

 Learning of Horn Clause rules  foodTreatsDisease(food,disease) – existing predicate  isTypeOf(food1,food2) – learnt predicate  isTypeOf(food1,food2) & foodTreatsDisease(food2,disease) foodTreatsDisease(food1,disease)  Relation instances could be learnt even without direct contextual patterns connecting them (not possible in Open RE)

 We saw that new relation predicates leads to learning more category & relation instances  Learning more category & relation instances would also lead to learning new predicates Actor Actor Award Award Tom Hanks Oscar Arnold Golden Globe Depp …... …... …...

Actor Actor Tom Hanks Arnold Award Award Depp Oscar …... Golden Globe …… …... …… …... …… …… …… …… ……

 Many invalid relations are retrieved  Un-lexicalized approaches to tackle them  Banko & Etzioni 2008, suggest that 95% of relation patterns are classified into 8 categories Rel. Frequency Rel. Frequency Category Category Pattern Pattern 37.8 E1 Verb E2 X established Y 22.8 E1 Noun+Prep E2 X settlement with Y 16.0 E1 Verb+Prep E2 X moved to Y 9.4 E1 Infinitive E2 X plans to acquire Y 5.2 E1 Modifier E2 X is Y winner

 Build a model which would estimate the validity of an extracted relation predicate  Possible Features ◦ Un-lexicalized features ◦ One-One relations are mostly valid ◦ Relations with Hearst’s patterns (isA /part of relation – “such as”) have high chance of being valid. (Hearst 1992)

Invalid Relations and causes  Error in the promoted instances ◦ CBL promotes Months of the year as countries ◦ Organization Organization ‘meeting in’ ‘meeting in’ Country Country US Senate US Senate ‘meeting in’ ‘meeting in’ November November ◦ Cluster all country country instances using the category patterns. Months might form a unique sub cluster. ◦ If the Organization Organization instances link only to a particular sub-cluster then it indicates a weak relation ◦ Above metric could be used as another feature

Invalid Relations and causes  Ambiguity ◦ Animal names match with sports team names ◦ Animal Animal ‘won’ ‘won’ trophy trophy ◦ Compare with other predicates which are mutex to it (Sportsteam Sportsteam won won Trophy Trophy) and check if there have exactly matching patterns. ◦ If the ‘animal’ instances associated with the animal ‘won’ trophy relation also have evidence that it is a ‘Sportsteam’ then this is a feature indicating the weakness of Animal Animal ‘won’ ‘won’ trophy trophy relation

Invalid Relations and causes  Underspecified Relations ◦ These relations require more entities to be useful ◦ SportsTeam SportsTeam ‘defeated ‘defeated ‘ ‘ SportsTeam SportsTeam ◦ X defeated Y, Y defeated X etc. ◦ There should be temporal and location information for this relation to make sense

Mohamed Thahir Traditional and Open Relation Extraction Read the - PowerPoint PPT Presentation

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction Experimental Results Coupled learning of Predicates Challenges and ongoing work A relation is instantiated with a set of manually

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano

Relation Extraction CSCI 699 Instructor: Xiang Ren USC Computer Science Relation extraction

? (entity type) Apr 23, 2007 NAACL-HLT 2 1 What Is Relation Extraction? hundreds of

Relation Extraction Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 23, 2017

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

Zero-Shot Relation Extraction via Reading Comprehension Omer Levy Minjoon Seo Eunsol Choi Luke

Language Technology I 2011/2012 Relation Extraction Exercises 1. What are the motivations of

An Integrated Approach for Large-scale Relation Extraction from the Web Naimdjon Takhirov, Fabien

Portuguese Relation Extraction in the Organization Domain Sandra

Lecture 24: Relation Extraction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Understanding Social Tags: Relation Extraction and Tag Annotation Presentation at NLP@UoL, Mar

Neural Distant Superv rvision for Relation Ext xtraction Deepanshu Jindal Elements and Images

Relation Extraction II Luke Zettlemoyer CSE 517 Winter 2013 [with slides adapted from many

A Study of Hybrid Similarity Measures for Semantic Relation Extraction Alexander Panchenko and

Relation Extraction Luke Zettlemoyer CSE 517 Winter 2013 [with slides adapted from many people,

APCNN : Tackling Class Imbalance in Relation Extraction through Aggregated Piecewise Convolutional

Relation Extraction Bill MacCartney CS224U 14-16 April 2014 [with slides adapted from many

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Unsupervised Relation Extraction from Web -Bhavishya Mittal (11198) - Vempati Anurag Sai

A SURVEY ON RELATION EXTRACTION Nguyen Bach & Sameer Badaskar Language Technologies Institute

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

REET Joint Relation Extraction and Entity Typing via Multi-task Learning ADVISOR: JIA-LING, KOH

Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction

Introduction The automatic detection and extraction of Semantic Relations is a crucial step to

Mohamed Thahir Traditional and Open Relation Extraction Read the - PowerPoint PPT Presentation

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction Experimental Results Coupled learning of Predicates Challenges and ongoing work A relation is instantiated with a set of manually

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano

Relation Extraction CSCI 699 Instructor: Xiang Ren USC Computer Science Relation extraction

? (entity type) Apr 23, 2007 NAACL-HLT 2 1 What Is Relation Extraction? hundreds of

Relation Extraction Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 23, 2017

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

Zero-Shot Relation Extraction via Reading Comprehension Omer Levy Minjoon Seo Eunsol Choi Luke

Language Technology I 2011/2012 Relation Extraction Exercises 1. What are the motivations of

An Integrated Approach for Large-scale Relation Extraction from the Web Naimdjon Takhirov, Fabien

Portuguese Relation Extraction in the Organization Domain Sandra

Lecture 24: Relation Extraction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Understanding Social Tags: Relation Extraction and Tag Annotation Presentation at NLP@UoL, Mar

Neural Distant Superv rvision for Relation Ext xtraction Deepanshu Jindal Elements and Images

Relation Extraction II Luke Zettlemoyer CSE 517 Winter 2013 [with slides adapted from many

A Study of Hybrid Similarity Measures for Semantic Relation Extraction Alexander Panchenko and

Relation Extraction Luke Zettlemoyer CSE 517 Winter 2013 [with slides adapted from many people,

APCNN : Tackling Class Imbalance in Relation Extraction through Aggregated Piecewise Convolutional

Relation Extraction Bill MacCartney CS224U 14-16 April 2014 [with slides adapted from many

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Unsupervised Relation Extraction from Web -Bhavishya Mittal (11198) - Vempati Anurag Sai

A SURVEY ON RELATION EXTRACTION Nguyen Bach &amp; Sameer Badaskar Language Technologies Institute

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

REET Joint Relation Extraction and Entity Typing via Multi-task Learning ADVISOR: JIA-LING, KOH

Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction

Introduction The automatic detection and extraction of Semantic Relations is a crucial step to

A SURVEY ON RELATION EXTRACTION Nguyen Bach & Sameer Badaskar Language Technologies Institute