core context aware open relation extraction with
play

CORE: Context-Aware Open Relation Extraction with Factorization - PowerPoint PPT Presentation

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla Open relation extraction I Open relation extraction is the task of extracting new facts for a potentially unbounded set of


  1. CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano Del Corro Rainer Gemulla

  2. Open relation extraction I Open relation extraction is the task of extracting new facts for a potentially unbounded set of relations from various sources natural knowledge language bases text EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 2 of 21

  3. Input data: facts from natural language text open information extractor extract all surface relation facts in text Enrico Fermi was a surface fact rel sub obj professor in theoretical physics at Sapienza "professor at"(Fermi,Sapienza) University of Rome. tuple EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 3 of 21

  4. Input data: facts from knowledge bases KB fact Fermi employee(Fermi,Sapienza) employee entity link natural e.g., string match heuristic language Sapienza "professor at"(Fermi,Sapienza) text KB relation surface fact EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 4 of 21

  5. Relation extraction techniques taxonomy open close relation in-KB in-KB extraction out of-KB latent factors distant relation supervision models clustering set of predefined "black and white" relations approach tensor matrix completion completion RESCAL PITF NFE CORE (Nickel et al., 2011) (Drumond et al., 2012) (Riedel et al., 2013) (Petroni et al., 2015) limited scalability with the number restricted prediction space of relations; large prediction space EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 5 of 21

  6. Matrix completion for open relation extraction (Caesar,Rome) 1 (Fermi,Rome) 1 (Fermi,Sapienza) 1 1 1 (de Blasio,NY) born in professor at mayor of employee tuples x relations surface relation KB relation EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 6 of 21

  7. Matrix completion for open relation extraction (Caesar,Rome) ? ? ? 1 ? (Fermi,Rome) ? ? 1 ? (Fermi,Sapienza) 1 ? 1 ? ? 1 ? (de Blasio,NY) born in professor at mayor of employee tuples x relations surface relation KB relation EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 6 of 21

  8. Matrix factorization I learn latent semantic representations of tuples and relations relation latent factor vector tuple latent dot product factor vector I leverage latent representations to predict new facts (Fermi,Sapienza) professor at 0.8 0.9 related with science -0.5 -0.3 related with sport I in real applications latent factors are uninterpretable EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 7 of 21

  9. Matrix factorization CORE integrates contextual information into such models to improve prediction performance EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 7 of 21

  10. Contextual information unspecific relation "join"(Peloso,Modest Mouse) surface relation Contextual information Tom Peloso joined person Modest Mouse to record entity types organization their fifth studio album. article topic named entity label entity recognizer with coarse- words record album grained type EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 8 of 21

  11. Contextual information unspecific relation "join"(Peloso,Modest Mouse) Contextual information How to incorporate contextual How to incorporate contextual Tom Peloso joined person entity types Modest Mouse to record information within the model? information within the model? organization their fifth studio album. article topic surface relation words record album EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 8 of 21

  12. CORE - latent representations of variables I associates latent representations f v with each variable v 2 V tuple (Peloso,Modest Mouse) relation join Peloso entities Modest Mouse latent factor person vectors organization context record album EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 9 of 21

  13. CORE - modeling facts Surface KB Context x 1 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 x 2 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 … 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 x 3 x 4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) “professor at”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics I models the input data in terms of a matrix in which each row corresponds to a fact x and each column to a variable v I groups columns according to the type of the variables I in each row the values of each column group sum up to unity EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 10 of 21

  14. CORE - modeling context Surface KB Context x 1 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 x 2 … x 3 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 x 4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) “professor at”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics I aggregates and normalizes contextual information by tuple B a fact can be observed multiple times with di ff erent context B there is no context for new facts (never observed in input) I this approach allows us to provide comprehensive contextual information for both observed and unobserved facts EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 11 of 21

  15. CORE - factorization model Surface KB Context x 1 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 x 2 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 … 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 x 3 x 4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) “professor at”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, person, physics history organization location relations tuples entities tuple types tuple topics I uses factorization machines as underlying framework I associates a score s ( x ) with each fact x X X x v 1 x v 2 f T s ( x ) = v 1 f v 2 v 1 ∈ V v 2 ∈ V \{ v 1 } I weighted pairwise interactions of latent factor vectors EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 12 of 21

  16. CORE - prediction goal produce a ranked list of tuples for each relation I rank reflects the likelihood that the corresponding fact is true I to generate this ranked list: B fix a relation r B retrieve all tuples t , s.t. the fact r(t) is not observed B add tuple context B rank unobserved facts by their scores EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 13 of 21

  17. CORE - parameter estimation I parameters: Θ = { b v , f v | v 2 V } I all our observations are positive, no negative training data I Bayesian personalized ranking, open-world assumption x x- observed fact sampled negative evidence physics history Fermi Caesar professor at professor at person person Sapienza Rome (Fermi,Sapienza) (Caesar,Rome) organization location location fact tuple entities tuple context fact tuple entities tuple context I pairwise approach, x is more likely to be true than x - X maximize f ( s ( x ) � s ( x -)) x I stochastic gradient ascent Θ Θ + η r Θ ( ) EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 14 of 21

  18. Experiments - dataset entity mentions 440k facts 15k facts linked using extracted from from string matching corpus I Contextual information entity type article metadata bag-of-word news desk (e.g., foreign desk) person sentences where descriptors (e.g., finances) organization the fact has been online section (e.g., sports) location section (e.g., a, d) extracted m t w publication year miscellaneous I letters to indicate contextual information considered EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 15 of 21

  19. Experiments - methodology I we consider (to keep experiments feasible): 10k tuples 19 Freebase relations 10 surface relations I for each relation and method: B we rank the tuples subsample B we consider the top-100 predictions and label them manually I evaluation metrics: number of true facts MAP (quality of the ranking) I methods: B PITF , tensor factorization method B NFE , matrix completion method (context-agnostic) B CORE , uses relations, tuples and entities as variables B CORE +m, +t, +w, +mt, +mtw EMNLP 2015. September 17-21, 2015. Lisbon, Portugal. 16 of 21

Recommend


More recommend