Extracting Descriptions of Location Relations from Implicit Textual - PowerPoint PPT Presentation

Extracting Descriptions of Location Relations from Implicit Textual Networks Andreas Spitz, Gloria Feher, Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group { spitz,gertz } @informatik.uni-heidelberg.de { feher } @stud.uni-heidelberg.de 11th GIR Workshop Heidelberg, November 30, 2017

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary What are the relations between Berlin and Vienna? source: cdn.getyourguide.com source: www.wien.info Extracting Descriptions of Location Relations Andreas Spitz 1 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Relations between Berlin and Vienna both are capitals spoken language is German located in Europe population > 1,000,000 Extracting Descriptions of Location Relations Andreas Spitz 2 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary source: www.wikidata.org Extracting Descriptions of Location Relations Andreas Spitz 3 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary How can we extract other non-trivial connections from texts? Extracting Descriptions of Location Relations Andreas Spitz 4 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Outline (1) The what and why of implicit textual networks (2) Identifying related locations and geo-entities (3) Extracting descriptive sentences (4) Exploratory results and discussion Extracting Descriptions of Location Relations Andreas Spitz 5 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary What is an Implicit Network? Spitz and Gertz, Terms over LOAD (2016) Extracting Descriptions of Location Relations Andreas Spitz 6 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Implicit Network Edge Weights For edges ( x, y ) in which y is a page or sentence, count only (co-) occurrences: � 1 if y contains x ω ( x, y ) = 0 otherwise Extracting Descriptions of Location Relations Andreas Spitz 7 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Implicit Network Edge Weights For edges ( x, y ) in which y is a page or sentence, count only (co-) occurrences: � 1 if y contains x ω ( x, y ) = 0 otherwise For edges ( x, y ) between entity types and terms, aggregate co-occurrence instances I : sum over similarities derived from sentence distances s . � ω ( x, y ) := exp( − s ( x, y, i )) i ∈ I Extracting Descriptions of Location Relations Andreas Spitz 7 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Why Use Implicit Networks? Existing approaches • Knowledge Extraction ⇒ Limited by identifiable patterns or predicates Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Why Use Implicit Networks? Existing approaches • Knowledge Extraction ⇒ Limited by identifiable patterns or predicates • Summarization ⇒ Severe scaling limitations for large input collections Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Why Use Implicit Networks? Existing approaches • Knowledge Extraction ⇒ Limited by identifiable patterns or predicates • Summarization ⇒ Severe scaling limitations for large input collections • Vector embeddings ⇒ Encode similarity of contexts, not relatedness of entities Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Why Use Implicit Networks? Existing approaches • Knowledge Extraction ⇒ Limited by identifiable patterns or predicates • Summarization ⇒ Severe scaling limitations for large input collections • Vector embeddings ⇒ Encode similarity of contexts, not relatedness of entities Implicit networks • Scale well to large document collections • Collocation-based weights encode relatedness of entities • Work well with dynamic text data Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Implicit Network Exploration Pipeline Spitz, Almasian, Gertz, EVELIN (2017) Extracting Descriptions of Location Relations Andreas Spitz 9 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Implicit Network Exploration Pipeline Extracting Descriptions of Location Relations Andreas Spitz 9 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Overview: Location Relation Extraction Extracting descriptive sentences for pairs of locations (1) Find closely related pairs of locations (2) Filter relations that exist in knowledge bases (3) Identify descriptive sentences for the remaining pairs Extracting Descriptions of Location Relations Andreas Spitz 10 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Identifying Closely Related Locations Obtain a location ranking from the network by (1) Creating weights for directed edges between nodes x ∈ X and y ∈ Y in entity sets X and Y in the implicit network | Y | ω ( x | y ) = ω ( x, y ) log � | N ( x ) ∩ Y | (2) For a given query location q ∈ L , ranking all l ∈ L by � ω ( l | q ) Rousseau and Vazirgiannis, Graph-of-word (2013) Spitz and Gertz, Terms over LOAD (2016) Extracting Descriptions of Location Relations Andreas Spitz 11 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Location Ranking Example Berlin (Q64) Vienna (Q1741) location wikiID score location wikiID score Germany Q183 1.00 Austria Q40 1.00 West Berlin Q56036 0.42 Berlin Q64 0.25 East Germany Q16957 0.32 Prague Q1085 0.23 Hamburg Q1055 0.31 Paris Q90 0.19 Munich Q1726 0.29 Munich Q1726 0.16 Brandenburg Q1208 0.29 Austria-Hungary Q28513 0.15 Paris Q90 0.27 Graz Q13298 0.14 Extracting Descriptions of Location Relations Andreas Spitz 12 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Coverage Estimation Data Input location data (Wikipedia): • List of largest German cities (79 locations) • List of international capitals (250 locations) Knowledge Base: • Wikidata ⇒ Inverse evaluation: How “poorly” does the ranking reflect Wikidata properties? Extracting Descriptions of Location Relations Andreas Spitz 13 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Coverage of Location Relations German cities 0.7 0.6 0.5 • Precision 0.4 0.3 Fraction of location pairs in 0.2 0.1 ranking that are connected metric@k 0.0 by a property in Wikidata World capitals 0.7 • Recall 0.6 0.5 Fraction of Wikidata proper- 0.4 0.3 ties that are in the ranked 0.2 0.1 list of location relations 0.0 1 10 20 30 40 50 60 70 80 90 100 position in ranking (k) metric precision recall Extracting Descriptions of Location Relations Andreas Spitz 14 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Sentence Extraction: Intuition Extracting Descriptions of Location Relations Andreas Spitz 15 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Basic Sentence Ranking Methods Rank a sentence s by a set of query entities Q (here: locations), based on its neighbourhood N ( s ) and a number n of relevant terms T n ( Q ) . Extracting Descriptions of Location Relations Andreas Spitz 16 of 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary Basic Sentence Ranking Methods Rank a sentence s by a set of query entities Q (here: locations), based on its neighbourhood N ( s ) and a number n of relevant terms T n ( Q ) . M1 Entity count (baseline) r 1 ( s, Q ) := | N ( s ) ∩ Q | • Rank by adjacent query entities Extracting Descriptions of Location Relations Andreas Spitz 16 of 26

Extracting Descriptions of Location Relations from Implicit Textual - PowerPoint PPT Presentation

Extracting Descriptions of Location Relations from Implicit Textual Networks Andreas Spitz, Gloria Feher, Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group { spitz,gertz }

Location, Location, Location, Location, Location: Location: GPS and Google Earth GPS and

1 Methods of Extracting or Obtaining Essential Oils The most common method for extracting

MOBILE COMPUTING CSE 40814/60814 Fall 2015 Location, Location, Location Location information

CS371m - Mobile Computing Location (Location, Location, Location) Cheap GPS

Snowball : Extracting Relations from Large Plain-Text Collections Eugene Agichtein Luis Gravano

VI.2 IE for Entities, Relations, Roles Extracting named entities (either type-less constants or

Facility location II. Chapter 10 Location-Allocation Model Plant Location Model Network

Facility location I. Chapter 10 Facility location Continuous facility location models Single

A simple and robust A simple and robust algorithm for extracting algorithm for extracting

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

Extracting Gait Parameters Extracting Gait Parameters from Raw Data from Raw Data

Program Analysis Program Analysis Extracting information, in order to present Extracting

CKM 2006 CKM 2006 Extracting CKM phase from phase from Extracting CKM B K

Chapter 8 Dataflow Descriptions in VHDL 1 benyamin@mehr.sharif.edu Dataflow Description

School Improvement SII Institute (Si2) Welcome! Todays Agenda Strategy Descriptions Import

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

Automatic Extraction of Sliced Object State Machines for Component Interfaces Tao Xie David

Does Automated Refactoring Obviate Systematic Editing? Na Meng*

Learning to Extract Folktale Keywords Dolf Trieschnigg, Dong

How to extract useful randomness from unreliable sources Divesh Aggarwal Maciej Obremski Joo

Cross-VM Side Channels and Their Use to Extract Private Keys Yinqian Zhang (UNC-Chapel Hill) Ari

CSC2542 Planning-Graph Techniques The lecture in 2 weeks will be given by our TA, Christian

Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution Jo

RECSM Summer School: Scraping the web Pablo Barber a School of International Relations