So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz, Johanna Geiß and Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group, Heidelberg { spitz, geiss, gertz } @informatik.uni-heidelberg.de 3rd GeoRich Workshop San Francisco, June 26, 2016
Motivation Network Construction Network Properties Toponym Disambiguation Summary Implicit Networks Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 1 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Implicit Text-Based Networks “Most of the circuits currently in use are specially constructed for competition. The current street circuits are Monaco , Mel- bourne , Montreal , Singapore and Sochi , although races in other urban locations come and go ( Las Vegas and Detroit , for example) and proposals for such races are often discussed – most recently New Jersey .” en.wikipedia.org/wiki/Formula One Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 2 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Edge Aggregation Distance-based cosine for nodes v and w : � i d i ( v ) d i ( w ) dicos ( v, w ) := �� i d i ( v ) 2 �� i d i ( w ) 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 4 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Nonreciprocal Relationships Dirk Beyer, Wikimedia Commons Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 5 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Inducing Edge Directions Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 6 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Inducing Edge Directions Normalize weights of outgoing edges: dicos ( v, w ) ω ( v → w ) := � x ∈ V dicos ( v, x ) Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 6 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Adding Knowledge Base Support: Wikidata Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 7 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Toponym Extraction in Wikipedia & Wikidata Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 8 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Overview Network statistics: | V | | E | density clustering coefficient 6 . 8 · 10 − 4 723 , 779 178 , 890 , 238 0.56 Node types: Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 9 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Overview Network statistics: | V | | E | density clustering coefficient 6 . 8 · 10 − 4 723 , 779 178 , 890 , 238 0.56 Node types: Wikidata location hierarchy: Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 9 of 18
Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Properties % of remaining edges clustering coefficient 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● network metric 25 ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 number of components assortativity ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40000 ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 20000 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 dicos threshold Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 10 of 18
Recommend
More recommend