the wikipedia location network overcoming borders and
play

The Wikipedia Location Network: Overcoming Borders and Oceans - PowerPoint PPT Presentation

The Wikipedia Location Network: Overcoming Borders and Oceans Johanna Gei 1 , Andreas Spitz 1 , otgen 1 , 2 , and Michael Gertz 1 Jannik Str 1 Heidelberg University, Institute of Computer Science Database Systems Research Group, Heidelberg 2


  1. The Wikipedia Location Network: Overcoming Borders and Oceans Johanna Geiß 1 , Andreas Spitz 1 , otgen 1 , 2 , and Michael Gertz 1 Jannik Str¨ 1 Heidelberg University, Institute of Computer Science Database Systems Research Group, Heidelberg 2 Max-Planck-Institute for Informatics Databases and Information Systems, Saarbr¨ ucken { geiss, spitz, stroetgen, gertz } @informatik.uni-heidelberg.de 9th GIR Workshop Paris, November 26, 2015

  2. Motivation Network Construction Properties and Applications Summary What’s the difference between France and Illinois? The Wikipedia Location Network Andreas Spitz 1 of 16

  3. Motivation Network Construction Properties and Applications Summary Implicit Networks Calais Calais Abbeville Abbeville B Amiens Amiens A29 Le Havre Le Havre A16 A1 A26 N31 A34 Rouen Rouen Reims Reims N31 Caen Caen N31 N31 A13 A4 N154 A26 A28 N154 Chartres Chartres A5 Troyes Troyes A10 A6 A11 N154 Le Mans Le Mans A19 7 Orléans Orléans A19 7 N A28 Auxerre Auxerre A11 A10 A71 A77 A6 Angers Angers 1 T T ours ours 5 A85 N 1 Bourges Bourges 100mi Nevers Nevers 0 200km The Wikipedia Location Network Andreas Spitz 2 of 16

  4. Motivation Network Construction Properties and Applications Summary Overview 1 Motivation 2 Network Construction 3 Properties and Applications 4 Summary The Wikipedia Location Network Andreas Spitz 3 of 16

  5. Motivation Network Construction Properties and Applications Summary Foundations of Implicit Networks “Most of the circuits currently in use are specially constructed for competition. The current street circuits are Monaco , Mel- bourne , Montreal , Singapore and Sochi , although races in other urban locations come and go ( Las Vegas and Detroit , for example) and proposals for such races are often discussed – most recently New Jersey .” en.wikipedia.org/wiki/Formula One The Wikipedia Location Network Andreas Spitz 4 of 16

  6. Motivation Network Construction Properties and Applications Summary Multi-Graph Extraction s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 The Wikipedia Location Network Andreas Spitz 5 of 16

  7. Motivation Network Construction Properties and Applications Summary Multi-Graph Extraction s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 The Wikipedia Location Network Andreas Spitz 5 of 16

  8. Motivation Network Construction Properties and Applications Summary Edge Aggregation Distance-based cosine for nodes v and w : � i d i ( v ) d i ( w ) dicos ( v, w ) := �� i d i ( v ) 2 �� i d i ( w ) 2 The Wikipedia Location Network Andreas Spitz 6 of 16

  9. Motivation Network Construction Properties and Applications Summary Toponym Extraction in Wikipedia The Wikipedia Location Network Andreas Spitz 7 of 16

  10. Motivation Network Construction Properties and Applications Summary Network Overview Node types: Network statistics: | V | | E | density clustering coefficient 6 . 8 · 10 − 4 723 , 779 178 , 890 , 238 0.56 The Wikipedia Location Network Andreas Spitz 8 of 16

  11. Motivation Network Construction Properties and Applications Summary Network Properties % of remaining edges clustering coefficient 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● network metric 25 ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 number of components assortativity ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40000 ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 20000 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 dicos threshold The Wikipedia Location Network Andreas Spitz 9 of 16

  12. Motivation Network Construction Properties and Applications Summary Hierarchical Evaluation Does the network contain classic geographical relations? 1. Extract hierarchical relations from Wikidata: The Wikipedia Location Network Andreas Spitz 10 of 16

  13. Motivation Network Construction Properties and Applications Summary Hierarchical Evaluation Does the network contain classic geographical relations? 2. Correspondence of highest 1. Extract hierarchical weighted incident edge in network relations from Wikidata: with the link to parent in hierarchy: • cities: 81.6% precision for link to parent country • countries: 80.3% precision for link to parent continent The Wikipedia Location Network Andreas Spitz 10 of 16

  14. Motivation Network Construction Properties and Applications Summary The Network at a Glance What’s the difference between France and Illinois? The Wikipedia Location Network Andreas Spitz 11 of 16

Recommend


More recommend