Building up a Large Scale of Ontology from Japanese Wikipedia Takahira Yamaguchi Keio University, Japan
Today’s Talk � Background � Proposed Methods IS ‐ A Hierarchy, Class – Instance, RDF triple, Property Domains, Synonyms � Application 1 (Knowledge Creation Support) Demo � Application 2 (Human Robot Interaction) Demo � Evaluation (Results, Related Work) � Conclusions and Future Work 2
Background: ODP&T (Ontology Development Process and Tool) Process determine consider enumerate define define define create scope reuse terms classes properties constraints instances (Noy 2003) Tool Ontology Learning Ontology Search Linked Open Data (LOD) from Text, (by SWOOGLE, WATSON) Search Monkey Semi ‐ structured Ontology Matching & (Enhanced Results) Resources Alignment Wikipedia2 Ontology 3
Proposed Methods Institute Educational Intstitute Univ Japanese Univ Wikipedia Keio Univ Wikipedia Keio Univ Ontology foundation 1858 Institute location foundation Keio Univ Keio University 4
Extracting Is ‐ a Relationships • String Matching Methods on the category names • Matching Infobox Templates and Categories 5
Wikipedia Category Tree Category A Category B Article a Category D Article b Category C Category E Article c Category F Article d Article e 6
Category Tree Category and Category Tree http://ja.wikipedia.org/wiki/Category: プログラミング言語 Category 「 Programming Language 」 Mixing is-a, has-a, class-instance, and other Mixing is-a, has-a, class-instance, and other Sub categores of Programming Language relationships in the category tree relationships in the category tree Extracting is-a relationships from category tree Extracting is-a relationships from category tree The number of categories: 91,316 7 http://ja.wikipedia.org/wiki/Wikipedia: カテゴリ
String Matching Methods on the category names � Backward String Matching Super Class Sub Class Motorway Japanese Motorway Airport Airport High-speed rail Taiwan High-speed rail Seafood Japanese Seafood Is- a 7,971 Is-a relationships Sub category of Sumo Amateur Sumo Junior college Japanese Junior college Japanese Airport Japanese Airport Short film Disney Short film Category Tree Is-a relationship Total: 12,558 � Forward Matched String Eliminating Super Class Sub Class Noodle Yakisoba Athlete Japanese Athlete Bird Domestic duck Bird Penguin Is- a 4,587 Is-a relationships Sub category of Musician Lyricist Musician Composer Japanese Golfer Golfer Media Music Media Newspaper Category Tree Is-a relationship Author Poetry 8
Matching Infobox Template Name to Category Name 「Instrument」 Template Super Class Sub Class Super Class Sub Class Organic compound Amine Software Free Software 「 Piano 」 Article Ester Organic compound Software Image Processing Software Carboxylic acid Organic compound Software Text Editor Terpenoid Organic compound Software Instrument 楽器 Web Browser Instrument Organic compound Software E-mail Software Organonitrogen compounds Is- a Aromaticity Organic compound Software Security Software Organic compound Heterocyclic compound Software Word processor Keyboard Keyboard Organic compound Amide Software CAD Software 鍵盤楽器 instrument instrument Organic compound Amino acid Software System Software Is- a Organic compound Alkaloid Software Windows Game Software Alcohol Organic compound Software Application Software ピアノ Piano Piano Aldehyde Organic compound Software TeX Is-a relationships: Categories that the piano article belongs 3,782 9 Keyboard instrument | Piano
Extracting Class ‐ Instance Relationships Title → Class Title → Class Title Item → Instance Item → Instance List of People from Tokyo People from Tokyo = Mathmatician = Class Instance * Kunihiko Kodaira Keio University People Yukichi Fukuzawa Item Keio University People Atsushi Seike Listing Pages: * Shokichi Iyanaga Keio University People Yuichiro Anzai about #8,300 Japanese Tourist attraction Kyoto = Physicist = Class-Instance Japanese Tourist attraction Hiroshima Japanese Tourist attraction Akihabara Relationships: 421,989 * Sin ‐ Itiro Tomonaga Japanese Tourist attraction Yokohama Procedure Nobel laureates Hideki Yukawa (1) Scrape lines including instances’ string Nobel laureates Masatoshi Koshiba (2) Eliminate ‘*’ lines which are used in the explanatory text of the list. Nobel laureates Yoichiro Nambu (3) Eliminate ‘*’ lines which are linked to other listing pages. (4) Eliminate ‘*’ lines which belong to unconcerned content index like “recital”. (5) Eliminate ‘*’ lines which aren’t correct as instances like “*REDIRECT”. (6) Eliminate ‘*’ lines which are used to describe year like “*19th”. 10 (7) Scrape the string of instance out of each ‘*’ lines by using symbol of link “[[ ]]” to identify it.
Extracting RDF Triples Subject Subject Predicate Object Tokyo region Kanto Tokyo area 2,187.65 Tokyo population 12,988,797 Tokyo density 5,940 Tokyo tree Ginkgo tree Tokyo flower Somei ‐ Yoshino Predicate Object Tokyo bird Black ‐ headed Gull Tokyo governor Shintaro Ishikawa ・・・・・・・・・・・ 1,485,751 Triples 11
Implementation of Wikipedia Ontology Search Application Institute Is-a Relationships Educational ? String Matching Methods Intstitute Univ ? Matching Templates and Categories Japanese Univ Class-Instance Relationships Wikipedia ? Scraping Listing Pages Keio Univ Keio Univ RDF Triples foundation ? Scraping Infoboxes 1858 Wikipedia Domain of Property Institute location ? Scraping Infoboxes foundation Ontology Synonym Keio Univ ? Extracting Redirect Links Keio University Support ・ Idea generation ・ Analysis Search Results 12
Demo1 WiLD (Wikipedia Linked Data Application) (think about Japanese famous novelist) 1min. Linked Data ・ Book ・ Restaurant 13 13 13
Demo2 HRI (Human Robot Interaction) (1) Microphone Speaker NAO Sonar Python Programming 58cm Inertial sensor Pressure sensor NAO comes from Aldebaran in France. http://www.aldebaran ‐ robotics.com/en 14 14 14
Demo2 HRI (Human Robot Interaction) (2) 1. An user asks Nao the ways for health-care. 2. Using WikipediaJapn ontology, Nao enumerates them. 3. The user selects tai-chi from them. 4. Using Action ontology with Nao, Nao shows the user tai-chi actions that Nao can do. 5. The user selects tai-chi_1 from them. 6. Nao does the action of tai-chi_1. Demo: by Japanese 実行可能動作 1min30sec 健康法 基本動作 複合動作 Tai ‐ chi 太極拳 太極拳 ウォーキング ピラティス早寝早起き 禁煙 日光浴 重心 移動 回転 姿勢 屈伸 ダンス ラジオ体操 陳式太極拳 楊式太極拳 スイミング ターン ゆっくりターン 左足屈伸 右足屈伸 孫式太極拳 ラジオ体操第一 ラジオ体操第二 座る 後退 前進 立つ 太ももを 横歩き 背泳ぎ 武式太極拳 呉式太極拳 バタフライ 両手を 前屈 ヨガ 寝転ぶ 伸ばす ゆっくり 自由形 平泳ぎ 歩く ゆっくり歩く 腰に手を 太極拳1 前に出す 太極拳2 ヨガ サイドステップ 自己紹介 スターウォーズ サイドステップ 右足に 左足に カルマヨーガ ジャパヨーガ 当てる 後ろ足 ゆっくり後ろ足 ギャーナヨーガ ダンス ダンス 重心をのせる 重心をのせる 中心に パクティヨーガ ラージャヨーガ スリラーダンス マントラヨーガ 重心を戻す WikipediaJapn ontology 15 Action ontology
Learning Results from Wikipedia Japan (1) Relations # Precision Al IS-A 93,322 76.30% IS-A By string matching 12,558 93.1±1.51% By template matching 3,782 95.6±1.09% By contents headlines 83,288 72.6±2.74% Class - Instance 421,989 97.2±1.02% RDF triple 1,485,751 95.8±1.79% Property Domains 6,485 95.4±1.22% Synonyms 106,671 67.0±2.90% Total 1,834,753 - 16
How is WJO going ? Less upper concepts Some concepts with many sub concepts Some classes with no instances Some classes with popular instances 17
Related Work Property Instance IS ‐ A Property Class ‐ RDF Property domain Instance triple ○ ○ Bizer DBpedia Suchanek YAGO ○ △ ○ △ Ponzetto - ○ Wikipedia Keio Univ. Japan2 ○ ○ ○ ○ ○ Ontology DBpedia - A Crystallization Point for the Web of Data Christian Bizer, Jens Lehmann, Georgi Kobilarov, Soren Auer, Christian Becker, Richard Cyganiak Sebastian Hellmann Journal of Web Semantics: Science, Services and Agents on the World Wide Web, Issue 7, Pages 154–165, 2009. http://wiki.dbpedia.org/ YAGO - A Large Ontology from Wikipedia and WordNet Fabian M Suchanek, Gjergi Kasneci, Gerhard Weikum Elsevier Journal of Web Semantics http://www.mpi-inf.mpg.de/yago-naga/yago/ 18
19 Conclusions and Future work Conclusions � Wikipedia Japan works for light ‐ weight (not heavy ‐ weight) ontology development. � Wikipedia Japan Ontology works for knowledge creation support and HRI. Less upper concepts Some concepts with many sub concepts Future Work Manage the right issues of Wikipedia Japan Ontology by with Upper Ontologies. Some classes with no instances Some classes with 19 popular instances
Recommend
More recommend