from lex i tron to asian wordnet on from lex i tron to
play

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian - PowerPoint PPT Presentation

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative Development Platform Virach Sornlertlamvanich National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology


  1. � � From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative Development Platform Virach Sornlertlamvanich National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology (NECTEC) NSTDA, Thailand virach@tcllab.org 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  2. ุ LEXiTRON version 1.1 � Corpus-based dictionary � Dictionary for writing y g � เผยแพรในป 2538 � CD-ROM สําหรับ Windows 3.1 Thai � CD ROM สาหรบ Windows 3.1 Thai Edition � ไทย 11,000 คํา ; อังกฤษ 9,000 คํา ฤ � 6 พจนานุกรมในหนึ่งเดียว 1) พจนานกรมไทยทั่วไป ) 2) พจนานุกรมการใชภาษาไทย 3) พจนานกรมคําเหมือนคําตรงขาม 3) พจนานุกรมคาเหมอนคาตรงขาม 4) พจนานุกรมไทยอังกฤษ 5) พจนานกรมกลมคําไทย 5) พจนานุกรมกลุมคาไทย 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  3. ุ C Corpus ‐ based Dictionary b d Di ti and Dictionary for Writing � การเขาถึงคํา � คําเหมือน (synonym) � คําตรงขาม (antonym) ( y ) � ตัวอยางประโยค (usage) � กลมคํา (word group) ( g p) � คําแปล (equivalent) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  4.  ํ Design of LEX i TRON � สรางจากพจนานุกรมสําหรับ � โครงสรางคํา ระบบแปลภาษา 30,000 คํา � คําเดี่ยว � ขอมูลของคํา � � คําประสม � คํา � Prefix � คําอาน � คาอาน � Suffix � Suffix � ประเภทของคํา ( หลัก 14, ยอย 45) � คําลักษณนาม � Verb pattern (12 -> 9 VPs) � คําเหมือน � คําตรงขาม � คาตรงขาม � ตัวอยางประโยค � คําแปลภาษาอังกฤษ � กลุมความหมาย 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  5. Synset Assignment via English Surface � Use English equivalents to link the existing dictionary to WordNet WordNet � POS (n, v, adv, adj), English equivalent, and English equivalent of synonym of the target language are used to pinpoint the appropriate link � Number of matched English equivalents in the Synset confirms the appropriate link � Experiment on Thai ‐ English, Indonesian ‐ English and Mongolian English dictionaries Mongolian ‐ English dictionaries 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  6. Asian WordNet Development Asian WordNet Development Addition Discussion X-English X English X-English Lookup X-English Indonesian -English g WN merged-WN AWN GWN KUI Thai-English X E X-English li h Correction Applications X-English Dictionary Ontology X-English Translation Translation CL-Search MT Summarization IE/IR Voting 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 ….

  7. Synset Assignment (CS=4) ∈ S 0 � Accept the Synset that includes more than one E 00 ∈ English Equivalent with English Equivalent with ∈ L L 0 S 1 confidence score of 4. E 01 ∈ S 2 S Example: L0: เปาหมาย L0: เปาหมาย E0: aim E1: target S0: purpose intent intention aim design S0: purpose, intent, intention, aim, design S1: aim, object, objective, target 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 S2: aim

  8. Synset Assignment (CS=3) ∈ S 0 � Accept the Synset that includes more than one L 0 E 0 ∈ English Equivalent from the English Equivalent from the ∈ S 1 synonym of the target L 1 E 1 ∈ language with confidence S 2 S score of 3. f 3 Example: L0: จอง L0: จอง Synonym L1: เพงมอง E0: stare E1: gaze S0: stare S1: gaze, stare 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  9. Synset Assignment (CS=2) � Accept the only Synset that ∈ ∈ includes the English includes the English L 0 E 0 S 0 Equivalent with confidence score of 2. Example: Example: L0: สูติแพทย E0 E0: obstetrician b t t i i S0: obstetrician, accoucheur , 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  10. Synset Assignment (CS=1) � Accept more than one Synset ∈ S 0 that includes each of the English E 0 E 0 ∈ ∈ Equivalent with confidence Equivalent with confidence L 0 score of 1. S 1 E 1 ∈ S 2 Example: L0: ชอง E0: hole E1: canal E1: canal S0: hole, hollow , S1: hole, trap, cakehole, maw, yap, gap 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 S2: canal, duct, epithelial duct, channel

  11. Quantitative Evaluation for T ‐ E WordNet (synset) T-E Dict (entry) total assigned total assigned 18,353 11,867 Noun 145,103 43,072 (13%) (13%) (28%) (28%) 1,333 2,298 Verb 24,884 17,669 (5%) (13%) 4,034 3,722 Adjective 31,302 18,448 (13%) (20%) 737 737 1 519 1,519 Adverb 5,721 3,008 (13%) (51%) 24,457 , 19,406 , t t l total 207 010 207,010 82,197 82 197 (12%) (24%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  12. Qualitative Evaluation for T ‐ E CS=4 CS=3 CS=2 CS=1 total 5 306 34 55 400 Noun (71.4%) (63.9%) (53.1%) (20.2%) (48.7%) 23 23 6 6 4 4 33 33 Verb (52.3%) (8.0%) (13.8%) (22.3%) 2 2 2 2 Adj Adjective ti (8.0%) (3.4%) 7 4 4 1 16 Adverb Adverb (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) 451 12 335 44 60 total (43.2%) (43 2%) (80 0%) (80.0%) (60 7%) (60.7%) (30 8%) (30.8%) (18%) (18%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  13. Improvement by Consulting Dictionaries from Multiple Improvement by Consulting Dictionaries from Multiple Sources MMT T-E Dictionary CS=4 CS=4 CS=3 CS=3 CS=2 CS=2 CS=1 CS=1 total total 12 335 44 60 451 Total (80 0%) (80.0%) (60.7%) (30.8%) (60 7%) (30 8%) (18%) (18%) (43.2%) (43 2%) MMT and LEXiTRON T-E Dictionary CS=4 CS=3 CS=2 CS=1 total 14 337 72 93 516 Total Total (93.3%) (61.1%) (50.3%) (27.8%) (49.4%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  14. Participation 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  15. Lookup 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  16. English ‐ English 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  17. Thai ‐ English 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  18. Thai ‐ Indonesian 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  19. Future Work � Asian WordNet Community � Language resource conversion and alignment � Language technology sharing � Collaborative development platform � Collaborative development platform AsianWordnet AsianWordnet (www.tcllab.org/kui -> www.asianwordnet.org) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

Recommend


More recommend