e valuation via negativa i nformation r etrieval mike
play

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike - PowerPoint PPT Presentation

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo, Richard Tzong-Han Tsai, and Wen-Lian Hsu National Tsing Hua University Academia Sinica Taiwan 1 / 36 Fundamental Unit? a


  1. 中 文 詞分 E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo, Richard Tzong-Han Tsai, and Wen-Lian Hsu National Tsing Hua University Academia Sinica Taiwan 1 / 36

  2. Fundamental Unit? a meta-communication 2 / 36

  3. What is a Word? to linguistics 3 / 36

  4. “... the smallest free form that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning) ...” http://en.wikipedia.org/wiki/Word 4 / 36

  5. “... the task of defining what constitutes a ‘word’ involves determining where one word ends and another word begins...” http://en.wikipedia.org/wiki/Word#Word_boundaries 5 / 36

  6. Word Boundary? • Phonology • Morphology • Orthography • Compound? Multi-word expression? • Multi-word vs. multiword vs. multi word • CJKV? • Multi-character expression? 6 / 36

  7. What is a Word? to computational linguistics 7 / 36

  8. Standard de jure ? • Academia Sinica Balanced Corpus • Chinese Treebank of University of Pennsylvania • City University of Hong Kong • Microsoft Research Asia • Peking University 8 / 36

  9. ... then match standards the more accuracy, the better communication? 9 / 36

  10. What is a Word? to computational linguistics applications 10 / 36

  11. e.g. Information Retrieval 11 / 36

  12. Standard de facto ? • Word n -gram • Character n -gram • Hybrid 12 / 36

  13. Monotonic or not? better WS results yield better IR outcomes? 13 / 36

  14. Is it finite? How to evaluate WS-to- application influence? 14 / 36

  15. http://www.blackmetal.com/scans0710/teratism-via-negativa.jpg Via Negativa “It describes God by saying what he is not, rather than what he is, because as finite beings we can not recognize God's attributes in any real and full sense and because God is beyond what our language can positively describe. “ http://www.blackwellreference.com/public/tocnode?id=g9781405106795_chunk_g978140510679515_ss1-58 15 / 36

  16. Binary Classification? clinical trial? 16 / 36

  17. Something about Evaluation 17 / 36

  18. IR Evaluation • Data • TREC, NTCIR, etc. • Metrics • P@k, MRR, MAP , etc. • Doubts • Pooling bias • Score standardization 18 / 36

  19. CWS Evaluation • Recall and precision counted by • Boundary • Token • Constituent • Similarity? 19 / 36

  20. WS-to-IR • Peng et al . (2002) • WS: 44-70%, IR: ↗ • WS: 70-77%, IR: ⤴ • WS: 85-95%, IR: ⤵ • He et al . (2002) • WS: ↗ (91-94%), IR: ⤴ 20 / 36

  21. Why Inconclusive? • WS accuracy ranges? • WS/IR evaluation metrics? • Query length? • Term types? 21 / 36

  22. Term Type • Kwok (2002) • Insensitive: stop-words; frequent non-content-bearing • Monotonic: content-bearing • Non-monotonic: • 西土耳其 (Western Turkey) • Semantic, syntax, or surface? • 农 (agricultural) / 作物 (plants) • 旱 (drought) / 灾 (disaster) vs. 春旱 (Spring drought) vs. 旱区 (area or drought disaster) • Recall or precision? • 火 (fire) / 山 (mountain) vs. 火山 (volcano) 22 / 36

  23. Surface Pattern http://www.definicionabc.com/general/gestalt-psicologia.php • Ambiguity • Combinatorial • 西土耳其、 农 作物、旱灾、春旱、旱区、火山 ... etc. • Overlapping • 施政 (practice policy) / 伟 (great) vs. 施 (Shih) / 政 伟 (Zheng-Wei) • Which is more harmful? 23 / 36

  24. 24 / 36

  25. Is it finite? How to evaluate WS-to- IR influence? 25 / 36

  26. IR Is Rallying • Indexing models • Retrieval models • Data collections • Evaluation metrics 26 / 36

  27. Tractable Simulation? http://imgs.xkcd.com/store/glen_shirts/g_try_science_shirt_2.jpg 27 / 36

  28. Balanced NTCIR (long) and Sogou (short) query collections 28 / 36

  29. Pragmatical WS accuracy-controlled systems on different standards 1, 1/2, 1/4, ..., 1/16384 data of Bakeoff 2005 for CRF http://scifun.files.wordpress.com/2010/07/1278929569066.jpg 29 / 36

  30. 30 / 36

  31. Popularity similarity (MAP) to a black box’s preference (top-100) 31 / 36

  32. 32 / 36

  33. Correlation ≠ Causation TNR and NPV may imply something http://imgs.xkcd.com/store/imgs/correlation_shirt_300.png 33 / 36

  34. Discussion • 上海 滩 (the bund of Shanghai) • MSR: 上海 滩 ,上海 / 滩 ,上 / 海 / 滩 • PKU: 上海 滩 ,上海 / 滩 ,上 / 海 滩 • May be caused by...... • Standard differences? • Lexicon disappearances? 34 / 36

  35. Concerns • Other accuracy-controlled WS systems than CRF? • The same training data, different standards? • Conventional/comparative IR experiments? • Lucene? Lemur/Indri? • TREC and NTCIR? • Silver standards? • Relaxation of negative patterns? • Graphical or n-best list output of WS? • Oracle precision, recall, TNR, NPV, etc? • Other applications than IR? • Out-of-vocabulary? 35 / 36

  36. <(_ _)> 36 / 36

Recommend


More recommend