E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike - PowerPoint PPT Presentation

中文詞分 E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo, Richard Tzong-Han Tsai, and Wen-Lian Hsu National Tsing Hua University Academia Sinica Taiwan 1 / 36

Fundamental Unit? a meta-communication 2 / 36

What is a Word? to linguistics 3 / 36

“... the smallest free form that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning) ...” http://en.wikipedia.org/wiki/Word 4 / 36

“... the task of defining what constitutes a ‘word’ involves determining where one word ends and another word begins...” http://en.wikipedia.org/wiki/Word#Word_boundaries 5 / 36

Word Boundary? • Phonology • Morphology • Orthography • Compound? Multi-word expression? • Multi-word vs. multiword vs. multi word • CJKV? • Multi-character expression? 6 / 36

What is a Word? to computational linguistics 7 / 36

Standard de jure ? • Academia Sinica Balanced Corpus • Chinese Treebank of University of Pennsylvania • City University of Hong Kong • Microsoft Research Asia • Peking University 8 / 36

... then match standards the more accuracy, the better communication? 9 / 36

What is a Word? to computational linguistics applications 10 / 36

e.g. Information Retrieval 11 / 36

Standard de facto ? • Word n -gram • Character n -gram • Hybrid 12 / 36

Monotonic or not? better WS results yield better IR outcomes? 13 / 36

Is it finite? How to evaluate WS-to- application influence? 14 / 36

http://www.blackmetal.com/scans0710/teratism-via-negativa.jpg Via Negativa “It describes God by saying what he is not, rather than what he is, because as finite beings we can not recognize God's attributes in any real and full sense and because God is beyond what our language can positively describe. “ http://www.blackwellreference.com/public/tocnode?id=g9781405106795_chunk_g978140510679515_ss1-58 15 / 36

Binary Classification? clinical trial? 16 / 36

Something about Evaluation 17 / 36

IR Evaluation • Data • TREC, NTCIR, etc. • Metrics • P@k, MRR, MAP , etc. • Doubts • Pooling bias • Score standardization 18 / 36

CWS Evaluation • Recall and precision counted by • Boundary • Token • Constituent • Similarity? 19 / 36

WS-to-IR • Peng et al . (2002) • WS: 44-70%, IR: ↗ • WS: 70-77%, IR: ⤴ • WS: 85-95%, IR: ⤵ • He et al . (2002) • WS: ↗ (91-94%), IR: ⤴ 20 / 36

Why Inconclusive? • WS accuracy ranges? • WS/IR evaluation metrics? • Query length? • Term types? 21 / 36

Term Type • Kwok (2002) • Insensitive: stop-words; frequent non-content-bearing • Monotonic: content-bearing • Non-monotonic: • 西土耳其 (Western Turkey) • Semantic, syntax, or surface? • 农 (agricultural) / 作物 (plants) • 旱 (drought) / 灾 (disaster) vs. 春旱 (Spring drought) vs. 旱区 (area or drought disaster) • Recall or precision? • 火 (fire) / 山 (mountain) vs. 火山 (volcano) 22 / 36

Surface Pattern http://www.definicionabc.com/general/gestalt-psicologia.php • Ambiguity • Combinatorial • 西土耳其、农作物、旱灾、春旱、旱区、火山 ... etc. • Overlapping • 施政 (practice policy) / 伟 (great) vs. 施 (Shih) / 政伟 (Zheng-Wei) • Which is more harmful? 23 / 36

24 / 36

Is it finite? How to evaluate WS-to- IR influence? 25 / 36

IR Is Rallying • Indexing models • Retrieval models • Data collections • Evaluation metrics 26 / 36

Tractable Simulation? http://imgs.xkcd.com/store/glen_shirts/g_try_science_shirt_2.jpg 27 / 36

Balanced NTCIR (long) and Sogou (short) query collections 28 / 36

Pragmatical WS accuracy-controlled systems on different standards 1, 1/2, 1/4, ..., 1/16384 data of Bakeoff 2005 for CRF http://scifun.files.wordpress.com/2010/07/1278929569066.jpg 29 / 36

30 / 36

Popularity similarity (MAP) to a black box’s preference (top-100) 31 / 36

32 / 36

Correlation ≠ Causation TNR and NPV may imply something http://imgs.xkcd.com/store/imgs/correlation_shirt_300.png 33 / 36

Discussion • 上海滩 (the bund of Shanghai) • MSR: 上海滩，上海 / 滩，上 / 海 / 滩 • PKU: 上海滩，上海 / 滩，上 / 海滩 • May be caused by...... • Standard differences? • Lexicon disappearances? 34 / 36

Concerns • Other accuracy-controlled WS systems than CRF? • The same training data, different standards? • Conventional/comparative IR experiments? • Lucene? Lemur/Indri? • TREC and NTCIR? • Silver standards? • Relaxation of negative patterns? • Graphical or n-best list output of WS? • Oracle precision, recall, TNR, NPV, etc? • Other applications than IR? • Out-of-vocabulary? 35 / 36

<(_ _)> 36 / 36

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike - PowerPoint PPT Presentation

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo, Richard Tzong-Han Tsai, and Wen-Lian Hsu National Tsing Hua University Academia Sinica Taiwan 1 / 36 Fundamental Unit? a

+ = Photo from Iain Tate on Flickr Photo from Becky Stern on Flickr Algorithm = Measurements!

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU),

Local Governmental Employees Retirement System Principal Results of Actuarial Valuation as

Teachers and State Employees Retirement System Principal Results of Actuarial Valuation as

V V ISUAL ISUAL G RAPH RAPH M M ODELING AND R R ETRIEVAL ODELING AND ETRIEVAL A L A L ANGUAGE

Valuation Valuation Wetland Wetland Valuation of Environment and Resource Value is defined as

Firefighters and Rescue Squad Workers Pension Fund Principal Results of Actuarial Valuation

Valuation Application CA Pinkesh Billimoria 8 th June 2019 Topics covered: Valuation for Mergers

Valuation of Acquisition Targets: Guidance for M&A Counsel Understanding Valuation Models,

Business Valuation in India & Emerging Opportunities Chander Sawhney FCA, ACS, Certified

VALUATION CA Bhavik Shah 16 May 2015 Presentation Overview Valuation Concept Purpose of

Drug I nformation Network Drug I nformation Network Project Project Training Workshop for

Unit 17: Real Property Valuation 1 Learning Objectives: Licensing requirements for

Efficient valuation of exotic derivatives Valuation Examples of in L evy models payoff

Analysis of valuation formulae and Valuation Examples of applications to option pricing in L

Real Estate in Central & Eastern Europe Mike Edwards Head of Valuation Advisory Services,

New Standards in Fund Valuation VOLTAIRE ADVISORS 3 RD ANNUAL WORKSHOP ON FUND VALUATION NEW YORK,

V OIV International Business Valuation Conference Business Valuation: glocal (global and

Disability Income Plan Principal Results of Actuarial Valuation as of December 31, 2012 October

Due Diligence, Legal and Regulatory Valuation aspects Valuation under Companies Act 2013

DB/Win Everyday Valuation 02/23/2010 DB/Win Every Day Valuation Aaron Venouziou, EA John

Topic: Surviving the IRS Valuation Audit Course Overview CLE Course: Fall-out from Giustina and

AF CYBER AND IT BIG ROCKS M R W ILLIAM B ILL M ARION II D EPUTY C HIEF , I NFORMATION D

G lobal G eospatial I nformation M anagement Working Group Tim Trainor U.S. Census Bureau What

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike - PowerPoint PPT Presentation

E VALUATION via Negativa I NFORMATION R ETRIEVAL Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo, Richard Tzong-Han Tsai, and Wen-Lian Hsu National Tsing Hua University Academia Sinica Taiwan 1 / 36 Fundamental Unit? a

+ = Photo from Iain Tate on Flickr Photo from Becky Stern on Flickr Algorithm = Measurements!

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU),

Local Governmental Employees Retirement System Principal Results of Actuarial Valuation as

Teachers and State Employees Retirement System Principal Results of Actuarial Valuation as

V V ISUAL ISUAL G RAPH RAPH M M ODELING AND R R ETRIEVAL ODELING AND ETRIEVAL A L A L ANGUAGE

Valuation Valuation Wetland Wetland Valuation of Environment and Resource Value is defined as

Firefighters and Rescue Squad Workers Pension Fund Principal Results of Actuarial Valuation

Valuation Application CA Pinkesh Billimoria 8 th June 2019 Topics covered: Valuation for Mergers

Valuation of Acquisition Targets: Guidance for M&amp;A Counsel Understanding Valuation Models,

Business Valuation in India &amp; Emerging Opportunities Chander Sawhney FCA, ACS, Certified

VALUATION CA Bhavik Shah 16 May 2015 Presentation Overview Valuation Concept Purpose of

Drug I nformation Network Drug I nformation Network Project Project Training Workshop for

Unit 17: Real Property Valuation 1 Learning Objectives: Licensing requirements for

Efficient valuation of exotic derivatives Valuation Examples of in L evy models payoff

Analysis of valuation formulae and Valuation Examples of applications to option pricing in L

Real Estate in Central &amp; Eastern Europe Mike Edwards Head of Valuation Advisory Services,

New Standards in Fund Valuation VOLTAIRE ADVISORS 3 RD ANNUAL WORKSHOP ON FUND VALUATION NEW YORK,

V OIV International Business Valuation Conference Business Valuation: glocal (global and

Disability Income Plan Principal Results of Actuarial Valuation as of December 31, 2012 October

Due Diligence, Legal and Regulatory Valuation aspects Valuation under Companies Act 2013

DB/Win Everyday Valuation 02/23/2010 DB/Win Every Day Valuation Aaron Venouziou, EA John

Topic: Surviving the IRS Valuation Audit Course Overview CLE Course: Fall-out from Giustina and

AF CYBER AND IT BIG ROCKS M R W ILLIAM B ILL M ARION II D EPUTY C HIEF , I NFORMATION D

G lobal G eospatial I nformation M anagement Working Group Tim Trainor U.S. Census Bureau What

Valuation of Acquisition Targets: Guidance for M&A Counsel Understanding Valuation Models,

Business Valuation in India & Emerging Opportunities Chander Sawhney FCA, ACS, Certified

Real Estate in Central & Eastern Europe Mike Edwards Head of Valuation Advisory Services,