Welcome! Twitter: #ntcir9 Ust: ntcir-9-kick NTCIR-9 Kick-Off Event ff 2010.10.05 日本語セッション : 13:30- English Session: 15:30- li h S i 30 1
Program Program • About NTCIR Ab t NTCIR • About NTCIR-9 • Accepted Tasks • Why participate? • How to participate How to participate • Important Dates • Q & A 2
About NTCIR 3
NTCIR: NII Testbeds and Community for Information access Research Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by providing an h i i f ti t h l i b idi infrastructure for large-scale evaluations. ■ Data sets, evaluation methodologies, and forum , g , Project started in late 1997 Once every 18 months Data sets (Test collections or TCs) Scientific, news, patents, and web Chi K J d E li h Chinese, Korean, Japanese, and English Tasks (Research Areas) IR: Cross-lingual tasks, patents, web, Geo QA : Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining C Community-based Research Activities i b d R h A i i i 4
Information retrieval (IR) Information retrieval (IR) • Retrieve RELEVANT information from vast collection to meet users’ information needs ’ i f ti d • Using computers since the 1950s • First CS uses human assessments as success criteria • First CS uses human assessments as success criteria – Judgments vary – Comparative evaluations on the same infrastructure – Comparative evaluations on the same infrastructure Information access (IA) o at o access ( ) Whole process to make information usable by users. ex.: IR, text summarization, QA, text mining, and Q d clustering 5
Tasks at Past NTCIRs NTCIR 1 2 3 4 5 6 7 8 '99'01 '02'04'05'07'08'09- User Generated Community QA ■ Contents Contents Opinion Analysis Opinion Analysis ■ ■ ■ ■ ■ ■ Cross-Lingual QA + IR ■ ■ Module-Based Geo Temporal ■ IR for Focused Patent Patent Domain Domain ■ ■ ■ ■ □ ■ ■ ■ ■ □ Complex/ Any Types ■ ■ ■ Question Dialog ■ ■ A Answering i Cross-Lingual C Li l ■ ■ ■ ■ Factoid, List ■ ■ ■ ■ Text Mining / Classification ■ ■ ■ ■ ■ Summarization / Summarization / Trend Info Visualization ■ ■ ■ Consolidation Text Summarization ■ ■ ■ Web Web ■ ■ ■ Statistical MT ■ ■ Crosslingual Cross-Lingual IR ■ ■ ■ ■ ■ ■ ■ ■ Retrieval Non English Search Non-English Search ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Ad Hoc IR, IR for QA Text Retrieval ■ ■ ■ ■ ■ ■ ■ ■ The Years the meetings were held. The tasks started 18 months before 6
Procedures in NTCIR Workshops p • Call forTask Proposals Call for Task Proposals • Selection of Task Proposals by Committee • Discussion about Experimental Designs and Evaluation Methods (can be continued to Formal Runs) b d l • Registration to Task(s) – Deliver Training Data (Documents, Topics, Answers) – DeliverTraining Data (Documents Topics Answers) • Experiments and Tuning by Each Participants – Deliver Test Data (Documents and Topics) • Experiments by Each Participants • Submission of Experimental Results • Pooling the Answer Candidates from the Submissions and • Pooling the Answer Candidates from the Submissions, and Conduct Manual Judgments • Return Answers (Relevance Judgments) and Evaluation Results • Workshop Meeting Discussion for the Next Round 7
NTCIR Workshop Meeting NTCIR: Workshop Meeting http://research nii ac jp/ntcir/ http://research.nii.ac.jp/ntcir/ 8 8
NTCIR-7 & -8 Program Committee Mark Sanderson, Doug Oard, Atsushi Fujii, Tatsunori Mori, Fred Gey, Noriko Kando (and EllenVoorhees Sung Hyun Myaeng Hsin Hsi Chen (and Ellen Voorhees, Sung Hyun Myaeng, Hsin-Hsi Chen, Tetsuya Sakai) 9
NTCIR Test Collections Test Collections = Docs + Topics/Questions + Answers est Co ect o s ocs op cs/Quest o s s e s 10 Available to Non-participants for Research Purpose
Focus of NTCIR Focus of NTCIR Lab-type IR Test yp New Challenges g Intersection of IR + NLP Asian Languages/cross-language To make information in the documents Variety of Genre more usable for users! Parallel/comparable Corpus Realistic eval/user task Interactive/Exploratory search QA t QA types at topic crea t t i Forum for Researchers and Other Experts/users Other Experts/users Idea Exchange Discussion/Investigation on Evaluation Discussion/Investigation on Evaluation methods/metrics 11
IR Systems Evaluation IR Systems Evaluation • Engineering Level: Efficiency • Input Level: ex. Exhaustivity, quality, novelty of DB • Process Level: Effectiveness ex. recall, precision Process Level: Effectiveness ex. recall, precision • Output Level: Display of output • User Level: ex. Effort that users need U L l Eff t th t d • Social Level: ex. Importance (Cleverdon & Keen 1966) 12
Difficulty of retrieval varies with topics J-J Level1 D auto Effectiveness across Effectiveness across TOPICS on 1.0000 101 検索システム別の11pt再現率精度 SYSTEMS system 102 103 103 104 105 1 0.8000 A 106 Average over 50 B 107 108 topics topics C C 109 D 0.8 110 0.6000 E 111 cision F 112 113 G G 0.6 pre precision 114 H 0.4000 115 I 116 J 117 0.4 118 K 119 L 0.2000 120 M 121 N 0.2 122 O 123 124 0.0000 P 125 0 2 4 6 8 0 126 0 . . . . . . 0 0 0 0 0 1 127 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 128 recall recall recall 129 13
Difficulty of retrieval varies with topics J J L J-J Level1 D auto l1 D t Effectiveness across 1.0000 Effectiveness across TOPICS on 101 検索システム別の11pt再現率精度 SYSTEMS 102 system system 103 104 105 1 0.8000 “Difficult topics” vary with systems A 106 Average over 50 B 107 A 108 C J J Level1 D auto J-J Level1 D auto topics topics 109 D B 0.8 110 0.6000 E C ecision 1.0000 111 F 112 D 113 G on 0.6 0 6 pre E E 0.8000 0 8000 114 114 precisio H 0.4000 115 F I cision 116 0.6000 G J 117 ecision 0.4 118 K H 119 119 pre n av. pre L 0.2000 0.4000 I 120 M J 121 N 0.2 122 0.2000 K O 123 L L 124 124 0.0000 P Mea For reliable and stable 125 0.0000 M 0 2 4 6 8 0 126 0 evaluation, using substantial # . . . . . . 1 0 4 7 0 0 3 0 6 9 0 2 5 0 8 1 1 4 7 0 3 6 9 N 0 0 0 1 1 1 1 2 2 2 3 3 3 4 4 4 4 127 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0.1 1 1 0.2 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 of topics is necessary. 128 recall recall O Topic# Topic# Requests #101 150 Requests #101-150 129 129 P 14
What are TCs usable for evaluating? g Pharmeceutical R & D Phase II : Phase III: Phase IV: Phase I: Animal experiments Animal experiments Tests with healthy Tests with healthy Clinical tests Clinical tests In vitro human subjects experiments 15
What are TCs usable for evaluating? g NTCIR Users’ information-seeking Test collections tasks Phase III: Phase IV: Phase II : Phase I: Controlled Uncontrolled pre- Sharing modules, Laboratory-type interactive testing interactive testing operational testing operational testing Prototype P t t testing using human testing subjects Pharmeceutical R & D Phase II : Phase III: Phase IV: Phase I: Animal experiments Animal experiments Tests with healthy Tests with healthy Clinical tests Clinical tests In vitro human subjects experiments 2.Input level 、 6.Social level Levels of evaluation 4.User level 、 5.Output level 1. Engineering level: Efficiency 3.Process level: Effectiveness 16
• Information Seeking Task g – document types + user community – user’s situation purpose of search realistic user s situation, purpose of search, realistic Experiments are Abstraction of the RealWorldTasks Experiments are Abstraction of the Real World Tasks. Trade-off between “reality” and “contorable” • Testing & Bench marking To learn how and why the system works better (worse) than others To learn how it can be improved Scientific Understanding of the effectiveness g 17
Improvement of Effectiveness by Evaluation Workshops p y p 1.5 – 2 times in 3 years Cornell University TREC Systems 0.5 n Precision 0.4 TREC-1 0.3 verage P TREC-2 TREC-3 0.2 TREC 4 TREC-4 Mean Av 0.1 TREC-5 TREC-6 TREC-6 M 0 TREC-7 '92 '93 '94 '95 '96 '97 '98 System System System System System System System System System System System System System System 18
Research Trends Number of Papers Presented at ACM-SIGIR 450 400 WEB 350 User Evaluation 300 Non-Text QA & Summarization pers 250 # of pap NLP NLP Cross-Lingual 200 ML Clustering g 150 150 Efficiency Filtering 100 Query Processign IR Models 50 General 0 '77-79 77 79 '80-84 80 84 '85-89 85 89 '90-94 90 94 '95-99 95 99 '00-04 00 04 '05-09 05 09 PpublicationYears 19
Recommend
More recommend