Overview of the 7 th NTCIR f Workshop N Noriko Kando k K d National Institute of Informatics, Japan http://research.nii.ac.jp/ntcir/ h // h ii j / i / kando (at) nii. ac. Jp With th With thanks for Tetsuya Sakai for the slides k f T t S k i f th lid Noriko Kando NTC7 OV 2008-12-17 1
NTCIR: NTCIR: NII Test Collection for Information Retrieval NII Test Collection for Information Retrieval Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by h h i i f ti t h l i b providing an infrastructure for large-scale evaluations. ■ Data sets, evaluation methodologies, and forum ■ Data sets, evaluation methodologies, and forum Project started in late 1997 7th Once every 18 months 6th 5th Data sets (Test collections or TCs) 4th 3rd Scientific, news, patents , and web 2st Chin s Chinese, Korean, Japanese, and English K r n J p n s nd En lish 1st st Tasks 0 20 40 60 80 100 # of groups # of countries IR: Cross-lingual tasks, patents, web, QA : Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining C Community-based Research Activities it b d R h A ti iti NTCIR-7 participants Noriko Kando 2 NTC7 OV 2008-12-17 82 groups from 15 countries
Information access (IA) Information access (IA) • Whole process ofpreparing information from the vast collection of documents usable by the vast collection of documents usable by users. • For example, IR, text summarization, QA, F l IR t t i ti QA text mining, and clustering • Use human assessments as success criteria NTC7 OV 2008-12-17 Noriko Kando 3
Focus of NTCIR Focus of NTCIR N New Challenges Ch ll Lab-type IR Test Intersection of IR + NLP Intersection of IR NLP Asian Languages/cross-language Asian Languages/cross-language To make information in the Variety of Genre documents more usable for Parallel/comparable Corpus Parallel/comparable Corpus users! users! Realistic eval/user task Forum for Researchers Idea Exchange Discussion/Investigation on Evaluation methods/metrics Evaluation methods/metrics NTC7 OV 2008-12-17 Noriko Kando 4
Tasks at past NTCIRs Tasks (Research Areas) of NTCIR Workshops p 1st 2nd 3rd 4th 5th 6th Japanese IR news sci T Cross-lingual IR Cross lingual IR T a Patent Retrieval s map/classif k k W b R Web Retrieval i l s Navigational Geo Result Classification Term Extraction QuestionAnswering Info Access Dialog S Summ metrics t i s Cross-Lingual Text Summarization Trend Information Opinion Analysis NTC7 OV 2008-12-17 Noriko Kando 5
NTCIR-7 Clusters NTCIR-7 Clusters Cluster 1. Advanced CLIA Mu uST; V - Complex CLQA ( Chinese, Japanese, English) - IR for QA (Chinese, Japanese, English) Visuali Cluster 2. User-Generated : - Multilingual Opinion Analysis Multilingual Opinion Analysis zation Cluster 3. Focused Domain : Patent - Patent Translation ; English -> Japanese, P t t T sl ti ; E n Chall li h J - Patent Mining paper -> IPC Cluster 4. MuST : enge - Multi-modal Summarization of Trends NTC7 OV 2008-12-17 Noriko Kando 6
NTCIR 7 is made up of NTCIR-7 is made up of… • Cluster 1: Advanced Cross lingual Information • Cluster 1: Advanced Cross-lingual Information Access (ACLIA) = CCLQA + IR4QA • Cluster 2: Multilingual Opinion Analysis task • Cluster 2: Multilingual Opinion Analysis task (MOAT) + CLIRB • Cluster 3: Focused Domains • Cluster 3: Focused Domains = PATMT + PATMN • Multimodal Summarization of Trend Multim d l Summ i ti n f T nd information (MuST) • The 2 nd International Workshop on Evaluating The 2 nd Internati nal W rksh p n Evaluatin Information Access (EVIA)
Evaluation Workshops Evaluation Workshops • "evaluation“ It is not an competition! not an exam! • Constructs a common data set usable for Constructs a common data set usable for experiments. • provides to participants the data sets and unified provides to participants the data sets and unified procedures for evaluation – Each participating research group conducts experiments with various approaches and can participate with own h h d h purpose. • Successful examples; TREC CLEF DUC INEX Successful examples; TREC, CLEF, DUC, INEX, and TAC, FIRE (new!) Community-based activities • Implications are various Implications are various NTC7 OV 2008-12-17 Noriko Kando 8
IA Systems Evaluation IA Systems Evaluation • Engineering Level: Efficiency Engineering Level Efficiency • Input Level: ex. Exhaustivity, quality, novelty of DB • Process Level: Effectiveness ex. recall,precision P L l Eff ti ll i i • Output Level: Display of output • User Level: ex. Effort that users need • Social Level: ex. Importance (Cleverdon & Keen 1966) L . mp r n ( r n & K n 966) NTC7 OV 2008-12-17 Noriko Kando 9
Retrieval Difficulty Varies with Topics Effectiveness Effectiveness Across TOPICS J-J Level1 D auto Across SYSTEMS on a System 検索システム別の11pt再現率精度 1.0000 101 101 102 103 0.8000 Average over A 1 104 B 105 50 topics 50 topics 106 C C 107 D 0.8 108 0.6000 E 109 cision 110 F 111 on G G 112 pre precisi 0.6 H 113 0.4000 114 I 115 J 116 0.4 K 117 118 L 0.2000 119 M 120 0.2 N 121 122 O 123 0.0000 P 124 0 125 0 2 4 6 8 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 . . . . . . 0 0 0 0 0 1 126 recall 127 recall recall NTC7 OV 2008-12-17 Noriko Kando 10
Retrieval Difficulty Varies with Topics J-J Level1 D auto J J L l1 D t Effectiveness Effectiveness Across TOPICS 検索システム別の11pt再現率精度 1.0000 Across SYSTEMS on a System on a System 101 102 “Difficult Topics” Vary with 103 0.8000 A 1 Average over 104 B 105 Systems Systems 50 topics 50 topics 106 J-J Level1 D auto C 107 D 0.8 108 0.6000 E 109 A 1.0000 ecision 110 F B 111 ision G C Precision n 0.6 0.6 112 pre 0 8000 0.8000 preci H 113 D 0.4000 114 I E 115 0.6000 J ecision F 116 0.4 K 117 G pre an Ave P 118 L 0.2000 0.4000 H 119 M I 120 0.2 N 121 J 0.2000 122 O K 123 123 Mea 0.0000 P L 124 0 0.0000 For reliable and 125 0 2 4 6 8 0 0 0.1 0.2 0.3 0.4 0.5 M 0.6 0.7 0.8 0.9 1 . . . . . . 0 0 0 0 0 1 126 N stable evaluation, stable evaluation, recall 127 Requests #101 150 Requests #101-150 recall O Topic# Topic# using substantial # topics is inevitable. NTC7 OV 2008-12-17 Noriko Kando 11
TC usable to evaluate? Pharmaceutical R & D Phase II : Phase III: Phase IV: Phase I: Animal Clinical Test Test with In Vitro Experiments Healthy Human y Subject NTC7 OV 2008-12-17 Noriko Kando 12
TC usable to evaluate what? NTCIR Users’ information seeing Test Collections tasks Phase II : Phase III: Phase IV: Phase I: Sharing Controlled Uncontrolled Modules Modules , Interactive Interactive Pre operational Pre-operational Laboratory- Prototype Testing using Testing type Testing testing human Subjects Pharmeceutical R Phase II : Phase III: Phase IV: & D Phase I: Animal Clinical Test Test with In Vitro Experiments Healthy Human y Subject 2.Input Level 、 6.Social Level Levels of 4.User Level 、 5.Output Levle Evaluation Evaluation 1.Engineering Level efficiency 3.Process Level: effectiveness NTC7 OV 2008-12-17 Noriko Kando 13
Summary of “What is NTCIR ” Summary of What is NTCIR • Providing a scientific basis for understanding • Providing a scientific basis for understanding the effectiveness of automated information access technologies access technologies • Leveraging the R&D and technology transfer • Reusable Test collection is a key component R bl T ll k • Evaluating search effectiveness is not easy. g y A small-scale or carelessly-designed TCs may skew the test results NTC7 OV 2008-12-17 Noriko Kando 14
NTCIR-7: Advanced CLIA Teruko Mitamura (CMU) Eric Nyberg (CMU) Eric Nyberg (CMU) Ruihua Chen (MSRA) Fred Gey (UCB), Donghong Ji (Wuhan Univ) Donghong Ji (Wuhan Univ) Noriko Kando (NII) Chin-Yew Lin (MSRA) Chuan-Jie Lin (Nat Taiwan Ocean Univ) Tsuneaki Kato (Tokyo Univ) Tatsunori Mori (Yokohama N Univ) Tatsunori Mori (Yokohama N Univ) Tetsuya Sakai (NewsWatch) Ad i Advisor: K.L.Kwok (Queen College) K L K k (Q C ll ) NTC7 OV 2008-12-17 Noriko Kando 15
NTCIR-7: Advanced CLIA Answer Question Question Translation Translation CCLQA CCLQA Extraction & Analyzers & Retrieval Formatting XML,AP Q with q- Retrieved I Questions Answers Questions types documents Eval. By IR for QA CLIR - IR Effectiveness - QA Effectiveness QA Effectiveness -Test effectiveness of T t ff ti f OOV, PRF, QE in QA - Focused Retrieval - Focused Retrieval NTC7 OV 2008-12-17 Noriko Kando 16
ACLIA Complex Cross-lingual Question Answering Complex Cross lingual Question Answering (CCLQA) Task Different teams Small teams that can exchange can exchange do not possess d t and create a an entire QA “dream-team” dream team system system QA system can contribute IR and QA communities can collaborat IR d QA i i ll b
Recommend
More recommend