robustness robustness robustness
play

Robustness? Robustness ? Robustness? - PDF document

Robustness? Robustness ? Robustness? Thomas Mandl


  1. � �� ��������������� ��������������� Robustness? Robustness ? Robustness? ����������������������� Thomas Mandl ������������ ���!�"##$� Information Science • Robust … means … capable of functioning Universität Hildesheim mandl@uni-hildesheim.de correctly, (or at the very minimum, not failing catastrophically) under a great many Robust Task - conditions. (http://www.reference.com/) Result Overview and Lessons Learned from Robustness • Robust IR means the capability of an IR Evaluation system to work well (and reach at least a minimal performance) under a variety of conditions (topics, difficulty, collections, users, languages …) ��������������� ����������������������� Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 1 2 Variety of of conditions conditions … … System Variance System System Variance Variance Variety Variety of conditions … 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.1 0.2 0 0.1 Mono FR Mono EN Mono PT Bi ->FR 0 Mono FR Mono EN Mono PT Bi ->FR Variance between topics Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 3 4 Robust Task Task 2007 2007 History of Robust IR Evaluation of Robust IR Evaluation Robust Robust Task 2007 History History of Robust IR Evaluation • TREC • Again … – Mono-lingual Retrieval – Use topics and relevance assessment from previous CLEF campaigns – 2003 - 2005 – Take a different perspective and use a robust • CLEF evaluation measure (GMAP) – Mono-, bi- and Multilingual Retrieval – Emphasize the difficult (= low performing) – 2006 six languages topics – 2007 three languages Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 5 6 1

  2. Training and Test Training and Test Which system Which system is is better? better? Training and Test Which system is better? • CLEF 2001, 2002 and 2003 for training 1 0.9 • CLEF 2004, 2005 and 2006 for testing 0.8 n Topics ∏ = geoAve 0.7 x n I i 0.6 II = 1 i 0.5 III 0.4 0.3 T o p ic S y s te m R e s u lt T o p ic S y s te m R e s u lt 0.2 1 A 0 .1 1 B 0 .2 0.1 0 2 A 0 .1 2 B 0 .2 Result A Result B 3 A 0 .9 3 B 0 .6 G e o A v e A 0 .2 1 G e o A v e B 0 .2 9 M A P A 0 .3 7 M A P B 0 .3 3 Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 7 8 Collections Collections Collections Robust Task Robust Task 2007 Robust Task 2007 2007 Language Target Collection Training Test • ����������� ������������ ����������� Topics Topics • �������������������� English Los Angeles Times 1994 41-200 251-350 • ����������������� ������������������� • ���� ����������������� French Le Monde 1994 41-140 251-350 • ����� ���������� ������� Swiss News Agency 94 Portuguese P ú blico 1995 - 201-350 Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 9 10 Participation Results Participation Participation Results Results Mono English • 63 runs submitted by 7 groups Rank Participant Experiment MAP GMAP 1st reina 10.2415/AH-ROBUST-MONO-EN-TEST- 38.97% 18.50% • 2006: 133 runs by 8 groups CLEF2007.REINA.REINAENTDNT 2nd daedalus 10.2415/AH-ROBUST-MONO-EN-TEST- 37.78% 17.72% CLEF2007.DAEDALUS.ENFSEN22S 3rd hildesheim 10.2415/AH-ROBUST-MONO-EN-TEST- 5.88% 0.32% CLEF2007.HILDESHEIM.HIMOENBRFNE Mono Portuguese Rank Participant Experiment MAP GMAP 10.2415/AH-ROBUST-MONO-PT-TEST- 1st reina CLEF2007.REINA.REINAPTTDNT 41.40% 12.87% 10.2415/AH-ROBUST-MONO-PT-TEST- 2nd jaen CLEF2007.JAEN.UJARTPT1 24.74% 0.58% 10.2415/AH-ROBUST-MONO-PT-TEST- 3rd daedalus CLEF2007.DAEDALUS.PTFSPT2S 23.75% 0.50% 10.2415/AH-ROBUST-MONO-PT-TEST- 4th xldb CLEF2007.XLDB.XLDBROB16 1.21% 0.071% Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 11 12 2

  3. Results Results Mono English Mono English Results Mono Results Mono Portuguese Portuguese Results Mono English Results Mono Portuguese Ad−Hoc Robust Monolingual English Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision Ad−Hoc Robust Monolingual Portuguese Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision 100% 100% reina [Experiment REINAENTDNT; MAP 38.97%; Not Pooled] reina [Experiment REINAPTTDNT; MAP 41.40%; Not Pooled] daedalus [Experiment ENFSEN22S; MAP 37.78%; Not Pooled] jaen [Experiment UJARTPT1; MAP 24.74%; Not Pooled] 90% hildesheim [Experiment HIMOENBRFNE; MAP 5.88%; Not Pooled] 90% daedalus [Experiment PTFSPT2S; MAP 23.75%; Not Pooled] xldb [Experiment XLDBROB16_10; MAP 1.21%; Not Pooled] 80% 80% 70% 70% 60% 60% Precision Precision 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 13 14 Results Results Results Results Mono French Results Results Mono French Mono French Ad−Hoc Robust Monolingual French Test Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision Mono French 100% unine [Experiment UNINEFR1; MAP 42.13%; Not Pooled] Rank Participant Experiment MAP GMAP reina [Experiment REINAFRTDET; MAP 38.04%; Not Pooled] 1st unine 10.2415/AH-ROBUST-MONO-FR-TEST- 42.13% 14.24% 90% jaen [Experiment UJARTFR1; MAP 34.76%; Not Pooled] CLEF2007.UNINE.UNINEFR1 daedalus [Experiment FRFSFR22S; MAP 29.91%; Not Pooled] hildesheim [Experiment HIMOFRBRF2; MAP 27.31%; Not Pooled] 2nd reina 10.2415/AH-ROBUST-MONO-FR-TEST- 38.04% 12.17% 80% CLEF2007.REINA.REINAFRTDET 70% 3rd jaen 10.2415/AH-ROBUST-MONO-FR-TEST- 34.76% 10.69% CLEF2007.JAEN.UJARTFR1 4th daedalus 10.2415/AH-ROBUST-MONO-FR-TEST- 29.91% 7.43% 60% CLEF2007.DAEDALUS.FRFSFR22S Precision 50% 5th hildesheim 10.2415/AH-ROBUST-MONO-FR-TEST- 27.31% 5.47% CLEF2007.HILDESHEIM.HIMOFRBRF2 40% Bi -> French 30% Rank Participant Experiment MAP GMAP 10.2415/AH-ROBUST-BILI-X2FR-TEST- 20% 1st reina CLEF2007.REINA.REINAE2FTDNT 35.83% 12.28% 10.2415/AH-ROBUST-BILI-X2FR-TEST- 10% 2nd unine CLEF2007.UNINE.UNINEBILFR1 33.50% 5.01% 10.2415/AH-ROBUST-BILI-X2FR-TEST- 0% 3rd colesun CLEF2007.COLESUN.EN2FRTST4GRINTLOGLU001 22.87% 3.57% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 15 16 Results Bi Bi- -lingual X lingual X - -> French > French Approaches Results Results Bi-lingual X -> French Approaches Approaches Ad−Hoc Robust Bilingual Test Task, French target collection(s) Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision 100% reina [Experiment REINAE2FTDNT; MAP 35.83%; Not Pooled] • Adoption of traditional and “advanced” CLIR unine [Experiment UNINEBILFR1; MAP 33.50%; Not Pooled] 90% colesun [Experiment EN2FRTST4GRINTLOGLU001; MAP 22.87%; Not Pooled] methods 80% – BM 25 ( Miracle ) 70% – N-gram translation ( CoLesIR ) 60% Precision – Weighting, stemming ( Uni NE ) 50% 40% 30% • Adoption of “robust” heuristics 20% – Expansion with an external resource ( SINAI ) 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Recall Thomas Mandl: Robust CLEF 2007 - Overview Thomas Mandl: Robust CLEF 2007 - Overview 17 18 3

Recommend


More recommend