CLEF 2009 Workshop September 30th - October 2nd 2009, Κέρκυρα , Greece Grid@CLEF Track Overview Donna Harman Nicola Ferro NIST, USA University of Padua, Italy donna.harman@nist.gov ferro@dei.unipd.it
Issues The CLEF research community has been outstanding and very active in designing, developing, and testing MLIA methods and techniques, constantly improving the performances of such components BUT Do we really know how MLIA components behave with respect to languages? Do we have a deep comprehension of how these components interact together when the language changes? CLEF 2009 Workshop Nicola Ferro and Donna Harman 2 September 30th - October 2nd 2009, Κέρκυρα , Greece
Objectives Look at differences across a wide set of languages; Identify best practices for each language; Help other countries to develop their expertise in the IR field and create IR groups; Provide a repository, in which all the information and knowledge derived from the experiments undertaken can be managed and made available CLEF 2009 Workshop Nicola Ferro and Donna Harman 3 September 30th - October 2nd 2009, Κέρκυρα , Greece
Where we are? CLEF 2009 Workshop Nicola Ferro and Donna Harman 4 September 30th - October 2nd 2009, Κέρκυρα , Greece
Where we are? CLEF 2009 Workshop Nicola Ferro and Donna Harman 4 September 30th - October 2nd 2009, Κέρκυρα , Greece
How Can We Get There? CLEF 2009 Workshop Nicola Ferro and Donna Harman 5 September 30th - October 2nd 2009, Κέρκυρα , Greece
Approach It’s not competition It’s not ranking It’s participation and cooperation CLEF 2009 Workshop Nicola Ferro and Donna Harman 6 September 30th - October 2nd 2009, Κέρκυρα , Greece
The CIRCO Framework The framework allows for a distributed , loosely-coupled , and asynchronous experimental evaluation of Information Retrieval (IR) systems where: distributed implies that different stakeholders can take part in the experimentation, each one providing one or more components of the whole IR system to be evaluated; loosely-coupled points out that minimal integration among the different components is required to carry out the experimentation; asynchronous underlines that no synchronization among the different components is required to carry out the experimentation. Stop Word Tokenizer Stemmer Indexer Remover CLEF 2009 Workshop Nicola Ferro and Donna Harman 7 September 30th - October 2nd 2009, Κέρκυρα , Greece
Participation Participant Institution Country chemnitz Chemnitz University of Technology Germany cheshire U.C.Berkeley United States Groups 9 subscribed 2 succeeded 18 runs Task # Participants # Runs Monolingual Dutch 0 0 Monolingual English 2 6 Monolingual French 2 6 Monolingual German 2 6 Monolingual Italian 0 0 Total 18 CLEF 2009 Workshop Nicola Ferro and Donna Harman 8 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF Collections Language Collection Documents Size (approx.) NRC Handelsblad 1994/95 84,121 291 Mbyte Dutch Algemeen Dagblad 1994/95 106,484 235 Mbyte 190,605 526 Mbyte English Los Angeles Times 1994 113,005 420 Mbyte Le Monde 1994 44,013 154 Mbyte French French SDA 1994 43,178 82 Mbyte 87,191 236 Mbyte Frankfurter Rundschau 1994 139,715 319 Mbyte Der Spiegel 1994/95 13,979 61 Mbyte German German SDA 1994 71,677 140 Mbyte 225,371 520 Mbyte La Stampa 1994 58,051 189 Mbyte Italian Italian SDA 1994 50,527 81 Mbyte 108,578 270 Mbyte CLEF 2009 Workshop Nicola Ferro and Donna Harman 9 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF Topics 84 topics in Dutch, English, French, German, and Italian from CLEF 2001&2002 All the topics have relevant documents in all the collections CLEF 2009 Workshop Nicola Ferro and Donna Harman 10 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF Results English CLEF 2009 Workshop Nicola Ferro and Donna Harman 11 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF Results English French CLEF 2009 Workshop Nicola Ferro and Donna Harman 11 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF Results English French German CLEF 2009 Workshop Nicola Ferro and Donna Harman 11 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF: Approaches Models: vector space + divergence from randomness Blind query expansion (top terms based on Lucene + Terrier from top docs) Stop words Data fusion with Z-score Snowball, N-grams for German, Krovetz and Savoy’s stemmers Track Rank Participant Experiment DOI MAP chemnitz 54.45% 1st 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER chesire 53.13% English 2nd 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB 2.48% Di ff erence chesire 51.88% 1st 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB chemnitz 49.42% French 2nd 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 4.97% Di ff erence chemnitz 48.64% 1st 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER chesire 40.02% German 2nd 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB 21.53% Di ff erence CLEF 2009 Workshop Nicola Ferro and Donna Harman 12 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF: Approaches Models: vector space + We look for strong rules which let us predict divergence from randomness the retrieval quality . . . [and] enable us to Blind query expansion (top terms based on Lucene + Terrier from top docs) automatically configure a retrieval engine in Stop words Data fusion with Z-score accordance to the corpus Snowball, N-grams for German, Krovetz and Savoy’s stemmers Track Rank Participant Experiment DOI MAP chemnitz 54.45% 1st 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER chesire 53.13% English 2nd 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB 2.48% Di ff erence chesire 51.88% 1st 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB chemnitz 49.42% French 2nd 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 4.97% Di ff erence chemnitz 48.64% 1st 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER chesire 40.02% German 2nd 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB 21.53% Di ff erence CLEF 2009 Workshop Nicola Ferro and Donna Harman 12 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF: Approaches Models: vector space + Models: logistic regression based divergence from randomness Blind query expansion (top terms Blind query expansion on Chesire II based on Lucene + Terrier from top docs) (probabilistic relevance feedback, Stop words Stop words top 10 terms from top 10 docs) Data fusion with Z-score Snowball, N-grams for German, Stemmer (Snowball) Krovetz and Savoy’s stemmers Track Rank Participant Experiment DOI MAP chemnitz 54.45% 1st 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER chesire 53.13% English 2nd 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB 2.48% Di ff erence chesire 51.88% 1st 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB chemnitz 49.42% French 2nd 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 4.97% Di ff erence chemnitz 48.64% 1st 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER chesire 40.02% German 2nd 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB 21.53% Di ff erence CLEF 2009 Workshop Nicola Ferro and Donna Harman 12 September 30th - October 2nd 2009, Κέρκυρα , Greece
Grid@CLEF: Approaches Models: vector space + We aim at understanding what happens when you Models: logistic regression based divergence from randomness try to separate the processing elements of IR Blind query expansion (top terms Blind query expansion based on Lucene + Terrier on Chesire II systems, taking this as an opportunity to re-analyse from top docs) (probabilistic relevance feedback, Stop words Stop words and improve our system by finding a way to top 10 terms from top 10 docs) Data fusion with Z-score incorporate components of other IR systems Snowball, N-grams for German, Stemmer (Snowball) Krovetz and Savoy’s stemmers Track Rank Participant Experiment DOI MAP chemnitz 54.45% 1st 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER chesire 53.13% English 2nd 10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB 2.48% Di ff erence chesire 51.88% 1st 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB chemnitz 49.42% French 2nd 10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 4.97% Di ff erence chemnitz 48.64% 1st 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER chesire 40.02% German 2nd 10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB 21.53% Di ff erence CLEF 2009 Workshop Nicola Ferro and Donna Harman 12 September 30th - October 2nd 2009, Κέρκυρα , Greece
Recommend
More recommend