russian information retrieval evaluation seminar romip
play

Russian Information Retrieval Evaluation Seminar (ROMIP) (ROMIP) - PowerPoint PPT Presentation

Russian Information Retrieval Evaluation Seminar (ROMIP) (ROMIP) http://romip.ru/en/ Igor Nekrestyanov, Pavel Braslavski CLEF 2010 ROMIP at a glance ROMIP at a glance TREC like Russian initiative Started 2002 Several text and


  1. Russian Information Retrieval Evaluation Seminar (ROMIP) (ROMIP) http://romip.ru/en/ Igor Nekrestyanov, Pavel Braslavski CLEF 2010

  2. ROMIP at a glance ROMIP at a glance • TREC ‐ like Russian initiative • Started 2002 • Several text and image collections g • 10 ‐ 15 participants per year (total 50+) • Academia and industry, students support • ~3 000 man ‐ hours of evaluation (2009) • Remote participation + live meeting Remote participation + live meeting • Collections are freely available • Popular testbed for IR research in Russia • Related activities: summer school in IR Related activities: summer school in IR 21.09.2010 ROMIP 3

  3. Why? Why? • Russia specifics Russia specifics Strong IR industry � Limited research in academia � Participation in global events considered complicated for Russian � groups (language barrier, costs, etc.) Russian language was not covered in international campaigns � • Objectives j Consolidate IR community � Stimulate research in the area Stimulate research in the area � � Independent evaluation � 21.09.2010 ROMIP 4

  4. Evaluation methodology Evaluation methodology � Similar to TREC approaches � Similar to TREC approaches � What’s special? Russian language collections � Some tasks are unique � − E.g. news clustering, snippet generation, etc. Mix of widely used and custom metrics � − E.g. snippet informativeness/readability Typically 2+ assessors (agreement 80 ‐ 85%) � Domain experts for legal ‐ related tracks � Rules and methodology are adjusted yearly � 21.09.2010 ROMIP 5

  5. Largest text collections Largest text collections Evaluated within Size Collection Documents Topics ad ‐ hoc search (compressed) track track ~300 000 Legal 2 Gb 14 794 220 ByWeb By.Web 1 524 676 1 524 676 8 Gb 8 Gb ~ 60 000 60 000 1 500+ 1 500+ KM.RU 3 010 455 13 Gb ~ 60 000 ~250 21.09.2010 ROMIP 6

  6. Text documents tracks Text documents tracks • Classic tracks run for years y � Ad ‐ hoc text retrieval � Text categorization (Web pages & sites legal) � Text categorization (Web pages & sites, legal) • Experimental tracks every year � Snippet generation � QA and fact extraction Q � News clustering � Search by sample document S h b l d 21.09.2010 ROMIP 7

  7. Snippets evaluation Snippets evaluation 21.09.2010 ROMIP 8

  8. Image collections Image collections � Photo collection: 20 000 images from Flickr � Photo collection: 20 000 images from Flickr � Dups collection: 15 hrs video � 37 800 frames 21.09.2010 ROMIP 9 9

  9. Image tracks Image tracks � Content based image retrieval (started 2008) − 750 tasks labeled 750 tasks labeled � Near ‐ duplicate detection (started 2008) − ~1500 clusters � Image annotation (started 2010) � Image annotation (started 2010) − ~ 1000 labeled images 21.09.2010 ROMIP 10 10

  10. ROMIP timeline ROMIP timeline 25 systems applied 3000 man- QA hours eval. systems participated image tagging image news+ tracks 20 news # of tracks ROMIP O snippets i t legal 2007 15 BY.Web legal l l KM.RU KM RU search classification 10 5 0 2003 2004 2005 2006 2007 2008 2009 2010 21.09.2010 ROMIP 11

  11. Thank you! Questions? Pavel Braslavski pb@yandex ‐ team.ru Igor Nekrestyanov Igor Nekrestyanov romip@romip.ru 21.09.2010 ROMIP 12

  12. RuSSIR RuSSIR Put RuSSIR pic here Annual event Annual event 100+ participants 4 th RuSSIR: Voronezh 13 ‐ 18 September htt // http://romip.ru/russir2010/ i / i 2010/ 21.09.2010 ROMIP 13

Recommend


More recommend