romip one step forward one step aside
play

ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel - PowerPoint PPT Presentation

ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel Braslavski, Ilia Chetviorkin, Maxim Gubin, Natalia Lukashevich, Igor Nekrestyanov, Marina Nekrestyanova, Natalia Vassileva CLEF 2012 ROMIP at a glance TREC-like Russian


  1. ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel Braslavski, Ilia Chetviorkin, Maxim Gubin, Natalia Lukashevich, Igor Nekrestyanov, Marina Nekrestyanova, Natalia Vassileva CLEF 2012

  2. ROMIP at a glance • TREC-like Russian initiative • Started 2002 • Several freely available text and image collections • 10-15 participating teams each year • Remote participation + live meeting • Popular testbed for IR research in Russia • Related activities: RuSSIR 20.09.2012 ROMIP 2

  3. ROMIP 2004 20.09.2012 ROMIP 3

  4. Largest text collections Evaluated within Size Collection Documents Topics ad-hoc search (compressed) track ~300,000 Legal 2 Gb 14,794 220 By.Web 1,524,676 8 Gb ~ 60,000 1 500+ KM.RU 3,010,455 13 Gb ~ 60,000 ~250 20.09.2012 ROMIP 4

  5. (Retired) text document tracks � Ad-hoc text retrieval � Text categorization � Snippet generation � QA and fact extraction � News clustering � Search by sample document 20.09.2012 ROMIP 5

  6. Image collections � Photo collection: 20,000 images from Flickr � Dups collection: 15 hrs video � 37 800 frames � Panoramic series: 55,000 images (data recycled from Internet Math 2011) 20.09.2012 ROMIP 6

  7. Image tracks � Content based image retrieval � Near-duplicate detection � Image annotation � Finding panoramic series 20.09.2012 ROMIP 7

  8. ROMIP by 2011 • Low participation from academia • Fatigue of classical IR tasks available relevance tables – no need to participate • • overfitting on available datasets; • hard to model realistic settings and data; • well-studied tasks – new results are hard to expect. • Limited resources • ML challenges (e.g. www.kaggle.com) 20.09.2012 ROMIP 8

  9. ROMIP light 2011 • Sentiment analysis • Search by query image (low participation � ) • Schedule shifted to fall 20.09.2012 ROMIP 9

  10. ROMIP timeline 30 systems applied 25 systems participated # of tracks 20 15 10 5 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 20.09.2012 ROMIP 10

  11. Sentiment analysis (SA) � Three domains: movies , books , and digital cameras � ‘Transfer learning’ (data from different sources) � Classification into 2, 3, and 5 classes � 23 teams registered � 12 submitted results � 6 reports published � 2-class: 105 runs, 3-class: 81 runs, 5-class: 30 runs 20.09.2012 ROMIP 11

  12. SA: data Training set 15,000+ movie reviews (10-point scale) 24,000+ book reviews (10-point scale) 10,000+ camera reviews (5-point scale) Test set blog posts collected via blog search w. subsequent filtering 275 posts on movies 329 posts on books 270 posts on digital cameras 20.09.2012 ROMIP 12

  13. Plans • New edition of SA track • Finer granularity (sentence) • Opinions for a given entity • Re-launch of image tracks (in cooperation with Graphicon conference) • Machine translation track 20.09.2012 ROMIP 13

  14. MT evaluation track (2012) • Strong industrial players • 1M parallel sentences (Ru-En) to release • Collaboration with TAUS Labs • Metrics BLEU • • human assessment 20.09.2012 ROMIP 14

  15. Thank you! Questions? Pavel Braslavski pb@kontur.ru 20.09.2012 ROMIP 15

Recommend


More recommend