overview of the 2019 open source ir replicability
play

Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC - PowerPoint PPT Presentation

Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019) Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu Vision Source: saveur.com Vision The ultimate candy store for information retrieval


  1. Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019) Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

  2. Vision Source: saveur.com

  3. Vision The ultimate candy store for information retrieval researchers! Source: Wikipedia (Candy)

  4. Vision The ultimate candy store for information retrieval researchers! See a result you like? Click a button to recreate those results! Really, any result? (not quite… let’s start with batch ad hoc retrieval experiments on standard test collections) What is this, really?

  5. Repeatability: you can recreate your own We get this “for free” results again Replicability: others can recreate your results (with your code) Our focus Reproducibility: others can recreate your results (with code they rewrite) Stepping stone… ACM Artifact Review and Badging Guidelines

  6. Why is this important? Good science Sustained cumulative progress Armstrong et al. (CIKM 2009): Little empirical progress made from 1998 to 2009 Why? researchers compare against weak baselines Yang et al. (SIGIR 2019): Researchers still compare against weak baselines

  7. How do we get there? Open-Source Code! … h g u o n e m o f r r a f t u b t , r a s t d o o g A TREC 2015 “Open Runs” 79 submitted runs… Voorhees et al. Promoting Repeatability Through Open Runs. EVIA 2016.

  8. 0 Number of runs successfully replicated Voorhees et al. Promoting Repeatability Through Open Runs. EVIA 2016.

  9. How do we get there? Open-Source Code! … h g u o n e m o f r r a f t u b t , r a s t d o o g A Ask developers to show us how! Open-Source IR Reproducibility Challenge (OSIRRC), SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR) Participants contributed end-to-end scripts for replicating ad hoc retrieval experiments Lin et al. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. ECIR 2016.

  10. System E ff ectiveness 0.75 0.50 MAP 0.25 0.00 5 L P L B P 5 5 + M M D 5 H ) ) E . t 2 2 2 2 Q Q s Q B S P n M M M D D M M : B o u J D P S S x : : 1 : 1 4 o B 5 B B B o i J o ( G o r C . g : 4 2 : : r : B d S r 5 : o i : : a M . G P ( n E r J e r S t 2 + l : g n d + e a S I 4 i 5 A R M M a r i a n G H G 2 S l r H r J I u a I M B r A e P T M G P T e Q T J A D B D : e : E n : : : e r R e r n e c e i I e u i r T r c L r A r u e e T L T System / Model 7 participating systems, GOV2 collection

  11. System E ffi ciency 100,000 10,000 Search Time (ms) 1,000 100 10 1 P B P + 5 ) ) 5 5 5 H L D L E M M t . 2 2 2 2 s Q Q Q B n P S M M M M D D M : B o u J D P x S S 1 : : : 1 4 o 5 B B B B J o i ( o G o C r . : 4 g 2 : r B : : S d 5 r : : o i M G . ( : a P E J e n r S t 2 r + : l g n + d S 5 4 e i a I A M M R a r a G i H n 2 G S l r r H J I u M a I B A r e P T M T P G Q e J A T D B : D e : E n : : e : r R e r n e c e i I e u i r T c r L r A u r e e T L T System / Model 7 participating systems, GOV2 collection

  12. E ff ectiveness/E ffi ciency Tradeo ff 10000 Indri: SDM Galago: SDM Terrier: DPH+Bo1 QE Indri: QL Terrier: DPH+Prox SD 1000 Time (ms) Galago: QL Terrier: DPH Terrier: BM25 MG4J: BM25 ATIRE: BM25 Lucene: BM25 (Pos.) Lucene: BM25 (Count) 100 ATIRE: Quant. BM25 MG4J: B+ JASS: 1B P MG4J: B JASS: 2.5M P 8 0 2 4 2 3 3 3 . . . . 0 0 0 0 MAP 7 participating systems, GOV2 collection

  13. How do we get there? Open-Source Code! … h g u o n e m o f r r a f t u b t , r a s t d o o g A Ask developers to show us how! I t w o r k e d , b u t …

  14. What worked well? We actually pulled it off! What didn’t work well? Technical infrastructure was brittle Replication scripts too under-constrained

  15. Infrastructure Source: Wikipedia (Burj Khalifa)

  16. VMs App App OS OS VM VM hypervisor Physical Machine

  17. Containers App App Container Container Container Engine OS Physical Machine

  18. Infrastructure Source: Wikipedia (Burj Khalifa)

  19. Workshop Goals 1. Develop common Docker specification for capturing ad hoc retrieval experiments – the “jig”. 2. Build a library of curate images that work with the jig. 3. Take over the world! (encourage adoption, broaden to other tasks, etc.)

  20. jig Docker image User specifies <image>:<tag> Starts image prepare phase Triggers hook init hook Triggers hook index hook Creates snapshot <snapshot> <image>:<tag> Triggers hook with snapshot search hook search run files phase trec_eval

  21. Source: Flickr (https://www.flickr.com/photos/m00k/15789986125/)

  22. 17 images 13 different teams Focus on newswire collections: Robust04, Core17, Core18 Official runs on Microsoft Azure f t o s o r c M i k s n a h T ! s i t d e r c e e r f r o f

  23. Anserini (University of Waterloo) Anserini-bm25prf (Waseda University) ATIRE (University of Otago) Birch (University of Waterloo) Elastirini (University of Waterloo) EntityRetrieval (Ryerson University) Galago (University of Massachusetts) ielab (University of Queensland) Indri (TU Delft) IRC-CENTRE2019 (T echnische Hochschule Köln) JASS (University of Otago) JASSv2 (University of Otago) NVSM (University of Padua) OldDog (Radboud University) PISA (New York University and RMIT University) Solrini (University of Waterloo) T errier (TU Delft and University of Glasgow)

  24. Robust04 49 runs from 13 images Images captured diverse models: query expansion and relevance feedback conjunctive and efficiency-oriented query processing neural ranking models

  25. Core17 12 runs from 6 images

  26. Core18 19 runs from 4 images

  27. Robust04 49 runs from 13 images

  28. Who won? Source: Time Magazine

  29. But it’s not a competition! Source: Washington Post

  30. TREC best – 0.333 TREC median (title) – 0.258

  31. Workshop Goals ✓ 1. Develop common Docker specification for capturing ✓ ad hoc retrieval experiments – the “jig”. 2. Build a library of curate images that work with the jig. ? 3. Take over the world! (encourage adoption, broaden to other tasks, etc.)

  32. What’s next? Source: flickr (https://www.flickr.com/photos/39414578@N03/16042029002)

Recommend


More recommend