MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics

Motivation Opportunity to evaluate NIR model (participatingin pool) • Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) • With long queries (TREC description) • Re-ranking results from unsupervised model Significant improvement with a strong signal from WSDM '18? How does it compare to BM25 with short queries (& pool)? 2

Outline • Model summary (PACRR & Co-PACRR) • Parameters varied • Experimental setup • Results 3

Input Representation Document bayern Query beats dortmund Query-document similarity matrix • word2vec similarity • One matrix for each document 4

Using Positional Information Document window bayern bayern bayern Query beats beats beats dortmund dortmund dortmund Match patterns (Convolutional kernels) PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 5

Using Positional Information Document window bayern bayern bayern Query beats beats beats dortmund dortmund dortmund Partial match Ordered match Reversed ordered match PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 6

Using Positional Information bayern bayern beats beats dortmund dortmund Matches are local: consider N x N regions of the matrix PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 7

Using Positional Information  bayern beats dortmund ✓  Patterns are exclusive: each region is best matched by a single pattern PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 8

PACRR: Position-Aware Convolutional Recurrent Relevance Matching w: kernel (1) CNN kernels capture patterns PACRR: A Position-Aware Neural IR Model for RelevanceMatching. K Hui, A Yates, K Berberich, G de Melo. In: EMNLP '17. 9

PACRR: Position-Aware Convolutional Recurrent Relevance Matching w: kernel 6 7 8 1 2 3 (1) CNN kernels capture patterns Signal for this region: w 1,1 x 1,6 + w 1,2 x 1,7 + w 1,3 x 1,8 + … + w 2,1 x 2,6 + … w 3,3 x 3,8 10

PACRR: Position-Aware Convolutional Recurrent Relevance Matching Best-matching pattern ✓ (1) CNN kernels (2) Max pool Signal: 1.0 capture patterns kernels Signal: 0 Signal: 0.3 11 11

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions K=2 12 12

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions For each query term, we now have: • K-max match signals for unigrams • K-max match signals for bigrams • … • K-max match signals for n-grams 13 13

PACRR: Position-Aware Convolutional Recurrent Relevance Matching (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers] 14 14

PACRR: Position-Aware Convolutional Related to MatchPyramid, but Recurrent Relevance Matching e.g., different pooling strategies A Study of MatchPyramid Models on Ad-hoc Retrieval . L. Pang, Y. Lan, J. Guo, J. Xu, Z. Cheng. Neu-IR '16 SIGIR Workshop. (1) CNN kernels (2) Max pool (3) K-max pool query capture patterns kernels signals from doc regions (4) Combination function (FC layers) produce a score for each query term (5) Document score is the summation [Steps 4 & 5 differ from original papers] 15 15

Variant: Cascade Pooling • Inspired by cascade model An experimental comparison of click position-bias models . Craswell et al. WSDM '08. • Prefer document with earlier relevant information • One of several improvements in Co-PACRR (WSDM '18) > Document A Document B 16

Variant: Cascade Pooling For each query term, PACRR retains top k match signals • Cascade Pooling: repeat for different document cutoffs • Top k signals from the first 50% of the document • Top k signals from the entire document Query term FC receives match signals from different cutoffs Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval. K Hui, A Yates, K Berberich, G de Melo. In: WSDM '18. 17

Parameters Varied 1. Cascade pooling used? (3 with, 2 without) 2. Size of k -max pooling (top 5 vs. 15) 3. Size of fully connected layers that score query term (2x8 or 1) 18

Experimental Setup • Train on TREC WT09-13 judgments • WT14 and WWW-1 used for validation • Using best weights on WWW-1 (after sanity checking on WT14), re-rank BM25 run provided by organizers 19

Results & Conclusion • No significant improvement between any pair of runs • No significant improvement over BM25 • Given past results, minD >= 0.1 seems large 20

Results & Conclusion • No significant improvement between any pair of runs • No significant improvement over BM25 • Given past results, minD >= 0.1 seems large Recent work building on PACRR (and other NIR models): CEDR: Contextualized Embeddings for Document Ranking. S. MacAvaney, A. Yates, A. Cohan, N. Goharian. SIGIR '19. Thanks! 21

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation Opportunity to evaluate NIR model (participatingin pool) Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) With long queries

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

CUTKB at NTCIR-14 QALab-PoliInfo Task Toshiki Tomihira and Yohei Seki University of Tsukuba,

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

VCI 2 R at the NTCIR-13 Lifelog-2 LIT Task Presented by: Qianli Xu Co-authors: Qianli Xu, V.

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del

SSTUT at NTCIR-4 Web task Yinghui Xu Kyoji Umemura Software System Lab. (Umemura Lab)

TUA1 at the NTCIR-14 STC-3 Task Chinese Emotional Conversation Generation Subtask Tokushima

THUIR at the NTCIR-14 Lifelog-3 (LIT Task): How does lifelog help the users status recognition

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora & Gareth J.F. Jones ADAPT Centre, School of

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral

Forst: Question Answering System for Term and Essay Questions at NTCIR-13 QA Lab-3 Task Kotaro

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

RICT at the NTCIR-14 QALab- PoliInfo Task Jiawei Yong, Shintaro Kawamura, Katsumi Kanasaki,

IMTKU Emotional Dialogue System for Short Text Conversation at NTCIR-14 STC-3 (CECG) Task

Quasi-Random Rumor Spreading Benjamin Doerr MPII Saarbrcken joint work with Tobias Friedrich

HCMUS at the NTCIR-14 Lifelog-3 Task Nguyen-Khang Le, Dieu-Hien Nguyen, Trung-Hieu Hoang,

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute - PowerPoint PPT Presentation

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation Opportunity to evaluate NIR model (participatingin pool) Previously evaluated on TREC Web Track 09-14 (WSDM '18, EMNLP '17) With long queries

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

CUTKB at NTCIR-14 QALab-PoliInfo Task Toshiki Tomihira and Yohei Seki University of Tsukuba,

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

VCI 2 R at the NTCIR-13 Lifelog-2 LIT Task Presented by: Qianli Xu Co-authors: Qianli Xu, V.

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del

SSTUT at NTCIR-4 Web task Yinghui Xu Kyoji Umemura Software System Lab. (Umemura Lab)

TUA1 at the NTCIR-14 STC-3 Task Chinese Emotional Conversation Generation Subtask Tokushima

THUIR at the NTCIR-14 Lifelog-3 (LIT Task): How does lifelog help the users status recognition

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora &amp; Gareth J.F. Jones ADAPT Centre, School of

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral

Forst: Question Answering System for Term and Essay Questions at NTCIR-13 QA Lab-3 Task Kotaro

DCU at the NTCIR-11 SpokenQuery&amp;Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

RICT at the NTCIR-14 QALab- PoliInfo Task Jiawei Yong, Shintaro Kawamura, Katsumi Kanasaki,

IMTKU Emotional Dialogue System for Short Text Conversation at NTCIR-14 STC-3 (CECG) Task

Quasi-Random Rumor Spreading Benjamin Doerr MPII Saarbrcken joint work with Tobias Friedrich

HCMUS at the NTCIR-14 Lifelog-3 Task Nguyen-Khang Le, Dieu-Hien Nguyen, Trung-Hieu Hoang,

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora & Gareth J.F. Jones ADAPT Centre, School of

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for