Analysis of the Paragraph Vector Model for Information Retrieval - PowerPoint PPT Presentation

Analysis of the Paragraph Vector Model for Information Retrieval Qingyao Ai 1 , Liu Yang 1 , Jiafeng Guo 2 , W. Bruce Croft 1 1 College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA {aiqy, lyang, croft}@cs.umass.edu 2 CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, China guojiafeng@ict.ac.cn

Motivation Most tasks in IR benefit from representations that reflect the semantic • relationships between words and documents. Word-document matching is essential for language modeling approaches. • Topic models Bags-of-words • Topic/Neural Models Representations Embeddings PLSA – president president car car LDA – … – [ 0,0,1,0,0,1,0,0 ] Query [ 0,0, 1 ,0,0, 1 ,0,0 ] Neural models • Word2vec – = 0 > 0 Paragraph vector – [ 2,0,0,0,0,0,0,3 ] Document model government vehicle No priori topic number • Highly efficient in training • Automatically learn document representations • Language model • Optimize a weighting scheme widely used in IR •

Outline • Paragraph Vector Based Retrieval Model – What is paragraph vector model – How to use it for retrieval • Issues of Paragraph Vector Model in Retrieval Scenario – Over-fitting on short documents – Improper noise distribution – Insufficient modeling for word substitution • Experiments – Experiment setup – Results – Parameter sensitivity

Paragraph Vector Model Paragraph vector model [13] jointly learns embedding for words and • documents through optimizing the probabilities of observed word-document pairs defined as: w · ~ exp ( ~ d ) P ( w | d ) = (1) w 0 · ~ w 0 2 V w exp ( ~ P d ) • The following figure describes the structure of Paragraph vector model with distributed bag-of-words assumption (PV-DBOW). food research vaccine drug … Semantic Space Document d

Language Estimation with Paragraph Vector Model • Inspired by LDA-based retrieval model [24], we apply paragraph vector model by smoothing the probability estimation in language modeling approaches with PV- DBOW and propose a paragraph vector based retrieval model (PV-LM). Query: food drug law P ( q 1 | d ) = λ P P V ( q 1 | d ) + (1 − λ ) P LM ( q 1 | d ) q 1 (2) q 2 drug … food research vaccine drug q 3 law Semantic Space Document d

Language Estimation with Paragraph Vector Model • However, PV-LM did not PV-LM QL 0.260 produce promising results: 0.259 – The performance of PV-LM is 0.258 highly sensitive to the training 0.257 iteration of PV-DBOW. MAP 0.256 – The mean average precision (MAP) of PV-LM does not outperform 0.255 LDA-LM [24] on Robust04 (0.259). 0.254 0.253 0.252 10 20 30 40 50 60 70 80 90 Iteration number Figure 1: The MAP of QL and the PV-based retrieval model with the original PV-DBOW on Robust04 with title queries in respect of different training iteration.

Overfitting on Short Documents Iter 5 Iter 20 Iter 80 900 10 800 9 Frequency in Top 50 documents 700 8 7 600 Vector Norm 6 500 5 400 4 300 3 200 2 1 100 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 2500 Document Length (words) Document Length (words) Figure 2: The distribution of documents in respect of Figure 3: The distribution of vector norms in respect of document length for top 50 documents retrieved by PV- document length for 10,000 documents randomly based retrieval model on Robust04 (title queries). sampled from Robust04. The PV-based retrieval model tends to retrieve more short documents as • training iteration increases. In a subset of 10,000 random sampled documents, we observed significant • norm increase for short documents’ vectors.

Overfitting on Short Documents Iter 5 Iter 20 Iter 80 900 10 800 9 Frequency in Top 50 documents 700 8 7 600 Vector Norm 6 500 5 400 4 300 3 200 2 1 100 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 2500 Document Length (words) Document Length (words) Figure 2: The distribution of documents in respect of Figure 3: The distribution of vector norms in respect of document length for top 50 documents retrieved by PV- document length for 10,000 documents randomly based retrieval model on Robust04 (title queries). sampled from Robust04. Long document vector norms change the probability distribution of • document language models and makes them focus on observed words. One direct solution to this problem is L2 regularization: • ` 0 ( w, d ) = ` ( w, d ) − � # d || ~ d || 2 (3)

Negative Sampling Proposed by Mikolov et al. [17], negative • Corpus sampling is a technique that approximates the k samples global objective of PV-DBOW by sampling … “negative” terms from corpus: computer X X w · ~ ` = #( w, d ) log( � ( ~ d )) food morning w ∈ V w d ∈ V d - w N · ~ X X + #( w, d )( k · E w N ∼ P V [log � ( − ~ d )]) - + America w ∈ V w d ∈ V d - (4) • If we derived the local objective of a specific word-doc pair and let its partial derivative equal to zero. Then we have: d = log(#( w, d ) 1 w · ~ Document d P V ( w )) − log k ~ · (5) #( d )

Improper Noise Distribution The original negative sampling technique • PD PV adopts a empirical word distribution as ： 0.2 P V ( w N ) = # w N 0.18 (6) Negative Sampling Probability | C | 0.16 which makes the original PV-DBOW 0.14 0.12 optimizing a variation of TF-ICF weighting 0.1 scheme. 0.08 0.06 • Empirically: 0.04 CF-based negative sampling suppresses - 0.02 frequent words too much. 0 TF-ICF weighting loses the document structure 2 3 4 5 6 7 - information Corpus Frequency (power of 10) • We proposed a document-frequency based Figure 4: The distribution of the original negative noise distribution: sampling (PV ) and the document-frequency based negative sampling (PD). The horizontal axis represents # D ( w N ) log value of word frequency (base 10). P D ( w N ) = (7) P w 0 2 V w # D ( w 0 ) d = log(#( w, d ) 1 w · ~ which makes the PV-DBOW optimizing a P V ( w )) − log k (5) ~ · #( d ) variation of TF-IDF weighting scheme.

Insufficient Modeling for Word Substitution Table 1: The cosine similarities between “ clothing” , “ garment” and four relevant documents in Robust04 query 361 (“ clothing sweatshops ”). PV-DBOW clothing garment clothing 1.000 0.632 LA112689-0194 ( TF clothing = 2 , TF garment = 26) 0.044 0.134 LA112889-0108 ( TF clothing = 0 , TF garment = 10) -0.003 0.100 LA021090-0137 ( TF clothing = 7 , TF garment = 9) 0.052 0.092 LA022890-0105 ( TF clothing = 6 , TF garment = 6) 0.066 0.079 Existing topic models and embedding models mainly focus on two types of • word relations: co-occurrence (e.g. topic related words) and substitution (e.g. synonyms) PV-DBOW focuses on capturing word co-occurrence but ignores word- • context information, which makes it difficult to understand word substitution relation (e.g. “ clothing ” and “ garment ”).

Insufficient Modeling for Word Substitution Table 1: The cosine similarities between “ clothing” , “ garment” and four relevant documents in Robust04 query 361 (“ clothing sweatshops ”). PV-DBOW PV joint objective clothing garment clothing garment 1.000 0.632 1.000 0.638 clothing LA112689-0194 ( TF clothing = 2 , TF garment = 26) 0.044 0.134 0.107 0.169 LA112889-0108 ( TF clothing = 0 , TF garment = 10) -0.003 0.100 0.126 0.155 LA021090-0137 ( TF clothing = 7 , TF garment = 9) 0.052 0.092 0.147 0.119 LA022890-0105 ( TF clothing = 6 , TF garment = 6) 0.066 0.079 0.107 0.107 • As suggested by Dai et al. [5] and Sun et al. [22], one approach to alleviate the problem is regularizing PV-DBOW by requiring word vectors to predict their context. Specifically, we apply a joint objective as: w i · ~ w N · ~ ` = log( � ( ~ d )) + k · E w N ⇠ P V [log � ( − ~ d )] i + L (8) X + log( � ( ~ w i · ~ c j )) + k · E c N ⇠ P V [log � ( − ~ w i · ~ c N )] j = i � L j 6 = i

Experiment Setup Datasets: • – TREC collections: Robust04, GOV2* with title and description queries – Five-fold cross validation – Evaluation: mean average precision (MAP), normalized discounted cumulative gain (NDCG@20) and precision (P@20) Reported Models: • – QL: Query likelihood model [19] with Dirichlet smoothing. – LDA-LM: LDA-based retrieval model proposed by Wei and Croft [15]. – PV-LM: the PV-based retrieval model with the PV-DBOW proposed by Le et al. [13] – EPV-R-LM: the PV-LM model with L2 regularization. – EPV-DR-LM: the EPV-R-LM model with document frequency based negative sampling. – EPV-DRJ-LM: the EPV-DR-LM model with joint objective. * Due to the efficiency issues, we used a random subset with 500k documents to train LDA and PV on GOV2

Analysis of the Paragraph Vector Model for Information Retrieval - PowerPoint PPT Presentation

Analysis of the Paragraph Vector Model for Information Retrieval Qingyao Ai 1 , Liu Yang 1 , Jiafeng Guo 2 , W. Bruce Croft 1 1 College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA {aiqy, lyang,

Extract 2 1984 , G. Orwell, 1. What progression can you find from paragraph 1 to paragraph 5?

Presentation Class Lesson 3 Lesson Preview: I. Writing a Paragraph a. What is a paragraph? b.

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

1 2 Further information: IFRS 17 paragraph 29 3 4 Further information: IFRS 17 paragraph 32

Take out Warm- Paragraph Versailles Agenda up Response DBQ Put Paragraph Response and

Video Paragraph Captioning using Hierarchical Recurrent Neural Networks Haonan Yu, Jiang Wang,

PARAGRAPH & ESSAY WRITING ESSAY WRITING Teacher : Prof. Indu Bora Subject : English

Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014)

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Paragraph Clustering for Intrinsic Plagiarism Detection Using a Stylistic Vector Space Model

NOW Handout Page 1 1 Styles of Vector Architectures Components of Vector Processor Vector

THE DOHA/TRIPS PARAGRAPH 6 SYSTEM: REFLECTIONS FOR GLOBAL HEALTH SOUTH CENTRE SIDE EVENT AT

One page, one point Presentations to Impress Learn Do A presentation with Delete paragraph 2

Tw o Approaches to Bible Study The Vertical Method Focused on one paragraph, section, & book

Voice Based Information Retrieval System How far is it from text based retrieval system? PRAJNA

Structured Document Retrieval Benjamin Piwowarski DCC October 28, 2004 B. Piwowarski (DCC)

Impact of ASF on availability of critical nutrients in breast milk Lindsay H. Allen Center

Cannabis: Regulation, Testing, and Standardization Heather Krug, MS State Marijuana Laboratory

Linear Algebraic Models in Information Retrieval Nathan Pruitt and Rami Awwad December 12th, 2016

Challenges for search engine retrieval effectiveness evaluations: Universal Search and user

Joint Visual-Text Modeling for Multimedia Retrieval JHU CLSP Workshop 2004 Final

FLAT 3 : Feature Location & Textual Tracing Tool Trevor Savage, Meghan Revelle, Denys

Analysis of the Paragraph Vector Model for Information Retrieval - PowerPoint PPT Presentation

Analysis of the Paragraph Vector Model for Information Retrieval Qingyao Ai 1 , Liu Yang 1 , Jiafeng Guo 2 , W. Bruce Croft 1 1 College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA {aiqy, lyang,

Extract 2 1984 , G. Orwell, 1. What progression can you find from paragraph 1 to paragraph 5?

Presentation Class Lesson 3 Lesson Preview: I. Writing a Paragraph a. What is a paragraph? b.

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

1 2 Further information: IFRS 17 paragraph 29 3 4 Further information: IFRS 17 paragraph 32

Take out Warm- Paragraph Versailles Agenda up Response DBQ Put Paragraph Response and

Video Paragraph Captioning using Hierarchical Recurrent Neural Networks Haonan Yu, Jiang Wang,

PARAGRAPH &amp; ESSAY WRITING ESSAY WRITING Teacher : Prof. Indu Bora Subject : English

Distributed Representations of Sentences and Documents Quoc Le and Tomas Mikolov (ICML 2014)

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Paragraph Clustering for Intrinsic Plagiarism Detection Using a Stylistic Vector Space Model

NOW Handout Page 1 1 Styles of Vector Architectures Components of Vector Processor Vector

THE DOHA/TRIPS PARAGRAPH 6 SYSTEM: REFLECTIONS FOR GLOBAL HEALTH SOUTH CENTRE SIDE EVENT AT

One page, one point Presentations to Impress Learn Do A presentation with Delete paragraph 2

Tw o Approaches to Bible Study The Vertical Method Focused on one paragraph, section, &amp; book

Voice Based Information Retrieval System How far is it from text based retrieval system? PRAJNA

Structured Document Retrieval Benjamin Piwowarski DCC October 28, 2004 B. Piwowarski (DCC)

Impact of ASF on availability of critical nutrients in breast milk Lindsay H. Allen Center

Cannabis: Regulation, Testing, and Standardization Heather Krug, MS State Marijuana Laboratory

Linear Algebraic Models in Information Retrieval Nathan Pruitt and Rami Awwad December 12th, 2016

Challenges for search engine retrieval effectiveness evaluations: Universal Search and user

Joint Visual-Text Modeling for Multimedia Retrieval JHU CLSP Workshop 2004 Final

FLAT 3 : Feature Location &amp; Textual Tracing Tool Trevor Savage, Meghan Revelle, Denys

PARAGRAPH & ESSAY WRITING ESSAY WRITING Teacher : Prof. Indu Bora Subject : English

Tw o Approaches to Bible Study The Vertical Method Focused on one paragraph, section, & book

FLAT 3 : Feature Location & Textual Tracing Tool Trevor Savage, Meghan Revelle, Denys