going beyond the document query lexical match
play

Going Beyond the Document-Query Lexical Match Oren Kurland Faculty - PowerPoint PPT Presentation

Going Beyond the Document-Query Lexical Match Oren Kurland Faculty of Industrial Engineering and Management Technion 1 / 29 Search engines 2 / 29 The ad hoc retrieval task Relevance Ranking Rank documents in a corpus by their relevance to


  1. Going Beyond the Document-Query Lexical Match Oren Kurland Faculty of Industrial Engineering and Management Technion 1 / 29

  2. Search engines 2 / 29

  3. The ad hoc retrieval task Relevance Ranking Rank documents in a corpus by their relevance to the information need expressed by a given query 3 / 29

  4. The vector space model Salton ’68 q = “Technion” � q = < 0 , . . . , 0 , 1 , 0 , . . . , 0 > d = “Technion faculty student Technion” � d = < 0 , . . . , 0 , 1 , 0 , . . . , 0 , 1 , 0 , . . . , 0 , 2 , 0 , . . . , 0 > def = cos ( � score V S ( d ; q ) d, � q ) Term weighting scheme: TF.IDF TF: the number of occurrences of the term in the document IDF: The inverse of the document frequency of the term 4 / 29

  5. The language modeling approach Ponte&Croft ’98 Ranking def � score LM ( d ; q ) = p ( w | d ) w ∈ q p ( w | d ) is the probability that w is generated from a language model induced from d A language model = (1 − λ )2 def p ( “Hello” | “Hello Hello World” ) 3+ λp ( “Hello” | corpus ) 5 / 29

  6. The document-query similarity estimate Retrieval frameworks Probabilistic retrieval (Maron&Kuhns ’60); Okapi BM25 (Robertson et al. ’93) Vector space model (Salton ’68) The inference network model (Turtle&Croft ’90) Pivoted document length normalization (Singhal et al. ’96) Language modeling (Ponte&Croft ’98) Divergence from randomness (Amati&van Rijsbergen ’00) Are all these the same? It’s all about TF, IDF and length normalization (Fang et al. ’04, ’09) Axiomatization of document-query similarity functions used for ranking (Fang et al. ’05) 6 / 29

  7. Web search A variety of relevance signals The similarity between the page and the query (query dependent) The similarity between the anchor text and the query (query dependent) The PageRank score of the page (query independent) Additional document quality measures; e.g., spam score, entropy (query independent) The clickthrough rate for the page (query independent) ... 7 / 29

  8. Learning to rank A training set: { ( f ( q i , d j ) , l ( q i , d j )) } i,j q i : query d j : document f ( q i , d j ) : a representation for the pair ( q i , d j ) l ( q i , d j ) : a relevance judgment for the pair ( q i , d j ) Minimize a loss function using pointwise/pairwise/listwise approaches (Liu ’09) 8 / 29

  9. Observations Relevance is determined based on whether the document content satisfies the information need expressed by the query The document-query similarity is among the most important features for ranking pages in Web search (Liu ’09) Can document-query similarity estimates be further improved? 9 / 29

  10. The surface-level document query similarity The vocabulary mismatch problem Relevant documents might not contain some, or even all, query terms Short queries Short documents (e.g., Tweets) query: “shipment vehicles” document: “cargo freight truck” 10 / 29

  11. The risk minimization framework Lafferty&Zhai ’01 11 / 29

  12. Semantic matching Li&Xu ’13 Query reformulation Term dependence models Translation models Topic models Latent space models 12 / 29

  13. Short queries Automatic query expansion Global methods analyze the corpus or external resources in a query-independent fashion Local methods rely on some initial search Global methods Using Wordnet (Voorhees ’94), large external corpus (Diaz&Metzler ’06), Wikipedia (Xu et al. ’09) Translation model (Berger&Lafferty ’99) def = � � w ′ ∈ Lexicon p ( w ′ | d ) T ( w ′ | w ) p ( q | d ) w ∈ q Estimating T using mutual information (Karimzadehgan&Zhai ’09) Effective for microbolog search (Karimzadehgan et al. ’13) 13 / 29

  14. Pseudo-feedback-based query expansion Utilize information from documents that are highly ranked by an initial search performed in response to the query Relevance modeling (Lavrenko&Croft ’01) A generative theory of relevance: The query and the relevant documents are sampled from the same language model (relevance model; R ) def p ( w | R ) = λp ( w | q ) + (1 − λ ) � d ∈D init p ( w | d ) p ( d | q ) � � def � � score ( d ; q ) = KL p ( ·| R ) � p ( ·| d ) � � � State-of-the-art (unigram) pseudo-feedback-based query expansion approach (Lv&Zhai ’09) How do we set λ ? Adaptive/selective query expansion ... 14 / 29

  15. Beyond bag-of-terms (unigram) representations Markov Random Fields (Metzler&Croft ’05) Q : query composed of the terms q 1 , q 2 , . . . D : document p ( Q, D ) = ? 15 / 29

  16. Markov Random Fields P ( D | Q ) rank � = λ c f ( c ) c ∈ Cliques ( G ) G : graph; f ( c ) : feature function def Unigram features: f T ( c ) = log p ( q i | D ) Ordered phrase features: def f O ( c ) = log p ( ow ( q i , . . . , q i + k ) | D ) Unordered phrase features: def f U ( c ) = log p ( uw ( q i , . . . , q i + k ) | D ) Additional models Linear discriminant model (Gao et al. ’05) Differential concept weighting (Shi&Nie ’10) Modeling higher order term (concept) dependencies using query hypergraphs (Bendersky&Croft ’12) 16 / 29

  17. Latent concept expansion Metzler&Croft ’07 def � p ( E | Q ) = ( f QD ( Q, D ) + f D ( D ) + f QD ( E, D ) + f Q ( E )) D ∈D init Additional models Using hierarchical Markov Random Fields for query expansion (Lang et al. ’10) Learning concept importance (Bendersky et al. ’11) 17 / 29

  18. Parametrized concept weighting Bendersky et al. ’11 def � � score ( D ; Q ) = λ c f ( c, D ) T ∈T c ∈ T c : concept T : Types of concepts: query terms, phrases (bigrams), biterms, expansion terms def � λ c = w ϕ ϕ ( c ) ϕ ∈ φ T φ T : a set of feture (importance) functions for a concept of type T (e.g., using the corpus, Google n-grams, Wikipedia, search log) � � � score ( D ; Q ) = w ϕ ϕ ( c ) f ( c, D ) T ∈T ϕ ∈ φ T c ∈ T 18 / 29

  19. Positional language models Lv&Zhai ’09 c ( w, i ) : count of term w at position i in document D k ( i, j ) : the term count propagated to position i from position j def = P N c ′ ( w, i ) j =1 c ( w, j ) k ( i, j ) c ′ ( w, i ) def p ( w | D, i ) = P w ′ ∈ Vocabulary c ′ ( w ′ , i ) def “ ˛ ˛ ” score ( Q, D, i ) = KL p ( ·| Q ) ˛ p ( ·| D ) ˛ ˛ ˛ Query expansion: A positional relevance language model (Lv&Zhai ’10) 19 / 29

  20. Matching in a latent space The term document matrix: w 1 w 2 A = d 1 f ( w 1 ; d 1 ) f ( w 2 ; d 1 ) f ( w 1 ; d 2 ) f ( w 2 ; d 2 ) d 2 Latent Semantic Analysis (LSA; Deerwester et al. ’90): Low rank approximation using SVD A k = min X : rank ( X )= k || A − X || F Probabilistic Latent Semantic Analysis (pLSA; Hofmann ’99) Supervised methods for doc-query matching in a latent space (Bai et al. ’09, Huang et al. ’13, Wu et al. ’13) 20 / 29

  21. The cluster hypothesis The cluster hypothesis (Jardine&van Rijsbergern ’71, van Rijsbergen ’79) : Closely associated documents tend to be relevant to the same requests Leveraging the hypothesis: enrich a document representation using information induced from its corpus context 21 / 29

  22. Smoothing document representations c ( w, D ) def p ( w | D ) = λ 1 w ′ c ( w ′ , D )+ � c ( w, corpus ) � λ 2 w ′ c ( w ′ , corpus ) + λ 3 p ( w | t ) p ( t | D ) � t ∈ Topics Topics: Clusters with which D is associated (Kurland&Lee ’04, Liu&Croft ’04) LDA (Blei et al. ’03; Wei&Croft ’06) or pLSA (Hofmann ’99; Lu et al. ’11) or PAM (Li&McCallum ’06; Yi&Allan ’09) 22 / 29

  23. Smoothing document representations Empirical observations (Yi&Allan ’09) Using more sophisticated topic models doesn’t yield improved retrieval effectiveness Using nearest-neighbors clusters as “topics” results in retrieval performance as good as that as using topic models Pseudo-feedback-based query expansion (specifically, relevance modeling) outperforms using topic models Cluster-based smoothing is highly effective for microblog retrieval (Efron ’11) 23 / 29

  24. A different approach to utilizing corpus context Cluster ranking Document ranking method Query Clustering Initial list of method documents Cluster ranking Set of method clusters Each cluster is replaced with its Ranking of documents clusters Ranking of documents 24 / 29

  25. The optimal cluster p@5 Doc-query similarity Query expansion Oracle experiment 25 / 29

  26. Ranking clusters using Markov Random Fields Raiber&Kurland ’13 Winner of the Web track in TREC 2013 = p ( c, q ) def rank p ( c | q ) = p ( c, q ) p ( q ) p ( c, q ) rank � = λ l f l ( l ) l ∈ Cliques ( G ) f l ( l ) : feature function defined over the clique l 26 / 29

  27. Challenges Query = “oren kurland dblp” Search #1 Search #2 27 / 29

Recommend


More recommend