introduction to information retrieval
play

Introduction to Information Retrieval - PowerPoint PPT Presentation

tf-idf weighting Vector space model Pivot length normalization Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Sch utze Institute for Natural Language Processing, University of


  1. tf-idf weighting Vector space model Pivot length normalization idf weight df t is the document frequency, the number of documents that t occurs in. df t is an inverse measure of the informativeness of term t . Inverse document frequency, idf t , is a direct measure of the informativeness of the term. Sch¨ utze: Vector space model 11 / 37

  2. tf-idf weighting Vector space model Pivot length normalization idf weight df t is the document frequency, the number of documents that t occurs in. df t is an inverse measure of the informativeness of term t . Inverse document frequency, idf t , is a direct measure of the informativeness of the term. The idf weight of term t is defined as follows: N idf t = log 10 df t ( N is the number of documents in the collection.) Sch¨ utze: Vector space model 11 / 37

  3. tf-idf weighting Vector space model Pivot length normalization idf weight df t is the document frequency, the number of documents that t occurs in. df t is an inverse measure of the informativeness of term t . Inverse document frequency, idf t , is a direct measure of the informativeness of the term. The idf weight of term t is defined as follows: N idf t = log 10 df t ( N is the number of documents in the collection.) [log N / df t ] instead of [ N / df t ] to “dampen” the effect of idf Sch¨ utze: Vector space model 11 / 37

  4. tf-idf weighting Vector space model Pivot length normalization Examples for idf Sch¨ utze: Vector space model 12 / 37

  5. tf-idf weighting Vector space model Pivot length normalization Examples for idf 1 , 000 , 000 idf t = log 10 df t term df t idf t calpurnia 1 6 animal 100 4 sunday 1000 3 fly 10,000 2 under 100,000 1 the 1,000,000 0 Sch¨ utze: Vector space model 12 / 37

  6. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking Sch¨ utze: Vector space model 13 / 37

  7. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking idf gives high weights to rare terms like arachnocentric . Sch¨ utze: Vector space model 13 / 37

  8. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking idf gives high weights to rare terms like arachnocentric . idf gives low weights to frequent words like good , increase , and line . Sch¨ utze: Vector space model 13 / 37

  9. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking idf gives high weights to rare terms like arachnocentric . idf gives low weights to frequent words like good , increase , and line . idf affects the ranking of documents for queries with at least two terms. Sch¨ utze: Vector space model 13 / 37

  10. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking idf gives high weights to rare terms like arachnocentric . idf gives low weights to frequent words like good , increase , and line . idf affects the ranking of documents for queries with at least two terms. For example, in the query “arachnocentric line”, idf weighting increases the relative weight of arachnocentric and decreases the relative weight of line . Sch¨ utze: Vector space model 13 / 37

  11. tf-idf weighting Vector space model Pivot length normalization Effect of idf on ranking idf gives high weights to rare terms like arachnocentric . idf gives low weights to frequent words like good , increase , and line . idf affects the ranking of documents for queries with at least two terms. For example, in the query “arachnocentric line”, idf weighting increases the relative weight of arachnocentric and decreases the relative weight of line . idf has little effect on ranking for one-term queries. Sch¨ utze: Vector space model 13 / 37

  12. tf-idf weighting Vector space model Pivot length normalization Summary: tf-idf weighting Sch¨ utze: Vector space model 14 / 37

  13. tf-idf weighting Vector space model Pivot length normalization Summary: tf-idf weighting Assign a tf-idf weight for each term t in each document d : w t , d = (1 + log tf t , d ) · log N df t Sch¨ utze: Vector space model 14 / 37

  14. tf-idf weighting Vector space model Pivot length normalization Summary: tf-idf weighting Assign a tf-idf weight for each term t in each document d : w t , d = (1 + log tf t , d ) · log N df t The tf-idf weight . . . Sch¨ utze: Vector space model 14 / 37

  15. tf-idf weighting Vector space model Pivot length normalization Summary: tf-idf weighting Assign a tf-idf weight for each term t in each document d : w t , d = (1 + log tf t , d ) · log N df t The tf-idf weight . . . . . . increases with the number of occurrences within a document. (term frequency component) Sch¨ utze: Vector space model 14 / 37

  16. tf-idf weighting Vector space model Pivot length normalization Summary: tf-idf weighting Assign a tf-idf weight for each term t in each document d : w t , d = (1 + log tf t , d ) · log N df t The tf-idf weight . . . . . . increases with the number of occurrences within a document. (term frequency component) . . . increases with the rarity of the term in the collection. (inverse document frequency component) Sch¨ utze: Vector space model 14 / 37

  17. tf-idf weighting Vector space model Pivot length normalization Outline tf-idf weighting 1 Vector space model 2 Pivot length normalization 3 Sch¨ utze: Vector space model 15 / 37

  18. tf-idf weighting Vector space model Pivot length normalization Binary incidence matrix Anthony Julius The Hamlet Othello Macbeth . . . and Caesar Tempest Cleopatra Anthony 1 1 0 0 0 1 Brutus 1 1 0 1 0 0 Caesar 1 1 0 1 1 1 0 1 0 0 0 0 Calpurnia Cleopatra 1 0 0 0 0 0 mercy 1 0 1 1 1 1 worser 1 0 1 1 1 0 . . . Each document is represented as a binary vector ∈ { 0 , 1 } | V | . Sch¨ utze: Vector space model 16 / 37

  19. tf-idf weighting Vector space model Pivot length normalization Count matrix Anthony Julius The Hamlet Othello Macbeth . . . and Caesar Tempest Cleopatra Anthony 157 73 0 0 0 1 Brutus 4 157 0 2 0 0 Caesar 232 227 0 2 1 0 0 10 0 0 0 0 Calpurnia Cleopatra 57 0 0 0 0 0 mercy 2 0 3 8 5 8 worser 2 0 1 1 1 5 . . . Each document is now represented as a count vector ∈ N | V | . Sch¨ utze: Vector space model 17 / 37

  20. tf-idf weighting Vector space model Pivot length normalization Binary → count → weight matrix Anthony Julius The Hamlet Othello Macbeth . . . and Caesar Tempest Cleopatra Anthony 5.25 3.18 0.0 0.0 0.0 0.35 1.21 6.10 0.0 1.0 0.0 0.0 Brutus Caesar 8.59 2.54 0.0 1.51 0.25 0.0 Calpurnia 0.0 1.54 0.0 0.0 0.0 0.0 Cleopatra 2.85 0.0 0.0 0.0 0.0 0.0 mercy 1.51 0.0 1.90 0.12 5.25 0.88 worser 1.37 0.0 0.11 4.15 0.25 1.95 . . . Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . Sch¨ utze: Vector space model 18 / 37

  21. tf-idf weighting Vector space model Pivot length normalization Binary → count → weight matrix Anthony Julius The Hamlet Othello Macbeth . . . and Caesar Tempest Cleopatra Anthony 5.25 3.18 0.0 0.0 0.0 0.35 1.21 6.10 0.0 1.0 0.0 0.0 Brutus Caesar 8.59 2.54 0.0 1.51 0.25 0.0 Calpurnia 0.0 1.54 0.0 0.0 0.0 0.0 Cleopatra 2.85 0.0 0.0 0.0 0.0 0.0 mercy 1.51 0.0 1.90 0.12 5.25 0.88 worser 1.37 0.0 0.11 4.15 0.25 1.95 . . . Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . Sch¨ utze: Vector space model 18 / 37

  22. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . Sch¨ utze: Vector space model 19 / 37

  23. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . So we have a | V | -dimensional real-valued vector space. Sch¨ utze: Vector space model 19 / 37

  24. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . So we have a | V | -dimensional real-valued vector space. Terms are axes of the space. Sch¨ utze: Vector space model 19 / 37

  25. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . So we have a | V | -dimensional real-valued vector space. Terms are axes of the space. Documents are points or vectors in this space. Sch¨ utze: Vector space model 19 / 37

  26. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . So we have a | V | -dimensional real-valued vector space. Terms are axes of the space. Documents are points or vectors in this space. Very high-dimensional: tens of millions of dimensions when you apply this to web search engines Sch¨ utze: Vector space model 19 / 37

  27. tf-idf weighting Vector space model Pivot length normalization Documents as vectors Each document is now represented as a real-valued vector of tf-idf weights ∈ R | V | . So we have a | V | -dimensional real-valued vector space. Terms are axes of the space. Documents are points or vectors in this space. Very high-dimensional: tens of millions of dimensions when you apply this to web search engines Each vector is very sparse - most entries are zero. Sch¨ utze: Vector space model 19 / 37

  28. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Sch¨ utze: Vector space model 20 / 37

  29. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Key idea 2: Rank documents according to their proximity to the query Sch¨ utze: Vector space model 20 / 37

  30. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Key idea 2: Rank documents according to their proximity to the query proximity = similarity Sch¨ utze: Vector space model 20 / 37

  31. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Key idea 2: Rank documents according to their proximity to the query proximity = similarity proximity ≈ negative distance Sch¨ utze: Vector space model 20 / 37

  32. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Key idea 2: Rank documents according to their proximity to the query proximity = similarity proximity ≈ negative distance Recall: We’re doing this because we want to get away from the you’re-either-in-or-out, feast-or-famine Boolean model. Sch¨ utze: Vector space model 20 / 37

  33. tf-idf weighting Vector space model Pivot length normalization Queries as vectors Key idea 1: do the same for queries: represent them as vectors in the high-dimensional space Key idea 2: Rank documents according to their proximity to the query proximity = similarity proximity ≈ negative distance Recall: We’re doing this because we want to get away from the you’re-either-in-or-out, feast-or-famine Boolean model. Instead: rank relevant documents higher than nonrelevant documents Sch¨ utze: Vector space model 20 / 37

  34. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? Sch¨ utze: Vector space model 21 / 37

  35. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? First cut: (negative) distance between two points Sch¨ utze: Vector space model 21 / 37

  36. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? First cut: (negative) distance between two points ( = distance between the end points of the two vectors) Sch¨ utze: Vector space model 21 / 37

  37. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? First cut: (negative) distance between two points ( = distance between the end points of the two vectors) Euclidean distance? Sch¨ utze: Vector space model 21 / 37

  38. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? First cut: (negative) distance between two points ( = distance between the end points of the two vectors) Euclidean distance? Euclidean distance is a bad idea . . . Sch¨ utze: Vector space model 21 / 37

  39. tf-idf weighting Vector space model Pivot length normalization How do we formalize vector space similarity? First cut: (negative) distance between two points ( = distance between the end points of the two vectors) Euclidean distance? Euclidean distance is a bad idea . . . . . . because Euclidean distance is large for vectors of different lengths. Sch¨ utze: Vector space model 21 / 37

  40. tf-idf weighting Vector space model Pivot length normalization Why distance is a bad idea Sch¨ utze: Vector space model 22 / 37

  41. tf-idf weighting Vector space model Pivot length normalization Why distance is a bad idea poor d 2 :Rich poor gap grows d 1 : Ranks of starving poets swell 1 q : [rich poor] d 3 : Record baseball salaries in 2010 0 rich 0 1 q and � The Euclidean distance of � d 2 is large although the distribution of terms in the query q and the distribution of terms in the document d 2 are very similar. Sch¨ utze: Vector space model 22 / 37

  42. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Sch¨ utze: Vector space model 23 / 37

  43. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query Sch¨ utze: Vector space model 23 / 37

  44. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query The following two notions are equivalent. Sch¨ utze: Vector space model 23 / 37

  45. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query The following two notions are equivalent. Rank documents according to the angle between query and document in decreasing order Sch¨ utze: Vector space model 23 / 37

  46. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query The following two notions are equivalent. Rank documents according to the angle between query and document in decreasing order Rank documents according to cosine(query,document) in increasing order Sch¨ utze: Vector space model 23 / 37

  47. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query The following two notions are equivalent. Rank documents according to the angle between query and document in decreasing order Rank documents according to cosine(query,document) in increasing order Cosine is a monotonically decreasing function of the angle for the interval [0 ◦ , 180 ◦ ] Sch¨ utze: Vector space model 23 / 37

  48. tf-idf weighting Vector space model Pivot length normalization Use angle instead of distance Rank documents according to angle with query The following two notions are equivalent. Rank documents according to the angle between query and document in decreasing order Rank documents according to cosine(query,document) in increasing order Cosine is a monotonically decreasing function of the angle for the interval [0 ◦ , 180 ◦ ] → do ranking according to cosine Sch¨ utze: Vector space model 23 / 37

  49. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document Sch¨ utze: Vector space model 24 / 37

  50. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 Sch¨ utze: Vector space model 24 / 37

  51. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 q i is the tf-idf weight of term i in the query. Sch¨ utze: Vector space model 24 / 37

  52. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 q i is the tf-idf weight of term i in the query. d i is the tf-idf weight of term i in the document. Sch¨ utze: Vector space model 24 / 37

  53. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 q i is the tf-idf weight of term i in the query. d i is the tf-idf weight of term i in the document. q | and | � q and � | � d | are the lengths of � d . Sch¨ utze: Vector space model 24 / 37

  54. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 q i is the tf-idf weight of term i in the query. d i is the tf-idf weight of term i in the document. q | and | � q and � | � d | are the lengths of � d . q and � This is the cosine similarity of � d . . . . . . or, equivalently, q and � the cosine of the angle between � d . Sch¨ utze: Vector space model 24 / 37

  55. tf-idf weighting Vector space model Pivot length normalization Cosine similarity between query and document | V | � d ) = � q d q i q | · d i q ,� q ,� � cos( � d ) = sim ( � q | · = | � | � | � | � d | d | i =1 q i is the tf-idf weight of term i in the query. d i is the tf-idf weight of term i in the document. q | and | � q and � | � d | are the lengths of � d . q and � This is the cosine similarity of � d . . . . . . or, equivalently, q and � the cosine of the angle between � d . cosine similarity = dot product of length-normalized vectors Sch¨ utze: Vector space model 24 / 37

  56. tf-idf weighting Vector space model Pivot length normalization Cosine similarity illustrated Sch¨ utze: Vector space model 25 / 37

  57. tf-idf weighting Vector space model Pivot length normalization Cosine similarity illustrated poor 1 v ( d 1 ) � � v ( q ) � v ( d 2 ) θ � v ( d 3 ) 0 rich 0 1 Sch¨ utze: Vector space model 25 / 37

  58. tf-idf weighting Vector space model Pivot length normalization Components of tf-idf weighting Term frequency Document frequency Normalization n (natural) tf t , d n (no) 1 n (none) 1 log N l (logarithm) 1 + log(tf t , d ) t (idf) c (cosine) df t 1 √ w 2 1 + w 2 2 + ... + w 2 M 0 . 5 × tf t , d max { 0 , log N − df t a (augmented) 0 . 5 + p (prob idf) } u (pivoted 1 / u max t ( tf t , d ) df t unique) � 1 if tf t , d > 0 b (boolean) b (byte size) 1 / CharLength α , 0 otherwise α < 1 1+log( tf t , d ) L (log ave) 1+log(ave t ∈ d ( tf t , d )) Sch¨ utze: Vector space model 26 / 37

  59. tf-idf weighting Vector space model Pivot length normalization Components of tf-idf weighting Term frequency Document frequency Normalization n (natural) tf t , d n (no) 1 n (none) 1 log N l (logarithm) 1 + log(tf t , d ) t (idf) c (cosine) df t 1 √ w 2 1 + w 2 2 + ... + w 2 M 0 . 5 × tf t , d max { 0 , log N − df t a (augmented) 0 . 5 + p (prob idf) } u (pivoted 1 / u max t ( tf t , d ) df t unique) � 1 if tf t , d > 0 b (boolean) b (byte size) 1 / CharLength α , 0 otherwise α < 1 1+log( tf t , d ) L (log ave) 1+log(ave t ∈ d ( tf t , d )) Best known combination of weighting options Sch¨ utze: Vector space model 26 / 37

  60. tf-idf weighting Vector space model Pivot length normalization tf-idf example Sch¨ utze: Vector space model 27 / 37

  61. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Sch¨ utze: Vector space model 27 / 37

  62. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Sch¨ utze: Vector space model 27 / 37

  63. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn Sch¨ utze: Vector space model 27 / 37

  64. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn document: logarithmic tf, no df weighting, cosine normalization Sch¨ utze: Vector space model 27 / 37

  65. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn document: logarithmic tf, no df weighting, cosine normalization query: logarithmic tf, idf, no normalization Sch¨ utze: Vector space model 27 / 37

  66. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn document: logarithmic tf, no df weighting, cosine normalization query: logarithmic tf, idf, no normalization Isn’t it bad to not idf-weight the document? Sch¨ utze: Vector space model 27 / 37

  67. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn document: logarithmic tf, no df weighting, cosine normalization query: logarithmic tf, idf, no normalization Isn’t it bad to not idf-weight the document? Example query: “best car insurance” Sch¨ utze: Vector space model 27 / 37

  68. tf-idf weighting Vector space model Pivot length normalization tf-idf example We often use different weightings for queries and documents. Notation: ddd.qqq Example: lnc.ltn document: logarithmic tf, no df weighting, cosine normalization query: logarithmic tf, idf, no normalization Isn’t it bad to not idf-weight the document? Example query: “best car insurance” Example document: “car insurance auto insurance” Sch¨ utze: Vector space model 27 / 37

  69. tf-idf weighting Vector space model Pivot length normalization tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. word query document product tf-raw tf-wght df idf weight tf-raw tf-wght weight n’lized auto best car insurance Key to columns: tf-raw: raw (unweighted) term frequency, tf-wght: logarithmically weighted term frequency, df: document frequency, idf: inverse document frequency, weight: the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight Sch¨ utze: Vector space model 28 / 37

  70. tf-idf weighting Vector space model Pivot length normalization tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. word query document product tf-raw tf-wght df idf weight tf-raw tf-wght weight n’lized auto 0 best 1 car 1 insurance 1 Key to columns: tf-raw: raw (unweighted) term frequency, tf-wght: logarithmically weighted term frequency, df: document frequency, idf: inverse document frequency, weight: the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight Sch¨ utze: Vector space model 28 / 37

  71. tf-idf weighting Vector space model Pivot length normalization tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. word query document product tf-raw tf-wght df idf weight tf-raw tf-wght weight n’lized auto 0 1 best 1 0 car 1 1 insurance 1 2 Key to columns: tf-raw: raw (unweighted) term frequency, tf-wght: logarithmically weighted term frequency, df: document frequency, idf: inverse document frequency, weight: the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight Sch¨ utze: Vector space model 28 / 37

  72. tf-idf weighting Vector space model Pivot length normalization tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. word query document product tf-raw tf-wght df idf weight tf-raw tf-wght weight n’lized auto 0 0 1 best 1 1 0 car 1 1 1 insurance 1 1 2 Key to columns: tf-raw: raw (unweighted) term frequency, tf-wght: logarithmically weighted term frequency, df: document frequency, idf: inverse document frequency, weight: the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight Sch¨ utze: Vector space model 28 / 37

  73. tf-idf weighting Vector space model Pivot length normalization tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. word query document product tf-raw tf-wght df idf weight tf-raw tf-wght weight n’lized auto 0 0 1 1 best 1 1 0 0 car 1 1 1 1 insurance 1 1 2 1.3 Key to columns: tf-raw: raw (unweighted) term frequency, tf-wght: logarithmically weighted term frequency, df: document frequency, idf: inverse document frequency, weight: the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight Sch¨ utze: Vector space model 28 / 37

Recommend


More recommend