info 4300 cs4300 information retrieval slides adapted
play

INFO 4300 / CS4300 Information Retrieval slides adapted from - PowerPoint PPT Presentation

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from http://informationretrieval.org/ IR 10: SVD and Latent Semantic Indexing Paul Ginsparg Cornell University, Ithaca, NY 30 Sep 2010 1 / 58


  1. INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch¨ utze’s, linked from http://informationretrieval.org/ IR 10: SVD and Latent Semantic Indexing Paul Ginsparg Cornell University, Ithaca, NY 30 Sep 2010 1 / 58

  2. Administrativa Assignment 2 due Sat 9 Oct, 1pm (late submission permitted until Sun 10 Oct at 11 p.m.) No class Tue 12 Oct (midterm break) The Midterm Examination is on Thu Oct 14 from 11:40 to 12:55, in Olin 165. It will be open book. Topics examined include assignments, lectures and discussion class readings before the midterm break. (Review of topics next Thurs, 7 Oct) According to the registrar ( http://registrar.sas.cornell.edu/Sched/EXFA.html ), final exam is Fri 17 Dec 2:00-4:30 pm (location TBD). Early opportunity to take exam will be Mon 13 Dec, 2:00pm 2 / 58

  3. Discussion 4, Tue,Thu 5,7 Oct 2010 Read and be prepared to discuss the following paper: Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, ”Indexing by latent semantic analysis”. Journal of the American Society for Information Science, Volume 41, Issue 6, 1990. http://www3.interscience.wiley.com/cgi-bin/issuetoc?ID=10049584 Note that to access this paper from Wiley InterScience, you need to use a computer with a Cornell IP address. (also at /readings/jasis90f.pdf ) C = U Σ V T X = T 0 S 0 D ′ ⇐ ⇒ 0 ˆ C k = U Σ k V T X = TSD ′ ⇐ ⇒ 3 / 58

  4. Overview Recap 1 Singular value decomposition 2 Latent semantic indexing 3 Dimensionality reduction 4 LSI in information retrieval 5 Redux of Comparisons 6 4 / 58

  5. Outline Recap 1 Singular value decomposition 2 Latent semantic indexing 3 Dimensionality reduction 4 LSI in information retrieval 5 Redux of Comparisons 6 5 / 58

  6. Symmetric diagonalization theorem S a square, symmetric, real-valued M × M matrix with M linearly independent eigenvectors then there exists a symmetric diagonal decomposition S = Q Λ Q − 1 where the columns of Q are the orthogonal and normalized (unit length, real) eigenvectors of S , and Λ is the diagonal matrix with entries the eigenvalues of S all entries of Q are real and Q − 1 = Q T We will use this to build low-rank approximations to term document matrices, using CC T 6 / 58

  7. Outline Recap 1 Singular value decomposition 2 Latent semantic indexing 3 Dimensionality reduction 4 LSI in information retrieval 5 Redux of Comparisons 6 7 / 58

  8. SVD C an M × N matrix of rank r , C T its N × M transpose. CC T and C T C have the same r eigenvalues λ 1 , . . . , λ r U = M × M matrix whose columns are the orthogonal eigenvectors of CC T V = N × N matrix whose columns are the orthogonal eigenvectors of C T C Then there’s a singular value decomposition (SVD) C = U Σ V T where the M × N matrix Σ has Σ ii = σ i for 1 ≤ i ≤ r , and zero otherwise. σ i are called the singular values of C 8 / 58

  9. Compare to S = Q Λ Q T C = U Σ V T ⇒ CC T = U Σ V T V Σ U T = U Σ 2 U T ( C T C = V Σ U T U Σ V T = V Σ 2 V T ) l.h.s. is square symmetric real-valued, and r.h.s. is symmetric diagonal decomposition CC T ( C T C ) is a square matrix with rows, columns corresponding to each of the M terms (documents) i , j entry measures overlap between i th and j th terms (documents), based on document (term) co-occurrence Depends on term weighting: simplest case (1,0): i , j entry counts number of documents in which both terms i and j occur (number of terms which occur in both documents i , j ) 9 / 58

  10. Illustration of SVD r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r V T C U Σ = r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r Upper: C has M > N Lower: C has M < N 10 / 58

  11. 4 × 2 Example Example: singular value decomp of 4 × 2 matrix of rank 2   1 − 1 0 1  = U Σ V T =   C =   1 0  − 1 1 0 1 0 1 − 0 . 633 0 . 00 − 0 . 489 0 . 601 2 . 24 0 „ − 0 . 707 « 0 . 316 − 0 . 707 − 0 . 611 − 0 . 164 0 1 . 00 − 0 . 707 B C B C B C B C − 0 . 316 − 0 . 707 0 . 611 0 . 164 0 0 0 . 707 − 0 . 707 @ A @ A 0 . 633 − 0 . 00 0 . 122 0 . 765 0 0   − 0 . 632 0 . 000 � 2 . 236 � � − 0 . 707 � 0 . 316 − 0 . 707 0 . 000 0 . 707   =   − 0 . 316 − 0 . 707 0 . 000 1 . 000 − 0 . 707 − 0 . 707   0 . 632 0 . 000 0 1 1 − 1 − . 5 . 5 B C Σ 22 = 0 C 1 = ⇒ B C . 5 − . 5 Σ 11 = 2 . 236 and Σ 22 = 1 @ A − 1 1 11 / 58

  12. Low Rank Approximations Given M × N matrix C and positive integer k , find M × N matrix C k of rank ≤ k which minimizes Frobenius norm of difference X = C − C K : � M N � � � � X 2 || X || F = � ij i =1 j =1 (minimize discrepancy between C and C k for fixed k smaller than rank r of C ). Use SVD: Given C , construct SVD C = U Σ V T Σ → Σ k by setting smallest r − k singular values to 0 C k = U Σ k V T is the rank- k approximation to C Theorem (Eckart Young): yields matrix rank k with lowest possible Frobenius error, error given by σ k +1 12 / 58

  13. Illustration of low rank approximation V T C k U Σ k = r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r Matrix entries affected by “zeroing out” smallest singular value indicated by dashed boxes 13 / 58

  14. Example 1   1 0 0 0 1 0     0 0 1 =     1 0 0   0 1 0 14 / 58

  15. Example 1, cont’d  1 0 0  0 1 0     0 0 1 =     1 0 0   0 1 0 √ 1 0 0 · · · · · ·   √  2 0 0  2 √ 1 0 0 · · · · · ·   √ 0 2 0 1 0 0   2       0 0 1 · · · · · · 0 1 0 0 0 1         1   0 0 · · · · · · 0 0 0 0 0 1   √   2   1 0 0 0 0 0 · · · · · · √ 2 http://www.wolframalpha.com/input/?i=svd {{ 1,0,0 } , { 0,1,0 } , { 0,0,1 } , { 1,0,0 } , { 0,1,0 }} matlab: [U,S,V]=svd([1 0 0; 0 1 0; 0 0 1; 1 0 0; 0 1 0]) 15 / 58

  16. Example 2   1 1 1 1 1 1     1 1 1 =     1 1 1   1 1 1 16 / 58

  17. Example 2, cont’d  1 1 1  1 1 1     1 1 1 =     1 1 1   1 1 1 �   1 · · · · · · · · · · · · √ 5   15 0 0  �  1 · · · · · · · · · · · ·   1 1 1   5 0 0 0 √ √ √     3 3 3  �  1   0 0 0  · · · · · · · · · · · ·  · · · · · · · · ·     5     0 0 0   · · · · · · · · · � 1    · · · · · · · · · · · ·  5   0 0 0   � 1 · · · · · · · · · · · · 5 http://www.wolframalpha.com/input/?i=svd {{ 1,1,1 } , { 1,1,1 } , { 1,1,1 } , { 1,1,1 } , { 1,1,1 }} matlab: [U,S,V]=svd([1 1 1; 1 1 1; 1 1 1; 1 1 1; 1 1 1]) 17 / 58

  18. Example 3  0 1 1  tea coffee 1 0 1     cocoa 1 1 0 =     1 1 1 drink   beverage 1 1 1 �   2 1 1 · · · · · · √ √ √ 15 2 6   10 0 0  � �  2 2 0 − · · · · · · 1 1 1     √ √ √ 15 3 0 1 0   3 3 3    �  − 1 1 2 − 1 1   0 0 0 1  · · · · · ·   √ √  √ √   15 2 2  2 6      − 1 2 − 1 0 0 0   � √ √ √ 3    0 0 · · · · · ·  6 3 6 10   0 0 0   � 3 0 0 · · · · · · 10 http://www.wolframalpha.com/input/?i=svd {{ 0,1,1 } , { 1,0,1 } , { 1,1,0 } , { 1,1,1 } , { 1,1,1 }} matlab: [U,S,V]=svd([0 1 1; 1 0 1; 1 1 0; 1 1 1; 1 1 1]) 18 / 58

  19. Example 3, cont’d  0 1 1  tea coffee 1 0 1     1 1 0 ⇒ cocoa     drink 1 1 1   1 1 1 beverage �   2 · · · · · · · · · · · · √ 15  10 0 0   �  2 · · · · · · · · · · · ·   1 1 1   15 0 0 0 √ √ √     3 3 3   � 2   0 0 0 · · · · · · · · · · · · · · · · · · · · ·       15     0 0 0   · · · · · · · · · � 3   · · · · · · · · · · · ·   10   0 0 0   � 3 · · · · · · · · · · · · 10  2 / 3 2 / 3 2 / 3  2 / 3 2 / 3 2 / 3     = 2 / 3 2 / 3 2 / 3     1 1 1   1 1 1 19 / 58

Recommend


More recommend