lecture 6 probabilistic latent semantic analysis
play

Lecture 6: (Probabilistic) Latent Semantic Analysis Julia - PowerPoint PPT Presentation

CS598JHM: Advanced NLP (Spring 2013) http://courses.engr.illinois.edu/cs598jhm/ Lecture 6: (Probabilistic) Latent Semantic Analysis Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: by appointment Indexing by Latent


  1. CS598JHM: Advanced NLP (Spring 2013) http://courses.engr.illinois.edu/cs598jhm/ Lecture 6: (Probabilistic) Latent Semantic Analysis Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: by appointment

  2. Indexing by Latent Semantic Analysis (Deerwester et al., 1990) Bayesian Methods in NLP 2

  3. Latent Semantic Analysis The task: Return relevant documents for text queries The problem: relevance is conceptual/semantic - The index of relevant documents may not contain all query terms ( synonymy and missing information) - The query terms may be ambiguous ( polysemy ) Indexing by Latent Semantic Analysis - Map queries and documents into a new vector space whose k dimensions correspond to independent concepts - In this space, queries will be near semantically close documents 3 Bayesian Methods in NLP

  4. : Documents : Terms ? : Query : Region closest to Query Dimension 2 (e.g. cosine > .9) ? Dimension 1 4 Bayesian Methods in NLP

  5. Latent Semantic Analysis Low-rank approximation of Singular Value Decomposition (SVD): Documents Concepts Documents Concepts Terms Terms ≈ × × = X ≈ T 0 × S 0 × D 0 ’ = Ẋ X: Term-document matrix (=data): X ij = freq of w i in D j Ẋ = T 0 S 0 D 0 ‘ ( k -rank approximation of X ) T 0 : Columns are orthogonal and unit-length T 0 ’T 0 = I this S 0 : Diagonal matrix of the k largest singular values should really be D 0 : Columns are orthogonal and unit-length D 0 ’D 0 = I ^ X 5 Bayesian Methods in NLP

  6. LSA: term similarity T 0 Ẋ Ẋ ‘ = T 0 S 0 S 0 T 0 Term w i dot product of w i , w j in the new space T 0 S 0 ẊẊ ‘ = T 0 S 0 S 0 T 0 ( D cancels out because S is diagonal and D orthonormal) Similarity of terms w i , w j in the new space: ( ẊẊ ‘ ) ij Bayesian Methods in NLP

  7. LSA: document similarity Ẋ ’ Ẋ = D 0 S 0 S 0 D 0 D 0 20 Doc. D j ẊẊ ‘ dot product of D i , D j in the new space D 0 S 0 Ẋ ’ Ẋ = D 0 S 0 S 0 D 0 ( T cancels out because S is diagonal and T orthonormal) Similarity of documents d i , d j in the new space: ( Ẋ ’ Ẋ ) ij 7 Bayesian Methods in NLP

  8. LSA: term-document similarity The elements of Ẋ give the similarity of terms and documents. Now, terms are projected to TS 1/2 , documents to DS 1/2 8 Bayesian Methods in NLP

  9. LSA: query-document similarity Queries q are ‘pseudo-documents’: they don’t appear in X Construct their term vector X q Define their document vector D q = X’ q TS -1 9 Bayesian Methods in NLP

  10. Probabilistic Latent Semantic Indexing (Hofmann 1999) Bayesian Methods in NLP 10

  11. The aspect model Observations are document-word pairs (d, w) Assume there are k aspects z 1 ...z k Each observation is associated with a hidden aspect z P(d, w) = P(d)P(w | d) with P(w | d) = ∑ z ∈ Z P(w | z)P(z | d) Or, equivalently: P(d, w) = ∑ z ∈ Z P(z)P(d | z)P(w | z) 11 Bayesian Methods in NLP

  12. A geometric interpretation w 3 Word simplex Any point in this simplex defines 1.0 a multinomial over words Documents P(w |d) Each document corresponds to one multinomial over words Topics P(w | z) Each topic is a multinomial over words Topic simplex The topics define the corners of a (sub)simplex. 1.0 All training documents lie inside 1.0 w 2 this topic simplex. w 1 P(w | d) = λ 1 P(w | z 1 ) + λ 2 P(w | z 2 ) + λ 3 P(w | z 3 ) = P(z 1 | d)P(w | z 1 ) + P(z 2 | d)P(w | z 2 ) + P(z 3 | d)P(w | z 3 ) 12 Bayesian Methods in NLP

  13. PLSA is a mixture model Mixture models: - K mixture components and N observations x 1... x N - Mixing weights ( θ 1 .. . θ K ): P( k ) = θ K - Each observation x n is generated by mixture component z n P( x n ) = P( z n ) P( x n | z n ) PLSI: - Mixture components = topics - Mixing weights are specific to each document θ d = ( θ d1 ... θ dK ) - Each observation (word) w d,n is a sample from the document-specific mixture model. It is drawn from one of the components z d,n P( w d,n ) = P( z d,n | θ d ) P( w d,n | z d,n ) 13 Bayesian Methods in NLP

  14. Estimation: EM algorithm E-step: Recompute P(z | d, w) = P(z, d, w) / ∑ z’ P(z’, d, w) with P(z, d, w) = P(z)P(d | z)P(w | z) M-step: Recompute P(w | z) ∝ ∑ d freq(d, w) P( z | d, w) P(d | z) ∝ ∑ w freq(d, w) P( z | d, w) P(z) ∝ ∑ d ∑ w freq(d, w) P( z | d, w) 14 Bayesian Methods in NLP

Recommend


More recommend