cross language explicit semantic analysis
play

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka - PowerPoint PPT Presentation

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar www.webis.de 1 Lipka@CLEF [ ] 01.10.09 Outline Retrieval Models The CL-ESA Retrieval Model CL-ESA at TEL@CLEF 2009


  1. Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar www.webis.de 1 Lipka@CLEF [ ∧ ] 01.10.09

  2. Outline ❑ Retrieval Models ❑ The CL-ESA Retrieval Model ❑ CL-ESA at TEL@CLEF 2009 ❑ Formalization of CL-ESA 2 Lipka@CLEF [ ∧ ] 01.10.09

  3. Retrieval Models q ∈ Q q ∈ Q Retrieval model R Information� Query� need representation Human query formulation ρ R ( q , d ) d ∈ D R d ∈ D Computer-based� α R relevance judgment Real-world� Document� document Computer-based� representation �document generation Underlying� Conceptual document models, Linguistics, Computer linguistics theories 3 Lipka@CLEF [ ∧ ] 01.10.09

  4. The CL-ESA Retrieval Model Explicit Semantic Analysis, ESA [Gabrilovich/Markovitch 2007] 4 Lipka@CLEF [ ∧ ] 01.10.09

  5. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.4� 0.2� ... � ... � 0.1 0.7 Document� collection D 5 Lipka@CLEF [ ∧ ] 01.10.09

  6. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.5� 0.4� 0.2� ... 0.2� � ... � ... � 0.1� ... � 0.2 0.1 ... 0.3 � 0.7 0.3 Index collection D I� Document� collection D e.g. Wikipedia 6 Lipka@CLEF [ ∧ ] 01.10.09

  7. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.1� ϕ 0.0� 0.5� ... ... � 0.4� 0.2� ... 0.2� � 0.2 ... � ... � 0.1� ... � 0.2 0.1 ... 0.3 � 0.7 0.3 Index collection D I� Document� collection D Concept space e.g. Wikipedia 7 Lipka@CLEF [ ∧ ] 01.10.09

  8. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.1� ϕ 0.0� 0.5� ... ... � 0.4� 0.2� ... 0.2� � 0.2 0.2� ϕ ... � ... � 0.1� ... � 0.2 0.1� 0.1 ... 0.3 � 0.7 ... ... ϕ ESA � 0.3 0.0 Similarity analysis in� Index collection D I� a collection-relative Document� collection D concept space e.g. Wikipedia Ranking: d ∗ = argmax d ∈ D ϕ ESA ( q, d ) , where ϕ ESA ( q, d ) := ϕ ( q | D I , d | D I ) 8 Lipka@CLEF [ ∧ ] 01.10.09

  9. ��� yyy yyy ��� ��� ��� yyy yyy yyy ��� yyy ��� The CL-ESA Retrieval Model Cross-Language Explicit Semantic Analysis German index collection D I2 German collection D 2 0.1� 0.0� 0.5� ... ... ϕ � 0.4� 0.2� ... 0.2� � 0.2 0.2� ... � ... � 0.1� ... � 0.2 0.1� 0.1 ... 0.3 � 0.7 ... ... � 0.3 0.0 ϕ CL-ESA 0.3� 0.7� 0.6� ... ... ϕ � 0.9� 0.6� 0.3� ... � 0.2 0.4� ... � ... � 0.3� ... 0.3 � 0.1� 0.3 0.8 ... � 0.9 ... ... � 0.5 0.4 English index collection D I1 ,� Similarity analysis in� English collection D 1 D I1 aligned with D I2 concept space 9 Lipka@CLEF [ ∧ ] 01.10.09

  10. CL-ESA at TEL@CLEF 2009 Setting Index collection: ❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative 10 Lipka@CLEF [ ∧ ] 01.10.09

  11. CL-ESA at TEL@CLEF 2009 Setting Index collection: ❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative Difficulties at TEL@CLEF: ❑ Selecting the correct index collection. (language detection needed) ❑ Correct index collection not always available. ❑ Fields title, subject, and alternative not always share the same language. 11 Lipka@CLEF [ ∧ ] 01.10.09

  12. CL-ESA at TEL@CLEF 2009 12 Lipka@CLEF [ ∧ ] 01.10.09

  13. Formalization of CL-ESA 13 Lipka@CLEF [ ∧ ] 01.10.09

  14. Formalization of CL-ESA ESA 0.5� ϕ 0.1� 0.2� 0.2� ... � 0.4� 0.0� ... � 0.1� 0.2� 0.1� 0.2 ... � ... ... � ... ... ... 0.3 ... � � � 0.1 0.2 0.3 0.7 0.0 D | D I D I D 14 Lipka@CLEF [ ∧ ] 01.10.09

  15. � y y � y � � y � y Formalization of CL-ESA ESA 0.5� ϕ 0.1� 0.2� 0.2� ... � 0.4� 0.0� ... � 0.1� 0.2� 0.1� 0.2 ... � ... ... � ... ... ... 0.3 ... � � � 0.1 0.2 0.3 0.7 0.0 D | D I D I D A D | D I = A T D I · A D Documents Terms Documents Terms documents� coordinates Concept� Index� •� = T T A D A D | DI = A DI · A D A DI | DI | × | D |� | DI | × | V |� | V | × | D | 15 Lipka@CLEF [ ∧ ] 01.10.09

  16. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d 16 Lipka@CLEF [ ∧ ] 01.10.09

  17. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d ∼ Cross language term co-occurrence = nf q T · G L 1 , L 2 · d 17 Lipka@CLEF [ ∧ ] 01.10.09

  18. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d ∼ Cross language term co-occurrence = nf q T · G L 1 , L 2 · d � �� � Query translation 18 Lipka@CLEF [ ∧ ] 01.10.09

  19. Outlook 1. Consideration of more index collections 2. Better language detection 3. Detailed analysis of document fields 19 Lipka@CLEF [ ∧ ] 01.10.09

  20. 20 Lipka@CLEF [ ∧ ] 01.10.09

Recommend


More recommend