latent topic feedback for information retrieval
play

Latent Topic Feedback for Information Retrieval David Andrzejewski - PowerPoint PPT Presentation

Latent Topic Feedback for Information Retrieval David Andrzejewski David Buttler Center for Applied Scientific Computing Lawrence Livermore National Laboratory (USA) August 22, 2011 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR


  1. Latent Topic Feedback for Information Retrieval David Andrzejewski David Buttler Center for Applied Scientific Computing Lawrence Livermore National Laboratory (USA) August 22, 2011 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 1 / 18

  2. BigCo Internal Document Navigation Portal euro opposition search Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 2 / 18

  3. BigCo Internal Document Navigation Portal euro opposition search Returned documents Hurd in passionate Maastricht defense Financial Times - 14 May 91 Small companies may lose in EC deals Financial Times - 14 May 91 Russian President Yeltsin invited to G7 Financial Times - 24 Mar 92 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 2 / 18

  4. BigCo Internal Document Navigation Portal euro opposition search Returned documents Related topics debate Hurd in passionate Maastricht defense T ory Euro sceptics Financial Times - 14 May 91 social chapter, Liberal Democrat mps, Labour, bill, Commons Small companies may lose in EC deals Financial Times - 14 May 91 Emu economic monetary union Russian President Yeltsin invited to G7 Maastricht treaty, member states Financial Times - 24 Mar 92 European, Europe, Community, Emu Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 2 / 18

  5. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  6. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  7. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  8. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  9. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  10. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  11. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  12. Corpus navigation challenges Condition Impaired IR technique Non-expert user keyword queries Lack of metadata faceted search Specialized domain WordNet Small user base query log mining, relevance feedback Proprietary data Crowdsourcing Who has these problems? Private organizations Government agencies Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 3 / 18

  13. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 4 / 18

  14. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection... Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 4 / 18

  15. Topic modeling with Latent Dirichlet Allocation (LDA) Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection... Patients at risk for drug-resistant Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 4 / 18

  16. How can we exploit latent topics? Implicitly : language model smoothing (Wei & Croft, SIGIR 2006) This approach: explicit user feedback on topics How to show topics? 1 Which topics to show? 2 How to use feedback? 3 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 5 / 18

  17. How can we exploit latent topics? Implicitly : language model smoothing (Wei & Croft, SIGIR 2006) This approach: explicit user feedback on topics How to show topics? 1 Which topics to show? 2 How to use feedback? 3 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 5 / 18

  18. How can we exploit latent topics? Implicitly : language model smoothing (Wei & Croft, SIGIR 2006) This approach: explicit user feedback on topics How to show topics? 1 Which topics to show? 2 How to use feedback? 3 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 5 / 18

  19. How can we exploit latent topics? Implicitly : language model smoothing (Wei & Croft, SIGIR 2006) This approach: explicit user feedback on topics How to show topics? 1 Which topics to show? 2 How to use feedback? 3 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 5 / 18

  20. How can we exploit latent topics? Implicitly : language model smoothing (Wei & Croft, SIGIR 2006) This approach: explicit user feedback on topics How to show topics? 1 Which topics to show? 2 How to use feedback? 3 Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 5 / 18

  21. Question 1 - How to show topics to user? “Top N” lists are hard to interpret We combine several techniques topic label (Lau et al, COLING 2010) topic n -grams (Blei & Lafferty, arXiv 2009) capitalization recovery Label Terms Topic 11 oil, gas, production, exploration sea, north, company, field, energy petroleum, companies Petroleum state oil company North Sea, natural gas production, exploration, field, energy Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 6 / 18

  22. Question 1 - How to show topics to user? “Top N” lists are hard to interpret We combine several techniques topic label (Lau et al, COLING 2010) topic n -grams (Blei & Lafferty, arXiv 2009) capitalization recovery Label Terms Topic 11 oil, gas, production, exploration sea, north, company, field, energy petroleum, companies Petroleum state oil company North Sea, natural gas production, exploration, field, energy Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 6 / 18

  23. Question 1 - How to show topics to user? “Top N” lists are hard to interpret We combine several techniques topic label (Lau et al, COLING 2010) topic n -grams (Blei & Lafferty, arXiv 2009) capitalization recovery Label Terms Topic 11 oil, gas, production, exploration sea, north, company, field, energy petroleum, companies Petroleum state oil company North Sea, natural gas production, exploration, field, energy Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 6 / 18

  24. Question 1 - How to show topics to user? “Top N” lists are hard to interpret We combine several techniques topic label (Lau et al, COLING 2010) topic n -grams (Blei & Lafferty, arXiv 2009) capitalization recovery Label Terms Topic 11 oil, gas, production, exploration sea, north, company, field, energy petroleum, companies Petroleum state oil company North Sea, natural gas production, exploration, field, energy Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 6 / 18

  25. Question 1 - How to show topics to user? “Top N” lists are hard to interpret We combine several techniques topic label (Lau et al, COLING 2010) topic n -grams (Blei & Lafferty, arXiv 2009) capitalization recovery Label Terms Topic 11 oil, gas, production, exploration sea, north, company, field, energy petroleum, companies Petroleum state oil company North Sea, natural gas production, exploration, field, energy Andrzejewski and Buttler (LLNL) Latent Topic Feedback for IR KDD 2011 6 / 18

Recommend


More recommend