Question Answering & the Semantic Web Günter Neumann Language Technology-Lab DFKI, Saarbrücken
Overview • Hybrid Question Answering • Language Technology and the Semantic Web 2004 G. Neumann
Motivation: From Search Engines to Answer Engines ����������� �������������������������� ��#������ ������ ������� ���%���� � ������� ��!���������� ���������� "�����#��������� ���������������������� ������$������������� ������������� ��� ������������� 2004 G. Neumann
▼ ✮ ✩ ✴ ❇ ✯ ✷ ✾ ✻ ❏ ✭ ✰ ✯ ✻✧ ✩ ✫ ✯ ■ ✮ ✺ ✻ ✫ ✧ ✰ ✮✯ ✰ ✯ ✩ ✯ ✳ ✫ ✻❁ ✺ ❑ ✪ ✩ ✱ ✬ ✫ ✮ ✩ ✴ ✻ ✩ ✧ ●❍ ✵ ✬ ✰ ✰ ✯ ❁ ✧ ✯ ✮ ✪ ✮ ✴ ✺ ✧ ✩ ❊❋ ✫❉ ✴ ✫ ✫ ✯ ✾ ✧ ✩ ❄ ✷ ✻ ✺ ✺ ✱ ✩ ✷ ✮ ❄ ✰ ✻ ✴ ✮ ✫✰ ✮ ✧ ✻ ◗ ✻ ❏ ✮ ✮ ✪ ✫ ❁ ❀ ✯ ✾ ✫ ✯ P ❖ ✺ ✮ ✷ ●❍ ✻ ✰ ✤ ❄ ✵ ✮ ✾ ✯ ❇ ❀ ✮ ✳ ✮ ✮ ❅ ✮ ✩ ✷ ✫ ✮ ✷ ❖ ◆ ▼ ✽ ✷ ❃ ❋ ✫▲ ❁ ✻ ❀ ✯ ✮ ✮ ✧✺ ✴ ❈ ✱ ✩ ✩ ✯ ❈ ✯ ✩ ✴ ✵ ✰ ✷ ✮✯ ✬ ✴ ✺ ✮✯ ✪ ✰ ✯ ✮ ✼ ✧ ✩ ✯ ✷ ✼ ✻ ✪ ❁ ❀ ✪ ✪ � ✫✰ ✫ ✮✯ ✭ ✬ ✫ ✩ ✯ ★ ✧ ✦ ✁ ✢ ✝ ✟ ✳ ✪ ✜ ✸✹ ✼ ✻ ✯ ✮ ✴ ✧✺ ✩ ✱ ✱ ✷ ✯ ✮ ✬ ✫ ✪ ✯ ✰ ✝ ✛ ✯ ✍✑ ✏✕ ✔ ✝ ✟ ✓ ✑ ✒ ✏ ✓ ✏ ✍ ✝ ✍✎ ✝ ✌ ✂✄ ✑ ✔ ✗ ✑ ✠ ✍ ✚ ✘✙ ✁ ✟ ✕ ✡ ✏ ✍✑ ✗ ✄ ✖ ✕ ✑ ✍ ✠ ✷ ✩ ☎ ✻ ✻ ✷ ❇ ✪ ✫ ✩ ✷ ✼ ✩ ❄ ✮ ✫ ✻ ✧ ✱ ✻ ✾ ❄ ✾ ❆ ✾ ✬ ✰ ✧ ✻ ✪ ✫ ✮ ✱ ✩ ✪ ✷ ✮ ✻❈ ✴ ✪ ✪ ✻❁ ✯ ✻ ✧ ✾ ✫❂ ✻❁ ❀ ✯ ✪ ✫ ✮ ✻✧✾ ✩ ✿ ✻✧✾ ✽ ✵ ✰ ✰ ✮ ✼ ❃ ✟ ✾ ✯ ✯ ❆ ✧ ✮ ✮❅ ❄ ✧ ✱ ✧ ✷ ✮✯ ✬ ✯ ✪ ✮ ✴ ✺ • Output: a set of possible answers drawn • Input: a question in NL; a set of text and ✴✶✵ Question Answering ✮✲✱ database resources from the resources ✠✡☞☛ ✣✥✤ ☎✆✞✝ 2004 G. Neumann
Hybrid QA Architecture NL NL Questions Answers Hypothesis Question Answer web Analysis Generation real-life QA systems will perform best mining Query Response if they can Generation Analysis Off Line Data • combine the virtues of domain- Harvesting specialized QA with open-domain QA On-Line • utilize general knowledge about Information Extraction frequent types and External Fact DB DB • access semi-structured know- The Web via Fact DB DB of an External Enriched Texts ledge bases Search Engine Fact DB Off-Line Information Extraction 2004 G. Neumann
Design Issues • Foster bottom-up system development • Data-driven, robustness, scalability • From shallow & deep NLP • Large-scale answer processing • Coarse-grained uniform representation of query/documents • Text zooming • From paragraphs to sentences to phrases • Ranking scheme for answer selection • Common basis for • Online Web pages • Large textual sources 2004 G. Neumann
BiQue: A Cross-Language Question-Answering System (cf. Neumann&Sacaleanu, 2003) • Goal: • Given a question in German, find answers in English text corpora • Sub-tasks • Integration of existing components • IR-engines, our IE-core engine, EuroWordNet • Development of methods/components for • Question translation & expansion • Unsupervised NE recognition • Participation at QA-track at Clef –2003/2004 2004 G. Neumann
Major control flow of BiQue “Mit wem ist David Beckham verheiratet?” {person:David Beckham, married, person:?} Web German English Question Lucene IR Question Analysis XML-indexing Query Text corpus Documents Query Answer • Translation Paragraph Annotated Type selection Corpus • WSD • Expansion Passages “David Beckham, the soccer star engaged to marry Posh Spice, is being blamed for England 's World Cup defeat.” Answer Answer Candidates Answer Validation Extraction Posh Spice {person:David Beckham, person:Posh Spice} 2004 G. Neumann
Query Translation & Expansion • Second idea: • First idea: • Use EuroWordNet • Only use • Use external MT-services EuroWordNet • Overlap-mechanism for query • Defines a word-based expansion translation via synset • Crosslingual because offsets • Experience • Q-type & A-type from DE- Question Analysis • EuroWordNet too • Synsets from EuroWN direct sparse on German query expansion (online side alignment) • Neverless introduced • Experience too much ambiguity • NE-translation is • External MT services also used crucial for Word-Sense-Disambiguation WSD • So far, not very much • Reduced degree of ambiguity of help 2004 G. Neumann
Example (cf. Neumann&Sacaleanu, 2003) ,7����������������8�������� �����������9����%�������� :��9; ���!���������3����4�����5��� ����+���������,--.� ��������5� 1 0���������������� ��������������������������#�����%������!���������+��������������,--.1 �������������������� ������!�����������������������������+�������������,--.1 2������������������� ������!������������������#���������+����������������������,--.1 &�' �( �)�*������������#����������+�������������������,--.������������%�����������+��� ��� ����������/ 67���������#���������������������(�� @�������,DCE-ED� @�������A-B-6C ∀ �� ∈ &�' �( �����+�#:�����(;< =������������%���������������&�' �(� �(��*%���8����!��������/ �(��*�������� ��� ��!����/ ����� ∀ ��������:�;�� 9�������������� 9���*������%�������������/ ����������������>���������������?� 2��+�������#����&�' >( =����������������������������������������� @�������ACF6DE� &�' �(� �(��*�##������#�������+����������#������ ��� ���������������������5�/ 9���*��%����������!�������%���������%����5���� ������5�� ��G/ 2004 G. Neumann
What we learned ... • Different MT services can help each other • Logos suitable for EN-query parsing • Necessary to determine A-type, Q-focus on EN side • Systran/FreeTranslation better in NE-translation • Problem: MT-services often compute • Ill-formed strings: bad for query parsing • “partial” translation (mixed strings): problem for IR/paragraph selection • Our envisaged approach • Use DE-query analysis as control object for determining EN query object • Prefer DE-determined EAT, NE, Q-focus • Further decrease role of external MT services; only used for WSD 2004 G. Neumann
Even more to learn ... • Off-line Annotation of corpus would help defining more controlled IR • Query/Answer processing • Question analysis as “deep” as possible • Question classification as basis for answer strategy selection • Answer strategies for definition/list-based questions • Had led to substantial improvements of our Clef-2003 system for Clef-2004 2004 G. Neumann
Recommend
More recommend