darmstadt knowledge processing repository based on uima
play

Darmstadt Knowledge Processing Repository Based on UIMA Iryna - PowerPoint PPT Presentation

Darmstadt Knowledge Processing Repository Based on UIMA Iryna Gurevych, Max Mhlhuser, Christof Mller, Jrgen Steimle, Markus Weimer, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science Department


  1. Darmstadt Knowledge Processing Repository Based on UIMA Iryna Gurevych, Max Mühlhäuser, Christof Müller, Jürgen Steimle, Markus Weimer, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science Department Darmstadt University of Technology

  2. Telecooperation

  3. Telecooperation

  4. THESEUS Darmstadt Knowledge Processing Software Ubiquitous Knowledge Processing Repository AQUA SIR

  5. A utomatic Qu ality A ssessment and Feedback in eLearning 2.0 (AQUA) AQUA 5

  6. User Generated Discourse in Web 2.0

  7. AQUA – Anoto pen

  8. AQUA - System Architecture Natural Language Processing Machine Learning

  9. AQUA – System Architecture

  10. SIR (in cooperation with Prof. Hinrichs) • Semantic Information Retrieval Natural language low level expression of communication information need interface Bridge the human – computer gap Semantic search (SIR) based on semantic relatedness Natural language low level expression of communication information need interface

  11. Information Retrieval (IR) Boolean, Vector Space, ... Document ... Keywords Document 2 � Document 1 Document ... Document ... Document ... Document 3

  12. SIR-Project baker, to program, Semantic Relatedness quality assurance Profession ... Essay Profession 2 Profession 1 cake, computer, Profession ... to read, ... Profession ... Profession ... Semantic search (SIR) based on semantic relatedness Profession 3 Natural language low level expression of communication information need interface

  13. SIR Example find good index terms Compound Splitting Negation Detection WSD compute semantic relatedness

  14. THESEUS - TEXO • Large-scale BMBF-Project, industry (SAP, Siemens, etc.) • Service Marketplaces in Web 2.0 � Find services, both users and machines • Problem: � Only keyword-based search � Lack of ontologies for semantic search • Solution: � Use natural language descriptions of web services � Apply Semantic Information Retrieval � Community Mining for optimized service selection � Darmstadt Knowledge Processing Repository

  15. UIMA components SIR AQUA THESEUS Wikipedia reader, Forum reader , Plain text reader Data import Tokenizer, Sentence splitter, Stopword tagger Linguistic preprocessing Stemmer, Lemmatizer, Compound Splitter Morphological analysis PoS-Tagger, Parser Syntactic analysis NE tagger, Sentiment detector, WSD component Semantic analysis Swear word tagger (AQUA), Negation detection (SIR) Project specific analysis Indexer (Lucene, Terrier), ARFF export Data export

  16. Advantages of UIMA • Components can be shared between projects • Shared model of thinking � “Reader + Annotators + Consumer” � Configuration of components • Descriptive component orchestration

  17. Challenges • Agree on a type system � No automatic type mapping • Some rough edges in UIMA � No real plug’n’work with PEAR packages � Using constraints to align annotations seems to be slow

  18. Wish list • Automatic type matching • Better tool support � Improving Eclipse plug-ins (robustness, features) � Refactoring of UIMA components � CPE runner ++ (automatic logging, performance monitor, etc.) • Plug’n’work approach • “Import by name” in CPEs � Or make ${CPM_HOME}/path also work for readers/consumers • Construct XML descriptors from Java annotations • More intuitive API

  19. Thank you very much! Thank you very much! • Acknowledgements: � DFG for funding “Semantic Information Retrieval” � DFG for funding “Automatic Quality Assessment and Feedback in eLearning 2.0” http://www.ukp.tu-darmstadt.de/

Recommend


More recommend