MIMIR: Multi-paradigm Information Management Index and Repository Valentin Tablan Niraj Aswani, Ian Roberts University of Sheffield
University of Sheffield, NLP MIMIR □ … is an IR engine that can search over: ○ Text ○ Semantic Annotations ○ Ontologies and Knowledge Bases ...represented as GATE documents □ … is built on top of: ○ Ontotext ORDI ○ MG4J text indexing engine 2009 GATE Summer School, Sheffield 2
University of Sheffield, NLP Semantic Annotation □ … is an annotation process where [parts of] the schema (annotation types, annotation features) are ontological objects. □ … is different from: ○ Ontology learning ○ Ontology population (though it sometimes includes it) 2009 GATE Summer School, Sheffield 3
University of Sheffield, NLP Semantic Annotation 2009 GATE Summer School, Sheffield 4
University of Sheffield, NLP Under the Hood Document Ontology + Collection Knowledge Base ... Mentions Token Token Token Index Index Index Index 2009 GATE Summer School, Sheffield 5
University of Sheffield, NLP A Mimir Configuration □ Text fields ○ string (the document text, downcased) ○ root (morphological root of each word) ○ category (part-of-speech of each word) □ Annotations ○ Measurement (indexed features: type, dimension) ○ Reference (indexed feature: type) ○ Section (indexed feature: type) 2009 GATE Summer School, Sheffield 6
University of Sheffield, NLP Query Types (basic) □ Text. Matches plain text. Syntax: sequence of words Example: device for measurement of light intensity □ Annotation. Matches annotations. Syntax: {Type feature1=value1 feature2=value2...} Example: {Measurement type=scalarValue} □ Sequence Query. Sequence of other queries. Syntax: Query1 [n..m] Query2... Example: up to {Measurement} [1..5] {Measurement} 2009 GATE Summer School, Sheffield 7
University of Sheffield, NLP Query Types (inclusion) □ IN Query. Hits of one query only if in hits of another. Syntax: Query1 IN Query2 Example: London IN {Reference} □ OVER Query. Hits of a query, only if overlapping hits of another. Syntax: Query1 OVER Query2 Example: {Reference} OVER London 2009 GATE Summer School, Sheffield 8
University of Sheffield, NLP Query Types (advanced) □ Named Index. Search different text indexes. Syntax: indexName:term Example: root:be [matches is, am, was, were, ...] □ Kleene. Specified number of repeats. Syntax: Query +n, Query +n..m Example: {Measurement}[2], category:JJ[1..3] 2009 GATE Summer School, Sheffield 9
University of Sheffield, NLP MIMIR ancestry: ANNIC 2009 GATE Summer School, Sheffield 10
University of Sheffield, NLP MIMIR v. ANNIC: Index size 120 103.04 100 87.77 80 60 40 21.51 20 8.33 6.9 0.82 0 ANNIC v. 0.1 v. 0.2 v. 0.3 v. 0.4 v. 1.0 Index Size (times raw input) 2009 GATE Summer School, Sheffield 11
University of Sheffield, NLP MIMIR v. ANNIC: Features ANNIC Mimir Annotation features All (+) Only configured (-) Hit details Full (+) Text only (-) JAPE Compatible Yes (+) Partial (-) Scalability Poor (-) Very Good (+) Index Size Large (-) ~ Input (+) Search Speed Fair (-) Fast (+) 2009 GATE Summer School, Sheffield 12
University of Sheffield, NLP DEMO! 2009 GATE Summer School, Sheffield 13
Recommend
More recommend