CONCERTO Conceptual indexing, querying and retrieval of digital documents J. McNaught , W.J. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, P. Leo, A. Persidis, G. Semeraro, F. Esposito, V. Candela, G-P. Zarri & L. Gilardoni
Overview • Concerto: Textual annotation • Functionalities • Textual annotation and the knowledge worker • Relation to principles and assumptions of Knowledge Management
Concerto — Esprit P29159 • Project funded by the European Commission • Esprit: Strategic Research in IT • Conceptual indexing, querying and retrieval of digital documents: � WWW, digital libraries, corporate document bases, … • Ended September 00
CONCERTO: main features • Full knowledge engineering software environment • Computer-aided conceptual annotation • Intelligent IR via annotations • No full semantic or conceptual analysis
Textual annotation • Use KB & document management, IR & language engineering technologies • Knowledge represented as annotations to textual sources of knowledge (so unlike traditional KB) • Only annotate what is relevant to user needs • User decides what knowledge is finally stored
Concerto functionality • Document capture � XML documents • Basic semantic element discovery (terms, names, relationships, actions) • Interactive conceptual annotation � map textual elements to NKRL objects and templates • KM facility: store NKRL annotations in XML documents; index annotations • Query facility, report generation
Acquisition Knowledge Repositories & Preprocessing Management (13) Concept Concept XML Document Manager Repository Translator (5) (9) Template Template Manager (7) Repository (16) BSE & Extraction (14) (10) Conceptual Mapping to (17) Annotation (6) Ontology Knowledge Repository Manager (18) Document (1) (11) (2) Repository Inference (8) (3) Engine (19) (15) (12) (4) (20) Conceptual Query Concept Template Document Annotation Environment Ontology Ontology Acquisition Builder Interface Interface Interface Interface Interface Interfaces Abstracts Conceptual Annotation Knowledge Web Pages Editors & General Users Administrators
Document capture � XML • Define suitable Document Type Definitions (DTDs) • Automatic analysis of logical structure of input documents • Generation of corresponding XML documents • Simple DTDs: later processing enriches DTDs with other meta- information
Document capture and normalisation to XML DAI Web Browser WEB SERVER + 1. Servlet Engine HTML HTTP XML DT Bean 3. SERVLET 2.
Basic Semantic Element Extraction • Extract & tag basic semantic elements • Named entities: companies, products, locations, people, trade names, offices, amounts, … • Context sensitive rules � Also handle co-reference
BSE Extraction continued… • Partial filling of templates (“business rules”) • Database of known (part) names and cues to aid analysis • Success > 90% with proven FACILE technology • Results checked manually � faster than purely manual approach
Basic Semantic Element Extraction Input text NE Rules Basic Database NE Preprocessing Lookup Analyser Named Filled Entities Templates External Tagger & Database Morph. Analyser
Text in process of undergoing Basic Semantic Element Extraction Colours: different types of entity Tool-tip showing rule that applied
Ontology Mapper • Associates names, terms & partially filled templates with classes & templates of main Concerto repositories • Ontology covers domains of current user partners (publishing/printing & biotechnology) • OM can be used to classify documents (as in FACILE)
Conceptual annotation builder interface • User checks output of BSEE module • Completes/verifies partially filled templates • Interaction ensures high-quality results • Higher precision in search
Knowledge representation • Most proposals for conceptual annotation limited in scope � They cannot handle complex narrative actions, facts, events, states relating real or intended behaviour of actors (typical of industrial/economic context) • Use of Narrative Knowledge Representation Language (NKRL)
Knowledge management facilities • Set of repositories, to store � Documents � Conceptual annotations � Plus information required for construction of annotations
Management facilities… • Set of manager modules providing � Access to repositories � Advanced manipulation operations
Knowledge Repositories • Concept repository � Concepts (ontology) and NKRL instances
Repositories… • Template repository � NKRL templates represented in RDF � Information required to construct instances (predicative occurrences) from predicative templates: • “move a generic object” � “Tomorrow, I will move the table”
Repositories… • Document repository � XML documents that are (to be) annotated • Conceptual annotation repository � Conceptual annotations in terms of NKRL predicative occurrences and bindings between them � Represented in RDF
Resource Description Format • W3C proposal for metadata • Handles complex NKRL structures including second-order structures
Knowledge Repository management Concept Concepts manager repository Template manager Conceptual Template annotations repository repository Knowledge manager Conceptual annotations repository Document Inference repository engine
Annotation and the Knowledge Worker: Pira International • Pira Int operates database of abstracts � Published on-line � Printed abstracts journals � 600 titles (many types) about publishing � 3 current types of knowledge worker •abstracters, editors, system workers � Metadata: currently simple lists of index terms, company and trade names
Evolving Knowledge Work Roles at Pira International • Need for new type of worker: Knowledge Administrator � Develops user templates � Maintains templates and domain specific ontologies • User templates: application specific � Written using user-friendly means, not raw NKRL (internal system use)
Example Pira Int Template: Contract Expected values Examples of use between Organisation or Author Person and Organisation or Publisher Person for use of Resource Any content (resource) created by Author in Resource Book status In progress, negotiation, completed
Annotation and the Knowledge Worker: Biovista • Production of corporate intelligence industry reports • Editors scan large amount of information � expert analysis of events, synthesis of trends, much added value � need to identify links between business entities and events � wish to be able to ask questions of stored knowledge
Biovista: specific needs • Business entity identification, e.g. � Companies, people, products, processes (mergers, collaborations, drug development) • Business relationship identification, e.g. � ‘Employee’ relationship � Company activity in industrial sector due to co-development agreement
Biovista Ontologies • Generic business entities • Biotechnology concepts • Fine-grained � Leads to richly-connected concepts that help to answer complex editor queries • Like Pira Int, need for knowledge administrator for ontologies and templates
Principles and Assumptions of KM Davenport’s Principles (subset) • KM is expensive: Concerto reduces costs by � Easing document capture � Enabling rapid integration of knowledge � Adding value via conceptual annotations that are reliable (human-validated)
KM is expensive: reduce costs by � Offering automatic categorisation � Allowing flexible access � Affering basis for training and updating
Davenport’s Principles • Effective management of K requires hybrid solutions of people & technology � Concerto provides an ‘intelligent assistant’ • KM never ends, descriptions must be quick and dirty, highly relevant to user � We offer quick and accurate description, guided by user who maximises relevance
Davenport’s Principles • KM benefits more from maps than models � Concerto offers partial maps, not full models • KM involves sharing & reuse of K (‘unnatural acts’) � We offer a shared knowledge base, typically accessed by those who did not contribute the knowledge initially
Davenport’s Principles • KM means improving work processes � Pira Int and Biovista are examining how business processes will be affected and improved by use of Concerto • Knowledge access is only the beginning � Annotations can be used to provide summaries, enabling wider involvement
Applehans et al.: KM Assumptions • KM does not have to be profound � We offer partial, highly accurate and relevant knowledge, easily targettable • Document management concepts, technologies and procedures are basis for success in KM � Concerto strongly supports document-based KM
Applehans et al.: KM Assumptions • Think big, but start small � Terminologies and ontologies are backbone � Useful in own right in many ways � An additional principle: KM relies on accurate, well-defined, structured terminologies and concept systems
Recommend
More recommend