Multilingual Verbalisation of Modular Ontologies using GF and lemon Brian Davis, Ramona Enache, Jeroen van Grondelle and Laurette Pretorius CNL 2012 August 29, 2012
Structure WHY? Be Informed use case as context The meta-model/model separation - meta-model semantics WHAT? Verbalisation and … — Modularisation — Label variants and their manipulation — Multilingualism — lemon -GF mapping HOW? Achieving these four aspects in GF (and lemon ) WHAT NEXT? Ideas about future work
Context: Be Informed Business Processes Platform Challenges: Adoption of ontologies -> new audiences (knowledge engineers and ontologists, business users, end users, etc.) -> access via verbalisation in multiple languages Dealing with complexity -> many constraints; changing rules; contextual rules, e.g. customer, time, …; rules from many sources that may cause conflict and overlap Be Informed Business Process Ontology: Captures all relevant activities, artifacts, involved roles etc. and the relations between these in a modularised way : Meta-model , using pre- and post-condition semantics, and Models of specific business process applications Verbalisation: Based on pattern sentences
Context: Be Informed Use Case • Be Informed offers ontology driven support throughout policy lifecycle • Business processes, products and decisions, registrations, interaction • Drafting, choosing, communicating, executing, evaluating • Multilingual , because • Customers are offering multi lingual services (for example: immigrations, Dutch government in the Caribbean) • Customers are sharing international models (for example: Europe, emission trading) • Be Informed is developing international business • Natural language , because • New audiences for ontologies (domain experts, policy makers, citizens) • Models lead to specification, documentation, case documents and letters MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914.
Remaining Challenges CNL 2010: Van Grondelle, Heller, Grijzen, Spreeuwenberg • “ How to prevent large numbers of patterns – Language variations: • Inflectional morphology: Plurals,.. – Other natural languages • Mathematical Expressions • Named things vs anonymous things • Extending/relating CNL’s like we extend/relate meta models ” CNL 2012: GF, lemon , multilingualism, label variants, modularisation
Meta-model
Model: Grant application process
Pattern Sentences Van Grondelle, J.C., Gülpers, M.: Specifying flexible business processes using pre and post conditions. In PoEM, Volume 92 of Lecture Notes in Business Information Processing , Springer (2011) 38 – 51: 1. The activity PUBLISHING THE RESULT may be performed if (a) a document of type DOCUMENT WITH DETAILS is available. 2. The activity PUBLISHING THE RESULT is completed if (a) a document of type SUBMISSION FORM has been created. Clumsy grammar and lack of fluency Non-scalability in terms of number of supported languages
Modularisation Meta-model Models Concepts and relations chosen based on Individual parties (no consensus) consensus Determined once, fixed Introduced over time, frequent changes Ontology formalism Various information sources/formalisms/styles Created by knowledge engineers Created by a wide range of people BI default meta models (stable) Resource intensive development (changes/updates) Lexicalisation and verbalisation Labels follow the ontology according to Labels exhibit large variation guidelines (e.g. case, activity, etc.) Complexity at lexical and Complexity at lexical level syntax/grammatical levels (pattern sentences)
Label variants Sources of Variation : Non-linguistic • Different styles of choosing labels • Different backgrounds of the people involved in modelling • Trade-off: guidelines and standards for systematically choosing good lab vs. industry adoptability and robustness in practice • Multilingual contexts Linguistic • Concepts referred to by a proper name, noun or compound noun (term), e.g. “Intake” or “Equality principle” • Concepts referred to by description, in form of a proposition or verb oriented style e.g. “Publishing the result”, “Publish the result” or “The result is published”
Label manipulation Manipulation : • While allowing some freedom of choice in label selection, the verbalisation of label variants that refer to the same concept in the ontology should be unique • Verbalisation of triples with complex labels according to required sentence patterns, with increased fluency (L, C and N)
lemon Requirements for ‘ontology - lexicon’ model – Represent linguistic information relative to ontology • Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application – Keep semantics separate from linguistic info • Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge – Modular, minimal design • Provide simple core model that can be easily extended upon need le xicon mo del for on tologies: ‘ lemon ’ – General model for formalising lexical features relative to independently defined ontological semantics – http://www.monnet-project.eu/lemon
lemon: Overview
Verbalisation Ontology verbalisation: exploits the complementary strengths of GF and lemon (modularisation , mapping …) GF: captures ontological information as well as the required sentence structure for multiple languages lemon: provides concrete label information in multiple languages Specification of BI business processes in terms of pre- and post-conditions requires verbalisation of such conditions, in accordance with the sentence patterns Concept labels are to be verbalised as propositional statements. Triples (activities with pre-conditions and/or post-conditions) are verbalised as conditional statements (“A if B”), where A and B are simple propositional statements with modalities, as appropriate.
Verbalising the triple (Activity, Requires_Available, Artifact subtyped as Document)
Grammatical Framework(GF) in a nutshell ● Grammar formalism ● GF grammar = abstract + concretes ● Mainly used for multilingual applications of natural language describing limited domains ● Resource library – basic syntactic constructions for 30 languages, useable as software library
Prior uses of GF ● Multilingual mathematical exercises(WebAlt) ● Dialogue systems(TALK) ● Verbalising ontologies(SUMO, MOLTO) ● Modelling controlled natural languages (Attempto)
Advantages of domain-specific GF grammars ● Code reuse – new grammars are easily developed using the resource library ● Accessible w/o linguistic training ● Accessible w/o extensive GF training(example-based) ● Can model sophisticated aspects of natural language(long- distance dependencies, discontinuous constituents, clitics) ● Define a translation system for any pair of languages
The problem ● Verbalise a business model for English and Dutch
The problem
The problem Separate concepts in ● T-Box: basic architecture of the meta-model => general patterns for verbalisation ● A-Box: instances of the concepts from the T-Box, particular cases of the meta-model
The problem T-Box ● Main modeling task ● Involves specifying the abstract syntax and concrete syntaxes defining verbalisation patterns
The solution : T-Box Abstract syntax: ● Translate the concepts in the model to GF categories ● Translate the signatures of the functions from the model to GF functions
The solution : T-Box ● Abstract syntax: cat Artifact; ● Activity; ● Fragment; ● fun requires_available : Activity → ● Artifact → Fragment ;
The solution : T-Box ● Concrete syntax: ● Language specific ● Maps concepts to basic syntactic categories(NP, S) and complex ones for a higher-quality language generation
The solution : T-Box ● Concrete syntax: ● lincat Artifact = NP ; Activity = {noun : NP; subj : NP; vp : VP; ● hasVerb : Bool }; ● Fragment = {subj : NP; pred : VP; ● ext : {s : S; hasExt : Bool}}; ●
The solution : T-Box Concrete syntax: ● define overloaded functions to build the categories ● oper mkFragm = overload { mkFragm : NP → VP → Fragm = ● \np, vp → { subj = np; pred = vp; ● ext = {s=dontCareS; hasExt=False}}; ● mkFragm : NP → VP → S → Fragm = ● \np, vp , sub → { subj = np; pred = vp; ● ext = {s=sub ; hasExt = True}}; } ●
The solution : T-Box Concrete syntax: ● oper mkActivity = overload { mkActivitm : NP → Activity = ● \ o → {noun = o; subj = o; vp = noVP; ● hasVerb = False}; ● mkActivity : V2 → NP → Activity = ● \v,o → {noun = nominalize ( mkVPSlash v) o; ● subj = o; vp = passiveVP v; hasVerb = True}; } ●
The solution : T-Box Concrete syntax: ● implement the functions as verbalisation patterns ● lin requires_available ac ar = mkFragm ac.subj ac.vp (mkS (mkCl ar (mkVP ● available_A));
The solution : A-Box Concrete syntax: ● fun Aintake = mkActivity (mkNP intake_N); fun ApublishingOfResult = mkActivity ● publish_V2 (mkNP the_Det result_N); ●
Recommend
More recommend