Online Generic Editing of Heterogeneous Dictionary Entries in Papillon Project Mathieu Mangeot & David Thevenin Work done at NII, Tokyo, Japan Now looking for a position...
My Motivation • Dictionaries are a Key Element of almost every NLP System • But Construction Costs are Heavy • => Lowering the Costs by Facilitating the Construction & Maintenance: • Building Dedicated Environments • Mutualizing the Resources by Reusing Existing Ones • Development by Voluntary Contributors • Resulting Data Publicly Available
Outline • The Situation: Manipulation of XML Dictionaries with Heterogeneous Entry Structures • The Problem: How to Edit them Online? • Our Solution: Using an HMI Tool • 2 Examples: Papillon & GDEF Dicts • Conclusion and Future Work
Papillon Platform http://www.papillon-dictionary.org Online Dict Server User Import Browse Papillon DiCo GDEF Dict WaDoKu FeM JMDict Ding Cedict SAIKAM VietDict
Papillon Platform Contributor Online Dict Server Edits Import Papillon DiCo GDEF Dict WaDoKu FeM JMDict Ding Specialist Cedict SAIKAM Checks VietDict
Outline • The Situation: Manipulation of XML Dictionaries with Heterogeneous Entry Structures • The Problem: How to Edit them Online? • Our Solution: Using an HMI Tool • 2 Examples: Papillon & GDEF Dicts • Conclusion and Future Work
Requirements for the Edition • Editor Available Online • Heterogeneous Entry Structures • Adaptative Interfaces • To the User (Neophyth, Specialist) • To the Platform (PDA, Workstation)
The Best: Ad Hoc Editor
Inconvenients • Ad Hoc for a Particular Structure • Must be Reimplemented if the Entry Structure Changes • Local and Platform Dependent • Users Cannot Contribute Online
Distributed & Democratic Conversion with Distribution to a LISP Program the Lexicographers RTF Data Files base
With Word™!
Inconvenients • Not Usable for Complex Structures • One Type of Information Per Line • No Complete Syntax Checking • Real Time Edition Not Possible • Delay Necessary for Conversion & Transport
Online: with HTML
Inconvenients • Not Dynamically Adaptable • Need to Write One Interface for Each Entry Structure • Lack of Interactors • Only Buttons, Text Boxes, Check Boxes & Pop up Menus
Outline • The Situation: Manipulation of XML Dictionaries with Heterogeneous Entry Structures • The Problem: How to Edit them Online? • Our Solution: Using an HMI Tool • 2 Examples: Papillon & GDEF Dicts • Conclusion and Future Work
Our Solution • None of the Previous Solutions Satisfy our Requirements • An Idea • Using HMI Techniques & Tools for Automatically Generating Interfaces • Generation Based on the Data Structure and the User Profile
ArtStudio: a Multitarget Generation Framework Task Concept Initial description Instance Transit description Final description Abstract UI Concrete Concrete Platform Platform UI UI User User Final Final Environment Environment UI UI • Author: David Thevenin
Our Implementation Necessary files: Concept Instances CUI Model: XML Model Model Schema Automatic Generator Generated UIs: Web/HTML Mobile/WML
A Simple Entry entry head pos example example word scientifique adj journées journal scientifiques scientifique Legend: XML Link to a child element Element textual content Link to the element value
Concepts Model: an XML Schema C_entry I_entry C_head I_head C_list I_list C_pos I_pos word word examples examples TextBox PopUp Menu Legend: • List example1 Concept example2 I_ example3 C_example examples Instance Link to a child concept Link to the interactor used by the concept TextBox Link to the instance
Instances Model <entry><hv>scientifique</hv> <pos>adj</pos> I_entry <ex>journées scientifiques</ex> <ex>journal scientifique</ex></entry> I_examples <ex>journées scientifiques</ex> I_head scientifique I_pos adj list <ex>journal scientifique</ex> word Legend: journées journal I_example I_example scientifiques scientifique Instance Link to a child instance Link to the instance value
CUI Model • XML Document • Describes the Graphic User Interface • Interactors and their Position • Target-Dependent • One Model for each Target: • Edition, Visualisation, Mobile Phone
Outline • The Situation: Manipulation of XML Dictionaries with Heterogeneous Entry Structures • The Problem: How to Edit them Online? • Our Solution: Using an HMI Tool • 2 Examples: Papillon & GDEF Dicts • Conclusion and Future Work
Papillon Dictionary • Multilingual Dictionary with a Pivot Structure • Monolingual Entries linked to a Pivot Volume • Microstructure based on the Meaning-T ext Theory • Very Complex: semantic formula, gvt pattern, lexical functions, etc.
Edition Interface
Other Views Consultation: Mobile Phone:
GDEF Dictionary • Bilingual Estonian-French Dictionary • Project Leader: Antoine Chalvin INALCO, Paris • Microstructure based on the Lemma
Edition Interface
Outline • The Situation: Manipulation of XML Dictionaries with Heterogeneous Entry Structures • The Problem: How to Edit them Online? • Our Solution: Using an HMI Tool • 2 Examples: Papillon & GDEF Dicts • Conclusion and Future Work
Conclusion • Innovative Solution • Generic: Multi-Dictionaries • Efficient: already 152 entries for GDEF Dict (2 people, 2 months) • Multitarget: Edition, Consultation, Mobile Phone • Multipurpose: can be adapted for other type of data
Future Work • To Find a Position! • Implementing more Features of the XML Schemata: • Basic Types: boolean, date, etc. • Complex Structures: choice, etc. • Automatizing the Process: • Generation of the Interface Model from the XML Schema
Recommend
More recommend