Interoperability of an 18 th century Italian-Latin-Croatian dictionary Petra Bago, Damir Boras Department of information and communication sciences, Faculty of humanities and social sciences, University of Zagreb {pbago, dboras}@ffzg.hr INFuture2015, Zagreb, 11-13 November 2015
Introduction ● An increasing interest in digitizing process of historical texts ● A lack of communication between the community members – The digitization projects are usually isolated to the project teams, universities, institutions and individuals ● A demand for standardization of technologies and processes – A key concept emerges: Interoperability INFuture2015, Zagreb, 11-13 November 2015
Reaction to lack of communication ● International cooperation ● Sustainable Interoperability for Language Technology (SILT) (USA) and Fostering Language Resources Network (FlaReNet) (EU) ● The main goal: to create a consensus related to sharing data and technologies for language resources and applications – working towards the interoperability of existing data, and promote standards for markup and resource creation INFuture2015, Zagreb, 11-13 November 2015
Interoperability ● For computer systems: – Syntactic interoperability ● to enable communication and data exchange, relying on specific data formats, communication protocols, and the like ● important that information is exchanged – Semantic interoperability* ● to automatically interpret exchanged information meaningfully and accurately in order to produce useful results via compliance to a common information exchange reference model ● important that information is interpreted the same on both sides INFuture2015, Zagreb, 11-13 November 2015
TEI encoding scheme ● TEI (Text Encoding Initative) encoding scheme for dictionaries ● enables semantic interoperability – exchange without or with minimal information loss, and correct interpretation of information ● the Guidelines recommend how to encode implicit features of textual resources, thereby making them explicit (a de facto standard) ● based on XML ● manual process INFuture2015, Zagreb, 11-13 November 2015
Encoding dictionaries ● The structure of entries – varies among and within dictionaries – a scheme should be suitable for various entry structures – complex but consistent structure ● The information found within entries – most information is implicit or compressed (lexicographical metadata) – to encode precise typographic form of the source text or the underlying structure of the information it presents INFuture2015, Zagreb, 11-13 November 2015
About della Bella's dictionary ● 1. volume of a second edition of “Dizionario italiano-latino-illirico” (Italian-Latin-Croatian dictionary) compiled by Ardelio della Bella and printed in Dubrovnik in 1785 ● very complex entry structure (examples) ● intended for Italian Jesuist missionaries ● Croatian grammar in the preamble ● 899 pages, 2 parts (preamble + dictionary), 2 volumes (preamble, A-H + I-Z), ~19,000 headwords INFuture2015, Zagreb, 11-13 November 2015
Encoding of della Bella's dictionary ● to keep sequence of information found in the original text ● all additional information is encoded through attributes of elements Abate . Abbas , tis . m. Opat , ta . m. Igu-|men , ena . m. Dignità d’Abate . Opat-|ſtvo , va . n. Igumenſtvo , tva . m. INFuture2015, Zagreb, 11-13 November 2015
Encoding of della Bella's dictionary Abate . Abbas , tis . m. <quote>Abbas <pc>,</pc></quote> Opat , ta . m. Igu-|men , ena . <form type="inflected"> m. Dignità d’Abate . Opat-| <gramGrp> ſtvo , va . n. Igumenſtvo , <case value="genitive"/> tva . m. <number value="singular"/> tis <form type="lemma" <pc>.</pc> xml:lang="it"> <orth>Abate</orth> </gramGrp> <pc>.</pc> </form> </form> INFuture2015, Zagreb, 11-13 November 2015
Encoding of della Bella's dictionary <cit type="example" xml:lang="hr"> <quote>Evo gre pet godin’, dàsam gne sluga ja <pc>;</pc></quote><lb/> <bibl>Sciſc<pc>.</pc></bibl> </cit> INFuture2015, Zagreb, 11-13 November 2015
Conclusion ● To enable semantic interoperability of digitized historical dictionaries -> the dictionaries have to be encoded using some standard ● Successful encoding of della Bella's 18 th century dictionary entries using a TEI (Text Encoding Initiative) encoding scheme (a de facto standard) ● Automatization of the encoding process? ● Linking to external resources (i.e. online encyclopedias)? INFuture2015, Zagreb, 11-13 November 2015
Thank you! INFuture2015, Zagreb, 11-13 November 2015
Recommend
More recommend