Dictionaries Christian Chiarcos Applied Computational Linguistics - PowerPoint PPT Presentation

Digital Humanities Workshop, Sep 9 – 11, 2014, Batumi, Georgia Linking Machine-Readable Dictionaries Christian Chiarcos Applied Computational Linguistics Lab chiarcos@informatik.uni-frankfurt.de 1

Linking Machine-Readable Dictionaries • Motivation: Aggregating information – from different dictionaries – from dictionaries and automatically analyzed text • State of the art on machine-readable dictionaries – XML (TEI, LMF) – RDF (lemon) • Example – Converting, linking and querying multilingual Wiktionaries

The future of the dictionary … „The three things no young person owns or uses and often don‘t realise exist: an alarm clock, an address book and a dictionary … At university I didn‘t meet a single person who owned any of them“ http://guardian.co.uk/books/booksblog/2012/sep/13/dictio naries-democratic-crowdsourcing/

The future of the dictionary … „The three things no young person owns or uses and often don‘t realise exist: an alarm clock, an address book and a dictionary … At university I didn‘t meet a single person who owned any of them“ http://guardian.co.uk/books/booksblog/2012/sep/13/dictio naries-democratic-crowdsourcing/ „[D]ictionaries are not dead, they just smell funny“ Ilan Kernerman, CEO KDictionaries, Kernerman Dictionary News 21 (July 2013): 1, paraphrasing Frank Zappa‘s quote on Jazz (1974)

The future of the dictionary … „[D]ictionaries … lose their autonomous identity and disappear in language technology. Machine translation, word processors, … and the like incorporate dictionary content and apply it in new forms“ Ilan Kernerman, CEO KDictionaries, Kernerman Dictionary News 21 (July 2013): 1 „[T]he message is clear and unambiguous: the future of the dictionary is digital.“ Stephen Bullon, Macmillan Education, upon announcing that Macmillan will no longer publish print dictionaries, Nov 2012

The future of the dictionary … … is digital – no space limitations • adding context information, e.g., from corpora – dynamic ordering & search • no index optimization for manual lookup – information aggregation • integrating information from different sources

The future of the dictionary … … is digital – no space limitations • adding context information, e.g., from corpora – dynamic ordering & search • no index optimization for manual lookup – information aggregation • integrating information from different sources two use cases: • cross-lingual dictionary lookup • text mining for archaeologists

Information Aggregation I Cross-lingual search • Assume you‘re a speaker of language X, say, German, and are interested in working with text in language Y, say, Georgian – Statistical machine translation may give you an idea, but you certainly want to counter- check with a dictionary ...

Information Aggregation I Cross-lingual search • Assume you‘re a speaker of language X, say, German, and are interested in working with text in language Y, say, Georgian – Statistical machine translation may give you an idea, but you certainly want to counter- check with a dictionary ... ... unfortunately, you don‘t have one

Information Aggregation I Cross-lingual search • Assume you‘re a speaker of language X, say, German, and are interested in working with text in language Y, say, Georgian • We do have a Georgian-English dictionary, though, and (luckily) a English-German one • Given a proper representation, storage and query formalisms, it is possible to perform a transitive query using English as a pivot language

Information Aggregation I Cross-lingual search Abschnitt Ader Basis foot Bein Etappe dict.leo.org Fuß http://www.georgianweb. Fußbreit com/pdf/lexicon.pdf leg dict.leo.org Fußende ფეხი Fußlinie Fußmauer Fußpunkt Hachse Kathete Mastfuß Programmzweig Strecke Schaft Strang Stollen Standfuß Sockel Schenkel Tritt Standvorrichtung Schlägel Sohle Segelunterliek

Information Aggregation I Cross-lingual search • Unfortunately, using English introduces a lot of noise – 2 English translations, 27 (!) German translations • But we can combine multiple paths, e.g., one using English as a pivot, one using Russian – elements in the intersection should be more reliable

Information Aggregation I Cross-lingual search Abschnitt Ader Basis foot Bein Etappe dict.leo.org Fuß http://www.georgianweb. Fußbreit com/pdf/lexicon.pdf leg dict.leo.org Fußende ფეხი Fußlinie Fußmauer http://meskhi.net/lexicon Fußpunkt нога dict.leo.org Hachse Kathete Mastfuß Programmzweig Strecke Schaft Spielbein Strang Stollen Standfuß Sockel Schenkel Tritt Standvorrichtung Schlägel Sohle Segelunterliek

Information Aggregation I Cross-lingual search • Unfortunately, using English introduces a lot of noise – 2 English translations, 27 (!) German translations • But we can combine multiple paths, e.g., one using English as a pivot, one using Russian – elements in the intersection should be more reliable 27 English-based translations + 3 Russian-based translations = 2 shared translations

Information Aggregation I Cross-lingual search • In a similar way, words missing from the Russian (or the English) path may be taken from the other one – more noise, but better coverage 27 English-based translations + 3 Russian-based translations = 28 possible translations – e.g., German Spielbein „free leg“

Information Aggregation I Jargon : A Prototype • student project @ GU Frankfurt • enter a word (in any language) and a target language • consult different machine-readable dictionaries to find a path into the target language • visualize results together with their „path“

Information Aggregation I Jargon : A Prototype

Information Aggregation I Jargon : A Prototype • Jargon uses lexical resources provided by different groups – using a shared vocabulary • lemon, more in 10 minutes => joint queries • still under development – prototype on restricted data set

Information Aggregation II Multilingual Semantic Web • a system for text mining (open information extraction) from archeological reports • extract machine-readable information from plain text – currently, English only • in the longer perspective, German and Dutch – http://corpora.acoli.informatik.uni- frankfurt.de/text-mining-webservice

Information Aggregation II Multilingual Semantic Web Given a PDF document

Information Aggregation II Multilingual Semantic Web Upload to server

Information Aggregation II Multilingual Semantic Web Perform NLP analysis

Information Aggregation II Multilingual Semantic Web Visualize data

Information Aggregation II Multilingual Semantic Web e.g. arch. periods

Information Aggregation II Multilingual Semantic Web or query in the results

Information Aggregation II Multilingual Semantic Web or query in the results TEXT Dr Irakli Iashvili spent a month at the Heberden Coin Room at the Ashmolean Museum , also with the support of the British Academy , working on the coinage of the Black Sea in general , and the coins found at QUERY Pichvnari in particular . TRIPLES Result

Information Aggregation II Multilingual Semantic Web or query in the results In this query, the only information-bearing element is „:work“ If we define that „:work“ entails „:bearbeitet“ (the German translation), we can formulate the same query in German i.e. ?a :bearbeitet ?c

Linking Machine-Readable Dictionaries • Motivation: Aggregating information – from different dictionaries – from dictionaries and automatically analyzed text • State of the art on machine-readable dictionaries – XML – RDF • Example – Converting, linking and querying multilingual Wiktionaries

Machine Readable Dictionaries XML • Text Encoding Initiative (TEI) – specifications for markup of digital-born documents – originally closely oriented towards digital editions of printed books – rich metadata (TEI header) – semantic markup ( div, seg, verse, … ) – limited interoperability • many different ways to represent the same information => information aggregation ???

Machine Readable Dictionaries XML • Lexical Markup Framework (LMF) – ISO standard for representing machine- readable dictionaries – an abstract model with XML specifications (DTD) – concrete application requires an instantiation  extending the DTD  violating the original DTD  in order to use this standard, you need to break it

Machine Readable Dictionaries XML • Lexical Markup Framework (LMF) – ISO standard for representing machine- readable dictionaries – an abstract model with XML specifications (DTD) – concrete application requires an instantiation  extending the DTD  violating the original DTD  in order to use this standard, you need to break it  suggestions for alternative representations of LMF, e.g., RDF (Francopoulo 2006)

Resource Description Framework (RDF) • W3C standard (1999) – generic data model: directed labeled graph • nodes, edges, labels – originally developed to provide metadata about resources • e.g., journals in a bookstore and eBooks in an online shop – resources are unambiguously identified in the web of data by Uniform Resource Identifiers URIs)

Dictionaries Christian Chiarcos Applied Computational Linguistics - PowerPoint PPT Presentation

Digital Humanities Workshop, Sep 9 11, 2014, Batumi, Georgia Linking Machine-Readable Dictionaries Christian Chiarcos Applied Computational Linguistics Lab chiarcos@informatik.uni-frankfurt.de 1 Linking Machine-Readable Dictionaries

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries & Terminology & Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Coding dictionaries information: How to manage? How to use? Eric Sorel Herve Guimard PhUSE 2009

The Federal Circuit month at Month at a Glance COURT CONSTRUES ROLE OF DICTIONARIES,

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

OpenMath Content Dictionaries: the Current State James H. Davenport Department of Computer

NPFL103: Information Retrieval (2) Dictionaries, Tolerant retrieval, Spelling correction Pavel

and sharing agroecological knowledge Jorge Chavez-Tafur, Paulo Petersen, Frank van Schoubroeck

Elder Basic Benefits Training Additional Options for Income Maximization June 30, 2020 Rachel

MeltMop Silver B Product Need 20,460 Ice, sleet, and snow-related occupational injuries in 2017

2018 Pennsylvania Climate Action Plan Updates June 26, 2018 Prepared for the Climate Change

Chapter 1 Primitive Man Hunter/gatherers Counted Simple Notches on wolf bone

TSWF Pediatric Nursing Services AIM Form Training May May-Aug 2020 Form Version Medically

Disclosures This is How I Do It Bard Peripheral Vascular - Research, Consultant Rertrograde

An Achilles Heel in Signature-Based IDS: Squealing False Positives in SNORT Sam Patton * Bill

Dictionaries Christian Chiarcos Applied Computational Linguistics - PowerPoint PPT Presentation

Digital Humanities Workshop, Sep 9 11, 2014, Batumi, Georgia Linking Machine-Readable Dictionaries Christian Chiarcos Applied Computational Linguistics Lab chiarcos@informatik.uni-frankfurt.de 1 Linking Machine-Readable Dictionaries

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries &amp; Terminology &amp; Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Coding dictionaries information: How to manage? How to use? Eric Sorel Herve Guimard PhUSE 2009

The Federal Circuit month at Month at a Glance COURT CONSTRUES ROLE OF DICTIONARIES,

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

OpenMath Content Dictionaries: the Current State James H. Davenport Department of Computer

NPFL103: Information Retrieval (2) Dictionaries, Tolerant retrieval, Spelling correction Pavel

and sharing agroecological knowledge Jorge Chavez-Tafur, Paulo Petersen, Frank van Schoubroeck

Elder Basic Benefits Training Additional Options for Income Maximization June 30, 2020 Rachel

MeltMop Silver B Product Need 20,460 Ice, sleet, and snow-related occupational injuries in 2017

2018 Pennsylvania Climate Action Plan Updates June 26, 2018 Prepared for the Climate Change

Chapter 1 Primitive Man Hunter/gatherers Counted Simple Notches on wolf bone

TSWF Pediatric Nursing Services AIM Form Training May May-Aug 2020 Form Version Medically

Disclosures This is How I Do It Bard Peripheral Vascular - Research, Consultant Rertrograde

An Achilles Heel in Signature-Based IDS: Squealing False Positives in SNORT Sam Patton * Bill

Computational Dictionaries Computational Dictionaries & Terminology & Terminology