curation technologies for multilingual europe
play

Curation Technologies for Multilingual Europe Georg Rehm DFKI, - PowerPoint PPT Presentation

Curation Technologies for Multilingual Europe Georg Rehm DFKI, Germany META-FORUM 2016 Lisbon, Portugal 04/05 July 2016 Author Scholar TV editor Researcher Information Knowledge worker ? Investigative


  1. Curation Technologies 
 for Multilingual Europe Georg Rehm DFKI, Germany META-FORUM 2016 – Lisbon, Portugal – 04/05 July 2016

  2. • Author • Scholar • TV editor • Researcher • Information Knowledge worker • ? Investigative journalist • Designer of an exhibition Information • Curator of digital information ?? Information ? Information Information Information Information Information Information Information Input Processes Software Output Curation Technologies for Multilingual Europe

  3. Sectors Input Processes Software Output tweet analyse text processor newspaper article newspaper article select presentation multimedia website wire copy focus spreadsheet tv report facebook status update revise email exhibition catalogue search result read up on browser mobile application email write groupware mashup (e.g., map) text message create sector-specific application text piece concept research CMS concept text file assess ECMS timeline video evaluate CRM study map arrange enterprise software presentation stockphoto sort graphics/layouting software fact collection in-house database IP telephony description of an exhibit structure calendar entry summarise etc. analysis spreadsheet shorten etc. archive translate Information ? etc. catch up on Information combine ?? Information ? Information abstract Information Information integrate Information visualise Information Information generate Information annotate Input Processes Software Output reference etc.

  4. Sectors Input Processes Software Output tweet analyse text processor newspaper article newspaper article select presentation multimedia website wire copy focus spreadsheet tv report facebook status update revise email exhibition catalogue search result read up on browser mobile application email write groupware mashup (e.g., map) text message create sector-specific application text piece concept research CMS concept text file assess ECMS timeline video evaluate CRM study map arrange enterprise software presentation stockphoto sort graphics/layouting software fact collection in-house database IP telephony description of an exhibit structure calendar entry summarise etc. analysis spreadsheet shorten etc. archive translate Information ? etc. catch up on Information combine ?? Information ? Information abstract Information Information integrate Information visualise Information Information generate Information annotate Input Processes Software Output reference etc.

  5. Sectors Input Processes Software Output tweet analyse text processor newspaper article newspaper article select presentation multimedia website wire copy focus spreadsheet tv report facebook status update revise email exhibition catalogue search result read up on browser mobile application email write groupware mashup (e.g., map) text message create sector-specific application text piece concept research CMS concept text file assess ECMS timeline video evaluate CRM study map arrange enterprise software presentation stockphoto sort graphics/layouting software fact collection in-house database IP telephony description of an exhibit structure calendar entry summarise etc. analysis spreadsheet shorten etc. archive translate Information ? etc. catch up on Information combine ?? Information ? Information abstract Information Information integrate Information visualise Information Information generate Information annotate Input Processes Software Output reference etc.

  6. Digital Curation Technologies • Make curation processes in four SMEs (and sectors) more efficient through language and knowledge technologies. • Technology transfer project to arrive at proofs of concept. • Curation services for real companies and real use cases. • The human expert/curator is always in the centre and loop. • Platform for digital curation technologies: innovation boost. sector-specific solutions sector-specific technologies platform technologies ! curation technologies language and knowledge technologies Curation Technologies for Multilingual Europe

  7. Curation Processes Processing, exploration and 
 re-aggregation of domain- and task- Structure visualisation specific document collections. Multilingual multimedia sources Crossmedia recommendations Curation Dashboard Multilingual summarisation Event timelining Semantification of content Multilingual sentiment analysis Semantic storytelling Ontology-based knowledge structures Automatic hyperlinking of document collections Curation Technologies for Multilingual Europe

  8. Key Characteristics • Technology transfer and integration project • Broad set of tools and technologies • Focus on building proofs of concept • Our technologies don’t have to be perfect • Human expert, i.e., the curator, always in the loop • Important for all SME partners: domain-adaptability. • WPs: Semantic Analysis, Semantic Generation, Multilingual Technologies, Integration into Curation Tech Curation Technologies for Multilingual Europe

  9. client using 
 the API platform for digital curation technologies client using 
 external the API service 1 broker REST API curation service 1 language or knowledge technology client using 
 external the API curation service 2 service 2 language or knowledge pipelined curation workflow technology client using 
 the API • Curation process: e-service available through REST API. • Services can be combined to form pipelines or workflows . • Domain-adaptability: every curation process has a training API to create and use domain-specific models. Curation Technologies for Multilingual Europe

  10. Current Results • Implemented the following baseline services: – NER – e-entityrecognition e-service – Geolocation – e-entityrecognition and visualisation – Temporal Analyser – e-entityrecognition and visualisation – Classification – e-classification e-service – Clustering – e-clustering e-service – Machine Translation – e-translation e-service • Curation Dashboard (first prototype) • Semantic Storytelling (work in progress) Curation Technologies for Multilingual Europe

  11. NER, Entity Linking, Geolocation • • Currently based on OpenNLP (with NIF integration) Entity Linking through SPARQL queries to DBPedia • • Mode 1: model-based (for domains where annotated For locations, GPS-coordinates are retrieved, data is available) document level average and standard deviation (over • all locations) are calculated to visualise positioning of Mode 2: dictionary-based (for domains where only a documents on a map. list of names is available) ... In the Viking colony of Iceland, an extraordinary vernacular literature blossomed in the 12th through 14th centuries ... ... 
 The ships were scuttled there in the 11th century, to block a 
 navigation channel and thus 
 protect Roskilde, then 
 Copenhagen from seaborne assault 
 ... ... 
 Viking Age inscriptions have 
 also been discovered on the 
 Manx runestones on the 
 Isle of Man. 
 … Plain Text NIF enrichment visualisation http://api.digitale-kuratierung.de/api/e-nlp/namedEntityRecognition?analysis=ner http://http://dev.digitale-kuratierung.de/admini/pages/geolocalization.php Curation Technologies for Multilingual Europe

  12. NER Training http://api.digitale-kuratierung.de/api/e-nlp/trainModel?analysis=dict 
 http://api.digitale-kuratierung.de/api/e-nlp/trainModel?analysis=ner 
 (in the suboptimal case that only a list of terms and their URIs in an (if annotated training data is available) 
 ontology is available) 
 NER model directly usable on new input Curation Technologies for Multilingual Europe

  13. Temporal Analysis ... 
 The ships were scuttled there in the 11th century, to block a 
 900 navigation channel and thus 
 protect Roskilde, then 
 Copenhagen from seaborne assault 
 • Sort and rank documents from a ... collection on chronological scale. ... 
 Viking Age inscriptions have 
 • Developed rule-based system due also been discovered on the 
 to our focus in terms of languages Manx runestones on the 
 (EN, DE), domain adaptability, Isle of Man. 
 normalisation requirements. ... • Analysis of temporal expressions ... in a document (or, later, In the Viking colony of Iceland, paragraphs or even sentences). an extraordinary vernacular • Compute mean value for date and literature blossomed in the 12th time, allowing positioning on a through 14th centuries 1600 timeline. … • Future plans: adaptability through visualisation Plain Text NIF enrichment user-specific rules. http://api.digitale-kuratierung.de/api/e-nlp/namedEntityRecognition?analysis=temp • Related work: SUTime, http://dev.digitale-kuratierung.de/admini/pages/timelining.php HeidelTime, Tango, Tarsgi; many papers at LREC 2016 Curation Technologies for Multilingual Europe

Recommend


More recommend