TIB|AV-Portal Challenges managing audiovisual metadata encoded in RDF Jörg Waitelonis yovisto GmbH Margret Plank German National Library of Science and Technology (TIB) Hannover Prof. Dr. Harald Sack HPI-Potsdam / FIZ Karlsruhe & KIT SWIB16 Semantic Web in Libraries Conference 2016, 28-30. November 2016, Bonn, Germany | http://swib.org/ Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
WELCOME SWIB16 Semantic Web in Libraries yovisto SWIB16 Jörg Waitelonis Christian Hentschel Prof. Dr. Harald Sack What we do: Based in: I ntelligent Linked Data-, Ontology- and Metadata-Management August Bebel Str. 26-53 Knowledge Discovery & Knowledge Mining 14482 Potsdam Video- & Image Analysis, User Interfaces, Visualization Germany 2 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
Developed in cooperation of: ■ German National Library of Science and Technology (TIB), Hannover ■ Hasso-Plattner-Institute for IT-Systems Engineering (HPI), Potsdam Hosted and maintained by: ■ yovisto GmbH, Potsdam ■ flowworks GmbH, München (Asset Management, Playout) 3 http://av.tib.eu/ Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
> 8000 ■ Lectures, ■ Conference talks, ■ Interviews, ■ Simulations, ■ Visualizations, ■ Research Data for ■ Scientist, ■ Lecturers, ■ Teachers, ■ Learners 4 http://av.tib.eu/ Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal Users, Customers, Uploader TIB Curators Manage Metadata: Video Upload Search View ☑ approved DOI, QA, Right clearance Ingest Workflow Management Media Asset Management AV-Analysis: ■ Video Segmentation ■ Optical Character Recognition (OCR) ■ Speech-to-text (ASR) Streaming ■ Visual Concept Detection (VCD) Search Index Semantic ■ Context Modelling RDF Triplestore Analysis: ■ Named Entity Linking http://av.tib.eu/ 5 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal Semantic metadata analysis with Named Entity Linking Textual Metadata Authoritative: Non-authoritative: ■ Formal, descriptive, technical ■ E.g. ASR/OCR-transcripts ■ E.g. title, description, keywords, etc. ■ Automatically Extracted ■ Manually authored ■ Refers to fragments of the video (fine grained) Refers to entire video (coarse grained) ■ mapping mapping Knowledge base 63.356 GND subject headings GND = Gemeinsame Normdatei (Integrated authority file) Incl. English translations from mappings to DBpedia, LCSH, MACS and WTI Thesaurus [1] Sven Strobel, PalomaMarín-Arraiza: Metadata for Scientific Audiovisual Media: Current Practices and Perspectives of the 6 TIB|AV-Portal, In Proc. of Metadata and Semantics Research: 9th Research Conference, MTSR 2015, Manchester, UK, 2015, Springer Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
http://dx.doi.org/10.5446/357#t=49:03,53:58 7 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export Why RDF? How? ■ Vocabulary selection ■ Extensible ■ Problem: heterogeneous metadata ■ Different serialization forms authoritative, spatio-temporal, nested ○ Interoperable ■ annotations ■ Queryable (SPARQL) Vocab discovery -> http://lov.okfn.org/ ■ ■ W3C standard ■ Selection criterions [2] Jörg Waitelonis, Margret Plank, Harald Sack, TIB|AV-Portal: Integrating Automatically Generated Video Annotations 8 into the Web of Data, in Proc. of 20th International Conference on Theory and Practice of Digital Libraries (TPDL 2016) Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export Vocabulary Selection Issues: ■ Availability on the Web ■ Adequate meaning Openness Specificity ■ ■ ■ Level of complexity/richness ■ Datatypes ■ Maintained ■ Avoid contradictions, e.g. ■ Trustworthy authorship ■ Domain & range / sub- & super-class Usage by others / popularity Datatype vs. object properties ■ ■ ■ Documentation ■ Does it fit currently used models 9 cf. http://wiki.dublincore.org/index.php/Vocabulary_evaluation,_selection_and_re-use Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export Standard Metadata and Basic Structure DCMI Metadata Terms � http://purl.org/dc/terms/ ■ ■ DCMI Type Vocabulary � http://purl.org/dc/dcmitype/ ■ schema.org Vocabulary � http://schema.org/ ■ Friend of a Friend Vocabulary 0.1 � http://xmlns.com/foaf/ Bibframe Vocabulary � http://bibframe.org/vocab/ ■ tib:video/16453 schema:name "Wall-crossing and geometry at infinity of Betti moduli spaces"@en ; schema:description "Linear algebraic differential equation (in one variable) ..."@en ; schema:keywords "Betti moduli"@en , "chaos theory"@en, "singularity"@en ; schema:dateCreated "1973-01-01T00:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#gYear> ; schema:duration 1:16:48 ; rdf:type schema:Movie ; schema:url <https://av.tib.eu/media/16453> ; schema:producer gnd:4028361-6 ; schema:publisher tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 ; schema:license <http://creativecommons.org/licenses/by/3.0/deed.en> ; schema:availability schema:OnlineOnly ; bibframe:doi <http://dx.doi.org/10.5446/16453> ; schema:thumbnailUrl <https://av.tib.eu/images/avpimg1fdaede78b338bba137140fd805cd382> . 10 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export Spatio-temporal Metadata Open Annotation Data Model (OA) � http://w3.org/ns/oa# ■ ■ NLP Interchange Format (NIF) � http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# tib:video/16453#t=smpte-25:0:05:00:22,0:05:03:00 dcterms:isPartOf tib:video/16453 . tib:asr/16453_13753838_7522 oa:hasTarget tib:video/16453#t=smpte-25:0:05:00:22,0:05:03:00 ; oa:annotatedBy tib:annotator/ASR-1.0.0 ; rdf:type oa:Annotation ; oa:hasBody tib:asr/16453_13753838_7522#char=0,5617 . tib:asr/16453_13753838_7522#char=0,5617 rdf:type nif:Context ; rdf:type nif:RFC5147String ; nif:isString "... five sets ..." . tib:asr/16453_13753838_7522#char=4743,4747 nif:referenceContext tib:asr/16453_13753838_7522#char=0,5617 ; itsrdf:taIdentRef gnd:4038613-2 ; itsrdf:taAnnotatorsRef tib:annotator/NEL-1.0.0 ; rdf:type nif:Phrase ; rdf:type nif:String ; nif:beginIndex "4743" ; nif:beginIndex "4747" ; nif:anchorOf "sets" . 11 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export Open Annotation tib:video/16453#t=smpte-25:0:23:12:12,0:23:14:4 oa:hasTarget tib:annotator/ASR-1.0.0 tib:asr/16453_13753838 ao:Annotation oa:annotatedBy rdf:type oa:hasBody “... the astronaut …” gnd:11896416X 12 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal The Data Model & RDF-Export NLP Interchange Format (NIF) tib:video/16453#t=smpte-25:0:23:12:12,0:23:14:4 oa:hasTarget tib:annotator/ASR-1.0.0 tib:asr/16453_13753838 ao:Annotation oa:annotatedBy rdf:type oa:hasBody nif:Context tib:asr/16453_13753838#char=0,62 “... the astronaut …” rdf:type nif:isString nif:RFC5147String nif:referenceContext nif:String tib:asr/16453_13753838#char=23,32 “astronaut” rdf:type nif:anchorOf nif:Phrase itsrdf: nif: nif: itsrdf: taldentRef beginIndex endIndex taAnnotatorsRef http://av.tib.eu/opendata gnd:11896416X 23 32 tib:annotator/NEL-1.0.0 13 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal Data Quality Standard metadata Automatically created metadata ■ AV analysis ○ typical detection errors (ASR, OCR, etc.) use the verified ■ Concise manual ■ Semantic analysis information to verification and clearing ○ missing annotations improve subsequent by TIB subject specialists analysis wrong annotations ○ � � � ○ knowledgebase errors and insufficiencies 14 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
TIB|AV-Portal Title: "Lecture on Science and Creativity" Author: “Kroto, Harold” Data Quality: Video Text Recognition Improving OCR Extend OCR vocabulary (per video) with ■ subject specific terminology ■ terminology from manually verified metadata OCR detects “Kyoto” Before OCR: ■ extend the OCR language model & subsequent spell-correction with terms from authoritative metadata (e.g.: Creativity, Harold, Kroto, Lecture, Science) ➥ OCR now detects “Kroto”. 15 http://dx.doi.org/10.5446/15907 Jörg Waitelonis, yovisto GmbH, Semantic Web in Libraries Conference 2016, 28-30th November 2016, Bonn, Germany
Recommend
More recommend