The role of RDF in records management and archiving Graham Moore, Head of Product Development, SESAM @gra_moore, graham.moore@sesam.io
Why are we here…? I had been working on records management and archive related projects and separately Semantic Web technologies Started looking at Moreq2010 and RDF. I told Gunnar about some of this work. Gunnar was interested in this so initiated a small project so that I could focus more deeply on the issues, and widened the scope to include Noark5 and continuous archiving. This is a summary of the report produced by this project. 2
Research Goals Look at the role of RDF and related standards as the basis for: – Standardised descriptions of RM and Archive systems data structures – The definition of semantics – As a tool for interchange – The provision of continuous archiving – Unifying Noark with MoReq2010 – Demonstrating data driven standards definitions 3
Agenda Introduction to me What is an I.T. standard? Research Goals Introduction to RDF, RDFCL, SOF Modelling Noark5 and Moreq2010 in RDF Noark5 as Moreq2010 Using RDF and SDShare for continuous archiving Conclusion and Future work 4
Introduction to me Work on With LMG OData to Updated Met authors of SPARQL SDShare for SGML and better RDF, HyTime and Created W3C got group. brainwashed to Created work on Topic SDShare with RDF Maps Makx Dekkers, Net API Marc Kuster for CEN Southampton OData working TMAP University group in Oasis I Using SGML for interchange in an OO system. > 10 years of ISO meetings producing XTM 1.0, 1.1, TMCL, TMDM, TMSyntax 1999 2007 1996 2010 2011 2012 Implementing software based on standards Time 5
Work Themes Standards around generalised data – Structures, Semantics, Interchange, APIs The mentality of standardisation – What makes a good standard? – What should be standardised? – What is in and what is out? – When to standardise? 6
What do I think I have learnt? Less is more Ambiguity is a very bad Extension through data is better than making the spec bigger Conformance is tricky Building on top of others standards really helps 7
How do I see Noark and Moreq2010 I.T. standards upon which there is a great weight of expectation, and responsibility. We want these standards to capture structures and operations in ways that support the data and processes of records and archiving. These structures and operations should be constrained enough for conformity and reliability but not place unnecessary restrictions on implementations or domains of 9 application.
What makes an I.T. Standard? 10
Conformance Touch Points Data Models – Central, critical piece of any standard. – But you cannot test conformance to a data model – Provide guidance to implementers, conveys intent and if its wrong the next two things are wrong. Interchange Syntax – The way in which different systems can communicate. Standardising this is about unambiguously defining the way syntactic elements map to data model constructs. – Conformance is about detecting invalid syntactic and semantic structures. API – If it walks like a duck, quacks like a duck, and looks like a duck. It‟s a duck. – If you declare an API and make clear the semantics of each operation, any system that adheres to those expectations is compliant. – Used either for just conformance testing of a solution or to allow interchange of implementations. UI, Search, Reporting – These should never be in a standard. 11
Evolution of software Windows Desktop Application Windows Desktop Application Windows database Desktop Application Phone PC API database Tablet 12
Records Management and Archive Systems Meta- Architecture X-Format Archive Phone Records Aggregator API Management / RMS Tablet Protocol of System submission Model Model PC Validation Validation Conformance Suite 13
Research Goals Look at the role of RDF and related standards as the basis for: – The descriptions of RM and Archive systems data structures The definition of semantics – As a tool for interchange – The provision of continuous archiving – Unifying Noark andMoreq2010 • Seriously, why have two standards? – Demonstrating data driven standards definitions 14
Scope Investigative work Explore a wide and extreme scope Help define what work items are interesting in the future 15
BIG SCOPE 80 hours for Research, Report & Presentation 16
Introduction to RDF 17
Semantic technologies A family of standards from the W3C – same organization that does HTML, CSS, ... Goal: to enable semantic web – interchange of structured data for machines – not just documents for humans Grounded on open world assumptions – Anyone can „express‟ anything and anyone should be able to process and understand it. 18
RDF The core standard – defines the data model – all the other parts build on this Implemented in database products – known as triple stores – these often replace RDBMS databases when working with semantic technologies An unusual data model – schemaless – graph database – everything based on triples – all objects identified by URIs 19
RDF Very powerful data model with extensive use of URIs <subject> <predicate> <object> Or <thing> <property-type> <value> 20
URIs http://sesam.io/standards/moreq2010/aggregation Persistent, globally unique identifiers for Things, Concepts Particularly useful for types, and property types and controlled vocabularies. Helps ensure that everyone uses the same identifier for the same thing. Minted by authorities Address as well as identity 21
22
How RDF works „PERSON‟ table I NAME EMAIL D 1 Graham Moore graham.moore@ 2 Lars Marius larsga@bouvet.no Garshol 3 Axel Borge axel.borge@bouve RDF-ized data t SUBJECT PROPERTY OBJECT http://example.com/person/ 1 rdf:type ex:Person http://example.com/person/ 1 ex:name Graham Moore http://example.com/person/ 1 ex:email graham.moore@ http://example.com/person/ 2 rdf:type ex:Person http://example.com/person/ 2 ex:name Lars Marius Garshol ... ... ... 23
RDF is a graph foaf: rdfs:subClassOf Agent foaf: Person rdf:type foaf:name Graham Moore foaf:nick gra ../1 ex:works-for Bouvet 24
Two more things Datatypes – values can be typed with XML Schema datatypes – means values can be stored more efficiently – numbers sort as numbers – user-defined data types also possible Graphs – the database is divided into graphs – each graph is a set of triples identified by a URI – very, very useful for subdividing the database – we use one graph per data source 25
RDF Merging 1 1 6 5 3 1 1 5 3 6 26
RDF Tracking Data Origins RDF Store 1 5 3 Graph 1 1 6 Graph 2 1 1 Show me thing 1 => 5 3 6 27
Many Serialisation formats N-Triples – Simple line based format RDF/XML – RDF in XML – the original format – Truly horrific • Slowed RDF adoption Turtle – nice, human-readable text format – a bit more work to parse JSON-LD – RDF in JSON 28
Turtle example @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.org/> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . _:lmg a foaf:Person; foaf:name “ Graham Moore ”; foaf:nick “ gra ”; ex:works-for _:bouvet. foaf:Person a rdfs:Class; rdfs:label “Person”; rdfs:subClassOf foaf:Agent. 29
Benefit of RDF Serialisation In XML serialisation systems, the XML is nearly never the internal operation model So a lot of effort is spent describing how the model maps to the syntactic representation With RDF the serialisation is a direct serialisation of the data being used operationally. 30
SPARQL The RDF query language – lots of implementations Standardized protocol – HTTP-based and very simple – XML, JSON, RDF format for results Very like SQL in some ways – totally unlike in others Also has an update language – update, insert, delete, clear, ... 31
All persons sorted by name prefix foaf: <http://xmlns.com/foaf/0.1/> select ?person ?name where { ?person a foaf:Person . ?person foaf:name ?name . } order by ?name 32
Why RDF for RMS / Archive standards and systems? Data model is power and flexible Need to be able to combine core models with data from any domain URIs provide strong basis for common agreed identifiers in core and domain areas, i.e. what is the identifier for a vehicle registration date? All of this stuff is in data not in prose Interchange is for free Merging is for free Types and Property Types in DATA – extensibility is built in. 33
Research Activities Enabler Moreq2010 Noark5 RDF Model and Model and Constraint Constraints Constraints Language as RDF as RDF Demonstrate how operations Semantic can be defined in terms of the Operations RDF models Framework RDF Constraint Noark5 as Moreq2010 Language RDF Noark5 Domain Extension as Constraint RDF Language 34
RDF Constraint Language RDF family of standards includes RDFS and OWL These are designed to support an open world model that is primarily about inference. 35
Recommend
More recommend