Christoph Böhme Analysis of library metadata with Metafacture 1 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Agenda 13:00 — a short introduction to Metafacture 13:30 — warm-up exercises 14:30 — triples and counting 15:00 — exercises on counting data (incl. 30 min coffee break at 15:30) 17:00 — joining data sets and analysing them 17:30 — exercises on joining data 18:50 — wrapping up 2 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Part 1 A short introduction to Metafacture 3 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Overview of Metafacture Stream module with Stream module with DSL * for constructing DSL * for constructing a DSL * for metadata a DSL * for metadata processing flows processing flows transformation transformation Flux Flux Metamorph Metamorph Building blocks for Building blocks for processing flows processing flows Stream modules Stream modules * DSL: Domain specific Language 4 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013 | Version 1.0
The basic building block of Metafacture Receives typed input: Receives typed input: Sends typed output: Sends typed output: • strings • strings • strings • strings • triples • triples • triples • triples • objects • objects • objects • objects • metadata events • metadata events • metadata events • metadata events Stream module Stream module Processes input to create Processes input to create some output. Modules some output. Modules usually perform rather small usually perform rather small tasks to foster reusability tasks to foster reusability 5 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013 | Version 1.0
A simple processing flow Read and print a file containing pica records: a string a string file name file name string string nothing nothing file handle file handle metadata events metadata events String open-file open-file as-lines as-lines decode-pica decode-pica encode-formeta encode-formeta write("stdout") write("stdout") file handle file handle metadata events metadata events string string a string for each line a string for each line 6 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013 | Version 1.0
Module configuration Module configuration • either a single mandatory value • or optional key-value pairs Stream module Stream module 7 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Describing flows with Flux A string as the A string as the initial input initial input "file.name" Key-value based Key-value based |open-file configuration configuration |as-lines |decode-pica Modules are Modules are connected with connected with |encode-formeta(style="multiline") a pipe character a pipe character |write("stdout"); Mandatory Mandatory Flow ends with a Flow ends with a parameter parameter semi-colon semi-colon 8 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Variables and comments in Flux Define default Define default values for the values for the variables in variables in in in and out and out out out default in = "file.name"; default out = "stdout"; Comments start Comments start Use variable instead Use variable instead in with two slashes with two slashes of directly entering a of directly entering a |open-file string string // ... |write(out); 9 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Running Flux scripts - Flux script must be selected in the IDE - Choose “Run with Flux” to execute the selected Flux script - “Flux Help” outputs a list of all supported modules 10 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Representation of metadata in Meta- facture: a stream of events Sequence of Sequence of metadata events metadata events Pica record Pica record Start record 2809 Start record 2809 decode-pica decode-pica 003 @ 003 @ Start entity 003@ Start entity 003@ $0 2809 $0 2809 Literal 0 : 2809 Literal 0 : 2809 033A 033A $n Publisher $n Publisher End entity End entity $p Location $p Location Start entity 033A Start entity 033A Literal n : Publisher Literal n : Publisher Literal p : Location Literal p : Location End entity End entity End record End record 11 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Processing metadata events with Metamorph morph morph rph rph Start record id Start record id Start entity 021A Start entity 021A Start record id Start record id Listen for 021A.a Literal a : The Trial Literal a : The Trial Literal Title : Literal Title : The Trial The Trial Output as Title End entity End entity End record End record End record End record 12 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Metamorph: data statements <? <?xml xml vers ersion ion="1.0" "1.0" enc encodi oding ng="U "UTF TF-8"?> ?> <m <meta etamo morph rph xml xmlns ns="h "http ttp:// ://www. ww.cu cultu ltureg regraph aph.o .org/ rg/met metamor morph ph" xm xmlns lns:x :xsi si="ht "http:/ p://w /www. ww.w3. w3.org/ rg/20 2001/ 01/XML XMLSche chema ma-ins instan tance" ce" ve versi rsion on="1" "1" en entit tityM yMark arker er="." ."> Separator for entities Separator for entities and literal names and literal names <rule ules> <data ata sour ource ce="021 021A. A.a" a" name name="Ti "Title" le" /> /> </rules </ rules> Name of the literal Name of the literal Name of the literal Name of the literal </metamorp </met amorph> h> to listen for to listen for that is output that is output 13 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Metamorph: modifying data .. ... <rule ules> <data ata sour ource ce="021 021A. A.a" a" name name="Ti "Title" le"> <re rege gexp match match="^ "^(The) (The) ( (.*)$" .*)$" for orma mat="${ ${2} 2}, $ ${1 {1}" }" /> /> </ </data data> Process the data value Process the data value </ </rules rules> before outputting it. You can before outputting it. You can specify multiple functions specify multiple functions here here ... ... 14 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Metamorph: combining data Name of the generated Name of the generated Literal value constructed Literal value constructed .. ... literal. It can include literal. It can include from the variables from from the variables from variables, too variables, too the data statements below the data statements below <rule ules> <comb ombin ine name name="Pu "Publ blish isher" er" value value="${ "${Pub Pub}: }: ${ ${Loc Loc}" }"> <data ata sour ource ce="033 033A. A.n" name name="Pu "Pub" /> /> <data ata sour ource ce="033 033A. A.p" p" name name="Loc Loc" /> /> </ </com combi bine ne> The data statements do not The data statements do not </rules </ rules> generate output but create generate output but create variables instead variables instead ... .. 15 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Exercises part 1 Warm-up 16 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Part 2 Triples and counting 17 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
The triple Triple: Triple: Subject Predicate Object Inspired by RDF triples but Inspired by RDF triples but subject und predicate do not subject und predicate do not need to be URIs need to be URIs 18 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Generating triples Triples stream-to-triples stream-to-triples Metadata events Literals Literals on top on top Start record id Start record id level level record-id name Klaus Literal name : Klaus Literal name : Klaus Start entity died Start entity died Entities Entities Literal when : 1401 Literal when : 1401 on top on top record-id died … level level Literal where : HH Literal where : HH End entity End entity Serialised Serialised End record End record with Formeta with Formeta 19 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Counting triples count 4 count-triples count-triples 2 count (countBy="object") (countBy="object") count 3 20 | 42 | Analysis of library metadata with Metafacture | SWIB 2013 | 25 November 2013
Recommend
More recommend