Flint – a format and file validation tool Alecs Geuder SCAPE Information Day British Library, UK, 14 th July 2014
Introducing Flint: Presentation Structure • Introduction • What does Flint do? • Flint-the-API • Policy-focused Validation • Flint-the-toolbox • Format-specific Implementations • How we are using it • Mini-demo
Introduction • Flint facilitates [file/format validation against a policy] • the code centres on individual file format modules (pdf, epub, ..) • Comes with a command line interface, GUIs and a hadoop mapreduce program
FLint – core features Set of internal & third Input file party tools of specific format code Format specific <checkresult file=“input file“ result=“passed” > Implementation PolicyAware < categoryA result=“passed”/> < categoryB result=“failed”/> (Uses schematron-utils) <testB.1 result=“failed”/> • canCheck <testB.2 result=“failed”/> • validationResult < categoryC result=“passed”/> • .. </checkresult> categoryC – two tests configuration Schematron Policy • categoryA – three tests • categoryB – two tests
The FLint ecosystem Entry points PDF CORE CLI EPUB Input file code DRM-detection PDF/EPUB GUIs config Geospatial data <checkResult> hadoop Format/Feature … specific Implementations
How we are using it • To deal with non print legal deposit What’s next • Add additional format/feature modules (geospatial, etc..)
Mini-demo
Recommend
More recommend