a pdf storage backend for prot g
play

A PDF Storage Backend for Protg Henrik Eriksson Linkping University - PowerPoint PPT Presentation

A PDF Storage Backend for Protg Henrik Eriksson Linkping University Storage of the Pizza example pizza.owl.pprj ; Mon Feb 13 11:09:16 GMT 2006 ; ;+ (version "3.2") ;+ (build "Build 243") ([BROWSER_SLOT_NAMES] of


  1. A PDF Storage Backend for Protégé Henrik Eriksson Linköping University

  2. Storage of the Pizza example pizza.owl.pprj ; Mon Feb 13 11:09:16 GMT 2006 ; ;+ (version "3.2") ;+ (build "Build 243") ([BROWSER_SLOT_NAMES] of Property_List pizza.owl.pprj (properties [pizza_ProjectKB_Instance_25] [pizza_ProjectKB_Instance_26] [pizza_ProjectKB_Instance_27] <?xml version="1.0"?> [pizza_ProjectKB_Instance_28] <rdf:RDF [pizza_ProjectKB_Instance_29])) xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ([CLSES_TAB] of Widget xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" (is_hidden TRUE) xmlns:owl="http://www.w3.org/2002/07/owl#" (label "Classes") xmlns="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl#" (property_list [Instance_47]) xmlns:daml="http://www.daml.org/2001/03/daml+oil#" (widget_class_name "edu.stanford.smi.protege.widget.ClsesTab")) xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl"> ([FORMS_TAB] of Widget <owl:Ontology rdf:about=""> <protege:defaultLanguage rdf:datatype="http://www.w3.org/2001/XMLSchema#string" (is_hidden TRUE) >en</protege:defaultLanguage> (label "Forms") <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string" (property_list [Instance_85]) >version 1.3</owl:versionInfo> (widget_class_name "edu.stanford.smi.protege.widget.FormsTab")) <rdfs:comment xml:lang="en">An example ontology that contains all constructs required for the various versions of the Pizza Tutorial run by Manchester University (see http://www.co- ([Instance_1005] of Widget ode.org/resources/tutorials/)</rdfs:comment> <owl:imports rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/> (is_hidden FALSE) </owl:Ontology> (name "owl:Class") <owl:Class rdf:ID="VegetarianPizzaEquivalent2"> (property_list [XY_Instance_540]) <rdfs:comment xml:lang="en">An alternative to VegetarianPizzaEquiv1 that does not require a definition of (widget_class_name "edu.stanford.smi.protegex.owl.ui.widget.OWLFormWidget")) VegetarianTopping. Perhaps more difficult to maintain. Not equivalent to VegetarianPizza </rdfs:comment> <owl:equivalentClass> ([Instance_2201] of Integer <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> (integer_value 250) <owl:Class rdf:ID="Pizza"/> (name "ClsesTab.left_right")) <owl:Restriction> <owl:onProperty> ([Instance_2202] of Integer <owl:ObjectProperty rdf:ID="hasTopping"/> </owl:onProperty> (integer_value 400) <owl:allValuesFrom> (name "ClsesTab.left.top_bottom")) <owl:Class> <owl:unionOf rdf:parseType="Collection"> ([Instance_2469] of String <owl:Class rdf:ID="FruitTopping"/> <owl:Class rdf:ID="HerbSpiceTopping"/> (name "owl_file_language") <owl:Class rdf:ID="NutTopping"/> (string_value "RDF/XML-ABBREV")) <owl:Class rdf:ID="SauceTopping"/> <owl:Class rdf:ID="VegetableTopping"/> ([Instance_2470] of String <owl:Class rdf:ID="CheeseTopping"/> </owl:unionOf> (name "owl_namespace") </owl:Class> (string_value "http://owl.protege.stanford.edu")) </owl:allValuesFrom> </owl:Restriction> ([Instance_2531] of Property_List </owl:intersectionOf> ) </owl:Class> </owl:equivalentClass> ([Instance_2534] of Widget <rdfs:label xml:lang="pt">PizzaVegetarianaEquivalente2</rdfs:label> </owl:Class> (is_hidden FALSE) <owl:Class rdf:ID="PepperTopping"> (label "Metadata") <owl:disjointWith> Project and (property_list [Instance_2539]) <owl:Class rdf:ID="MushroomTopping"/> (widget_class_name "edu.stanford.smi.protegex.owl.ui.metadatatab.OWLMetadataTab")) </owl:disjointWith> <owl:disjointWith> <owl:Class rdf:ID="LeekTopping"/> </owl:disjointWith> ontology files <owl:disjointWith> <owl:Class rdf:ID="TomatoTopping"/> </owl:disjointWith> <owl:disjointWith> <owl:Class rdf:ID="GarlicTopping"/> </owl:disjointWith> 2006-07-25 2

  3. How do you package an ontology? Gift wrapping? • .owl j r p p . .pont .pins • Document packaging 2006-07-25 3

  4. Persistent storage in Protégé Voluminous Files • � Serialization Verbose � Protégé Frames: CLIPS-like/XML � Protégé OWL: XML-based Slow parsing & writing Databases • Multiple file (e.g., .pprj, .owl) There is a storage problem here 2006-07-25 4

  5. Background: Semantic Documents • Combining documents with knowledge representation � Like semantic web, but for “real” documents • Problem: Large amounts of information is available electronically, but it is � difficult to find the right information when the search query is complex, and � difficult to navigate content-rich information. Goal • � Semantic description of document content (i.e., a meta-model for documents) � Support for systematic authoring of complex electronic documents � Adding support for PDF to Protégé – a PDF tab for Protégé 2006-07-25 5

  6. One Document—Many Applications One format for all applications 6 2006-07-25

  7. Semantic Documents Knowledge representation • � Semantic web: OWL � Ontologies • Document models � Document Adobe’s Portable Document retrieval Format (PDF) Statistics � documents (PDF) Extensible Metadata Platform Semantic search (XMP) XMP markup XMP markup XMP markup � MS Word, RTF (?) Reasoning engine Report publication Functions • database Functions � Semantic search based on metadata � Reasoning, inference 2006-07-25 7

  8. PDFTab: Annotation tool for Protégé Annotation tool Protégé Adobe Acrobat (PDF) 2006-07-25 8

  9. Lightweight semantic documents Semantic documents are nice, but • � sometimes too heavy � advanced tools required (heavy) • The PDF backend provides � a new save method � a compact storage format � storage using standard PDF attachments � file access through standard PDF tools (e.g., Acrobat) 2006-07-25 9

  10. PDF Attachments Little known feature of PDF • • Just like e-mail attachments 2006-07-25 10

  11. The “Secrets” of the Portable Document Format (PDF) • Open and documented format Document (PDF) PDF files contain something • like a file system Objects � Indexing for fast random access � Streams Like the .doc format of MS Word • Extendible file layout � Custom additions Metadata Pages Different object and streams • with support for text, binary Index data, compression, and (xref) encryption 2006-07-25 11

  12. Internal PDF Structure Document Root/Catalog Pages Outlines Metadata Names Contents XMP Embedded files 2006-07-25 12

  13. Storage backend Inserting ontologies in documents 13 2006-07-25

  14. Experimental implementation New knowledge base format/project type • 2006-07-25 14

  15. Resulting PDF document 15 2006-07-25

  16. Scenarios Generated documents • PDF generation Protégé Document Ontology Testing & Validation development revising Save publication • Authored documents Authoring Editing PDF conversion Protégé Document save publication Ontology Testing & Validation development revising 2006-07-25 16

  17. Discussion Architecture for storage (packaging) formats • � Other formats possible � Examples: zip, tar, tgz, … • Implementation issues � Currently “research prototype” � API changes/additions/debugging required pdfbox, OWL plug-in, Protégé core • � One PDF kb format required for each major storage type • Example: PDF-Protégé-Frames, PDF-Protégé- OWL, PDF-Protégé-RDFS • Should really be separated in a general PDF filter (more API changes required) 2006-07-25 17

  18. Summary Semantic documents • � Combine printable documents with ontologies and knowledge bases � Combined documentation (human-readable) and reasoning (machine-readable) � One document with several applications • PDF storage backend � Lightweight semantic documents � Attaching ontology files to PDF documents � Straightforward access from Acrobat 2006-07-25 18

Recommend


More recommend