Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases (http://www.ics.forth.gr/proj/isst/RDF) Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris Plexousakis Computer Science Department, University of Crete and Institute for Computer Science - FORTH Heraklion, Crete, Greece Karsten Tolle Johann Wolfgang Goethe University Frankfurt, Germany 1
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Motivation � The WWW is rapidly evolving into a conceptual structure assimilating � vast information resources of diverse nature (sites, documents, data, images etc.) � user communities (corporate, e-marketplaces, etc.) � description and brokering services � Large volumes of various types of metadata need to be managed to ensure: � fast deployment and � easy maintenance of large-scale applications for the Semantic Web 2
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Motivation � The Semantic Web and RDF � a standard representation language for resource descriptions with � a humanly readable / machine understandable syntax � enabling content syndication via superimposed resource descriptions � interpreted within or across communities using extensible schemata � Fact: several content providers and web portals already adopt RDF, thus giving rise to voluminous RDF description bases � Our thesis: take advantage of three decades of research in DB technology to support � declarative access and logical / physical independence for RDF description bases 3
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Outline � The Open Directory Portal: a case study � A Formal Data Model for RDF/S � The RDF Query Language (RQL) � Architecture � Core Middleware: � RDF Store, � Parser/Loader, � Query Interpreter � Testbed: the ODP RDF dump � Representative queries � Performance � Summary and Outlook 4
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete ODP Knowledge Catalog rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# typeOf (instance) rdfs: http://www.w3.org/2000/01/rdf-schema# subClassOf (isA) related property Class ns1: http://www.dmoz.org/topic.rdf ns2: www.oclc.org/dublincore.rdfs Recreation string Regional title Paris Lodging description Ext.Resource file_size Travel Vacation- Rentals integer last_modified Hotel Ile-de-France related date Hotel Directories title Disneyland &r2 &r4 &r3 &r1 description title title title description Official site of Disneyland Paris Site officiel de Pulitzer Opera SunScale Bedford Disneyland Paris 5
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete ODP Statistics � ODP Version: 16-01-2001 � 170 Mbytes of class hierarchies � 700 Mbytes of resource descriptions � 337,085 topics � 16 hierarchies with � max depth: 13 ( 6.86 on average) � max # subclasses: 314 ( 4.02 on average) � 2,342,978 URIs 6
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Resource Description Framework (RDF/S) � RDF: Resource Descriptions � Data Model: Directed Labeled Graphs � Nodes: Resources (URIs) or Litterals � Edges: Properties – Attributes or Relationships � Labels: Nodes (Class names) and Edges (Property names) � Statement: assertion of the form resource, property, value � Description: collection of statements concerning a resource � XML syntax � RDF Schema (RDFS): Schema Vocabularies � Specialization of both classes & properties (simple & multiple) � Multiple classification under several classes � Unordered, optional, and multi-valued properties � Domain and range polymorphism of properties 7
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete A Formal Data Model for RDF/S H Class Property < < � � � � N � � � � � � � � � � L � { } � � [ ] � � � � � C � � � T � P � � � � � � � � � � [[ . ]] [[ . ]] val {[val,val}] U V � � � � � � � � URI � � � � S literals resources containers 8
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The RDF Query Language RQL � Declarative query language for RDF description bases � relies on a typed data model (literal & container types + union types) � follows a functional approach (basic queries and filters) � adapts the functionality of semistructured or XML query languages to RDF, but also: � treats properties as self-existent individuals � exploits taxonomies of node and edge labels � allows querying of schemas as semistructured data � Relational interpretation of schemas & resource descriptions � Classes (unary relations) � Properties (binary relations) � Containers (n-ary relations) 9
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Portal Navigation with RQL � Browsing large description bases is cumbersome! � RQL provides powerful path expressions permitting filtering and navigation on both portal schemas and resource descriptions � E.g., to find (under the Regional ODP hierarchy) URI’s of hotels in Paris whose title matches “Opera” select Z from (select $X from Regional {:$X} where $X like “*Hotel*” and $X < Paris){Y}.{Z}title{T} where T like “*Opera*” 1 0
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The ICS-FORTH RDFSuite � The Validating RDF Parser (VRP): Karsten Tolle Diploma Thesis � The first RDF Parser supporting semantic validation of both resource descriptions and schemas � The RDF Schema Specific DataBase (RSSDB): Sophia Alexaki M.Sc. Thesis � The first RDF Store using schema knowledge to automatically generate an Object-Relational (SQL3) representation of RDF metadata and load resource descriptions � The RDF Query Language (RQL): Greg Karvournarakis M.Sc. Thesis � The first Declarative Language for uniformly querying RDF schemas and resource descriptions 11
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The RDFSuite Architecture ORDBMS ICS-VRP ICS-RQL Interpreter s I I P P A A s y a Parser Class Property n r r v Typing e o e a i t u J c_name domain p_name range d c LIB q a F Hotel Resource title Literal Graph n o JDBC F D u L Hotel Dir C++ f VRP Internal D R Constructor I F R SubClass SubProperty RDF Model g SQL3 P D SQL3 S n R S i subcl supcl subpr suppr M d + a B Hotel Dir Hotel 3 o Evaluation Validator D Parser L L Q Hotel title S URI source target creates paints creates 1 2
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Generic Representation Resources Triples uri: text id: int predid: int subid: int objid: int objvalue: text 1 http://www.dmoz.org/topics.rdfs#Hotel 6 2 1 2 http://www.dmoz.org/topics.rdfs#Hotel Directories 5 3 7 3 http://www.oclc.org/dublincore.rdfs#title 5 1 8 4 http://www.dmoz.org/schema.rdf#Ext.Resource 5 9 2 5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 3 9 SunScale http://www.w3.org/2000/01/rdf-schema#subClassOf 6 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property 8 http://www.w3.org/2000/01/rdf-schema#Class 9 r1 1 3
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Specific Representation Namespace Type id:int uri: text id: int nsid: int lpart: text 1 http://www.w3.org/2000/01/rdf-schema# 1 1 Resource 2 http://www.w3.org/1999/02/22-rdf-syntax-ns# 2 2 Bag 3 http://www.oclc.org/dublincore.rdfs# 3 2 Seq 4 http://www.dmoz.org/topics.rdfs# 4 String Property Class id: int nsid: int lpart: text id: int nsid: int lpart: text domainid: int rangeid: int 11 5 Ext.Resource 14 3 title 1 4 12 4 Hotel 15 3 description 1 4 13 4 Hotel Directories 16 5 title 11 4 SubClass SubProperty subid: int superid: int subid: int superid: int t1 11 1 16 14 12 1 URI: text Instances 13 12 uri: text t11 classid: int t12 t14 r1 11 URI: text URI: text source: text target: text r2 11 r1 r2 t15 r1 SunScale r1 13 r2 source: text target: text r2 Pulitzer Opera t13 r2 12 t16 URI: text source: text target: text r1 subtable 1 4
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete DBMS Size vs. Schema Triples � DBMS size scales linearly with the number of schema triples SpecRepr GenRepr Aver. triple 0.086 KB 0.1582 KB size (with (0.1734 KB) (0.3062 KB ) indexes) Aver. triple 0.0021 sec 0.0025 sec storage time (0.0025) sec (0.0032 sec) (with indexes) 1 5
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete Graph 2: DBMS Size vs. Data Triples � DBMS size scales linearly with the number of data triples SpecRepr GenRepr Aver. triple size 0.123 KB 0.123 KB (with indexes) (0.2566 KB) (0.2706 KB ) Aver. triple 0.0033 sec 0.0039 sec storage time (0.0043) sec (0.00457 sec) (with indexes) 1 6
Recommend
More recommend