8 24 2010
play

8/24/2010 Advanced databases and data models: Internet Theme1: - PDF document

8/24/2010 Advanced databases and data models: Internet Theme1: Semi structured data Lena Strmbck June 17, 2009 1 What is the problem? In this course: The users effort is not enough for the task What are the particular


  1. 8/24/2010 Advanced databases and data models: Internet Theme1: Semi structured data Lena Strömbäck June 17, 2009 1 What is the problem? In this course: • The user’s effort is not enough for the task • What are the particular requirements for storing data on the web? • Why are traditional databases not enough? The data describes complex real world objects • • Explore technologies for datamanagement on the web. • Six themes • The data is not easily human interpretable – Semi structured data – Querying semistructured data • There is a need for integration and comparison of data – Efficient storage for XML – Object oriented data management – Semantic web: Ontologies and OWL – Data integration for the web 1

  2. 8/24/2010 Personell and Course Information: Today´s lecture Available at: www.ida.liu.se/~TDDD43 Introduction to semi-structured data Technologies XML/RDF Defining the data model Data model vs. Data guides Technologies DTD/XML Schema/RDF Schema Data modeling in XML Other DB models Semi-structured data Semi-structured data - properties Data is not just text, but is not as well-structured as data irregular structure in databases implicit structure Occurs often in web databanks partial structure Occurs often in integration of databanks 2

  3. 8/24/2010 OEM (Object Exchange Model) Semi-structured data - model Graph network of nodes Nodes: objects oid object model (oid) atomic or complex - atoms: integer, string, gif, html, … - value of a complex object is a set of object references (label, oid) Edges have labels OEM is used by a number of systems (ex. Lorel) Exercise 1 OEM example Represent the relations below using the OEM data model. Restaurant Guide 12 Guide restaurant restaurant cafe nearby zipcode 19 35 54 77 92310 nearby c_id name r_id name category name address c1 Linkoping category name address address price price category name r1 Hamlet c2 Norkoping r2 Normandie 17 13 14 66 18 23 25 55 79 80 r3 McDonald's gourmet Chef Chu Vietnamese Saigon Mountain Menlo Park cheap fast food Sandra Cities View Restaurants street city zipcode r_id c_id street 44 15 16 r1 c1 Storgatan El Camino Real Palo Alto 92310 r2 c1 St.Larsgatan nearby r3 c2 Kungsgatan Restaurants&Cities 3

  4. 8/24/2010 Technologies: XML and RDF Example Why not relational databases? Technologies: XML RDF Definition of datamodel: DTD XMLschema RDFSchema Semantic models: Ontoligies and OWL later in the course. Relational Relational model - drawbacks representation • Far from semi-structured proposal Compartment Reaction Reactant • Not suitable for descibing tree structure Id Name Id Name Reaction Species • Too general or many tables Blood Inblood Tocell Sugartocell ToCell Sug1 Cell Musclecell Move Makemovement ToCell Ins • Static – all attributes typed Move Sug2 • All data entries atomic – in principle Species Id Name Compartment Product Sug1 Sugar Blood Reaction Species Ins Insulin Blood ToCell Sug2 Sug2 Suga Cell Move En En Energy Cell 4

  5. 8/24/2010 XML representation RDF: R esource D escription F ramework <?xml version="1.0" encoding="UTF-8"?> •Ordered tree Framework for describing resources on the web <minimodel name="sugartransport" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Similar to semi-structured proposal xsi:noNamespaceSchemaLocation="minimodel.xsd"> Designed to be read and understood by computers <listOfCompartments> <compartment id="blood" name="inblood" /> Not designed for being displayed to people <compartment id="cell" name="musclecell" /> •Element vs. Attribute </listOfCompartments> <listOfSpecies> <species id="sug1" name="sugar" compartment="blood" /> <species id="ins" name="insulin" compartment ="blood"/> Written in XML <species id="sug2" name="sugar" compartment ="cell"/> •Extensible <species id="en" name="energy" compartment ="cell"/> RDF is a W3C Recommendation </listOfSpecies> New kinds of data can be integrated <listOfReactions> <reaction id="tocell" name="sugartocell"> <listOfReactants> <speciesReference species="sug1"/> <speciesReference species="ins"/> •Flexible </listOfReactants> Easy to mix different kinds of data <listOfProducts> <speciesReference species="sug2"/> </listOfProducts> </reaction> <reaction id="move" name="makemovement"> <listOfReactants> <speciesReference species="sug2"/> </listOfReactants> <listOfProducts> <speciesReference species="en"/> </listOfProducts> </reaction> </listOfReactions> </minimodel> RDF: R esource D escription F ramework RDF Data model: Triples <?xml version="1.0" encoding="UTF-8"?> A Resource is anything that can have a URI, such as our molecule "_506372 " <species metaid="_506372" id="E1" name="MAPKKK activator" compartment="compartment" initialConcentration="3e-05"> <annotation> A Property is a Resource that has a name, such as <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" “isVersionof" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> <rdf:Description rdf:about="#_506372"> A Property value is the value of a Property, such as <bqbiol:isVersionOf> <rdf:Bag> " IPR003577 " <rdf:li rdf:resource="http://www.ebi.ac.uk/interpro/#IPR003577"/> </rdf:Bag> (note that a property value can be another resource) </bqbiol:isVersionOf> </rdf:Description> </rdf:RDF> </annotation> Suitable for semi-structured data. </species> 5

  6. 8/24/2010 Part of our example model as RDF triples Semi-structured data - properties 1 blood #name "in blood" 3 sug1 #name "sugar in blood" Data model/guide changes commonly 4 sug1 #compartment blood Object can change type/class 11 st #name "sugartransport" 12 genid:A71987 #type Bag 13 st #reactants genid:A71987 The distinction between data and schema is blurred 14 genid:A71987 1 sug1 15 genid:A71987 2 ins 16 genid:A71988 #type #Bag 17 st #products genid:A7 1988 18 genid:A71988 1 sug2 Data Guides Semi-structured data – data models vs data guides A structural summary over a databank that is used as a dynamic schema a posteriori ’data guide’ versus a priori schema Is used in query formulation and optimization Data model/data guide could be supportive or a hinder while querying Is often created a posteriori Properties: Definition of data model for XML – DTD or XML Schema concise accurate Data model for RDF – RDF schema convenient 6

  7. 8/24/2010 Defining the XML model: DTD Defining the XML model: XML Schema A Document Type Definition (DTD) defines the legal building The XML Schema defines the legal building blocks of an blocks of an XML document. XML document. It defines the document structure with a list of legal elements An XML Schema: and attributes. defines elements defines attributes In the DTD all XML documents are one of: defines which elements are child elements Elements defines the order of child elements Attributes defines the number of child elements Entities defines data types for elements and attributes PCDATA defines default and fixed values for elements and CDATA attributes XML Schema vs. DTD RDF Schema – define relations between objects XML Schemas are extensible to future additions <?xml version="1.0" encoding="UTF-8"?> extend element definitions <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" XML Schemas are richer and more powerful than DTDs xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:ID="species"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> XML Schemas are written in XML </rdf:Description> <rdf:Description rdf:ID="protein"> XML Schemas support data types <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#species"/> </rdf:Description> XML Schemas support namespaces </rdf:RDF> 7

  8. 8/24/2010 Data modelling with XML Lab exercises: • Element s vs. Attributes Construct a data model in relational model and XML. • Keys • Many to many relations Answer questions, compare and write report. Tools: Oxygen XML and MS Server NoSQL – non relational databases Neo Examples: Neo4j is a graph database . It is an embedded, Document store: CouchDB, ApacheDB disk-based, fully transactional Java persistence XML database: Marklogic Server, eXist engine that stores data structured in graphs Graph: AllegroGraph, Neo4j rather than in tables. Object database: GemStone/S Key/value store on disk: BigTable Linköping related company. Eventually consistent key-value store: Cassandra Ordered Key-value store: Berkeley DB Interesting for semi-structured data. Tabular: BigTable, HyperTable, Hbase Tuple store: Apache River 8

Recommend


More recommend