XML and Web Services Lecture 8 1 Outline • XML (Section 17) – XML syntax, semistructured data – Document Type Definitions (DTDs) – XML Schema • Introduction to XML based Web Services 2
Additional Readings on XML • XML – http://www.w3.org/XML/1999/XML-in-10-points – www.zvon.org/xxl/XMLTutorial/General/book_en.html – http://www.w3.org/TR/REC-xml-names (1/99) • Main source: www.w3.org (but hard to read) • Several XML tutorials on the Web 3 XML • eXtensible Markup Language • XML 1.0 – a recommendation from W3C, 1998 • Roots: SGML (used in publishing). – Standardized General Markup Language • After the roots: a format for sharing data 4
XML Data • Relational data does not have a syntax – I can � t “give” you my relational database – Need to import it from other syntax, like CSV (comma-separated-values) • XML = rich syntax for data – But XML is not relational: semi-structured • Usage: – Map any data to XML – Store it in files, exchange on the Web, Web services etc. – Even query it directly, using XPath, XQuery 5 XML Data Sharing and Exchange application application object-relational Integrate XML Data WEB (HTTP) Transform Warehouse application relational data legacy data Specific data management tasks 6
From HTML to XML HTML describes the layout 7 HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 8
XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the structure 9 XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…</book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> well formed XML document • if it has matching tags • tags are properly nested • single root element • and more constraints, e.g. on names 10
More XML: Attributes <book price = “55” currency = “EUR”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> attributes are alternative ways to represent data 11 XML example <?xml version='1.0' encoding='utf-8'?> <!-- A full XML Example --> <book price = “55” currency = “EUR”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> 12
More XML: Comments • Syntax <!-- .... Comment text... --> – Same syntax as in HTML • Yes, they are part of the data model !!! – Good documentation should be provided using comments • In particular for Web services (see later) 13 XML Data: a Tree ! Element node Attribute node data <data> <person id=“o555” > person <name> Mary </name> person <address> <street> Maple </street> <no> 345 </no> id <city> Seattle </city> address name </address> address name </person> phone o555 <person> <name> John </name> street no city Mary Thai <address> Thailand </address> John 23456 <phone> 23456 </phone> </person> Maple 345 Text Seattle </data> node Order matters !!! 14
More XML: IDs and References <person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> </person> Scope of IDs and references is the document 15 From Relational Data to XML Data XML: persons persons row row row phone name phone name phone name Name Phone “John” 3634 “Sue” 6343 “Dick” 6363 John 3634 <persons> Sue 6343 <row> <name>John</name> <phone> 3634</phone></row> Dick 6363 <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </persons> 16
XML Data • XML is self-describing • Schema elements become part of the data – Relational schema: persons(name,phone) – In XML <persons>, <name>, <phone> are part of the data, and are repeated many times • Consequence: XML is much more flexible – However, XML data is redundant! • XML = semi-structured data 17 Structured / Unstructured / Semi-Structured Data • Structured data – Organised in semantic chunks (entities) – Are grouped together, have same format • Unstructured data – Data can be of any type • Semi-structured data – Entities may not have the same attributes – Not all attributes may be required – Order of attributes not necessarily important – Nested and heterogeneous 18
Semi-structured Data Explained • Missing attributes: <person> <name> John</name> <phone>1234</phone> </person> <person> <name>Joe</name> � no phone ! </person> • Could represent in name phone a table with nulls John 1234 Joe - 19 Semi-structured Data Explained • Repeated attributes <person> <name> Mary</name> <phone>2345</phone> � two phones ! <phone>3456</phone> </person> • Impossible in tables: name phone ??? Mary 2345 3456 20
Semi-structured Data Explained • Attributes with different types in different objects � structured name ! <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person> • Nested collections (no 1NF) • Heterogeneous collections: – <db> contains both <book>s and <publisher>s 21 How to describe data types/schema? • In XML we basically have two options: – Document Type Description (DTD) • Since early days of XML • Allows for document validation • Limited support for data types – XML Schema • Allows for simple and complex data types • Has mainly replaced DTD 22
Document Type Definitions DTD • Part of the original XML specification • an XML document may have a DTD • XML document: well-formed = if tags are correctly closed valid = if it has a DTD and conforms to it • Validation is useful in data exchange 23 Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> 24
Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ... </company> 25 DTD: The Content Model <!ELEMENT tag ( CONTENT )> content • Content model: model – Complex = a regular expression over other elements – Text-only = #PCDATA – Empty = EMPTY – Any = ANY – Mixed content = (#PCDATA | A | B | C)* – * … 0 or many – + … at least one 26 – ? … 0 or 1
DTD: Regular Expressions DTD XML sequence <!ELEMENT name <name> <firstName> . . . . . </firstName> (firstName, lastName)) <lastName> . . . . . </lastName> </name> optional <name> <lastName> . . . . . </lastName> </name> <!ELEMENT name (firstName?, lastName)) <person> <name> . . . . . </name> star (repeated occurrence) <phone> . . . . . </phone> <phone> . . . . . </phone> <!ELEMENT person (name, phone*)) <phone> . . . . . </phone> . . . . . . </person> alternation <person> <name> . . . . . </name> <!ELEMENT person (name, (phone|email))) <email> . . . . . </email> 27 </person> DTD: Attributes • Document Type Definition mandatory <!ELEMENT person (ssn, name, office, phone?)> optional <!ATTLIST person age CDATA #REQUIRED "18" birthdate CDATA #IMPLIED nationality CDATA #FIXED "CH" default gender (male|female) "female"> • Document enumeration <person age="24" nationality="CH" gender="male"> <ssn> … </ssn> … <phone> … </phone> </person> 28
Recommend
More recommend