1
Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, - - PowerPoint PPT Presentation
Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, - - PowerPoint PPT Presentation
Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios, DTDs) Usage scenarios, DTDs) 1 History: SGML vs. HTML vs. History: SGML vs. HTML vs. XML XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
2
History: SGML vs. HTML vs. History: SGML vs. HTML vs. XML XML
SGML (1960)
XML(1996) HTML(1990) XHTML(2000) http://www.w3.org/TR/2006/REC-xml-20060816/
3
Why XML ? Why XML ?
HTML is to be interpreted by browsers
HTML is to be interpreted by browsers
Shown on the screen to a human
Shown on the screen to a human
Desire to separate the “content” from
Desire to separate the “content” from “presentation” “presentation”
Presentation has to please the human eye
Presentation has to please the human eye
Content can be interpreted by machines, for
Content can be interpreted by machines, for machines presentation is a handicap machines presentation is a handicap
Semantic markup of the data
Semantic markup of the data
4
Information about a book in Information about a book in HTML HTML
<td><h1 class=” <td><h1 class=”Books Books"> ">Politics of experience by Ronald Laing, Politics of experience by Ronald Laing, published in 1967 published in 1967</h1></td><td align="right" nowrap> Item </h1></td><td align="right" nowrap> Item number:320070381076</td><td align="right" valign="top"><img number:320070381076</td><td align="right" valign="top"><img src="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gi src="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gi f" width="8" height="8"></td></tr><tr><td colspan="6" f" width="8" height="8"></td></tr><tr><td colspan="6" valign="middle" bgcolor="#5F66EE"><img valign="middle" bgcolor="#5F66EE"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="4"></td></tr></table><table width="100%" border="0" height="4"></td></tr></table><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td cellpadding="0" cellspacing="0"><tr><td bgcolor="#CCCCFF"><img bgcolor="#CCCCFF"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="1"></td><td bgcolor="#EEEEFF"><div height="1"></td><td bgcolor="#EEEEFF"><div id="FastVIPBIBO"><table border="0" cellpadding="0" id="FastVIPBIBO"><table border="0" cellpadding="0" cellspacing="0" width="100%"> cellspacing="0" width="100%">
5
The same information in XML The same information in XML
< <book book year year=“1967”> =“1967”> < <title title>Politics of experience</ >Politics of experience</title title> > < <author author> > < <firstname firstname>Ronald</ >Ronald</firstname firstname> > < <lastname lastname>Laing</ >Laing</lastname lastname> > </ </author author> > </ </book book> >
Elements
- Information is (1) decoupled from presentation, then (2)
chopped into smaller pieces, and then (3) marked with semantic meaning
- It can be processed by machines
- Like HTML, only syntax, not logical abstract data model
6
XML key concepts XML key concepts
Documents
Documents
Elements
Elements
Attributes
Attributes
Namespace declarations
Namespace declarations
Text
Text
Comments
Comments
Processing Instructions
Processing Instructions
All inherited from SGML, then HTML
All inherited from SGML, then HTML
7
The key concepts of XML The key concepts of XML
< <book book year year=“1967”> =“1967”> < <title title>Politics of experience</ >Politics of experience</title title> > < <author author> > < <firstname firstname>Ronald</ >Ronald</firstname firstname> > < <lastname lastname>Laing</ >Laing</lastname lastname> > </ </author author> > </ </book book> >
Elements
- Documents
- Elements
- Attributes
- Text
- Nested structure
- Conceptual tree
- Order is important
- Only “characters”, not integers, etc
8
Elements Elements
Enclosed in Tags
Enclosed in Tags
Begin Tag: e.g.,
Begin Tag: e.g., <bibliography> <bibliography>
End Tag: e.g.,
End Tag: e.g., </bibliography> </bibliography>
Element without content: e.g.,
Element without content: e.g., <bibliography /> <bibliography /> is a is a shorthand for shorthand for <bibliography> </bibliography> <bibliography> </bibliography>
Elements can be nested
Elements can be nested
<bib> <bib> <book> Wilde Wutz </book> <book> Wilde Wutz </book> </bib> </bib>
Subelements can implement multisets
Subelements can implement multisets
<bib> <bib> <book> ... </book> <book> ... </book> <book> ... </book> <book> ... </book> </bib> </bib>
Order is important !
Order is important !
Documents must be well-formed
Documents must be well-formed
<a> <a> <b> <b> </a> </a> </b> </b> is forbidden! is forbidden! <a> <a> <b> </b> <b> </b> is forbidden! is forbidden!
9
Attributes Attributes
Attribute are associated to Elements
Attribute are associated to Elements
<book price = „55“ year = „1967“ <book price = „55“ year = „1967“ > > <title> ... </title> <title> ... </title> <author> ... </author> <author> ... </author> </book> </book>
Elements can have only attributes
Elements can have only attributes
<person name = „Wutz“ age = „33“/> <person name = „Wutz“ age = „33“/>
Attribute names must be unique! (No Multisets)
Attribute names must be unique! (No Multisets)
<person name = „Wilde“ name = „Wutz“/> <person name = „Wilde“ name = „Wutz“/> is illegal! is illegal!
What is the difference between a nested element
What is the difference between a nested element and an attribute? Are attributes useful? and an attribute? Are attributes useful?
Modeling decision: should „name“ be an attribute
Modeling decision: should „name“ be an attribute
- r a subelement of a person ? What about „age“ ?
- r a subelement of a person ? What about „age“ ?
10
Text and Mixed Content Text and Mixed Content
Text appears in element content
Text appears in element content
<title>
<title>The politics of experience The politics of experience</title> </title>
Can be mixed with other subelements
Can be mixed with other subelements
<title>
<title>The politics of <em>experience</em> The politics of <em>experience</em></title> </title>
Mixed Content
Mixed Content
For „documents“ data -- very useful
For „documents“ data -- very useful
The need does not arise in „data“ processing, only entities
The need does not arise in „data“ processing, only entities and relationships and relationships
People speak in sentences, not entities and relationships.
People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by while adding semantic markup that can be interpreted by machines. machines.
11
Continuous spectrum between Continuous spectrum between natural language, semi-structured natural language, semi-structured data, and structured data data, and structured data
1. 1.
Dana said that the book entitled Dana said that the book entitled „The politics of „The politics of experience“ is really excellent ! experience“ is really excellent !
2. 2.
<citation author=„Dana“> <citation author=„Dana“> The book entitled The book entitled „The „The politics of experience“ is really excellent ! politics of experience“ is really excellent ! </citation> </citation>
3. 3.
<citation author=„Dana“> <citation author=„Dana“> The book entitled The book entitled <title> <title> The politics of experience The politics of experience</title> </title> is really excellent ! is really excellent ! </citation> </citation>
4. 4.
<citation> <citation>
<author> <author>Dana Dana</author> </author> <aboutTitle> <aboutTitle>The politics of experience The politics of experience</aboutTitle> </aboutTitle> <rating> <rating> excellent excellent</rating> </rating>
</citation> </citation>
12
CDATA sections CDATA sections
Sometimes we would like to preserve the
Sometimes we would like to preserve the
- riginal characters, and not interpret them as
- riginal characters, and not interpret them as
markup markup
CDATA sections
CDATA sections
Not parsed as XML
Not parsed as XML
<message>
<message>
<greeting>Hello,world!</greeting> <greeting>Hello,world!</greeting> </message> </message>
<message>
<message> <![CDATA[<greeting>Hello, <![CDATA[<greeting>Hello, world!</greeting>]]> world!</greeting>]]> </message> </message>
13
Comments, PIs, Prolog Comments, PIs, Prolog
Comment: Syntax as in HTML
Comment: Syntax as in HTML
<!-- this is a comment --> <!-- this is a comment -->
Processing Instructions
Processing Instructions
Contain no data - interpretation by processor
Contain no data - interpretation by processor
Syntax:
Syntax: <?pause 10 secs ?> <?pause 10 secs ?>
Pause is
Pause is „Target“; „Target“; 10secs 10secs is „Content“ is „Content“
XML
XML is a reserved target for prolog is a reserved target for prolog
Prolog
Prolog
<?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?> <?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?>
Standalone defines whether there is a DTD
Standalone defines whether there is a DTD
Encoding is usually Unicode.
Encoding is usually Unicode.
14
Whitespaces declaration Whitespaces declaration
Whitespace = Continuous sequence of
Whitespace = Continuous sequence of Space Space, , Tab Tab and and Return Return character character
Special Attribute
Special Attribute xml:space xml:space to control use to control use
Human-readible XML (with Whitespace)
Human-readible XML (with Whitespace)
<book <book xml:space=„preserve“ xml:space=„preserve“ > > <title>The politics of experience</title> <title>The politics of experience</title> <author>Ronald laing</author> <author>Ronald laing</author> </book> </book>
(Efficient) machine-readible XML (no WS)
(Efficient) machine-readible XML (no WS) <book
<book xml:space=„default“ xml:space=„default“ ><title>The politics of ><title>The politics of experience</title><author>Ronald experience</title><author>Ronald Laing</author></book> Laing</author></book>
Performance improvement: ca. Factor 2.
Performance improvement: ca. Factor 2.
15
Language declaration Language declaration
<p
<p xml:lang="en"> xml:lang="en">The quick The quick brown fox jumps over the lazy brown fox jumps over the lazy dog.</p> dog.</p>
<p
<p xml:lang="en-GB"> xml:lang="en-GB">What colour What colour is it?</p> is it?</p>
<p
<p xml:lang="en-US"> xml:lang="en-US">What color What color is it?</p> is it?</p>
16
Universal Resource Identifiers Universal Resource Identifiers
- n the Web
- n the Web
URLs, URIs, IRIs
URLs, URIs, IRIs
URL (Universal Resource Locators):
URL (Universal Resource Locators): deferenceable deferenceable identifier on the Web identifier on the Web
The target of an URL pointer is an HTML file (virtual or
The target of an URL pointer is an HTML file (virtual or materialized) materialized)
URIs (Unique Resource Identifier):
URIs (Unique Resource Identifier): general purpose key general purpose key to resources on the Web to resources on the Web
Uniquely identifies a resource
Uniquely identifies a resource
Target is not an HTML file, can be anything (schema, table, file,
Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) entity, object, tuple, person, physical item, etc)
Lifetime and scope of this “key” is user dependent
Lifetime and scope of this “key” is user dependent
IRI (Internationalized Resource Identifiers)
IRI (Internationalized Resource Identifiers)
Allow non Latin characters (Chinese, Arabic, Japanese, etc)
Allow non Latin characters (Chinese, Arabic, Japanese, etc)
URL, URI, IRIs
URL, URI, IRIs
All strings
All strings
Very LONG strings
Very LONG strings
17
Namespaces Namespaces
Integration of Data from diverse data sources
Integration of Data from diverse data sources
Integration of different XML Vocabularies (aka Namespaces)
Integration of different XML Vocabularies (aka Namespaces)
Each „vocabulary“ has a unique key, identified by a URI/IRI
Each „vocabulary“ has a unique key, identified by a URI/IRI
Same local name, from different vocabularies can have Same local name, from different vocabularies can have
Different meaning
Different meaning
Different structure associated with it
Different structure associated with it
Qualified Names (Qname) to attach a „name“ to its
Qualified Names (Qname) to attach a „name“ to its „vocabulary“ „vocabulary“
for all nodes in an XML document that has names (Attributes, Elements,
for all nodes in an XML document that has names (Attributes, Elements, Pis Pis
QName
QName ::= triple ( URI ::= triple ( URI [ prefix: ] [ prefix: ] localname ) localname )
Binding (prefix, URI) is introduced in elements start tag
Binding (prefix, URI) is introduced in elements start tag
Later only the prefix is used, not the long URIs
Later only the prefix is used, not the long URIs
Prefix is optional, default namespaces
Prefix is optional, default namespaces
Prefix and localname a separated by „:“
Prefix and localname a separated by „:“
„
„http://w3.org/TR/1999/REC-xml-names“ http://w3.org/TR/1999/REC-xml-names“
18
Namespaces (cont) Namespaces (cont)
Namespace definitions look like Attributes
Namespace definitions look like Attributes
Identified by „xmlns:prefix“ or „xmlns“ (default)
Identified by „xmlns:prefix“ or „xmlns“ (default)
Bind the Prefix to the URI
Bind the Prefix to the URI
Scope is the entire element where the
Scope is the entire element where the namespace is declared namespace is declared
Includes the element itslef, its attributes and ist
Includes the element itslef, its attributes and ist subtrees subtrees
Example
Example
< <ns: ns:a a xmlns:ns=„someURI“ ns: xmlns:ns=„someURI“ ns:b=„foo“> b=„foo“> < <ns: ns:b>content</ b>content</ns ns:b> :b>
</ </ns: ns:a> a>
19
Default namespaces Default namespaces
Default namespaces, no prefix
Default namespaces, no prefix
<a xmlns=„someURI“ > <a xmlns=„someURI“ > <b/> <!-- a and b are in the someURI namespace! --> <b/> <!-- a and b are in the someURI namespace! --> </a> </a>
Only applies to subelements, not attributes
Only applies to subelements, not attributes
<a xmlns=„someURI“ <a xmlns=„someURI“ c = „not in someURI c = „not in someURI namespace“ namespace“> > <b/> <!-- a and b are in the someURI namespace! --> <b/> <!-- a and b are in the someURI namespace! --> </a> </a>
20
Example: Namespaces Example: Namespaces
DQ1 defines
DQ1 defines dish dish for for china china
Diameter, Volume, Decor, ...
Diameter, Volume, Decor, ...
DQ2 defines
DQ2 defines dish dish for for satellites satellites
Diameter, Frequency
Diameter, Frequency
How many „dishes“ are there?
How many „dishes“ are there?
Better ask for:
Better ask for:
„
„How many How many dishes dishes are there?“ are there?“
- r
- r
„
„How many How many dishes dishes are there are there?“ ?“
21
Example: Namespaces Example: Namespaces
<gs:dish <gs:dish xmlns:gs = „http://china.com“ xmlns:gs = „http://china.com“ > > <gs:dm gs:unit = „cm“> <gs:dm gs:unit = „cm“>20 20</gs:dm> </gs:dm> <gs:vol gs:unit = „l“> <gs:vol gs:unit = „l“>5 5</gs:vol> </gs:vol> <gs:decor> <gs:decor>Meissner Meissner</gs:decor> </gs:decor> </gs:dish> </gs:dish> <sat:dish <sat:dish xmlns:sat = „http://satelite.com“ xmlns:sat = „http://satelite.com“ > > <sat:dm> <sat:dm>200 200</sat:dm> </sat:dm> <sat:freq> <sat:freq>20-2000MHz 20-2000MHz</sat:freq> </sat:freq> </sat:dish> </sat:dish>
22
Mixing Several Namespaces Mixing Several Namespaces
< <gs:dish xmlns:gs = „http://china.com“ gs:dish xmlns:gs = „http://china.com“ xmlns:uom = „http://units.com“> xmlns:uom = „http://units.com“> < <gs:dm gs:dm uom:unit = „cm“> uom:unit = „cm“>20 20< </gs:dm /gs:dm> > < <gs:vol gs:vol uom:unit = „l“> uom:unit = „l“>5 5< </gs:vol /gs:vol> > < <gs:decor gs:decor> >Meissner Meissner< </gs:decor /gs:decor> > <comment> <comment>This is an unqualified element name This is an unqualified element name</comment> </comment> < </gs:dish /gs:dish> >
23
Example XML data Example XML data
XHTML (browser/presentation)
XHTML (browser/presentation)
RSS (blogs)
RSS (blogs)
UBL (Universal Business Language)
UBL (Universal Business Language)
HealthCare Level 7 (medical data)
HealthCare Level 7 (medical data)
XBRL (financial data)
XBRL (financial data)
Digital photography metadata (XMP)
Digital photography metadata (XMP)
XMI (metadata)
XMI (metadata)
XQueryX (programs)
XQueryX (programs)
XForms (forms)
XForms (forms)
SOAP (message envelopes)
SOAP (message envelopes)
Microsoft Office -- Powerpoint in XML
Microsoft Office -- Powerpoint in XML (documents) (documents)
24
XHTML XHTML
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
25
RSS, blogs RSS, blogs
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.xml.com/xml/news.rss"> <title>XML.com</title> <link>http://xml.com/pub</link> <description> XML.com features a rich mix of information and services for the XML community. </description> <image rdf:resource="http://xml.com/universal/images/xml_tiny. gif" /> <items> <rdf:Seq> <rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" /> <rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.htm l" /> </rdf:Seq> </items> <textinput rdf:resource="http://search.xml.com" /> </channel> <image rdf:about="http://xml.com/universal/images/xml_tiny.gif "> <title>XML.com</title> <link>http://www.xml.com</link> <url>http://xml.com/universal/images/xml_tiny.gif</url> </image>
26
UBL (Universal Business UBL (Universal Business Language) Language)
Vocabularies definitions for:
Vocabularies definitions for:
ApplicationResponse•AttachedDocument•BillOfLading
ApplicationResponse•AttachedDocument•BillOfLading
- Catalogue•CatalogueDeletion•CatalogueItemSpecific
- Catalogue•CatalogueDeletion•CatalogueItemSpecific
ationUpdate•CataloguePricingUpdate•CatalogueRequ ationUpdate•CataloguePricingUpdate•CatalogueRequ est•CertificateOfOrigin•CreditNote•DebitNote•Despatc est•CertificateOfOrigin•CreditNote•DebitNote•Despatc hAdvice•ForwardingInstructions•FreightInvoice•Invoic hAdvice•ForwardingInstructions•FreightInvoice•Invoic e•Order•OrderCancellation•OrderChange•OrderResp e•Order•OrderCancellation•OrderChange•OrderResp
- nse•OrderResponseSimple•PackingList•Quotation•R
- nse•OrderResponseSimple•PackingList•Quotation•R
eceiptAdvice•Reminder•RemittanceAdvice•RequestF eceiptAdvice•Reminder•RemittanceAdvice•RequestF
- rQuotation•SelfBilledCreditNote•SelfBilledInvoice•St
- rQuotation•SelfBilledCreditNote•SelfBilledInvoice•St
atement•TransportationStatus•Waybill atement•TransportationStatus•Waybill
27
HealthCareLevel 7 HealthCareLevel 7
Medical information that is being exchanged
Medical information that is being exchanged between hospitals, patients, doctors, between hospitals, patients, doctors, pharmacies and insurance companies pharmacies and insurance companies
http://en.wikipedia.org/wiki/HL7
http://en.wikipedia.org/wiki/HL7
28
XBRL (Financial information) XBRL (Financial information)
Goal: facilitate the exchange of business
Goal: facilitate the exchange of business and financial performance information and financial performance information between companies, governments, between companies, governments, insurance companies, banks, etc. insurance companies, banks, etc.
Mandate by law in many countries
Mandate by law in many countries
http://en.wikipedia.org/wiki/XBRL
http://en.wikipedia.org/wiki/XBRL
29
Extensible Metadata Platform Extensible Metadata Platform (XMP) (XMP)
Used in
Used in PDF PDF, , photography photography and and photo editing photo editing applications. applications.
Particular
Particular schemas schemas for basic properties useful for for basic properties useful for recording the history of a resource as it passes recording the history of a resource as it passes through multiple processing steps, from being through multiple processing steps, from being photographed, photographed, scanned scanned, or authored as text, , or authored as text, through photo editing steps (such as through photo editing steps (such as cropping cropping or
- r
color adjustment), to assembly into a final image. color adjustment), to assembly into a final image.
XMP allows each software program or device along
XMP allows each software program or device along the way to add its own information to a digital the way to add its own information to a digital resource, which can then be retained in the final resource, which can then be retained in the final digital file. digital file.
http://en.wikipedia.org/wiki/Extensible_Metadat
http://en.wikipedia.org/wiki/Extensible_Metadat a_Platform a_Platform
30
Microsoft Office in XML Microsoft Office in XML
Office 2003 was able to import/export all
Office 2003 was able to import/export all documents into XML documents into XML
Office 2007 models the documents NATIVELY
Office 2007 models the documents NATIVELY in XML in XML
Examples of vocabularies and schemas:
Examples of vocabularies and schemas:
WordprocessingML (the XML file format for
WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003) and DataDiagramingML (Visio 2003)
31
Forms on the Web in XML Forms on the Web in XML
XML Forms (Xforms)
XML Forms (Xforms)
http://www.w3.org/TR/xforms/
http://www.w3.org/TR/xforms/
<xforms:model> <xforms:instance>
<xforms:model> <xforms:instance> <ecommerce xmlns=""> <method/> <ecommerce xmlns=""> <method/> <number/> <expiry/> <number/> <expiry/> </ecommerce> </xforms:instance> </ecommerce> </xforms:instance> <xforms:submission <xforms:submission action="http://example.com/submit" action="http://example.com/submit" method="post" id="submit" method="post" id="submit" </xforms:model> </xforms:model>
32
Programs and queries in XML Programs and queries in XML
XQuery, the XML query language, has an XML
XQuery, the XML query language, has an XML representation representation
Programs and queries are also DATA
Programs and queries are also DATA
Blurring the distinction between data, metadata, code
Blurring the distinction between data, metadata, code
<xqx:functionName>distinct</xqx:functionName> <xqx:functionName>distinct</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> xsi:type="xqx:stringConstantExpr"> <xqx:value>http://www.bn.com</xqx:value> <xqx:value>http://www.bn.com</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:expr> <xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:elementTest> <xqx:elementTest> <xqx:nodeName> <xqx:nodeName> <xqx:QName>author</xqx:QName> <xqx:QName>author</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:stepExpr> </xqx:expr> </xqx:expr>
33
SOAP and Web Services SOAP and Web Services
Web Services is the favorite way of exchanging
Web Services is the favorite way of exchanging information between applications information between applications
XML exchange over HTTP, with a specific protocol
XML exchange over HTTP, with a specific protocol (SOAP) (SOAP)
<?xml version='1.0' ?><env:Envelope <?xml version='1.0' ?><env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation <env:Header> <m:reservation xmlns:m="http://travelcompany.example.org/reservation" xmlns:m="http://travelcompany.example.org/reservation" env:role="http://www.w3.org/2003/05/soap- env:role="http://www.w3.org/2003/05/soap- envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid:093a2da1-q345-739r-ba5d- <m:reference>uuid:093a2da1-q345-739r-ba5d- pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11- pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11- 29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> 29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> <n:passenger <n:passenger xmlns:n="http://mycompany.example.com/employees" xmlns:n="http://mycompany.example.com/employees" env:role="http://www.w3.org/2003/05/soap- env:role="http://www.w3.org/2003/05/soap- envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> </env:Header> <env:Body/> </env:Envelope> </env:Header> <env:Body/> </env:Envelope>
34
The need for XML “schemas” The need for XML “schemas”
Unlike any other data format, XML is totally flexible,
Unlike any other data format, XML is totally flexible, elements can be nested in arbitrary ways elements can be nested in arbitrary ways
We can start by writing the XML data -- no need for
We can start by writing the XML data -- no need for a priori design of a schema a priori design of a schema
Think relational databases, or Java classes
Think relational databases, or Java classes
However, schemas are necessary:
However, schemas are necessary:
Facilitate the writing of applications that process data
Facilitate the writing of applications that process data
Constraint the data that is correct for a certain application
Constraint the data that is correct for a certain application
Have a priori agreements between parties with respect to
Have a priori agreements between parties with respect to the data being exchanged the data being exchanged
Schema: a model of the data
Schema: a model of the data
Structural definitions
Structural definitions
Type definitions
Type definitions
Defaults
Defaults
35
History and role of XML Schema History and role of XML Schema Languages Languages
Several standard Schema Languages
Several standard Schema Languages
DTDs, XML Schema, RelaxNG
DTDs, XML Schema, RelaxNG
Schema languages have been designed after, and in
Schema languages have been designed after, and in an orthogonal fashion, to XML itself an orthogonal fashion, to XML itself
Schemas and data are completely decoupled in XML
Schemas and data are completely decoupled in XML
Data can exist with or without schemas
Data can exist with or without schemas
Or with multiple schemas
Or with multiple schemas
Schema evolutions rarely impose evolving the data
Schema evolutions rarely impose evolving the data
Schemas can be designed before the data, or extracted from
Schemas can be designed before the data, or extracted from the data (DataGuide -- Stanford) the data (DataGuide -- Stanford)
Makes XML the right choice for manipulating semi-
Makes XML the right choice for manipulating semi- structured data, or rapidly evolving data, or highly structured data, or rapidly evolving data, or highly customizable data customizable data
36
DTDs DTDs
Inherited from SGML
Inherited from SGML
Part of the original XML 1.0 specification
Part of the original XML 1.0 specification
Describe the “grammar” of the XML file
Describe the “grammar” of the XML file
Element declarations:
Element declarations: how elements are allowed to nest how elements are allowed to nest within each other by rules and constraints within each other by rules and constraints
Attributes lists:
Attributes lists: describe what attributes are allowed on describe what attributes are allowed on which element which element
Some constraints on the value of elements and
Some constraints on the value of elements and attributes attributes
Which is the root element of the XML file
Which is the root element of the XML file
Checking the structural constraints:
Checking the structural constraints: DTD DTD validation validation (valid vs. invalid documents) (valid vs. invalid documents)
DTD very useful for a while, not used anymore,
DTD very useful for a while, not used anymore, several major limitations several major limitations
37
Declaring the structure of Declaring the structure of elements elements
Grammar that describes the structure of the element
Grammar that describes the structure of the element
Subelements, identified by Name or
Subelements, identified by Name or
#PCDATA
#PCDATA
Combinators :
Combinators :
„
„+“ for at least 1 +“ for at least 1
„
„*“ for 0 or more *“ for 0 or more
„
„?“ for 0 or 1 ?“ for 0 or 1
„
„ , „ for concatenation , „ for concatenation
„
„ | „ for choice | „ for choice
<!ELEMENT a ( (b | c) * , d ? , e ) > <!ELEMENT a ( (b | c) * , d ? , e ) >
PCDATA: only textual content allowed
PCDATA: only textual content allowed
<!ELEMENT a #PCDATA>
<!ELEMENT a #PCDATA>
EMPTY : the element must be empty
EMPTY : the element must be empty
<!ELEMENT a EMPTY>
<!ELEMENT a EMPTY>
ANY: allows any content
ANY: allows any content
<!ELEMENT a ANY >
<!ELEMENT a ANY >
38
Example DTD for recipes Example DTD for recipes
<!E L E M E NT collection (description,recipe*)> <!E L E M E NT description ANY > <!E L E M E NT recipe (title,ingredient*,preparation,comment?,nutrit
- n)>
<!E L E M E NT title (#P CD ATA)> <!E L E M E NT ingredient (ingredient*,preparation)?> <!E L E M E NT preparation (step*)> <!E L E M E NT step (#P CD ATA)> <!E L E M E NT comment (#P CD ATA)> <!E L E M E NT nutrition E M P TY >
39
Defining the attribute lists Defining the attribute lists
Structure:
Structure: <!ATTLIST
<!ATTLIST ElementName ElementName definition definition> >
<!ATTLIST
<!ATTLIST ingredient ingredient name CDATA #REQUIRED name CDATA #REQUIRED amount CDATA #IMPLIED amount CDATA #IMPLIED unit CDATA #FIXED „cup“ unit CDATA #FIXED „cup“ > >
CDATA means normal content
CDATA means normal content
#REQUIRED, or #IMPLIED refer to the fact
#REQUIRED, or #IMPLIED refer to the fact that the attribute is optional or not that the attribute is optional or not
Default value possible
Default value possible
40
Attributes (cont.) Attributes (cont.)
#REQUIRED
#REQUIRED
Document must specify a value for attribute
Document must specify a value for attribute
#IMPLIED
#IMPLIED
Attribute is optional, there is no default
Attribute is optional, there is no default
value
value
Default value, if no other value specified
Default value, if no other value specified
#FIXED
#FIXED value value
Default value, if no other value specified
Default value, if no other value specified
If value specified, it must be the fixed value
If value specified, it must be the fixed value
41
Major attribute types Major attribute types
PCDATA: normal Text content
PCDATA: normal Text content
ID
ID
Value is unique within document
Value is unique within document
Element has at most one attribute of this type
Element has at most one attribute of this type
No default values allowed
No default values allowed
IDREF, IDREFS
IDREF, IDREFS
References to other elements within the
References to other elements within the document document
IDREFS: Enumeration, „ “ as separator
IDREFS: Enumeration, „ “ as separator
42
ID and IDREF attributes ID and IDREF attributes
<!ATTLIST <!ATTLIST book book isbn ID #REQUIRED isbn ID #REQUIRED price CDATA #IMPLIED price CDATA #IMPLIED index IDREFS „“ index IDREFS „“ > >
<book id=„1“ index=„2 3 “ > <book id=„1“ index=„2 3 “ > <book id=„2“ index=„3“/> <book id=„2“ index=„3“/> <book id =„3“/> <book id =„3“/>
43
Attributes list example Attributes list example
<!E L E M E NT ingredient (ingredient*,preparation)?> <!ATTL I ST ingredient name CD ATA #R E QUI R E D amount CD ATA #I M P L I E D unit CD ATA #I M P L I E D > <!E L E M E NT nutrition E M P TY > <!ATTL I ST nutrition protein CD ATA #R E QUI R E D carbohydrates CD ATA #R E QUI R E D fat CD ATA #R E QUI R E D
44
Mixed content in DTDs Mixed content in DTDs
Mixing PCDATA declarations with other
Mixing PCDATA declarations with other subelements means that the content can be subelements means that the content can be “mixed” “mixed” <!ELEMENT p(#PCDATA|a|ul|b|i|em)*> <!ELEMENT p(#PCDATA|a|ul|b|i|em)*> <p>some text <em>some emphasized <p>some text <em>some emphasized text</em> blah <b>some bold text</em> blah <b>some bold text</b> </p> text</b> </p>
45
Declarations of DTDs Declarations of DTDs
No DTD (well-formed Documents)
No DTD (well-formed Documents)
DTD inside the Document:
DTD inside the Document:
<!DOCTYPE name <!DOCTYPE name [definition] [definition] > >
DTD external, specified by URI:
DTD external, specified by URI:
<!DOCTYPE name <!DOCTYPE name SYSTEM „demo.dtd“> SYSTEM „demo.dtd“>
DTD external, Name and optional URI:
DTD external, Name and optional URI:
<!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“> PUBLIC „Demo“> <!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“ „demo.dtd“> PUBLIC „Demo“ „demo.dtd“>
DTD inside the document + external:
DTD inside the document + external:
<!DOCTYPE name1 <!DOCTYPE name1 SYSTEM „demo.dtd SYSTEM „demo.dtd > >
46
Correctness of XML documents Correctness of XML documents
Well formed
Well formed documents documents
Verify the basic XML constraints, e.g. <a></b>
Verify the basic XML constraints, e.g. <a></b>
Valid documents
Valid documents
Verify the additional DTD structural constraints
Verify the additional DTD structural constraints
Non well formed XML documents cannot be processed
Non well formed XML documents cannot be processed
Non-valid documents can still be processed (queried,
Non-valid documents can still be processed (queried, transformed, etc) transformed, etc)
47
Limitations of DTDs Limitations of DTDs
DTDs describe only the “grammar” of the XML
DTDs describe only the “grammar” of the XML file, not the detailed structure and/or types file, not the detailed structure and/or types
This grammatical description has some obvious
This grammatical description has some obvious shortcomings: shortcomings:
we cannot express that a “length” element must
we cannot express that a “length” element must contain a non-negative number contain a non-negative number (constraints on the (constraints on the type of the value of an element or attribute) type of the value of an element or attribute)
The “unit”
The “unit” element should only be allowed when element should only be allowed when “ “amount” amount” is present is present (co-occurrence constraints) (co-occurrence constraints)
the “
the “comment” comment” element should be allowed to appear element should be allowed to appear anywhere anywhere (schema flexibility) (schema flexibility)
48
Good Schema design principles
The XML schema language shall be
- 1. more expressive than XML DTDs
- 2. expressed in XML
- 3. self-describing
- 4. usable by a wide variety of applications that
employ XML
- 5. straightforwardly usable on the Internet
- 6. optimized for interoperability
- 7. simple enough to be implemented with
modest design and runtime resources
- 8. coordinated with relevant W3C specs
49
Recapitulation Recapitulation
XML as inheriting from the Web history
XML as inheriting from the Web history
SGML, HTML, XHTML, XML
SGML, HTML, XHTML, XML
XML key concepts
XML key concepts
Documents, elements, attributes, text
Documents, elements, attributes, text
Order, nested structure, textual information
Order, nested structure, textual information
Namespaces
Namespaces
XML usage scenarios
XML usage scenarios
Financial, medical, metadata, blogs, etc
Financial, medical, metadata, blogs, etc
DTDs and the need for describing the
DTDs and the need for describing the “structure” of an XML file “structure” of an XML file
Next: XML Schemas