Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, - - PowerPoint PPT Presentation

module 2 module 2 xml basics xml basics
SMART_READER_LITE
LIVE PREVIEW

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, - - PowerPoint PPT Presentation

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios, DTDs) Usage scenarios, DTDs) 1 History: SGML vs. HTML vs. History: SGML vs. HTML vs. XML XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)


slide-1
SLIDE 1

1

Module 2 Module 2 XML Basics XML Basics

(XML, Namespaces, (XML, Namespaces, Usage scenarios, DTDs) Usage scenarios, DTDs)

slide-2
SLIDE 2

2

History: SGML vs. HTML vs. History: SGML vs. HTML vs. XML XML

SGML (1960)

XML(1996) HTML(1990) XHTML(2000) http://www.w3.org/TR/2006/REC-xml-20060816/

slide-3
SLIDE 3

3

Why XML ? Why XML ?

 HTML is to be interpreted by browsers

HTML is to be interpreted by browsers

 Shown on the screen to a human

Shown on the screen to a human

 Desire to separate the “content” from

Desire to separate the “content” from “presentation” “presentation”

 Presentation has to please the human eye

Presentation has to please the human eye

 Content can be interpreted by machines, for

Content can be interpreted by machines, for machines presentation is a handicap machines presentation is a handicap

 Semantic markup of the data

Semantic markup of the data

slide-4
SLIDE 4

4

Information about a book in Information about a book in HTML HTML

<td><h1 class=” <td><h1 class=”Books Books"> ">Politics of experience by Ronald Laing, Politics of experience by Ronald Laing, published in 1967 published in 1967</h1></td><td align="right" nowrap> Item </h1></td><td align="right" nowrap> Item number:320070381076</td><td align="right" valign="top"><img number:320070381076</td><td align="right" valign="top"><img src="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gi src="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gi f" width="8" height="8"></td></tr><tr><td colspan="6" f" width="8" height="8"></td></tr><tr><td colspan="6" valign="middle" bgcolor="#5F66EE"><img valign="middle" bgcolor="#5F66EE"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="4"></td></tr></table><table width="100%" border="0" height="4"></td></tr></table><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td cellpadding="0" cellspacing="0"><tr><td bgcolor="#CCCCFF"><img bgcolor="#CCCCFF"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="1"></td><td bgcolor="#EEEEFF"><div height="1"></td><td bgcolor="#EEEEFF"><div id="FastVIPBIBO"><table border="0" cellpadding="0" id="FastVIPBIBO"><table border="0" cellpadding="0" cellspacing="0" width="100%"> cellspacing="0" width="100%">

slide-5
SLIDE 5

5

The same information in XML The same information in XML

< <book book year year=“1967”> =“1967”> < <title title>Politics of experience</ >Politics of experience</title title> > < <author author> > < <firstname firstname>Ronald</ >Ronald</firstname firstname> > < <lastname lastname>Laing</ >Laing</lastname lastname> > </ </author author> > </ </book book> >

Elements

  • Information is (1) decoupled from presentation, then (2)

chopped into smaller pieces, and then (3) marked with semantic meaning

  • It can be processed by machines
  • Like HTML, only syntax, not logical abstract data model
slide-6
SLIDE 6

6

XML key concepts XML key concepts

 Documents

Documents

 Elements

Elements

 Attributes

Attributes

 Namespace declarations

Namespace declarations

 Text

Text

 Comments

Comments

 Processing Instructions

Processing Instructions

 All inherited from SGML, then HTML

All inherited from SGML, then HTML

slide-7
SLIDE 7

7

The key concepts of XML The key concepts of XML

< <book book year year=“1967”> =“1967”> < <title title>Politics of experience</ >Politics of experience</title title> > < <author author> > < <firstname firstname>Ronald</ >Ronald</firstname firstname> > < <lastname lastname>Laing</ >Laing</lastname lastname> > </ </author author> > </ </book book> >

Elements

  • Documents
  • Elements
  • Attributes
  • Text
  • Nested structure
  • Conceptual tree
  • Order is important
  • Only “characters”, not integers, etc
slide-8
SLIDE 8

8

Elements Elements

 Enclosed in Tags

Enclosed in Tags

 Begin Tag: e.g.,

Begin Tag: e.g., <bibliography> <bibliography>

 End Tag: e.g.,

End Tag: e.g., </bibliography> </bibliography>

 Element without content: e.g.,

Element without content: e.g., <bibliography /> <bibliography /> is a is a shorthand for shorthand for <bibliography> </bibliography> <bibliography> </bibliography>

 Elements can be nested

Elements can be nested

<bib> <bib> <book> Wilde Wutz </book> <book> Wilde Wutz </book> </bib> </bib>

 Subelements can implement multisets

Subelements can implement multisets

<bib> <bib> <book> ... </book> <book> ... </book> <book> ... </book> <book> ... </book> </bib> </bib>

 Order is important !

Order is important !

 Documents must be well-formed

Documents must be well-formed

<a> <a> <b> <b> </a> </a> </b> </b> is forbidden! is forbidden! <a> <a> <b> </b> <b> </b> is forbidden! is forbidden!

slide-9
SLIDE 9

9

Attributes Attributes

 Attribute are associated to Elements

Attribute are associated to Elements

<book price = „55“ year = „1967“ <book price = „55“ year = „1967“ > > <title> ... </title> <title> ... </title> <author> ... </author> <author> ... </author> </book> </book>

 Elements can have only attributes

Elements can have only attributes

<person name = „Wutz“ age = „33“/> <person name = „Wutz“ age = „33“/>

 Attribute names must be unique! (No Multisets)

Attribute names must be unique! (No Multisets)

<person name = „Wilde“ name = „Wutz“/> <person name = „Wilde“ name = „Wutz“/> is illegal! is illegal!

 What is the difference between a nested element

What is the difference between a nested element and an attribute? Are attributes useful? and an attribute? Are attributes useful?

 Modeling decision: should „name“ be an attribute

Modeling decision: should „name“ be an attribute

  • r a subelement of a person ? What about „age“ ?
  • r a subelement of a person ? What about „age“ ?
slide-10
SLIDE 10

10

Text and Mixed Content Text and Mixed Content

 Text appears in element content

Text appears in element content

 <title>

<title>The politics of experience The politics of experience</title> </title>

 Can be mixed with other subelements

Can be mixed with other subelements

 <title>

<title>The politics of <em>experience</em> The politics of <em>experience</em></title> </title>

 Mixed Content

Mixed Content

 For „documents“ data -- very useful

For „documents“ data -- very useful

 The need does not arise in „data“ processing, only entities

The need does not arise in „data“ processing, only entities and relationships and relationships

 People speak in sentences, not entities and relationships.

People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by while adding semantic markup that can be interpreted by machines. machines.

slide-11
SLIDE 11

11

Continuous spectrum between Continuous spectrum between natural language, semi-structured natural language, semi-structured data, and structured data data, and structured data

1. 1.

Dana said that the book entitled Dana said that the book entitled „The politics of „The politics of experience“ is really excellent ! experience“ is really excellent !

2. 2.

<citation author=„Dana“> <citation author=„Dana“> The book entitled The book entitled „The „The politics of experience“ is really excellent ! politics of experience“ is really excellent ! </citation> </citation>

3. 3.

<citation author=„Dana“> <citation author=„Dana“> The book entitled The book entitled <title> <title> The politics of experience The politics of experience</title> </title> is really excellent ! is really excellent ! </citation> </citation>

4. 4.

<citation> <citation>

<author> <author>Dana Dana</author> </author> <aboutTitle> <aboutTitle>The politics of experience The politics of experience</aboutTitle> </aboutTitle> <rating> <rating> excellent excellent</rating> </rating>

</citation> </citation>

slide-12
SLIDE 12

12

CDATA sections CDATA sections

 Sometimes we would like to preserve the

Sometimes we would like to preserve the

  • riginal characters, and not interpret them as
  • riginal characters, and not interpret them as

markup markup

 CDATA sections

CDATA sections

 Not parsed as XML

Not parsed as XML

 <message>

<message>

<greeting>Hello,world!</greeting> <greeting>Hello,world!</greeting> </message> </message>

 <message>

<message> <![CDATA[<greeting>Hello, <![CDATA[<greeting>Hello, world!</greeting>]]> world!</greeting>]]> </message> </message>

slide-13
SLIDE 13

13

Comments, PIs, Prolog Comments, PIs, Prolog

 Comment: Syntax as in HTML

Comment: Syntax as in HTML

<!-- this is a comment --> <!-- this is a comment -->

 Processing Instructions

Processing Instructions

 Contain no data - interpretation by processor

Contain no data - interpretation by processor

 Syntax:

Syntax: <?pause 10 secs ?> <?pause 10 secs ?>

 Pause is

Pause is „Target“; „Target“; 10secs 10secs is „Content“ is „Content“

 XML

XML is a reserved target for prolog is a reserved target for prolog

 Prolog

Prolog

<?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?> <?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?>

 Standalone defines whether there is a DTD

Standalone defines whether there is a DTD

 Encoding is usually Unicode.

Encoding is usually Unicode.

slide-14
SLIDE 14

14

Whitespaces declaration Whitespaces declaration

 Whitespace = Continuous sequence of

Whitespace = Continuous sequence of Space Space, , Tab Tab and and Return Return character character

 Special Attribute

Special Attribute xml:space xml:space to control use to control use

 Human-readible XML (with Whitespace)

Human-readible XML (with Whitespace)

<book <book xml:space=„preserve“ xml:space=„preserve“ > > <title>The politics of experience</title> <title>The politics of experience</title> <author>Ronald laing</author> <author>Ronald laing</author> </book> </book>

 (Efficient) machine-readible XML (no WS)

(Efficient) machine-readible XML (no WS) <book

<book xml:space=„default“ xml:space=„default“ ><title>The politics of ><title>The politics of experience</title><author>Ronald experience</title><author>Ronald Laing</author></book> Laing</author></book>

 Performance improvement: ca. Factor 2.

Performance improvement: ca. Factor 2.

slide-15
SLIDE 15

15

Language declaration Language declaration

 <p

<p xml:lang="en"> xml:lang="en">The quick The quick brown fox jumps over the lazy brown fox jumps over the lazy dog.</p> dog.</p>

 <p

<p xml:lang="en-GB"> xml:lang="en-GB">What colour What colour is it?</p> is it?</p>

 <p

<p xml:lang="en-US"> xml:lang="en-US">What color What color is it?</p> is it?</p>

slide-16
SLIDE 16

16

Universal Resource Identifiers Universal Resource Identifiers

  • n the Web
  • n the Web

 URLs, URIs, IRIs

URLs, URIs, IRIs

 URL (Universal Resource Locators):

URL (Universal Resource Locators): deferenceable deferenceable identifier on the Web identifier on the Web

 The target of an URL pointer is an HTML file (virtual or

The target of an URL pointer is an HTML file (virtual or materialized) materialized)

 URIs (Unique Resource Identifier):

URIs (Unique Resource Identifier): general purpose key general purpose key to resources on the Web to resources on the Web

 Uniquely identifies a resource

Uniquely identifies a resource

 Target is not an HTML file, can be anything (schema, table, file,

Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) entity, object, tuple, person, physical item, etc)

 Lifetime and scope of this “key” is user dependent

Lifetime and scope of this “key” is user dependent

 IRI (Internationalized Resource Identifiers)

IRI (Internationalized Resource Identifiers)

 Allow non Latin characters (Chinese, Arabic, Japanese, etc)

Allow non Latin characters (Chinese, Arabic, Japanese, etc)

 URL, URI, IRIs

URL, URI, IRIs

 All strings

All strings

 Very LONG strings

Very LONG strings

slide-17
SLIDE 17

17

Namespaces Namespaces

 Integration of Data from diverse data sources

Integration of Data from diverse data sources

 Integration of different XML Vocabularies (aka Namespaces)

Integration of different XML Vocabularies (aka Namespaces)

 Each „vocabulary“ has a unique key, identified by a URI/IRI

Each „vocabulary“ has a unique key, identified by a URI/IRI

Same local name, from different vocabularies can have Same local name, from different vocabularies can have

 Different meaning

Different meaning

 Different structure associated with it

Different structure associated with it

 Qualified Names (Qname) to attach a „name“ to its

Qualified Names (Qname) to attach a „name“ to its „vocabulary“ „vocabulary“

 for all nodes in an XML document that has names (Attributes, Elements,

for all nodes in an XML document that has names (Attributes, Elements, Pis Pis

 QName

QName ::= triple ( URI ::= triple ( URI [ prefix: ] [ prefix: ] localname ) localname )

 Binding (prefix, URI) is introduced in elements start tag

Binding (prefix, URI) is introduced in elements start tag

 Later only the prefix is used, not the long URIs

Later only the prefix is used, not the long URIs

 Prefix is optional, default namespaces

Prefix is optional, default namespaces

 Prefix and localname a separated by „:“

Prefix and localname a separated by „:“

 „

„http://w3.org/TR/1999/REC-xml-names“ http://w3.org/TR/1999/REC-xml-names“

slide-18
SLIDE 18

18

Namespaces (cont) Namespaces (cont)

 Namespace definitions look like Attributes

Namespace definitions look like Attributes

 Identified by „xmlns:prefix“ or „xmlns“ (default)

Identified by „xmlns:prefix“ or „xmlns“ (default)

 Bind the Prefix to the URI

Bind the Prefix to the URI

 Scope is the entire element where the

Scope is the entire element where the namespace is declared namespace is declared

 Includes the element itslef, its attributes and ist

Includes the element itslef, its attributes and ist subtrees subtrees

 Example

Example

< <ns: ns:a a xmlns:ns=„someURI“ ns: xmlns:ns=„someURI“ ns:b=„foo“> b=„foo“> < <ns: ns:b>content</ b>content</ns ns:b> :b>

</ </ns: ns:a> a>

slide-19
SLIDE 19

19

Default namespaces Default namespaces

 Default namespaces, no prefix

Default namespaces, no prefix

<a xmlns=„someURI“ > <a xmlns=„someURI“ > <b/> <!-- a and b are in the someURI namespace! --> <b/> <!-- a and b are in the someURI namespace! --> </a> </a>

 Only applies to subelements, not attributes

Only applies to subelements, not attributes

<a xmlns=„someURI“ <a xmlns=„someURI“ c = „not in someURI c = „not in someURI namespace“ namespace“> > <b/> <!-- a and b are in the someURI namespace! --> <b/> <!-- a and b are in the someURI namespace! --> </a> </a>

slide-20
SLIDE 20

20

Example: Namespaces Example: Namespaces

 DQ1 defines

DQ1 defines dish dish for for china china

 Diameter, Volume, Decor, ...

Diameter, Volume, Decor, ...

 DQ2 defines

DQ2 defines dish dish for for satellites satellites

 Diameter, Frequency

Diameter, Frequency

 How many „dishes“ are there?

How many „dishes“ are there?

 Better ask for:

Better ask for:

 „

„How many How many dishes dishes are there?“ are there?“

  • r
  • r

 „

„How many How many dishes dishes are there are there?“ ?“

slide-21
SLIDE 21

21

Example: Namespaces Example: Namespaces

<gs:dish <gs:dish xmlns:gs = „http://china.com“ xmlns:gs = „http://china.com“ > > <gs:dm gs:unit = „cm“> <gs:dm gs:unit = „cm“>20 20</gs:dm> </gs:dm> <gs:vol gs:unit = „l“> <gs:vol gs:unit = „l“>5 5</gs:vol> </gs:vol> <gs:decor> <gs:decor>Meissner Meissner</gs:decor> </gs:decor> </gs:dish> </gs:dish> <sat:dish <sat:dish xmlns:sat = „http://satelite.com“ xmlns:sat = „http://satelite.com“ > > <sat:dm> <sat:dm>200 200</sat:dm> </sat:dm> <sat:freq> <sat:freq>20-2000MHz 20-2000MHz</sat:freq> </sat:freq> </sat:dish> </sat:dish>

slide-22
SLIDE 22

22

Mixing Several Namespaces Mixing Several Namespaces

< <gs:dish xmlns:gs = „http://china.com“ gs:dish xmlns:gs = „http://china.com“ xmlns:uom = „http://units.com“> xmlns:uom = „http://units.com“> < <gs:dm gs:dm uom:unit = „cm“> uom:unit = „cm“>20 20< </gs:dm /gs:dm> > < <gs:vol gs:vol uom:unit = „l“> uom:unit = „l“>5 5< </gs:vol /gs:vol> > < <gs:decor gs:decor> >Meissner Meissner< </gs:decor /gs:decor> > <comment> <comment>This is an unqualified element name This is an unqualified element name</comment> </comment> < </gs:dish /gs:dish> >

slide-23
SLIDE 23

23

Example XML data Example XML data

 XHTML (browser/presentation)

XHTML (browser/presentation)

 RSS (blogs)

RSS (blogs)

 UBL (Universal Business Language)

UBL (Universal Business Language)

 HealthCare Level 7 (medical data)

HealthCare Level 7 (medical data)

 XBRL (financial data)

XBRL (financial data)

 Digital photography metadata (XMP)

Digital photography metadata (XMP)

 XMI (metadata)

XMI (metadata)

 XQueryX (programs)

XQueryX (programs)

 XForms (forms)

XForms (forms)

 SOAP (message envelopes)

SOAP (message envelopes)

 Microsoft Office -- Powerpoint in XML

Microsoft Office -- Powerpoint in XML (documents) (documents)

slide-24
SLIDE 24

24

XHTML XHTML

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

slide-25
SLIDE 25

25

RSS, blogs RSS, blogs

<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.xml.com/xml/news.rss"> <title>XML.com</title> <link>http://xml.com/pub</link> <description> XML.com features a rich mix of information and services for the XML community. </description> <image rdf:resource="http://xml.com/universal/images/xml_tiny. gif" /> <items> <rdf:Seq> <rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" /> <rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.htm l" /> </rdf:Seq> </items> <textinput rdf:resource="http://search.xml.com" /> </channel> <image rdf:about="http://xml.com/universal/images/xml_tiny.gif "> <title>XML.com</title> <link>http://www.xml.com</link> <url>http://xml.com/universal/images/xml_tiny.gif</url> </image>

slide-26
SLIDE 26

26

UBL (Universal Business UBL (Universal Business Language) Language)

 Vocabularies definitions for:

Vocabularies definitions for:

 ApplicationResponse•AttachedDocument•BillOfLading

ApplicationResponse•AttachedDocument•BillOfLading

  • Catalogue•CatalogueDeletion•CatalogueItemSpecific
  • Catalogue•CatalogueDeletion•CatalogueItemSpecific

ationUpdate•CataloguePricingUpdate•CatalogueRequ ationUpdate•CataloguePricingUpdate•CatalogueRequ est•CertificateOfOrigin•CreditNote•DebitNote•Despatc est•CertificateOfOrigin•CreditNote•DebitNote•Despatc hAdvice•ForwardingInstructions•FreightInvoice•Invoic hAdvice•ForwardingInstructions•FreightInvoice•Invoic e•Order•OrderCancellation•OrderChange•OrderResp e•Order•OrderCancellation•OrderChange•OrderResp

  • nse•OrderResponseSimple•PackingList•Quotation•R
  • nse•OrderResponseSimple•PackingList•Quotation•R

eceiptAdvice•Reminder•RemittanceAdvice•RequestF eceiptAdvice•Reminder•RemittanceAdvice•RequestF

  • rQuotation•SelfBilledCreditNote•SelfBilledInvoice•St
  • rQuotation•SelfBilledCreditNote•SelfBilledInvoice•St

atement•TransportationStatus•Waybill atement•TransportationStatus•Waybill

slide-27
SLIDE 27

27

HealthCareLevel 7 HealthCareLevel 7

 Medical information that is being exchanged

Medical information that is being exchanged between hospitals, patients, doctors, between hospitals, patients, doctors, pharmacies and insurance companies pharmacies and insurance companies

 http://en.wikipedia.org/wiki/HL7

http://en.wikipedia.org/wiki/HL7

slide-28
SLIDE 28

28

XBRL (Financial information) XBRL (Financial information)

 Goal: facilitate the exchange of business

Goal: facilitate the exchange of business and financial performance information and financial performance information between companies, governments, between companies, governments, insurance companies, banks, etc. insurance companies, banks, etc.

 Mandate by law in many countries

Mandate by law in many countries

 http://en.wikipedia.org/wiki/XBRL

http://en.wikipedia.org/wiki/XBRL

slide-29
SLIDE 29

29

Extensible Metadata Platform Extensible Metadata Platform (XMP) (XMP)

 Used in

Used in PDF PDF, , photography photography and and photo editing photo editing applications. applications.

 Particular

Particular schemas schemas for basic properties useful for for basic properties useful for recording the history of a resource as it passes recording the history of a resource as it passes through multiple processing steps, from being through multiple processing steps, from being photographed, photographed, scanned scanned, or authored as text, , or authored as text, through photo editing steps (such as through photo editing steps (such as cropping cropping or

  • r

color adjustment), to assembly into a final image. color adjustment), to assembly into a final image.

 XMP allows each software program or device along

XMP allows each software program or device along the way to add its own information to a digital the way to add its own information to a digital resource, which can then be retained in the final resource, which can then be retained in the final digital file. digital file.

 http://en.wikipedia.org/wiki/Extensible_Metadat

http://en.wikipedia.org/wiki/Extensible_Metadat a_Platform a_Platform

slide-30
SLIDE 30

30

Microsoft Office in XML Microsoft Office in XML

 Office 2003 was able to import/export all

Office 2003 was able to import/export all documents into XML documents into XML

 Office 2007 models the documents NATIVELY

Office 2007 models the documents NATIVELY in XML in XML

 Examples of vocabularies and schemas:

Examples of vocabularies and schemas:

 WordprocessingML (the XML file format for

WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003) and DataDiagramingML (Visio 2003)

slide-31
SLIDE 31

31

Forms on the Web in XML Forms on the Web in XML

 XML Forms (Xforms)

XML Forms (Xforms)

 http://www.w3.org/TR/xforms/

http://www.w3.org/TR/xforms/

 <xforms:model> <xforms:instance>

<xforms:model> <xforms:instance> <ecommerce xmlns=""> <method/> <ecommerce xmlns=""> <method/> <number/> <expiry/> <number/> <expiry/> </ecommerce> </xforms:instance> </ecommerce> </xforms:instance> <xforms:submission <xforms:submission action="http://example.com/submit" action="http://example.com/submit" method="post" id="submit" method="post" id="submit" </xforms:model> </xforms:model>

slide-32
SLIDE 32

32

Programs and queries in XML Programs and queries in XML

 XQuery, the XML query language, has an XML

XQuery, the XML query language, has an XML representation representation

 Programs and queries are also DATA

Programs and queries are also DATA

 Blurring the distinction between data, metadata, code

Blurring the distinction between data, metadata, code

<xqx:functionName>distinct</xqx:functionName> <xqx:functionName>distinct</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> xsi:type="xqx:stringConstantExpr"> <xqx:value>http://www.bn.com</xqx:value> <xqx:value>http://www.bn.com</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:expr> <xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:elementTest> <xqx:elementTest> <xqx:nodeName> <xqx:nodeName> <xqx:QName>author</xqx:QName> <xqx:QName>author</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:stepExpr> </xqx:expr> </xqx:expr>

slide-33
SLIDE 33

33

SOAP and Web Services SOAP and Web Services

 Web Services is the favorite way of exchanging

Web Services is the favorite way of exchanging information between applications information between applications

 XML exchange over HTTP, with a specific protocol

XML exchange over HTTP, with a specific protocol (SOAP) (SOAP)

<?xml version='1.0' ?><env:Envelope <?xml version='1.0' ?><env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation <env:Header> <m:reservation xmlns:m="http://travelcompany.example.org/reservation" xmlns:m="http://travelcompany.example.org/reservation" env:role="http://www.w3.org/2003/05/soap- env:role="http://www.w3.org/2003/05/soap- envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid:093a2da1-q345-739r-ba5d- <m:reference>uuid:093a2da1-q345-739r-ba5d- pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11- pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11- 29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> 29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> <n:passenger <n:passenger xmlns:n="http://mycompany.example.com/employees" xmlns:n="http://mycompany.example.com/employees" env:role="http://www.w3.org/2003/05/soap- env:role="http://www.w3.org/2003/05/soap- envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> </env:Header> <env:Body/> </env:Envelope> </env:Header> <env:Body/> </env:Envelope>

slide-34
SLIDE 34

34

The need for XML “schemas” The need for XML “schemas”

 Unlike any other data format, XML is totally flexible,

Unlike any other data format, XML is totally flexible, elements can be nested in arbitrary ways elements can be nested in arbitrary ways

 We can start by writing the XML data -- no need for

We can start by writing the XML data -- no need for a priori design of a schema a priori design of a schema

 Think relational databases, or Java classes

Think relational databases, or Java classes

 However, schemas are necessary:

However, schemas are necessary:

 Facilitate the writing of applications that process data

Facilitate the writing of applications that process data

 Constraint the data that is correct for a certain application

Constraint the data that is correct for a certain application

 Have a priori agreements between parties with respect to

Have a priori agreements between parties with respect to the data being exchanged the data being exchanged

 Schema: a model of the data

Schema: a model of the data

 Structural definitions

Structural definitions

 Type definitions

Type definitions

 Defaults

Defaults

slide-35
SLIDE 35

35

History and role of XML Schema History and role of XML Schema Languages Languages

 Several standard Schema Languages

Several standard Schema Languages

 DTDs, XML Schema, RelaxNG

DTDs, XML Schema, RelaxNG

 Schema languages have been designed after, and in

Schema languages have been designed after, and in an orthogonal fashion, to XML itself an orthogonal fashion, to XML itself

 Schemas and data are completely decoupled in XML

Schemas and data are completely decoupled in XML

 Data can exist with or without schemas

Data can exist with or without schemas

 Or with multiple schemas

Or with multiple schemas

 Schema evolutions rarely impose evolving the data

Schema evolutions rarely impose evolving the data

 Schemas can be designed before the data, or extracted from

Schemas can be designed before the data, or extracted from the data (DataGuide -- Stanford) the data (DataGuide -- Stanford)

 Makes XML the right choice for manipulating semi-

Makes XML the right choice for manipulating semi- structured data, or rapidly evolving data, or highly structured data, or rapidly evolving data, or highly customizable data customizable data

slide-36
SLIDE 36

36

DTDs DTDs

 Inherited from SGML

Inherited from SGML

 Part of the original XML 1.0 specification

Part of the original XML 1.0 specification

 Describe the “grammar” of the XML file

Describe the “grammar” of the XML file

 Element declarations:

Element declarations: how elements are allowed to nest how elements are allowed to nest within each other by rules and constraints within each other by rules and constraints

 Attributes lists:

Attributes lists: describe what attributes are allowed on describe what attributes are allowed on which element which element

 Some constraints on the value of elements and

Some constraints on the value of elements and attributes attributes

 Which is the root element of the XML file

Which is the root element of the XML file

 Checking the structural constraints:

Checking the structural constraints: DTD DTD validation validation (valid vs. invalid documents) (valid vs. invalid documents)

 DTD very useful for a while, not used anymore,

DTD very useful for a while, not used anymore, several major limitations several major limitations

slide-37
SLIDE 37

37

Declaring the structure of Declaring the structure of elements elements

 Grammar that describes the structure of the element

Grammar that describes the structure of the element

 Subelements, identified by Name or

Subelements, identified by Name or

 #PCDATA

#PCDATA

 Combinators :

Combinators :

 „

„+“ for at least 1 +“ for at least 1

 „

„*“ for 0 or more *“ for 0 or more

 „

„?“ for 0 or 1 ?“ for 0 or 1

 „

„ , „ for concatenation , „ for concatenation

 „

„ | „ for choice | „ for choice 

<!ELEMENT a ( (b | c) * , d ? , e ) > <!ELEMENT a ( (b | c) * , d ? , e ) >

 PCDATA: only textual content allowed

PCDATA: only textual content allowed

 <!ELEMENT a #PCDATA>

<!ELEMENT a #PCDATA>

 EMPTY : the element must be empty

EMPTY : the element must be empty

 <!ELEMENT a EMPTY>

<!ELEMENT a EMPTY>

 ANY: allows any content

ANY: allows any content

 <!ELEMENT a ANY >

<!ELEMENT a ANY >

slide-38
SLIDE 38

38

Example DTD for recipes Example DTD for recipes

<!E L E M E NT collection (description,recipe*)> <!E L E M E NT description ANY > <!E L E M E NT recipe (title,ingredient*,preparation,comment?,nutrit

  • n)>

<!E L E M E NT title (#P CD ATA)> <!E L E M E NT ingredient (ingredient*,preparation)?> <!E L E M E NT preparation (step*)> <!E L E M E NT step (#P CD ATA)> <!E L E M E NT comment (#P CD ATA)> <!E L E M E NT nutrition E M P TY >

slide-39
SLIDE 39

39

Defining the attribute lists Defining the attribute lists

 Structure:

Structure: <!ATTLIST

<!ATTLIST ElementName ElementName definition definition> >

 <!ATTLIST

<!ATTLIST ingredient ingredient name CDATA #REQUIRED name CDATA #REQUIRED amount CDATA #IMPLIED amount CDATA #IMPLIED unit CDATA #FIXED „cup“ unit CDATA #FIXED „cup“ > >

 CDATA means normal content

CDATA means normal content

 #REQUIRED, or #IMPLIED refer to the fact

#REQUIRED, or #IMPLIED refer to the fact that the attribute is optional or not that the attribute is optional or not

 Default value possible

Default value possible

slide-40
SLIDE 40

40

Attributes (cont.) Attributes (cont.)

 #REQUIRED

#REQUIRED

 Document must specify a value for attribute

Document must specify a value for attribute

 #IMPLIED

#IMPLIED

 Attribute is optional, there is no default

Attribute is optional, there is no default

 value

value

 Default value, if no other value specified

Default value, if no other value specified

 #FIXED

#FIXED value value

 Default value, if no other value specified

Default value, if no other value specified

 If value specified, it must be the fixed value

If value specified, it must be the fixed value

slide-41
SLIDE 41

41

Major attribute types Major attribute types

 PCDATA: normal Text content

PCDATA: normal Text content

 ID

ID

 Value is unique within document

Value is unique within document

 Element has at most one attribute of this type

Element has at most one attribute of this type

 No default values allowed

No default values allowed

 IDREF, IDREFS

IDREF, IDREFS

 References to other elements within the

References to other elements within the document document

 IDREFS: Enumeration, „ “ as separator

IDREFS: Enumeration, „ “ as separator

slide-42
SLIDE 42

42

ID and IDREF attributes ID and IDREF attributes

<!ATTLIST <!ATTLIST book book isbn ID #REQUIRED isbn ID #REQUIRED price CDATA #IMPLIED price CDATA #IMPLIED index IDREFS „“ index IDREFS „“ > >

<book id=„1“ index=„2 3 “ > <book id=„1“ index=„2 3 “ > <book id=„2“ index=„3“/> <book id=„2“ index=„3“/> <book id =„3“/> <book id =„3“/>

slide-43
SLIDE 43

43

Attributes list example Attributes list example

<!E L E M E NT ingredient (ingredient*,preparation)?> <!ATTL I ST ingredient name CD ATA #R E QUI R E D amount CD ATA #I M P L I E D unit CD ATA #I M P L I E D > <!E L E M E NT nutrition E M P TY > <!ATTL I ST nutrition protein CD ATA #R E QUI R E D carbohydrates CD ATA #R E QUI R E D fat CD ATA #R E QUI R E D

slide-44
SLIDE 44

44

Mixed content in DTDs Mixed content in DTDs

 Mixing PCDATA declarations with other

Mixing PCDATA declarations with other subelements means that the content can be subelements means that the content can be “mixed” “mixed” <!ELEMENT p(#PCDATA|a|ul|b|i|em)*> <!ELEMENT p(#PCDATA|a|ul|b|i|em)*> <p>some text <em>some emphasized <p>some text <em>some emphasized text</em> blah <b>some bold text</em> blah <b>some bold text</b> </p> text</b> </p>

slide-45
SLIDE 45

45

Declarations of DTDs Declarations of DTDs

 No DTD (well-formed Documents)

No DTD (well-formed Documents)

 DTD inside the Document:

DTD inside the Document:

<!DOCTYPE name <!DOCTYPE name [definition] [definition] > >

 DTD external, specified by URI:

DTD external, specified by URI:

<!DOCTYPE name <!DOCTYPE name SYSTEM „demo.dtd“> SYSTEM „demo.dtd“>

 DTD external, Name and optional URI:

DTD external, Name and optional URI:

<!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“> PUBLIC „Demo“> <!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“ „demo.dtd“> PUBLIC „Demo“ „demo.dtd“>

 DTD inside the document + external:

DTD inside the document + external:

<!DOCTYPE name1 <!DOCTYPE name1 SYSTEM „demo.dtd SYSTEM „demo.dtd > >

slide-46
SLIDE 46

46

Correctness of XML documents Correctness of XML documents

 Well formed

Well formed documents documents

 Verify the basic XML constraints, e.g. <a></b>

Verify the basic XML constraints, e.g. <a></b>

 Valid documents

Valid documents

 Verify the additional DTD structural constraints

Verify the additional DTD structural constraints

 Non well formed XML documents cannot be processed

Non well formed XML documents cannot be processed

 Non-valid documents can still be processed (queried,

Non-valid documents can still be processed (queried, transformed, etc) transformed, etc)

slide-47
SLIDE 47

47

Limitations of DTDs Limitations of DTDs

 DTDs describe only the “grammar” of the XML

DTDs describe only the “grammar” of the XML file, not the detailed structure and/or types file, not the detailed structure and/or types

 This grammatical description has some obvious

This grammatical description has some obvious shortcomings: shortcomings:

 we cannot express that a “length” element must

we cannot express that a “length” element must contain a non-negative number contain a non-negative number (constraints on the (constraints on the type of the value of an element or attribute) type of the value of an element or attribute)

 The “unit”

The “unit” element should only be allowed when element should only be allowed when “ “amount” amount” is present is present (co-occurrence constraints) (co-occurrence constraints)

 the “

the “comment” comment” element should be allowed to appear element should be allowed to appear anywhere anywhere (schema flexibility) (schema flexibility)

slide-48
SLIDE 48

48

Good Schema design principles

 The XML schema language shall be

  • 1. more expressive than XML DTDs
  • 2. expressed in XML
  • 3. self-describing
  • 4. usable by a wide variety of applications that

employ XML

  • 5. straightforwardly usable on the Internet
  • 6. optimized for interoperability
  • 7. simple enough to be implemented with

modest design and runtime resources

  • 8. coordinated with relevant W3C specs
slide-49
SLIDE 49

49

Recapitulation Recapitulation

 XML as inheriting from the Web history

XML as inheriting from the Web history

 SGML, HTML, XHTML, XML

SGML, HTML, XHTML, XML

 XML key concepts

XML key concepts

 Documents, elements, attributes, text

Documents, elements, attributes, text

 Order, nested structure, textual information

Order, nested structure, textual information

 Namespaces

Namespaces

 XML usage scenarios

XML usage scenarios

 Financial, medical, metadata, blogs, etc

Financial, medical, metadata, blogs, etc

 DTDs and the need for describing the

DTDs and the need for describing the “structure” of an XML file “structure” of an XML file

 Next: XML Schemas

Next: XML Schemas