��� Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology �
Part 1: XML and DTD �
������� � SGML (Standard Generalized Markup Language) � ISO Standard, 1986, for data storage & exchange � Meta-language for defining languages � A famous SGML language: HTML!! � Separation of content and display � SGML reference is 600 pages long � XML (eXtensible Markup Language) � W3C (World Wide Web Consortium) -- http://www.w3.org/XML/) recommendation in 1998 � Simple subset (80/20 rule) of SGML � XML specification is 26 pages long �
��� � e X tensible M arkup L anguage � Metalanguage - used to create other languages � Has become a universal data-exchange format �
���������������������� ���������������� <bibliography> ����������������������� <paper ID= "object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> ��������������� ��������������� ��������������� ��������������� ������������ ������������ ������������ ������������ </bibliography> �
����������������� � Human-readable � Machine-readable (easy to parse) � Standard format for data interchange � Possible to validate � Extensible � can represent any data � can add new tags for new data formats � Hierarchical structure (nesting) �
� �!����������"����#������ Element element name Content <bibliography> <paper ID="object-fusion"> element <authors> <author>Y.Papakonstantinou</author> Empty <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> Element </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> Character content </bibliography> �
� �!���������%���� Attribute name <bibliography> Attribute Value <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> $
'�����(�� �!��� � A tag is a name, enclosed by angle brackets, with optional attributes � <foo id=“123”> � An element is a tree, containing an open tag, contents, and a close tag � <foo id=“123”>This is an element</foo> &
����������������������� � A basic XML document is an XML element � Example: <books> <book isbn=“123”> <title> Second Chance </title> <author> Matthew Dunn </author> </book> </books> �)
���������������� ������� <BOOKS> <book id=“123” BOOKS loc=“library”> book article <author>Hull</author> loc=“library” <title>California</title> ref 123 555 <year> 1995 </year> </book> author year <article id=“555” author title ref=“123”> title <author>Su</author> <title> Purdue</title> Su Hull 1995 Purdue </article> California </BOOKS> ��
����*����+ � Tags properly nested � Tag names case-sensitive � All tags must be closed � or self-closing � <foo/> is the same as <foo></foo> � Attributes enclosed in quotes � Document consists of a single (root) element ��
,� -.��!�����(�/� �� � Well-Formed: � Structure follows XML syntax rules � Valid: � Structure conforms to a DTD ��
�'���/�(����0 •HTML confuses presentation with content •No Explicit Structure, Semantics <DT> Author <IMG SRC= "greenball.gif" > <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT> Title Conference ��
����������� XML HTML � Extensible set of tags � Fixed set of tags � Content orientated � Presentation oriented � Standard Data � No data validation infrastructure capabilities � Allows multiple output � Single presentation forms ��
�������*���1���������*�!����1�� � XML Document Type Definitions (DTDs): � XML Schema � defines structure and data types � allows developers to build their own libraries of interchanged data types ��
������������������������������� � An XML document may have an optional DTD. � A grammar for XML documents � Defines � which elements can contain which other elements � which attributes are allowed/required/permitted on which elements ��
2'2�����2�����+1"���� � Both sides must agree on DTD � DTD can be part of document or stored separately �$
��������� �� � Consider an XML document: <db><person><name>Alan</name> <age>42</age> <email>agb@usa.net </email> </person> <person>………</person> ………. </db> �&
��������� �� � DTD for it might be: <!DOCTYPE db [ <!ELEMENT db (person*)> <!ELEMENT person (name, age, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT email (#PCDATA)> ]> �)
��������� �� Occurrence Indicator: Indicator Occurrence (no indicator) Required One and only one ? Optional None or one * Optional, None, one, or repeatable more + Required, One or more repeatable ��
� �!����2�1 �������� Authors followed by optional fullpaper, Sequence of 0 or followed by title, more paper followed by booktitle <!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> Sequence of 1 or <!element author (#PCDATA)> more author Character content ��
����*1"�!�3��+�!� � �������������������������������������������������������� ��������������������� <type name="Order" > <element name="name" type="string" /> <element name="street" type="string" /> <element name="zip" type="integer" /> <...> <attribute name="orderDate" type="date" /> </type> ��
����*1"�!�3��+�!� � ������������������������������������������������������ <type name="personName"> <element name="title" minOccurs="0"/> <element name="forename" minOccurs="0" maxOccurs="*"/> <element name="surname"/> </type> <type name="extendedName" source="personName" derivedBy="extension"> <element name="generation" minOccurs="0"/> </type> <type name="simpleName" source="personName" derivedBy="restriction"> <restrictions> <element name="title" maxOccurs="0"/> <element name="forename" minOccurs="1" maxOccurs="1"/> </restrictions> </type> ��
Part 2: XSL: XML Transformation ��
�*� � The eXtensible Style Language � Transforms XML into HTML � Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML ��
�*����1"���1���� XSL Stylesheet XML XSL HTML Source Processor Output ��
Recommend
More recommend