219451: Web Services: Concepts, Design and Implementation Lecture 2 XML Basis XML, HTML and SGML : A big picture XML Overview DTD Parser • DOM Original slides by Dr.Benchaporn Limthammaporn (blt@cs.kmitnb.ac.th) 1 Edited by Dr.Monchai Sopitkamon (fengmcs@ku.ac.th) SGML , HTML and XML SGML (Standard Generalized Markup Language) is a technology for specifying structured document types. HTML (Hypertext Markup Language) is one type of SGML, is a data format designed specifically for the Web, and combines the features of a typical structured markup language (paragraphs, titles, lists) with hypertext linking features. XML (eXtensible Markup Language) is a markup language much like HTML, and was designed to describe data . SGML HTML XML 2
XML vs. HTML XML was designed to carry data XML is not a replacement for HTML XML and HTML were designed with different goals: • HTML was designed to display data and focus on how data looks • XML was designed to describe data and focus on what data is HTML is about displaying information, while XML is about describing information The best way to first understand XML is to contrast it with HTML. XML is Extensible: � HTML: restricted set of tags, e.g. < TABLE> , < H1> , < B> , etc. � XML: you can create your own tags � Example: Put a library catalog on the web. � HTML: You are stuck with regular HTML tags, e.g. H1, H3, etc. � XML: You can create your own set of tags: TITLE, AUTHOR, DATE, PUBLISHER, etc. 3 Book Catalog in HTML < HTML> < BODY> < H1> Harry Potter< / H1> HTML conveys the “look and feel” of < H2> J. K. Rowling< / H2> your page. < H3> 1999< / H3> < H3> Scholastic< / H3> As a human, it is < / BODY> easy to pick out < / HTML> the publisher. But, how would a computer pick out the publisher? Answer: XML 4
Book Catalog in XML < BOOK> < TI TLE> Harry Potter< / TI TLE> < AUTHOR> J. K. Rowling< / AUTHOR> < DATE> 1999< / DATE> < PUBLI SHER> Scholastic< / PUBLI SHER> < / BOOK> Look at the new tags! A Human and a computer can now easily extract the publisher data. 5 XML vs. HTML General Structure: � � Both have start tags and end tags. � Tag Sets: � HTML has a pre-defined set of tags � XML lets you create your own tags. General Purposes: � � HTML focuses on "look and feel” � XML focuses on the structure of the data. � XML is not meant to be a replacement for HTML. In fact, they are usually used together. 6
Creating XML Documents Basic Definitions � Tag: a piece of markup � Example: < P> , < H1> , < TABLE> , etc. • Tags can also contain attributes • Attributes contain additional information included as part of the tag, within the tag's angle brackets • Attribute name is followed by an equality sign and the attribute value Element: a start and an end tag � Example: < H1> Hello< / H1> • Data between the start tag and its matching end tag defines an element of the data Empty tag is used when it makes sense to have a tag that stands by itself and doesn’t enclose any content. � Create an empty tag by ending it with /> eg. < flag/> Comment: • < !-- This is a comment --> 7 Rule 1: Well-Formedness � XML is much stricter than HTML. � XML requires that documents be well-formed: � every start tag must have an end tag � all tags must be properly nested. � XML Code: � < P> This is a < B> sample< / B> paragraph.< / P> � Another HTML Example: � < b> < i> This text is bold and italic< / b> < / i> � This will render in a browser, but contains a nesting error. � XML Code (with proper nesting) � < b> < i> This text is bold and italic < / i> < / b> 8
Rule 2: XML is Case Sensitive � XML is Case Sensitive. � HTML is not. � The following is valid in HTML: � < H1> Hello World< / h1> � This will not work in XML. Would result in a well-formedness error: � H1 does not have a matching end H1 tag. 9 Rule 3: Attributes must be quoted. � In HTML you can get away with doing the following: � < FONT FACE= ARIAL SIZE= 2> � In XML, you must put quotes around all your attributes: � < BOOK ID= “894329”> Harry Potter< /BOOK> 10
Example 1: A Memo (memo.xml) < ?xml version= "1.0" encoding= "I SO8859-1" ?> < note> < to> 219451 Class< / to> < from> Monchai< / from> < heading> I ntroduction< / heading> < body> This is an XML document!< / body> < / note> 11 Example 2 : Address Book (addressbook.xml) < ?xml version= "1.0" encoding= "ISO8859-1" ?> < addressbook> < person> < name> Monchai Sopitkamon< /name> < department> Computer Engineering< /department> < telephone> 1432< /telephone> < e-mail> fengmcs@ku.ac.th< /e-mail> < /person> < person> < name> Yuen Poovorawan< /name> < department> Computer Engineering< /department> < telephone> 1405< /telephone> < e-mail> yuen@ku.ac.th< /e-mail> < /person> < /addressbook> 12
Element VS Attribute Example : Element (file note1.xml) < ?xml version= "1.0" encoding= "windows-874"?> < !– data is declared as elements --> < note> < date> 11/07/05< /date> < to> teacher< /to> < from> department< /from> < heading> please call back some student< /heading> < body> Please return a student’s call at 665-4521< /body> < /note> 13 Element VS Attribute Example : Attribute (file note2.xml) < ?xml version= "1.0" encoding= "windows-874"?> < !– data is declared as attributes --> < note date= “11/07/05” to= “teacher" from= “student” heading= “please call back some student" body= “Please return a student’s call at 665-4521"> < /note> There are no rules about when to use attributes, and when to use child elements. 14
Avoid using attributes? Some of the problems with using attributes are: • attributes cannot contain multiple values (child elements can) • attributes are not easily expandable (for future changes) • attributes cannot describe structures (child elements can) • attributes are more difficult to manipulate by program code • attribute values are not easy to test against a Document Type Definition (DTD) - which is used to define the legal elements of an XML document Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data. Don't end up like this (this is not how XML should be used): <note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note> 15 Prolog in XML Files XML file always starts with a prolog The minimal prolog contains a declaration that identifies the document as an XML document: < ?xml version= "1.0"?> The declaration may also contain additional information • version - version of the XML used in the data • encoding - Identifies the character set used • standalone - whether the document references an external entity or data type specification 16
XML 17 XML -Elements Elements, or tags, are most common form of markup. First element must be a root element, which can contain other (child)elements. XML document must have one root element (< STAFFLI ST> ). XML document may contain one or more child elements, which begins with start-tag (< STAFF> ) and ends with end-tag (< / STAFF> ). XML elements are case sensitive An element can be empty, in which case it can be abbreviated to < EMPTYELEMENT/ > . Elements must be properly nested. 18
XML - Attributes Attributes are name-value pairs (name= “value”) that contain descriptive information about an element. Attribute is placed inside start-tag after corresponding element name with the attribute value enclosed in quotes. < STAFF branchNo = “B005”> Could also have represented branch as element of STAFF. A given attribute may only occur once within a tag, while elements with same tag may be repeated. 19 XML and more.. HTML CSS XSL XML ADO Schema DTD Database Parser DOM SAX 20
XML and more.. CSS (Cascading Style Sheets) XSL (eXtensible Style Sheets) • Display xml document on browser DTD (Document Type Definitions) • Specifies the types of tags that can be included in the XML document XSD (XML Schema Definition) • More flexible than DTD DOM (Document Object Model) • Manipulate XML document elements SAX (Simple API for XML) • Manipulate XML document elements ADO (Active Data Objects) • Manipulate data from database or other data sources 21 XML Style Sheet Processing XML Style Sheet HTML Page Processor XSL Style Sheet 22
Recommend
More recommend