SSC1: XML Volker Sorge Overview XML Format Document Structure XML Components Software Systems Components 1 Tree Parsing XML Parsing Documents XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker Sorge Verifying Documents Stream Parsing XML as a Data http://www.cs.bham.ac.uk/~vxs/teaching/ssc1 Exchange Format
SSC1: XML Topic Overview Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML ◮ The XML format: Document structure and its Parsing Documents Walking the Tree interpretation. Modifying the Tree Generating XML ◮ Tree Parsing XML: JDom and walking trees. Documents Creating Documents ◮ Validating and generating XML Verifying Documents Stream Parsing ◮ Stream Parsing XML: SAX/StAX XML as a Data Exchange Format ◮ Using XML to format data
SSC1: XML What is XML Volker Sorge Overview XML Format ◮ XML = eXtensible Markup Language Document Structure XML Components ◮ Markup languages are structured representations of text Tree Parsing XML (or data) Parsing Documents Walking the Tree Modifying the Tree ◮ they contain text, plus information about the structure Generating XML of that text Documents Creating Documents ◮ XML is a descendant from SGML (Standard Verifying Documents Generalized Markup Language) that was developed in Stream Parsing XML as a Data the 70s to describe document structure Exchange Format ◮ Therefore XML files are called documents, regardless of their content. ◮ Related languages are, for example, HTML.
SSC1: XML Example Volker Sorge Header < ? xml version = "1.0" ? > < ! DOCTYPE Configuration SYSTEM "../../conf.dtd" > Overview Root Element XML Format < configuration > < title > Document Structure < font > XML Components < name > Helvetica < /name > Tree Parsing XML < size unit= "pt" > 36 < /size > Parsing Documents < /font > Walking the Tree < /title > Modifying the Tree Tags < body > < font > Generating XML < name > Times Roman < /name > Documents < size unit= "pt" > 12 < /size > Creating Documents < /font > Verifying Documents End Tags < /body > Stream Parsing < window > < width unit= "px" > 400 < /width > XML as a Data < height unit= "px" > 200 < /height > Exchange Format < /window > < menu > Elements < item > Times Roman < /item > < item > Helvetica < /item > < item > Goudy Old Style < /item > < /menu > < /configuration > Comment < ! −− The end −− >
SSC1: XML Structure of XML Documents Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML Parsing Documents ◮ XML documents are structured as trees. Walking the Tree Modifying the Tree ◮ The structure is given using tags that contain child Generating XML Documents elements. Creating Documents Verifying Documents ◮ The single root of the tree is given by the Root Element. Stream Parsing ◮ Leafs consist of plain text enclosed by tags. XML as a Data Exchange Format
SSC1: XML Example (revisited) Volker Sorge < ? xml version = "1.0" ? > < ! DOCTYPE Configuration SYSTEM "../../conf.dtd" > Overview XML Format < configuration > < title > Document Structure < font > XML Components < name > Helvetica < /name > Tree Parsing XML < size unit= "pt" > 36 < /size > Parsing Documents < /font > Walking the Tree < /title > Modifying the Tree < body > < font > Generating XML < name > Times Roman < /name > Documents < size unit= "pt" > 12 < /size > Creating Documents < /font > Verifying Documents < /body > Stream Parsing < window > < width unit= "px" > 400 < /width > XML as a Data < height unit= "px" > 200 < /height > Exchange Format < /window > < menu > < item > Times Roman < /item > < item > Helvetica < /item > < item > Goudy Old Style < /item > < /menu > < /configuration > < ! −− The end −− >
SSC1: XML Structure of Example Documents Volker Sorge Overview XML Format Document Structure Configuration XML Components Tree Parsing XML Parsing Documents Walking the Tree Modifying the Tree Body Title Window Menu Generating XML Documents Creating Documents Verifying Documents Item Width Height Font Font Stream Parsing Goudy. . . 400 200 XML as a Data Item Exchange Format Helvetica Name Size Name Size Item Helvetica 36 Times Roman 12 Times Roman
SSC1: XML XML Main Components Volker Sorge Overview XML Format The main components of an XML document are elements . Document Structure XML Components ◮ They are enclosed by an open and a closing tag. (Tags Tree Parsing XML can be viewed as “named brackets”) Parsing Documents Walking the Tree ◮ They can contain ordinary text. Modifying the Tree Generating XML ◮ Elements can have in turn child elements. Documents Creating Documents ◮ They can have additional attribute assignments. Verifying Documents Stream Parsing < font > XML as a Data < name > Helvetica < / name > Exchange Format < size unit = "pt" > 36 < / size > < / font > is one element font containing two child elements name and size . The latter does contain one attributer unit = "pt" .
SSC1: XML Mixed Content Volker Sorge Overview XML Format ◮ It is legal that elements contain both text and child Document Structure XML Components elements. Tree Parsing XML Parsing Documents ◮ This is called mixed content. Walking the Tree Modifying the Tree ◮ However mixed content should be avoided as it Generating XML Documents ◮ Obscures the structure of the document. Creating Documents ◮ Makes parsing the document harder. Verifying Documents Stream Parsing ◮ Thus try not to have for example: XML as a Data Exchange Format < font > Helvetica < size > 36 < \ size > < \ font >
SSC1: XML Elements vs. Attributes Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML ◮ When designing an XML document, you often have to Parsing Documents Walking the Tree Modifying the Tree decide between using elements or attributes to represent Generating XML information. Documents Creating Documents ◮ General rule: use elements! Verifying Documents Stream Parsing ◮ Attributes should be used sparsely. XML as a Data ◮ Try to only use attributes for names used as identifiers. Exchange Format
SSC1: XML Other XML Components Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML Processing Instructions are delimited by < ? and ? > Parsing Documents Example: The header information < ? xml version = "1.0" ? > Walking the Tree Modifying the Tree ◮ They contain information for whatever program Generating XML Documents processes the document. Creating Documents Verifying Documents ◮ The only on you need to know is the header above. Stream Parsing ◮ You may also see xml-stylesheet, or php. XML as a Data ◮ Strictly speaking, it is optional, but you should Exchange Format always include it.
SSC1: XML Other XML Components Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML Parsing Documents Walking the Tree Modifying the Tree Comments Comments delimited by < ! −− and −− > Generating XML Example: < ! −− The end −− > Documents Creating Documents ◮ Cannot contain ‘ −− ’. Verifying Documents ◮ Don’t hide commands in comments! Stream Parsing XML as a Data Exchange Format
SSC1: XML Other XML Components Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML Character References Denote unicode characters by decimal Parsing Documents Walking the Tree or hex-code, e.g., &# x40 ; Modifying the Tree Generating XML Entity References Denote special characters name, e.g., & gt ; Documents Creating Documents DTD Document type definition, which offers a mechanism Verifying Documents Stream Parsing for validation. Example: XML as a Data < ! DOCTYPE Configuration SYSTEM "../../conf.dtd" > Exchange Format We will see some details on that later.
SSC1: XML XML vs HTML Volker Sorge Overview XML Format Document Structure XML Components Tree Parsing XML Although XML and HTML are closely related, there are Parsing Documents Walking the Tree some significant differences: Modifying the Tree Generating XML ◮ XML is case sensitive, HTML is not. Documents Creating Documents ◮ HTML can have attributes without values. In XML Verifying Documents Stream Parsing each attribute has to have a value. XML as a Data ◮ HTML is forgiving, XML is not! Exchange Format
Recommend
More recommend