Objectives Objectives � The purpose of using schemas An Introduction to XML and Web Technologies An Introduction to XML and Web Technologies � The schema languages DTD and XML Schema (and DSD2 and RELAX NG ) Schema Languages Schema Languages � Regular expressions – a commonly used formalism in schema languages Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley An Introduction to XML and Web Technologies 2 Motivation XML Languages Motivation XML Languages � We have designed our Recipe Markup Language � XML language : a set of XML documents with some semantics � ...but so far only informally described its syntax � schema : � How can we make tools that check that a formal definition of the syntax of an XML language an XML document is a syntactically correct Recipe Markup Language document (and thus � schema language : meaningful)? a notation for writing schemas � Implementing a specialized validation tool for Recipe Markup Language is not the solution... An Introduction to XML and Web Technologies 3 An Introduction to XML and Web Technologies 4 1
Validation Why use Schemas? Validation Why use Schemas? instance � Formal but human-readable descriptions document schema � Data validation can be performed with existing schema processors schema processor invalid valid normalized error instance message document An Introduction to XML and Web Technologies 5 An Introduction to XML and Web Technologies 6 General Requirements Regular Expressions General Requirements Regular Expressions � Commonly used in schema languages to describe sequences of characters or elements � Expressiveness � Σ : an alphabet (typically Unicode characters or element names) � Efficiency � σ∈Σ matches the string σ � α ? matches zero or one α � Comprehensibility � α * matches zero or more α ’s � α + matches one or more α ’s � α β matches any concatenation of an α and a β � α | β matches the union of α and β An Introduction to XML and Web Technologies 7 An Introduction to XML and Web Technologies 8 2
Examples DTD – – Document Type Definition Document Type Definition Examples DTD � A regular expression describing integers : � Defined as a subset of the DTD formalism from SGML 0|-?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)* � Specified as an integral part of XML 1.0 � A regular expression describing the valid contents of able elements in XHTML: table � A starting point for development of more expressive schema languages caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ ) � Considers elements, attributes, and character data – processing instructions and comments are mostly ignored An Introduction to XML and Web Technologies 9 An Introduction to XML and Web Technologies 10 Document Type Declarations Element Declarations Document Type Declarations Element Declarations � Associates a DTD schema with the instance document <!ELEMENT element-name content-model > � <?xml version="1.1"?> Content models: <!DO !DOCT CTYPE YPE co coll llect ection ion S SYST YSTEM EM "h "http ttp:// ://ww www.b w.bric rics. s.dk/ dk/ixw ixwt/ t/rec recipe ipes. s.dtd dtd"> "> <collection> � EMPTY MPTY ... </collection> � AN ANY � mixed content : (#PCDATA| e 1 | e 2 | ... | e n )* � <!DOCTYPE html PUBLI BLIC " C "-//W3 /W3C// C//DT DTD X D XHTM HTML L 1.0 1.0 Tr Tran ansit sition ional al//E //EN” N” � element content : regular expression over element names "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> (concatenation is written with “ , ”) � <!DOCTYPE collection [ .. ... ] . ]> Example: <!ELEMENT table (caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) > An Introduction to XML and Web Technologies 11 An Introduction to XML and Web Technologies 12 3
Attribute- -List Declarations List Declarations Attribute Types Attribute Attribute Types � CDATA : any value <!ATTLIST element-name attribute-definitions > � enumeration : ( s 1 | s 2 | ... | s n ) � ID : must have unique value Each attribute definition consists of � IDREF (/ IDREFS ): must match some ID attribute(s) � an attribute name � ... � an attribute type Examples: � a default declaration <!ATTLIST p align (left|center|right|justify) #IMPLIED> <!ATTLIST recipe id ID #IMPLIED> <!ATTLIST related ref IDREF #IMPLIED> Example: <!ATTLIST input maxlength CDATA #IMPLIED tabindex CDATA #IMPLIED> An Introduction to XML and Web Technologies 13 An Introduction to XML and Web Technologies 14 Attribute Default Declarations Entity Declarations (1/3) Attribute Default Declarations Entity Declarations (1/3) � #REQUIRED � Internal entity declarations – a simple macro mechanism � #IMPLIED (= optional) � ” value ” (= optional, but default provided) Example: � #FIXED ” value ” (= required, must have this value) • Schema: Examples: <!ENT <!E NTITY ITY c copyri yright ghtno notic tice "Copy opyrig right ht &#  169; 9; 20 2005 05 Wi Widge dgets ts'R 'R'Us 'Us."> ."> <!ATTLIST form • Input: action CDATA #REQUIRED A gadget has a medium size head and a big gizmo subwidget. onsubmit CDATA #IMPLIED &co © pyrig right htno notic tice; e; method (get|post) "get" enctype CDATA "application/x-www-form-urlencoded" > • Output: A gadget has a medium size head and a big gizmo subwidget. <!ATTLIST html Cop Copyr yrigh ight t &# © 169; 2 ; 2005 W Widgets' ts'R'U R'Us. s. xmlns CDATA #FIXED "http://www.w3.org/1999/xhtml"> An Introduction to XML and Web Technologies 15 An Introduction to XML and Web Technologies 16 4
Entity Declarations (2/3) Entity Declarations (3/3) Entity Declarations (2/3) Entity Declarations (3/3) � Internal parameter entity declarations – apply � External parsed entity declarations – references to XML data in other files to the DTD, not the instance document Example: • <!ENTITY widgets <!ENTITY widgets Example: SYSTEM "http://www.brics.dk/ixwt/widgets.xml"> SYSTEM "http://www.brics.dk/ixwt/widgets.xml" not widely used! • Schema: � External unparsed entity declarations – <!E <!ENT NTITY ITY % Shap hape " e "(rect ect|c |cir ircle cle|po |poly ly|de |defa faul ult)" t)"> references to non- XML data • <!A <!ATT TTLIS LIST ar T area sha ea shape pe %S %Sha hape pe; "rec rect" t"> Example: corresponds to • <!ENTITY widget-image SYSTEM "http://www.brics.dk/ixwt/widget.gif” SYSTEM "http://www.brics.dk/ixwt/widget.gif” <!A <!ATT TTLIS LIST ar T area sha ea shape pe (r (rec ect| t|cir circle cle|p |poly oly|d |def efaul ault) t) "r "rect ect"> "> NDATA gif NDATA gif > • <!NOTATION gif <!NOTATION gif SYSTEM "http: SYSTEM "http://www.iana.org/assignments/media-types/image/gif"> //www.iana.org/assignments/media-types/image/gif"> • <!ATTLIST thing img ENTITY ENTITY #REQUIRED> An Introduction to XML and Web Technologies 17 An Introduction to XML and Web Technologies 18 Conditional Sections Checking Validity with DTD Conditional Sections Checking Validity with DTD � Allow parts of schemas to be enabled/disabled A DTD processor (also called a validating XML parser) by a switch � parses the input document (includes checking Example: well-formedness) • <![%person.simple; [ � checks the root element name <!ELEMENT person (firstname,lastname)> ]]> � for each element, checks its contents and <![%person.full; [ <!ELEMENT person (firstname,lastname,email+,phone?)> attributes <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> � checks uniqueness and referential constraints ]]> <!ELEMENT firstname (#PCDATA)> ( ID / IDREF ( S ) attributes) <!ELEMENT lastname (#PCDATA)> • <!ENTITY % person.simple "INCLUDE" > <!ENTITY % person.full "IGNORE" > An Introduction to XML and Web Technologies 19 An Introduction to XML and Web Technologies 20 5
Recommend
More recommend