3. Defining the document structure (DTD) • Declaration of application-specific names and structural constraints • A document is valid if it specifies a DTD, and if its contents conform to the DTD. • A validating parser does the checking; but: validation is not mandatory • Items not specified in the DTD are forbidden • A DTD does not specify: the root, precise number of element instances, data formats (everything is a string ; some restrictions on names ), semantics (meaning) • Alternative to DTD: XML Schema (see later) XML-3 J. Teuhola 2013 37
Example DTD: Course document <!ELEMENT course (cname, teacher, semester, audience)> <!ELEMENT cname (#PCDATA)> <!ELEMENT teacher (#PCDATA)> <!ELEMENT semester (#PCDATA)> <!ELEMENT audience (student*)> <!ELEMENT student (#PCDATA)> #PCDATA (parsed char data) may contain entity references like & but not tags. Note! the DTD syntax does not conform to the general XML syntax. XML-3 J. Teuhola 2013 38
Example: test documents Valid: Invalid: <course> <course> <cname>XML</cname> <cname>XML</cname> <teacher>JT</teacher> <teacher>JT</teacher> <student>NN</student> <semester> <extent>5 sp</extent> Spring 2013 </course> </semester> <audience> <student>NN</student> Errors: </audience> ’semester’ and ’audience’ <course> are missing; ’extent’ not defined in DTD XML-3 J. Teuhola 2013 39
Declaring the DTD • Position: in the document prolog (after XML declaration, before the root) • Alternatives: – External dtd file URI : <!DOCTYPE coursetype SYSTEM ”http://...”> – External public DTD ; unique and known application: <!DOCTYPE coursetype PUBLIC ”ref” ”backup”> where backup is used if ref is not found. – Internal ; useful in development phase: <!DOCTYPE coursetype [<!ELEMENT ... >]> – Both (compatible internal and external subsets): <!DOCTYPE coursetype SYSTEM ”http://...” [ ... ]> XML-3 J. Teuhola 2013 40
Declaring elements • <!ELEMENT name (content)> where the content can be: – #PCDATA (parsed character data) – child – sequence (comma-separated ordered list) – alternatives (’|’-separated list) • Repetition indicators (suffix symbol), applicable to elements and parentesis expressions: – ? = zero or one – * = zero or many – + = one or many XML-3 J. Teuhola 2013 41
Declaring elements (cont.) • Examples: <!ELEMENT audience (student*) <!ELEMENT day (sunday|monday|...)> <!ELEMENT semester (year,(spring|fall))> <!ELEMENT audience (#PCDATA|student)*> • Special cases: – Empty element: <!ELEMENT name EMPTY> allows elements <name /> <name></name> – Arbitrary contents: <!ELEMENT name ANY> XML-3 J. Teuhola 2013 42
Declaring attributes • All possible attributes must be declared for each element type. • Syntax: <!ATTLIST element attname 1 type 1 default 1 attname 2 type 2 default 2 ... > • Example: <!ATTLIST course name CDATA #REQUIRED dept CDATA ”CS-IT”> • Attributes of one element may also be declared one by one in separate ATTLIST statements. XML-3 J. Teuhola 2013 43
Attribute types CDATA Character string where < and & must be escaped by < and & (possibly also " and '). Numeric data is also CDATA. Name token; like XML name but may NMTOKEN start with a number / punctuation NMTOKENS Whitespace-separated list of name tokens in parentheses Enumeration ’|’-separated list of alternative names following the XML name restrictions XML-3 J. Teuhola 2013 44
Attribute types (cont.) • ID XML name which is unique among ID- attributes in the document. Only one ID attribute per element is allowed. ID value must be a valid XML name (plain number is not!). • IDREF XML name referring to an ID attribute. This enables relationships between elements (cf. foreign keys of relations; but: referential integrity not checked). Needed for M:M relationships. • IDREFS Whitespace-separated list of ID references. XML-3 J. Teuhola 2013 45
Attribute types (cont.) • ENTITY Name of an (unparsed) entity, defined elsewhere in the DTD. • ENTITIES Whitespace-separated list of entity names • NOTATION ’|’-separated list (in parentheses) of alternative NOTATION declarations in DTD A NOTATION is more flexible than enumeration because notations are not restricted to XML naming rules. Declaring a notation, e.g. <!NOTATION gif SYSTEM “image/gif”> <!NOTATION tiff SYSTEM “image/tiff”> … <!ATTLIST image type NOTATION (gif | tiff) #REQUIRED> XML-3 J. Teuhola 2013 46
Attribute defaults Alternatives: #REQUIRED Compulsory, no default value #IMPLIED Attribute value may be omitted; no default #FIXED Always the same value; may be omitted Literal Quoted default value XML-3 J. Teuhola 2013 47
Declaring entities • Entity is a name with a related replacement text • Predefined: < & > " ' • Example: <!ENTITY domain ”it.utu.fi”> • Reference: &domain; • Replacement may contain well-formed markup: <!ENTITY address ”<addr> <street>Joukahaisenkatu 3-5</street> <zip>20014</zip> <city>Turku</city> </addr>”> • Replacement may contain entity references (but not loops). XML-3 J. Teuhola 2013 48
External entities • Parsed external entity: – Replacement in a file, e.g. <!ENTITY addr SYSTEM ”/folder/addr.xml”> – Not allowed in attribute values – After replacement the result must be well-formed – An external entity must not have a prolog (e.g. DTD) • Unparsed external entity: – Any data, e.g. digital image: <!ENTITY people SYSTEM ”pic.jpg” NDATA jpeg> – NDATA refers to (application-specific) notation: <!NOTATION jpeg SYSTEM ”image/jpeg”> – Usage as attribute value: <!ATTLIST course photo ENTITY #REQUIRED> – Instance: <course photo=”people”> XML-3 J. Teuhola 2013 49
Parameter entities • Used to name a repeating segment in the DTD • Syntax: <!ENTITY % name ”replacement”> • Reference (to be replaced): %name; • Example: <!ENTITY % employee ”name, dept, bdate”> <!ELEMENT professor (%employee;)> <!ELEMENT lecturer (%employee;)> <!ELEMENT assistant (%employee;)> • Usually appears in external DTDs, but can be redefined in an internal DTD (if both exist); replacement can itself be external: <!ENTITY % name SYSTEM ”http://...”> XML-3 J. Teuhola 2013 50
Example DTD (in file ’letters.dtd’) <!ELEMENT letters (letter+)> <!ELEMENT letter (topic*, text)> <!ATTLIST letter num ID #REQUIRED from CDATA #FIXED ”John Smith, IBM” to CDATA #REQUIRED date CDATA #REQUIRED secret (yes | no) ”no”> <!ELEMENT topic EMPTY> <!ATTLIST topic title CDATA #IMPLIED> <!ELEMENT text ANY> <!ENTITY signature ”Cheers, John”> XML-3 J. Teuhola 2013 51
Example: valid document <?xml version=”1.0” standalone=”no”?> <!DOCTYPE letters SYSTEM ”letters.dtd”> <letters> <letter num=”A123” to=”Bill” date=”20.09” secret=”yes”> <topic title=”Howdy” /> <topic title=”What’s cooking?” /> <text>Thanks for the party. &signature;</text> </letter> <letter num=”A124” to=”Jim” date=”21.09”> <topic title=”Hi” /> <text>See you again. &signature;</text> </letter> </letters> XML-3 J. Teuhola 2013 52
Problems with DTD • Does not itself use XML syntax; needs a different parser/editor/processor • No constraints on character data (e.g. no format, no regular expressions) • No strict data types (e.g. integer, float, boolean) • Restricting the number of repetitions is difficult • Namespaces are not interpreted; prefixes are just part of the names. • Definitions cannot depend on the context (DTD allows ”too much”) XML-3 J. Teuhola 2013 53
Problems with DTD (cont.) • Uniqueness scope of IDs cannot be restricted. • Referential integrity of IDREFS is not specified. • Limited modularity (using ENTITY-definitions); another way to build from pieces: XInclude . • No defaults for elements (only for attributes) • No wildcards for elements/attributes (only ANY content possible for elements). Some of these problems were solved in the XML schema language (see later). XML-3 J. Teuhola 2013 54
Recommend
More recommend