DTD and XML Schema
XML • Extensible Markup Language – A standard adopted in 1998 by the W3C (World Wide Web Consortium) • Optional mechanisms for specifying document structure – DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML • Query languages for XML – XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language CMPT 354: Database I -- DTD and XML Schema 2
Example Mandatory statement Root element XML element Element name Element content CMPT 354: Database I -- DTD and XML Schema 3
Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Name: Joe Public Id: 111111111 Id: 666666666 Address Address Number: 123 Number: 666 Street: Main St Street: Hollow Rd CMPT 354: Database I -- DTD and XML Schema 4
Document Type Definitions • A set of rules for structuring an XML document – Specified as part of the document itself, or – Give a URL where its DTD can be found – A document that conforms to its DTD is said valid • XML does not require a document has a DTD, but it must be well formed • A grammar that specifies a legal XML document, based on the tags used in the document and their attributes CMPT 354: Database I -- DTD and XML Schema 5
Example – DTD <!DOCTYPE PersonList[ <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> <!ELEMENT Contents (Person*)> <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Id (#PCDATA)> <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> <!ATTLIST Title Value CDATA #REQUIRED> ]> CMPT 354: Database I -- DTD and XML Schema 6
DTD Components • Name (e.g., PersonList) – Must coincide with the tag <!DOCTYPE PersonList[ name of the root element of <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> the document <!ELEMENT Contents (Person*)> • One ELEMENT statement <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> for each allowed tag, <!ELEMENT Id (#PCDATA)> including the root tag <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> • For each tag that can have <!ELEMENT Street (#PCDATA)> attributes, the ATTLIST <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> statement specifies the <!ATTLIST Title Value CDATA #REQUIRED> allowed attributes and their ]> types CMPT 354: Database I -- DTD and XML Schema 7
Specification • *: a subelement can appear zero or more times – +: a subelement can appear at least one time • #PCDATA (parsed character data), CDATA: character strings • #IMPLIED: an attribute is optional • ?: a subelement is optional – <!ELEMENT Person (Name, Id, Address?)> • |: alternatives of subelements – <!ELEMENT Name ((First, Last)|(Last, First))> CMPT 354: Database I -- DTD and XML Schema 8
Types for Attributes • CDATA: character strings • ID: unique values • IDREF: referential • IDREFS: list of IDREF CMPT 354: Database I -- DTD and XML Schema 9
DTD as Data Definition Language • There are some limitations • Namespaces are not in native design • DTD syntax is quite different from XML • Very limited set of basic types • Limited ways to specify data consistency constraints – No keys, weak referential integrity, no type references • No referential integrity for elements • Ordered elements • Global definition of elements CMPT 354: Database I -- DTD and XML Schema 10
Why XML Schema? • Use the same syntax as that used for ordinary XML documents – An alternative to DTD • Integrated with the namespace mechanism – Different schemas can be imported from different namespaces and integrated into one schema • Provide a number of built-in types similar to SQL, e.g., string, integer, and time • Define complex types from simpler ones • The same element name can be defined as different types depending on where the element is nested • Support keys and referential integrity constraints • Easy to specify documents where elements are unordered CMPT 354: Database I -- DTD and XML Schema 11
Schema and Instance • Goal: describing XML schema using XML • An XML document D that conforms to a given schema (which is another XML document) is said to be schema valid – D is called an instance of the schema CMPT 354: Database I -- DTD and XML Schema 12
XML Schema and Namespaces • An XML schema document begins with a declaration of the namespaces to be used • http://www.w3.org/2001/XMLSchema – the namespace identifying the names of tags and attributes used in a schema (not in the instances) – Describe the structural properties of documents in general, e.g., schema, attribute, element, … • http://www.w3.org/2001/XMLSchema-instance – another namespace used in conjunction with the above one – Identify a small number of special names that are defined in the XML Schema Specification and are used in the instance documents, e.g., schemaLocation • The target namespace – identifies the set of names defined by a particular schema document to be used in the instances CMPT 354: Database I -- DTD and XML Schema 13
Schema and An Instance Document CMPT 354: Database I -- DTD and XML Schema 14
Report Document CMPT 354: Database I -- DTD and XML Schema 15
Primitive Types • DTD has very limited primitive types – CDATA, ID, IDREF, IDREFS • Many useful primitive types in XML Schema – Decimal, integer, float, Boolean, date, … • Derive new primitive types from the basic ones – The mechanism is similar to the CREATE DOMAIN statement in SQL CMPT 354: Database I -- DTD and XML Schema 16
Deriving Simple Types • IDREFS is not one of the primitive types <simpleType name=“myIdrefs”> <list itemType=“IDREF”/> </simpleType> • Union of multiple types Suppose local phone numbers are 7 digits long and long distance numbers are 10 digits long <simpleType name=“phoneNumber”> <union memberTypes=“phone7digits phone10digits”/> </simpleType> CMPT 354: Database I -- DTD and XML Schema 17
Deriving Simple Types by Restriction • Constrain a basic type using one or more constraints from a fixed repertoire defined by the XML Schema specification <simpleType name=“phone7digits”> <restriction base=“integer”> <minInclusive value=“1000000”/> <maxInclusive value=“9999999”/> </restriction> </simpleType> CMPT 354: Database I -- DTD and XML Schema 18
More Examples • Phone numbers in XXX-YYYY format <simpleType name=“phone7digitsAndDash”> <restriction base=“string”> <pattern value=“[0-9]{3}-[0-9]{4}”/> </restriction> </simpleType> • More restrictions on basic string type – <length value=“7”/> – strings of length 7 – <minLength value=“7”/> – strings of length >= 7 – <maxLength value=“14”/> – strings of length <=14 CMPT 354: Database I -- DTD and XML Schema 19
Enumeration • Restrict the domain to a finite set • Can be applied to any base type <simpleType name=“emergencyNumbers”> <restriction base=“integer”> <enumeration value=“911”/> <enumeration value=“333”/> <enumeration value=“5431234”/> </restriction> </simpleType> CMPT 354: Database I -- DTD and XML Schema 20
More Examples on Simple Types CMPT 354: Database I -- DTD and XML Schema 21
Complex Types CMPT 354: Database I -- DTD and XML Schema 22
Basics of Complex Types • Tag complexType • Tag sequence: a list of elements that must occur in the given order • Using minOccurs and maxOccurs • Associating attributes with type • A complex type can be associated with an element <element name=“Student” type=“adm:studentType”/> CMPT 354: Database I -- DTD and XML Schema 23
Element without Content • Just associate attributes with types • Example <complexType name=“courseTakenType”> <attribute name=“CrsCode” type=“adm:courseRef”/> <attribute name=“Semester” type=“string”/> </complexType> CMPT 354: Database I -- DTD and XML Schema 24
Compositors • Tags describing how elements can be combined into groups, e.g., sequence – Required when a tag has complex content – Required even if the type has only one child element! • Compositor all: allow elements appear in any order <complexType name=“addressType”> <all> <element name=“StreetName” type=“string”/> <element name=“StreetNumber” type=“string”/> <element name=“city” type=“string”/> </all> </complexType> CMPT 354: Database I -- DTD and XML Schema 25
Recommend
More recommend