COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1
Manipulation of XML documents • there are various standards, tools, APIs, data models for XML: – validate – parse – query – transform • into other XML documents • into other formats, e.g., html, excel, relational tables • we continue with XPath.. – navigating and querying through XML documents – used in XQuery and in XSLT 2
Manipulation of XML documents • XPath for navigating and querying through XML documents • XQuery – more expressive than XPath, uses XPath – for querying and data manipulation – Turing complete – designed to access large amounts of data, to interface with relational systems • XSLT – similar to XQuery in that it uses XPath, .... – designed for “styling”, together with XSL-FO or CSS • DOM and SAX – a collection of APIs for programmatic manipulation – includes data model and parser – to build your own applications 3
XPath • designed to navigate to/select parts in a well-formed XML document • no transformational capabilities (as in XQuery and XSLT) • is a W3C standard: – XPath 1.0 is a 1999 W3C standard – XPath 2.0 is a 2007 W3C standard that extends/is a superset of XPath 1.0 • richer set of WXS datatypes and support ➡ type information from WXS validation Difference list - set? – see http://www.w3.org/TR/xpath20 • allows to select/define parts of an XML document: lists of nodes • uses path expressions – to navigate in XML documents – to select node-lists in an XML document • you have worked with path expressions in your 1st assignment: like the expressions in a traditional computer file system • provides numerous built-in functions – e.g., for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc. 4
XPath: Datamodel • remember how an XML document can be seen as a node-labelled tree – with element names as labels • XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax - but not on DOM tree ! XPath uses XQuery/XPath Datamodel • • there is a translation at http://www.w3.org/TR/xpath20/#datamodel – see XPath process model... 5
6
Content models and types in DTD and WXS • in DTDs, we don’t really have types, only element names • in WXS, we have a type hierarchy – an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y – we call this ‘named’ typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure), e.g., • Age and YoungAge <xs:simpleType name="Age"> are subtypes of integer, <xs:restriction base="xs:integer"> • but YoungAge is not a <xs:minInclusive value="0"/> <xs:maxInclusive value="130"/> subtype of Age </xs:restriction></xs:simpleType> • however, ProperYoungAge is a subtype of Age <xs:simpleType name="YoungAge"> <xs:restriction base="xs:integer"> <xs:simpleType name="ProperYoungAge"> <xs:minInclusive value="0"/> <xs:restriction base="Age"> <xs:maxInclusive value="19"/> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType> 7
Types in WXS • how do we determine a type of an element w.r.t. a WXS schema? 1. determine the type hierarchy, i.e., all types and where they are derived from an element of a type X derived by • if Y1, ..., Yk are all subtypes of X, then restriction or extension from Y can be used in place of an element of type Y e(X) := e(X) ∪ e(Y1) ∪ ... ∪ e(Yk) for e(T) the extension of type T, i.e., its instances 2. for each element in document, find its type (and supertypes) • difficult, e.g., if <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> 8
Content models and types in DTD and WXS • In order to prevent difficulties in WXS as caused by <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> WXS’s Element Declarations Consistent constraint is imposed (and also on the schema at top level): If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true: 1 all their {type definition}s must have a non-absent {name}. 2 all their {type definition}s must have the same {name}. 3 all their {type definition}s must have the same {target namespace}. 9
Determining types in DTD and WXS • [DTD] element name = type of that element • [WXS] as a consequence of the Element Declarations Consistent constraint, we can determine all element’s types in a top down manner (and this is done during validation and recorded in PSVI): – start with n = root element node – from element name e of n , determine type t of n (if n is root node, since schema cannot contain two global components with the same name, this is possible otherwise EDC constraint ensures this) 1. in schema, find model group G for t and – for each element child node n’ of e with name e’ , determine in G type t’ of e’ and recurse into (1.) 10
XPath: Datamodel • the XPath DM uses the following concepts • nodes : • atomic value: – element • behave like nodes without children or parents – attribute • is a value in the value space of a WXS atomic type, – text e.g., xsd:string – namespace • item: atomic values or nodes – processing-instruction – comment <?xml version="1.0" encoding="ISO-8859-1"?> – document (root) <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> attribute node </book> </bookstore> element node text node 11
XPath Data Model <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <?xml-stylesheet href="screen.css" type="text/css" media="screen"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.example.org/"> vlib.example.org</a>.</p> </body> </html> From: http://xformsinstitute.com/essentials/browse/ch03s02.php 12
Document nodeType = DOCUMENT_NODE nodeName = #document nodeValue = (null) Comparison XPath DM and DOM datamodel Element nodeType = ELEMENT_NODE nodeName = mytext nodeValue = (null) firstchild lastchild attributes • XPath DM and DOM DM are similar, but different – most importantly regarding names and values of nodes but also structurally (see ★ ) – in XPath, only attributes, elements, processing instructions, and namespace nodes have names, of form (local part, namespace URI) – whereas DOM uses pseudo-names like #document, #comment, #text – In XPath, the value of an element or root node is the concatenation of the values of all its text node descendants , not null as it is in DOM: • e.g, XPath value of <a>A<b>B</b></a> is “AB” ★ XPath does not have separate nodes for CDATA sections (they are merged with their surrounding text) <N>here is some text and <![CDATA[some CDATA < >]]> – XPath has no representation </N> 13
XPath: core terms -- relation between nodes • (since we view XML documents as trees) each node has at most one parent – each node but the root node has exactly one parent – the root node has no parent • each node has zero or more children • ancestor is the transitive closure of parent, i.e., a node’s parent, its parent, its parent, ... • descendant is the transitive closure of child, i.e., a node’s children, their children, their children, ... • when evaluating an XPath expression p , we assume that we know – which document and – which context we are evaluating p over – … we see later how they are chosen/given • an XPath expression evaluates to an item sequence , – an item is either a node (doc., element, attribute,...) or an atomic value – document order is preserved among items 14
Recommend
More recommend