XML and XQuery 5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner XML and XQuery 20130422 Slide 1 of 46
The XML Context • XML was designed for data exchange, not for large, homogeneous databases. • Its primary usage is thus quite different from that of the relational model. • In terms of potential application, XML and the relational model are, for the most part, complementary rather than competitive frameworks. • Nevertheless, there are similarities. • In particular, the XML query language XQuery has many similarities with SQL. • In these slides, some basic ideas surrounding XML and XQuery are presented. • The goal it to provide a brief introduction, not a comprehensive presentation. XML and XQuery 20130422 Slide 2 of 46
The Components Surrounding XML Data: XML is the language for representing data. DML: There are many query languages for XML, among them: XPath: A language for expressing queries as operations on paths. XQuery: A more comprehensive query language, in many ways similar to SQL, containing XPath as a subset. XSLT: An imperative document translation language which may also be used to express queries. SQL/XML: Used for data exchange and storage between relational and XML databases. DDL: There is no true DDL for XML; any well-formed XML expression qualifies as data. Constraints: However, there are at least two languages for expressing constraints on XML data. DTD: An old language, inherited from SGML, with limited expressive power. XML Schema: A newer and very comprehensive language, but relatively complex. XML and XQuery 20130422 Slide 3 of 46
The Family Tree of XML • XML, at least originally, stood for e X tended M arkup L anguage. • XML is a descendant of SGML ( S tandard G eneralized M arkup L anguage). • In this sense, it is a cousin of HTML . • All of these languages are characterized by nested blocks with tag pairs of the form <foo> and </foo> . • In HTML, the tags and their semantics are fixed in the definition of the language. • In XML, on the other hand, even the tags themselves may be freely chosen, and are limited only by constraints expressed in an appropriate language such as DTD or XML Schema. XML and XQuery 20130422 Slide 4 of 46
A Simple Example • Here is a simple way (but not the only way) to represent the tuple (’Biology’,’Watson’,’90000’) of the department relation of the university schema. <department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > </ department > • It may be regarded as a simple tree representation. department dept name building budget Biology Watson 90000 XML and XQuery 20130422 Slide 5 of 46
Types of Vertices in the Tree Representation department <department dept_name="Biology"> <building >Watson </building > dept name building budget <budget >90000 </budget > </ department > Biology Watson 90000 • Each yellow box is a tag vertex. • Each green box is an attribute vertex, and dept name is called an attribute of the tag department with value Biology . • Each blue box is a text vertex. Warning: XPath uses a somewhat different and more complex classification (not covered in detail here). • For now, this one will suffice. XML and XQuery 20130422 Slide 6 of 46
Attributes versus Elements with Text Values Question: Why not represent dept name as the value enclosed in tags, as shown on the right, as opposed to an attribute of department ? <department > <department dept_name="Biology"> <dept_name >Biology </dept_name > <building >Watson </building > <building >Watson </building > <budget >90000 </budget > <budget >90000 </budget > </department > </ department > Answer: It is a design decision, and both work. • There is an advantage to attributes for representing key constraints in DTD. • More later on this. • It is also possible to have several attributes: <department dept_name="Biology" budget="90000"> <building >Watson </building > </ department > XML and XQuery 20130422 Slide 7 of 46
Order of Children • The order of tag children is significant. • Thus, the following two expressions are different. <department dept_name="Biology"> <department dept_name="Biology"> <building >Watson </building > <budget >90000 </budget > <budget >90000 </budget > <building >Watson </building > </department > </ department > • However, the order of attributes is not significant. • Thus, the following two expressions are equivalent. <department dept_name="Biology" <department budget="90000" budget="90000"> dept_name="Biology"> <building >Watson </building > <building >Watson </building > </department > </ department > • Hence, in the tree representation, the order of tag vertices matters, but the order of attribute vertices does not. XML and XQuery 20130422 Slide 8 of 46
Deeper Nesting and Multiple Occurrences • There is no limit to the depth of nesting, and tags may be repeated. • Here is part of an example from a nested representation of the university database. <department dept_name="Biology"> <building >Watson <building > <budget >90000 </budget > <instructor iid="76766"> <name >Crick </name > <course cid="BIO -301"> ... <salary >72000 </salary > </course > <teaches > ... <student > </teaches > <sid >98988 </sid > <teaches > ... <name >Tanaka </name > </teaches > <tot_cred >120 </tot_cred > <takes > </instructor > ... <course cid="BIO -101"> </takes > <title >Intro. to Biology </title > <takes > ... <credits >4</credits > </takes > <section > ... <advisor > </section > <iid >76766 </iid > <section > </advisor > ... </student > </section > </ department > </course > • The only requirement is that the tags be nested properly. XML and XQuery 20130422 Slide 9 of 46
Document Structure • An entire document must be a well-formed XML expression. • Thus, there must be an encompassing pair of begin-end block markers. <university_flat > <! -- The -- > university database of the textbook in XML <department dept_name="Biology"> ... </ department > ... <department dept_name=" Basketweaving "> ... </department > <instructor > ... </ instructor > ... <instructor > ... </ instructor > ... </ university_flat > • Note that linebreaks are just whitespace. • Layout is as freeform as HTML. • Comments are represented just as in HTML. XML and XQuery 20130422 Slide 10 of 46
Accessing and Querying XML Databases • Since an XML document is just a text file, it may be created and accessed with any text editor. • However, to query it properly, to check it against a DTD or XML Schema specification, and even to access it at all if it is very large, an XML DBMS, or a relational DBMS with XML support is necessary. • In this course, the XML DBMS eXist-db will be used for that purpose. • It uses a Web interface. • However, it is also very easy to install on your own computer. • It is written in Java, and requires only JDK. XML and XQuery 20130422 Slide 11 of 46
The Departmental Installation of eXist-db URL: http://exist-db.cs.umu.se • You should have received a user-ID and password. • The system will be demonstrated during a lecture. • This slide provides just some basic points. • Login is via the Admin tab under Administration . • Under Browse->Collections , the main directory /db may be found. • Some system-wide databases may be found there, as may /db/home . • Under /db/home is the private directory of each user. • Under that directory, the possibility of downloading and deleting files, as well as creating subdirectories, is provided. • There is also the Webstart client which will provides browsing, change of access rights, etc. • Access rights are in Unix/Linux style. XML and XQuery 20130422 Slide 12 of 46
The Departmental Installation of eXist-db — 2 • There are two interfaces for a query client, accessible from the home page. XQuery Sandbox: A simple interface which allows the user to paste in a query to a database and have it evaluated. XQuery IDE (eXide): A newer and fancier interface, but does not communicate with some clipboards. ❉ Does not talk to Emacs. • Only results which are themselves well-formed XML expressions are displayed. • Results which are lists of values, for example, are not displayed. XML and XQuery 20130422 Slide 13 of 46
Recommend
More recommend