XML and Databases Data Management for Big Data 1 XML and Databases Data Management for Big Data 2018-2019 (spring semester) Dario Della Monica These slides have been translated in English from slides in Italian by Angelo Montanari ✬ ✩ ✫ ✪
XML and Databases of formal rules (a grammar), that defjnes how an XML of tags that can be adapted (unlike HTML, whose set of tags is (3) XML is extensible . The language provides an extensible set type. between markup tags, which describe structure and content included in XML documents as string of characters enclosed (2) XML allows for data description (markup) . Data are document is generated. (1) XML is a formal language . XML is defjned through a set Data Management for Big Data Main features: XML stands for eXtensible Markup Language What XML is 2 fjxed). ✬ ✩ ✫ ✪
XML and Databases modifjed with any text editor. XML data into HTML. Resulting HTML pages can be (XML Stylesheet Language) stylesheet can be used to translate Style information can be included in a separate stylesheet. XSL defjne data semantics (content), rather than presentation stlye. • a language for data presentation , like HTML. XML markups An XML document is a text document that can be read and Data Management for Big Data What XML is NOT - 1 3 displayed inside a browser. ✬ ✩ Therefore, XML is not : ✫ ✪
XML and Databases Data Management for Big Data 4 What XML is NOT - 2 • a programming language (such as Java). An XML document does not perform computations. • a network transfer data protocol (such as HTTP). XML does not transfer data over the network. • a DBMS (such as Oracle). XML does not store or return data. ✬ ✩ ✫ ✪
XML and Databases An XML element is a text enclosed between text tags. Tags are XML element can contain free text (called character data ) or other is enclosed between a beginning and an ending tag is called The document contains only 1 element , called person . Such an Data Management for Big Data enclosed between angle brackets. Example—XML document: XML elements 5 XML elements. ✬ ✩ < person > Alan Turing < /person > element is delimited by the beginning tag < person > and the ending tag < /person > . Tags are used for text markup. Whatever element content (in the example, the string Alan Turing ). An ✫ ✪
XML and Databases Example: an XML document profession . Element person contains a sub-element name and 3 sub-elements Data Management for Big Data Example . Element name , in turn, contains 2 sub-elements fjrst and last . 6 ✬ ✩ < person > < name > < first > Alan < /first > < last > Turing < /last > < /name > < profession > computer scientist < /profession > < profession > mathematician < /profession > < profession > cryptographer < /profession > < /person > ✫ ✪
XML and Databases must precede ending tag of A. structure : elements are the nodes , document element is the root , Consequently: an XML document can be represented as a tree a direct fashion) in exactly one other element . • Each element , except for document element, is enclosed (in ones ( document element ). • There must exist a single element that encloses all of the other follows the beginning tag of element A, then ending tag of B Data Management for Big Data • Elements cannot overlap : if the beginning tag of element B Features of an XML document 7 sub-elements are the children . ✬ ✩ ✫ ✪
XML and Databases XML elements can interleave free text and sub-elements as in Notice: XML is case sensitive ( address and Address are difgerent elements . An empty address element can be compactly There might also be elements with no content at all ( empty Data Management for Big Data the following example: tags). Another example 8 ✬ ✩ < person > < first > Alan < /first >< last > Turing < /last > is mainly known as a < profession > computer scientist < /profession > . However, he was also an accomplished < profession > mathematician < /profession > and a < profession > cryptographer < /profession > . < /person > represented as < address/ > ). ✫ ✪
XML and Databases XML elements can have attributes describing their properties. An (unlike element order, which DOES matter). attributes, which must have difgerent names. Attribute order is irrelevant Data Management for Big Data attribute’s name and value is a string. value can be enclosed between single or double quotes. An element can have an arbitrary number of 9 XML attributes ✬ ✩ attribute has the following syntax: name = ” value ” , where name is the < person born = ”23 / 06 / 1912” died = ”07 / 06 / 1954” > < name > < first > Alan < /first > < last > Turing < /last > < /name > < profession > computer scientist < /profession > < profession > mathematician < /profession > < profession > cryptographer < /profession > < /person > ✫ ✪
XML and Databases XML elements vs. XML attributes How to choose? Attributes for element metadata, elements for Data Management for Big Data Some pieces of information can be encoded both as attribute value and as element content. 10 storing information. ✬ ✩ < person born = ”23 / 06 / 1912” died = ”07 / 06 / 1954” > < name first = ” Alan ” last = ” Turing ” / > < profession value = ” computer scientist ” / > < profession value = ” mathematician ” / > < profession value = ” cryptographer ” / > < /person > ✫ ✪
XML and Databases Data Management for Big Data a suitable attribute (e.g., the id attribute). It is possible to associate a unique identifjer to elements as value of documents with a graph structure . to defjne and use references that make it possible to produce So far, XML documents have a tree structure . XML allows one XML references - 1 11 ✬ ✩ < state id = ” s 1” > < scode > CA < /scode > < sname > California < /sname > < /state > ✫ ✪
XML and Databases idref attribute. sub-element capital within state element, featuring attribute idref data structures . In the previous example, we can add a Through references, it is possible to represent looping/recursive referring to another element through the value of idref attribute. Data Management for Big Data referring to element city . To refer to state element previously defjned, it is possible to use XML references - 2 12 ✬ ✩ < city id = ” c 1” > < ccode > LA < /ccode > < cname > Los Angeles < /cname > < state - of idref = ” s 1” / > < /city > Notice that state - of is an empty element, whose only purpose is ✫ ✪
XML and Databases 1. every beginning tag has a corresponding ending tag; implementation of an XML parser). There are also stand-alone web browser that recognizes XML (i.e., browser includes an The simplest way to parse an XML document is to load it inside a 5. attributes within the same element must have difgerent names. 4. attribute values must be enclosed between quotes; 3. there must be exactly one document element; 2. elements do not overlap; document fulfjlls XML grammar rules: Data Management for Big Data establishes whether or not it is well-formed. A well-formed An XML parser is a software that reads an XML document and XML parser 13 XML parsers like xmllint command line tool . ✬ ✩ ✫ ✪
XML and Databases data source (relational DB, object DB, text document), that model for semi-structured data. Such a data model is close to inadequate to deal with them. XML has been proposed as data devoid of a regular schema; therefore, relational DB can be • Semi-structured databases . Semi-structured data are language to facilitate the inter-change/integration. needs to be exchanged/integrated, XML is a suitable common • Data exchange . In presence of information from difgerent Data Management for Big Data XML main applications are the following: XML main applications 14 the hierarchical data model . ✬ ✩ ✫ ✪
XML and Databases 1.0 standard) is DTD (Document Type Defjnition). An alternative DTD do not carry any semantic information about element document and specify the context within which they are used. come by default, other can be defjned by the user) used inside the later on, an XML entity is a name for a portion of text; some entity document. It lists elements, attributes and entities (as we will see A DTD allows one to force structural constraints over an XML is XSchema . schema defjnition language (the only one that is present in XML Data Management for Big Data application through the defjnition of a schema . The most common It is possible to specify markup that is allowed inside an XML DTD (Document Type Defjnition) 15 contents or attribute values . ✬ ✩ ✫ ✪
Recommend
More recommend