Lecture 12 Overview Database Management Systems • Semi-Structured Data • Introduction to XML Winter 2004 • Querying XML Documents CMPUT 391: XML and Querying XML Dr. Osmar R. Zaïane Chapter 17 University of Alberta of Textbook 1 2 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta The Structure of Data Structured Data • In the real world data can be of any type • For applications manipulating data, the structure of data is very important to insure efficiency and effectiveness. and not necessarily following any organized • The data is structured when: format or sequence. – Data is organized in semantic chunks (entities). • Such data is said to be unstructured. – Similar entities are grouped together (relations or classes). Unstructured data is chaotic because it – Entities in a same group have the same descriptions (attributes). doesn’t follow any rule and is not – Entity descriptions for all entities in a group have the same predictable. defined format, a predefined length, are all present, and follow the same order (schema). • Text data is usually unstructured. Many data • This structure is sometimes too rigid for some applications. on the Internet is unstructured (video • For many application, data is neither completely streams, sound streams, images, etc). unstructured nor completely structured. Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 3 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 4
Semi-Structured Data Semi-Structured Data (Cont.) • Data is organized in semantic entities • To make it suitable for machine processing • Similar entities are grouped together it should have these characteristics • But – Be object-like – Entities in the same group may not have the same – Be schemaless (doesn’t guarantee to attributes conform exactly to any schema, but – The presence of some attributes may not always be required different objects have some commonality – The size of same attributes of entities in a same among themselves) group may not be the same – Be self-describing (some schema-like – The type of the same attributes of entities in a same information, like attribute names, is part of group may not be of the same type. data itself) 5 6 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Non-Self-Describing Data Self-Describing Data Relational or Object-Oriented: • Attribute names embedded in the data itself • Doesn’t need schema to figure out what is what Data part : • (but schema might be useful nonetheless) (#123, [“Students”, {[“John”, 111111111, [123,”Main St”]], (#12345, [“Joe”, 222222222, [321, “Pine St”]] } [ ListName : “Students”, ] ) Contents : { [ Name : “John Doe”, Schema part : Id : “111111111”, Address : [ Number : 123, Street : “Main St.”] ] , PersonList[ ListName : String, PersonList [ Name : “Joe Public”, Contents : [ Name : String, Id : “222222222”, Id : String, Address : [ Number : 321, Street : “Pine St.”] ] } Address : [ Number : Integer, Street : String] ] ] ) ] Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 7 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 8
Data Model for Semi-Structured Data Example: Booklist Data in OEM • Semi-structured data doesn’t have a schema. • There are many data models to represent semi- BOOK structured data. Most of them use the notion of labeled graphs. – Nodes in the graph correspond to compound AUTHOR TITLE PUBLISHED AUTHOR FORMAT TITLE objects or atomic values. – Edges in the graph correspond to attributes The Hard- Identity 1998 – The graph is self describing (no need for a schema) character cover – Object Exchange Model (OEM): each object is of phy- described by a triplet <label, type, value> Milan Kundera sical law – Complex objects are decomposed hierarchically Richard Feynman into smaller objects 9 10 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Overview Introduction to XML • XML: eXtensible Markup Language • Semi-Structured Data • Suitable for semistructured data • Introduction to XML – Easy to describe object-like data • Querying XML Documents – Selfdescribing – Doesn’t require a schema (but can be provided optionally) • Standard of the World-Wide Web Consortium for data exchange • All major database products have been extended to store and construct XML documents Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 11 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 12
What is Special with XML Example attributes • It is a language to markup data <?xml version=“1.0” ?> • There are no predefined tags like in HTML <PersonList Type =“Student” Date =“2002-02-02” > • Extensible � tags can be defined and <Title Value =“Student List” /> Root element Root <Person> extended based on applications and needs … … … elements </Person> – Elements / Tags <Person> Empty … … … element </Person> – Attributes </PersonList> – ( Eg. : < BOOK page="453" > … </ BOOK >) Element (or tag) names • Elements are nested • Root element contains all others 13 14 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta More Terminology Rules for Creating XML Documents Opening tag • Rule 1 : All terminating tags shall be closed – Omitting a closing XML tag is an error. <Person Name = “John” Id = “111111111”> Example: <FirstName> Joerg </FirstName> “standalone” text, • Rule 2 : All non-terminating tags shall be closed John is a nice fellow not useful as data Person Content of Person – Omitting a forward slash for non-terminating <Address> (empty) tags is an error. Address , Parent of Address <Number>21</Number> Ancestor of number number Example <Available answer="yes" /> Nested element, child of Person Person <Street>Main St.</Street> • Rule 3 : XML shall be case sensitive </Address> – Using the wrong case is an error. … … … Child of Address Address , Example: <FirstName> Osmar </firstname> Descendant of Person Person </Person> – It is OK in HTML <H1>my header</h1> Closing tag Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 15 Dr. Osmar Zaïane, 2001-2004 CMPUT 391 – Database Management Systems University of Alberta 16
Recommend
More recommend