Plan 1. Information in tegration: imp ortan t new application that motiv ates what follo ws. 2. Semistructured data: a new data mo del designed to cop e with problems of information in tegration. 3. XML: a new W eb standard that is essen tiall y semistructured data. 4. X QUER Y: an emerging standard query language for XML data. 1
Information In tegration Problem: related data exists in man y places. They talk ab out the same things, but di�er in mo del, sc hema, con v en tions (e.g., terminology). Example In the real w orld, ev ery bar has its o wn database. Some ma y ha v e relations lik e b eer-price; others � ha v e an MS-w ord �le from whic h the men u is prin ted. Some k eep phones of man ufacturers but not � addresses. Some distinguish b eers and ales; others do not. � 2
Tw o approac hes 1. : Mak e copies of information at War ehousing eac h data source cen trally . ✦ Reconstruct data daily/w eekl y/mon thly , but do not try to k eep it up-to-date. 2. : Create a view of all information, Me diation but do not mak e copies. ✦ Answ er queries b y sending appropriate queries to sources. 3
W arehousing user result query W arehouse Com biner W rapp er W rapp er DB1 DB2 4
Mediation query result Mediator query query result result W rapp er W rapp er query result query result DB1 DB2 5
Semistructured Data A di�eren t kind of data mo del, more suited � to information-in tegration applications than either relational or OO. ✦ Think of \ob jects," but with the t yp e of an ob ject its o wn business rather than the business of the class to whic h it b elongs. ✦ Allo ws information from sev eral sources, with related but di�eren t prop erties, to b e �t together in one whole. Ma jor application: XML do cumen ts. � 6
Graph Represen tati on of Semistructured Data No des = ob jects. � No des connected in a general ro oted graph � structure. Lab els on arcs. � A tomic v alues on leaf no des. � Big deal: no restriction on lab els (roughly = � attributes). ✦ Zero, one, or man y c hildren of a giv en lab el t yp e are all OK. 7
Example bar b eer b eer manf manf name prize Bud A.B. name y ear a w ard M'lob serv edA t 1995 Gold name addr Jo e's Maple 8
XML (Extensible Markup Language) HTML uses tags for formatting (e.g., \itali c") . XML uses tags for seman tics (e.g., \this is an address"). Tw o mo des: � 1. XML allo ws y ou to in v en t Wel l-forme d y our o wn tags, m uc h lik e lab els in semistructured data. 2. XML in v olv es a DTD (Do cumen t V alid T yp e De�nition) that tells the lab els and giv es a grammar for ho w they ma y b e nested. 9
W ell-F ormed XML 1. Declaration = ?> . <? ... ✦ Normal declaration is <? XML VERSION = "1.0" STANDALONE = "yes" ?> ✦ \Standalone" means that there is no DTD sp eci�ed. 2. tag surrounds the en tire balance of the R o ot do cumen t. ✦ is balanced b y </FOO> , as in <FOO> HTML. 3. An y balanced structure of tags OK. ✦ Option of tags that don't balance, r e quir e lik e in HTML. <P> 10
Example <?XML VERSION = "1.0" STANDALONE = "yes"?> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 11
Do cumen t T yp e De�nitions (DTD) Essen tially a grammar describing the legal nesting of tags. In ten tion is that DTD's will b e standards for � a domain, used b y ev ery one preparing or using data in that domain. ✦ Example: a DTD for describing protein structure; a DTD for describing bar men us, etc. Gross Structure of a DTD <!DOCTYPE r oot tag [ <!ELEMENT name ( components )> mor e el ements ]> 12
Elemen ts of a DTD An is a name (its tag) and a paren thesized element description of tags within an elemen t. ✦ Sp ecial case: after an elemen t (#PCDATA) name means it is text. Example <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> 13
Comp onen ts Eac h elemen t name is a tag. � Its comp onen ts are the tags that app ear � nested within, in the order sp eci�ed. Multipli ci t y of a tag is con trolled b y: � a) = zero or more of. * b) = one or more of. + c) = zero or one of. ? In addition, = \or." | � 14
Using a DTD 1. Set = "no" . STANDALONE 2. Either a) Include the DTD as a pream ble, or b) F ollo w the tag b y a XML DOCTYPE declaration with the ro ot tag, the k eyw ord SYSTEM , and a �le where the DTD can b e found. 15
Example of (a) <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 16
Example of (b) Supp ose our bars DTD is in �le . bar.dtd <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars SYSTEM "bar.dtd"> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 17
A ttribute Lists Op ening tags can ha v e \argumen ts" that app ear within the tag, in analogy to constructs lik e in HTML. <A HREF = ...> Keyw ord in tro duces a list of !ATTLIST � attributes and their data t yp es. Example <!ELEMENT BAR (NAME BEER*)> <!ATTLIST BAR type = "sushi"|"sports"|"o ther" > Bar ob jects can ha v e a (bar) t yp e, and the � v alue of that t yp e is limited to the three strings sho wn. Example of use: � <BAR type = "sushi"> . . . </BAR> 18
ID's and IDREF's These are p oin ters from one ob ject to another, analogous to and in NAME = "foo" HREF = "#foo" HTML. Allo ws the structure of an XML do cumen t to � b e a general graph, rather than just a tree. An attribute of t yp e can b e used to giv e ID � the ob ject (string b et w een op ening and closing tags) a unique string iden ti�er. An attribute of t yp e refers to some IDREF � ob ject b y its iden ti�er. ✦ Also to allo w m ultiple ob ject IDREFS references within one tag. 19
Example Let us include in our do cumen t t yp e elemen ts Bars that are the man ufacturers of b eers, and ha v e eac h b eer ob ject link, with an IDREF, to the prop er man ufacturer ob ject. <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*, MANF*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT MANF (ADDR)> <!ATTLIST MANF (name ID)> <!ELEMENT ADDR (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ATTLIST BEER (manf = IDREF)> <!ELEMENT PRICE (#PCDATA)> ]> 20
X QUER Y Emerging standard for querying XML do cumen ts. Basic form: FOR < variables ranging over sets of elements > WHERE < condition > RETURN < set of elements > ; Sets of elemen ts describ ed b y aths , consisting p � of: 1. URL, if necessary . 2. Elemen t names forming a path in the semistructured data graph, e.g., = \start at an y no de and //BAR/NAME BAR go to a c hild." NAME 3. Ending condition of the form [ < condition about subelements, attributes (preceded by @), and values > ] . 21
Recommend
More recommend