framework
play

Framework Information Integration : Making 1. XML databases from - PDF document

Framework Information Integration : Making 1. XML databases from various places work as one. Semi-structured Data : A new data 2. model designed to cope with problems Semi-structured Data of information integration. Extensible Markup


  1. Framework Information Integration : Making 1. XML databases from various places work as one. Semi-structured Data : A new data 2. model designed to cope with problems Semi-structured Data of information integration. Extensible Markup Language XML : A standard language for 3. Document Type Definitions describing semi-structured data schemas and representing data. 2 1 1. Information Integration 2. Semi-structured Data Generally databases in an enterprises have: � A new data model designed to cope � Several underlying database management � with problems of information systems integration Oracle, MS SQL Server, DB2, Informix, Sybase (SQL � Server), MS Access, etc. Accommodates of different DBMS � Several underlying database schemas � Integrates different schemas Information in an employee table can contain � � Employee Name, SSN, DOB, title, hrsPerWeek. � modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime, � createBy Employee Name, SSN, DOB, title, salary, modifiedTime, � modifiedBy, createTime, createBy 3 4 The Information-Integration 3. XML Problem XML : A standard language for Major bottleneck in enterprise � � describing semi-structured data application integration schemas and representing data. For example, � Hewlett Packard split into HP and Agilent � HP bought Compaq � Need to integrate data from different � sources 5 6 1

  2. The Information-Integration Problem Example Related data exists in many places � Consider merger of three stores in a � and could, in principle, work together. mall But different databases differ in: � There is some overlap in the products � sold but the databases are different Model (relational, object-oriented?). 1. Schema (normalized/unnormalized?). 2. Terminology: are consultants employees? 3. Retirees? Subcontractors? Conventions (meters versus feet?). 4. 7 8 Example Two Approaches to Integration � Every store has a database. Warehousing 1. � One may use a relational DBMS; another Makes a copy of the data � keeps the menu in an MS-Word document. More developed of the two � � One stores the phones of distributors, Mediation 2. another does not. Creates a view of the data � � One distinguishes products in one Newer and less developed � department and another doesn’t. � One counts inventory by number of items, another by cases. 9 10 Warehousing Mediation Make copies of the data sources at a central Create a view of all sources, as if they � � site and transform it to a common schema. were integrated. Reconstruct data daily/weekly Answer a view query by translating it to � � Do not try to keep it more up-to-date than that. terminology of the sources and querying � them. Pro: � Pro: Very well-developed, and several commercial tools are � � available Current data � Con: � Con: � Data can be old since updates are expensive � Can be slow � Availability of tools � 11 12 2

  3. Warehouse Diagram A Mediator User query Result Warehouse Mediator Query Result Result Query Wrapper Wrapper Wrapper Wrapper Query Result Query Result Source 1 Source 2 Source 1 Source 2 13 14 Semi-structured: Motivation Semi-structured: Motivation � Most effective approach to Information � Main limitation of Object-Oriented Integration: Models: Object Models are Strongly Typed � Semi-structured Data Model � Objects of a class have one structure only � or Semi-structured Objects � Semi-structured approach solves this problem 15 16 Semi-structured Data Semi-structured Data � Purpose: � Each object has a class of their own and properties are defined whatever labels � Represent data from independent sources more flexibly than are attached to that object � either relational � Properties mean attributes, relationships, � or object-oriented models. methods, etc. 17 18 3

  4. Semi-structured Graphs Semi-structured Data � Think of objects, but with the type of � Easy to think of Semi-structured data as each object its own business, not that Graphs of its “class.” � Nodes = objects. � Labels to indicate meaning of � Labels on arcs: substructures. � attributes leading to a leaf node � Relationships leading to another node. 19 20 Example: Data Graph Semi-structured Graphs Root object represents the entire DB. Often look like trees, but are not. Notice a � Atomic values at leaf nodes root new kind soda soda � nodes with no arcs out. of data. rest � Flexibility: no restriction on: manf manf prize PepsiCo � Labels out of a node. name name year award sellsAt � Number of successors with a given label. Pepsi 2003 Sobe BestSeller name addr The soda object for Pepsi KFC Main St (arc-in called soda; arc-out called name to Pepsi) The restaurant object for KFC (arc-in called rest; 21 22 arc-out labeled name to KFC) XML Well-Formed and Valid XML � XML = Extensible Markup Language. � Well-Formed XML allows you to invent your own tags. � While HTML uses tags for formatting � Similar to labels in semi-structured data graph. (e.g., “italic”), XML uses tags for � Valid XML involves a DTD (Document Type semantics (e.g., “this is an address”). Definition), which � Key idea: create tag sets for a domain � gives a grammar for the use of labels (e.g., genomics), and translate all data � limits the set of labels our of node into properly tagged XML documents. � the order and number of times a label occurs 23 24 4

  5. Well-Formed XML: Header Well-Formed XML: Body � Start the document with a declaration , � Body of document is a root tag surrounded by < ? … ?> . surrounding nested tags. � Body can include: � Normal declaration for Well-Formed � several properly matching tags (as in html XML is: structure) <? XML VERSION = “1.0” STANDALONE = “yes” ?> � Root tag can � Version indicates version number � have a special meaning such as document type � Standalone = “yes” means no DTD � or can be generic provided. 25 26 Tags Example: Well-Formed XML Root tag RESTS < ? XML VERSION = “1.0” STANDALONE = “yes” ?> � Tags, as in HTML, are normally surrounds the < RESTS> entire document < NAME> tag specifies the REST name matched pairs, as < REST> One of several nested < NAME> Taco Bell< /NAME> � < BLAH> … < /BLAH> . REST tags representing < SODA> < NAME> Pepsi< /NAME> information about a < PRICE> 1.00< /PRICE> < / SODA> single REST � Tags may be nested arbitrarily. < SODA> < NAME> Sobe< /NAME> < PRICE> 2.00< /PRICE> < /SODA> � Some tags requiring no matching ender, < /REST > < REST> … � such as < P> in HTML, are also permitted. < SODA> tags have names < /REST > and price for each Soda � however, we will not use these in examples … nested in < NAME> and < PRI CE> tags. < /RESTS> 27 28 XML and Semi-structured Data XML and Semi-structured Data � Well-Formed XML documents with � Semi-structured approach allows for nested tags is exactly the same idea as non-tree structures trees of semi-structured data. � We shall see that XML also enables � Tags are the labels on edges non-tree structures, as does the semi- � Nodes represent data between matching structured data model. tags � Parent-child relationship is immediate nesting in XML 29 30 5

Recommend


More recommend