11/3/2009 Manage XML Data using Relational DBMS � Trend: XML represents most data on WWW � Need: Relational Databases for Answer a lot of XML Queries Easy/Auto Effective Efficient Querying XML Documents: � How: Use Relational model to solve XML Limitations and Opportunities � Why: Relational is Powerful, Let’s Reuse it Background: XML Review Background: XML Review XML DTD � Schema for XML e X tensible M arkup L anguage � � Extended from SGML � Helps applications program � Hierarchical / Semi- interpret meaning of XML structured Data (sets) Data � Self-describing � Pattern matching � Program can interpret � * means zero or more data � + means 1 or more � Emerging as Standard � ? means zero or 1 in Web Applications Background: XML Review Background: XML Review DTD XML DTD � Schema for XML � Schema for XML e X tensible M arkup L anguage � � Helps applications program � Helps applications program � Hierarchical / Semi- interpret meaning of XML interpret meaning of XML structured Data (sets) Data Data � Extended from SGML � Pattern matching � Pattern matching � Self-describing � * means zero or more � * means zero or more � Emerging as Standard in � + means 1 or more � + means 1 or more � ? means zero or 1 Web Applications � ? means zero or 1 � Root is not always the same � >= 1 element can be root element 1
11/3/2009 xml databases How: Use Relational for XML : 4 things � Convert from XML to Relational � form pairs from XML to Relational Method Section 1. Schema 1 DTD >= 1 Columns / Tables In-lining 3 2. Instance each XML doc >= 1 Rows / Tuples Parsing 3 � talk / discuss about 3. Query 1 XML-QL/Lorel 1 SQL Translation 4 an interesting idea connected to topic � Convert Query results from Relational back to XML � share your ideas from Relational to XML Method Section with the class 4. Result > 1 tuples 1 XML Doc various 5 Focus of Presentation How : 1. Schema Will talk about only 1 and 4 From XML To Relational From XML To Relational Method Section 1. Schema DTD Tables/Columns 1. Schema DTD Tables/Columns In-lining 3 2. Instance each XML doc >= 1 Rows / Tuples Parsing 3 3. Query 1 XML-QL/Lorel 1 SQL Translation 4 From Relational To XML Method Section Node Elements 4. Query Projected Columns Attributes various 5 Result Selected Rows Sets Parent/Child Schema : Start with a DTD Schema: Turn DTD to tables XML Schema, e.g. DTD XML Schema, e.g. DTD Relational Schema: Table declarations in SQL Figure 12 2
11/3/2009 Schema : Keep everything intact Schema: DTD to ER XML Schema, e.g. DTD Given a XML Schema, intuitive to map � Each node element -> an ER Entity / Tables Relational Schema, e.g. � XML attributes -> ER attributes / Columns Table Declarations in SQL But, � Child node elements can be elements themselves � Can’t map directly them into ER Attributes Figure 12 Schema: Simplify DTDs Schema: DTD to Relational : In-lining 1. Simplify DTD Simplify DTDs 2. Make DTD Graph from simplified DTD Fig 5. Flattening 3. Use DTD graph and create Tables: 3 ways Fig 6. Simplification 1. Basic Fig 7. Grouping 2. Shared 3. Hybrid � Nothing gets deleted/modified/added � Easier to make DTD graph Schema: DTD to Relational DTD Schema to DTD Graph 1. Simplify DTD 2. Make DTD graph from simplified DTD � First step to map XML node elements/attributes to relational tables/columns � Elements appear exactly once � Attributes and operators appear as many times as they appear in the DTD 3. Create Relations from DTD graph Node: elements/attributes/operators Edge: arrow from Parent to Child 3
11/3/2009 DTD Graph: more than 1 XML Doc Schema: DTD to Relational 1. Simplify DTD 2. Make DTD graph from simplified DTD 3. Create Relations from DTD graph, 3 ways 1. Basic, 2. Shared = Basic + … 3. Hybrid = Shared + … � Like buying a car DTD describes > 1 type of XML Doc � Book can be root in 1 XML Doc Shared XOR Hybrid option packages Article can be root in another Schema: DTD to Relational Basic In-lining : 4 steps 1. Simplify DTD � Input: an Element graph, element is root 2. Make DTD graph from simplified DTD � Output: a set of Relations 3. Create Relations from DTD graph, 3 ways 1. Basic, 4 steps algorithm, apply for each element graph 2. Shared = Basic + … � e.g. Editor 3. Hybrid = Shared + … � Like buying a car � Car option packages DTD graph | Editor Element graph DTD graph | Editor Element graph Recursion <-> graph cycle Cross reference Recursion Element graph is sub-graph of DTD graph: e.g. Editor element graph, Editor is root 4
11/3/2009 Basic 1: Create relation for each root Basic 2: relations for sets/recursions Basic3: inline rest of attributes Basic 2: relations for sets/recursions Basic4 : connect using foreign keys Basic In-lining: Output : 14 tables 5
11/3/2009 Basic In-lining: Output : 14 tables Basic in-lining: pros & cons Pros Cons � � Easy to do certain queries, Large number of relations � Consume more resources such as “list all authors of a � Inefficient for queries such as book” “list all authors having first � > in-lined as attributes monograph.author.name.firstname Attribute name equals to path name name Jack” � Query 2 tables, get book author and article author individually � Complicated to handle DTD recursion � 1 table per recursion � Like doing it as separated schema for each root element Shared in-lining : 3 more steps Shared 1: Extract shared columns e.g. Author 5 to 1 � Output: every node in Input: exactly one relation, either � Relations from Basic � Extract book, editor.monograph, � separate relation or article.author, author into single Author � in-lined into parent 1st additional step 2nd and 3rd additional steps � � Identify nodes spread across Merge mutually recursive � multiple relations in Basic e.g. editor & monograph Basic � In-line a child node if it has no � Separate these shared nodes children into new relations � � e.g. firstname and lastname e.g. book author and article into author author into author � � fewer join operations Reduce replication Shared Shared 2: Merge mutually recursive Shared 3: in-line when no child node e.g . monograph 2 to 1, merge editor & e.g. merge firstname, lastname, name, address editor.monograph into monograph into Author.name.firstname, Author.name. etc. Basic Basic Shared Shared 6
11/3/2009 Shared 3: in-line when no child node Shared 3: in-line when no child node e.g. merge firstname, lastname, name, address e.g. merge firstname, lastname, name, address into Author.name.firstname, Author.name. etc. into Author.name.firstname, Author.name. etc. Basic Basic Shared Shared Shared 3: in-line when no child node Shared 3: in-line when no child node e.g. merge firstname, lastname, name, address e.g. Author 5 to 1 (name and address) into Author.name.firstname, Author.name. etc. Basic Basic Shared Shared Shared In-lining: Output : 5 tables Hybrid In-lining � Reverse Shared a little -> � In-line sub-element as long as it’s not a SET (*) or Recursive � e.g. Title -> article.title & monogrpah title Pros Cons � Even when in-degree > 1 � � Fewer tables -> Shared More joins than Basic if we � e.g. Author -> book.author + article.author start at a particular element � 1 table vs. 5 tables node. � for Author � List all authors having first name Jack 7
11/3/2009 Hybrid In-lining : Output: 4 tables Hybrid In-lining : In-line Title Shared Shared Hybrid Hybrid Hybrid In-lining : In-line Author Hybrid In-lining : Output: 4 tables Shared Shared Hybrid Hybrid Hybrid in-lining Summary: 3 types of In-lining 1. Schema: XML -> Relational Pros Cons � In-lining further reduces � Higher degree of in-lining 1. Basic In-lining DTD element graph joins could cause more SQL 2. Shared In-lining extract/merge shared tables queries to be generated � As good as Shared in most 3. Hybrid In-lining in-line for simplicity � Have to Balance between cases Basic and Shared � Better than Shared in some cases � Maintain the order of sets as in XML -> add column 8
11/3/2009 Focus of Presentation How : 4. Query Results Finished on 1, now move onto to 4 From XML To Relational From Relational To XML Method Section Node Elements 1. Schema DTD Tables/Columns In-lining 3 4. Query Projected Columns Attributes 2. Instance each XML doc >= 1 Rows / Tuples Parsing 3 Results Selected Rows Sets 3. Query 1 XML-QL/Lorel 1 SQL Translation 4 Parent/Child From Relational To XML Method Section Node Elements 4. Query Projected Columns Attributes various 5 Result Selected Rows Sets Parent/Child Relational Results to XML 4. Results : 1. Simple Structuring � Fill in Firstname 3 categories tag value 1. Simple Structuring -> fill in tag values Query Tuple XML 2. Tag Variables -> tag names 3. Grouping -> group sets / hierarchies 4. Results : 1. Simple Structuring 4. Results : 2. Tag Variables � Fill in Lastname � Book tag vs. Monograph tag tag value tag name Query Tuple XML Query Tuple XML 9
Recommend
More recommend