module 5 module 5 introduction to xquery introduction to
play

Module 5 Module 5 Introduction to XQuery Introduction to XQuery - PowerPoint PPT Presentation

Module 5 Module 5 Introduction to XQuery Introduction to XQuery XML is now everywhere XML is now everywhere Google search (warning: unreliable Google search (warning: unreliable numbers) numbers) 285.000.000 for XML 285.000.000 for


  1. Module 5 Module 5 Introduction to XQuery Introduction to XQuery

  2. XML is now everywhere XML is now everywhere  Google search (warning: unreliable Google search (warning: unreliable numbers) numbers)  285.000.000 for XML 285.000.000 for XML  1.000.000 for XQuery 1.000.000 for XQuery  11.000.000 for XSLT 11.000.000 for XSLT  12.000.000 for XML Schema 12.000.000 for XML Schema  60.000.000 for .NET 60.000.000 for .NET  200.000.000 for Java 200.000.000 for Java  64.000.000 for SQL 64.000.000 for SQL  The highest Google number among all the The highest Google number among all the technology buzzwords that I searched (except RSS) technology buzzwords that I searched (except RSS) 01/31/07 2

  3. Sources of XML data Sources of XML data Inter-application communication data (WS, Rest, etc) Inter-application communication data (WS, Rest, etc) 1. 1. Mobile devices communication data Mobile devices communication data 2. 2. Logs Logs 3. 3. Blogs (RSS) Blogs (RSS) 4. 4. Metadata (e.g. Schema, WSDL, XMP) Metadata (e.g. Schema, WSDL, XMP) 5. 5. Presentation data (e.g. XHTML) Presentation data (e.g. XHTML) 6. 6. Documents (e.g. Word) Documents (e.g. Word) 7. 7. Views of other sources of data Views of other sources of data 8. 8. Relational, LDAP, CSV, Excel, etc. Relational, LDAP, CSV, Excel, etc.  Sensor data Sensor data 9. 9.  01/31/07 3

  4. Some vertical application Some vertical application domains for XML domains for XML  HealthCare Level Seven HealthCare Level Seven http://www.hl7.org/ http://www.hl7.org/  Geography Markup Language (GML) Geography Markup Language (GML)  Systems Biology Markup Language (SBML) Systems Biology Markup Language (SBML) http://sbml.org/ http://sbml.org/  XBRL, the XML based Business Reporting standard XBRL, the XML based Business Reporting standard http://www.xbrl.org/ http://www.xbrl.org/  Global Justice XML Data Model Global Justice XML Data Model (GJXDM) http://it.ojp.gov/jxdm (GJXDM) http://it.ojp.gov/jxdm  ebXML ebXML http://www.ebxml.org/ http://www.ebxml.org/  e.g. Encoded Archival Description Application e.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/ http://lcweb.loc.gov/ead/  Digital photography metadata XMP Digital photography metadata XMP  An XML grammar for sensor data (SensorML) An XML grammar for sensor data (SensorML)  Real Simple Syndication (RSS 2.0) Real Simple Syndication (RSS 2.0) Basically everywhere. Basically everywhere. 01/31/07 4

  5. Processing the XML data Processing the XML data Huge amount of XML information, and growing Huge amount of XML information, and growing • We need to “ manage manage ” it, and then “ ” it, and then “ process process ” it ” it We need to “ • Store it efficiently Store it efficiently • Verify the correctness Verify the correctness • Filter, search, select, join, aggregate Filter, search, select, join, aggregate • Create new pieces of information Create new pieces of information • Clean, normalize the data Clean, normalize the data • Update it Update it • Take actions based on the existing data Take actions based on the existing data • Write complex execution flows Write complex execution flows • No conceptual organization like for relational No conceptual organization like for relational • databases (applications are too heterogeneous) databases (applications are too heterogeneous) 01/31/07 5

  6. Frequent solutions to XML data Frequent solutions to XML data management management Map it to generic generic programming APIs (e.g. programming APIs (e.g. Map it to 1. 1. DOM, SAX, StaX) DOM, SAX, StaX) Manually map it to map it to non-generic non-generic APIs APIs Manually 2. 2. Automatically map it to map it to non-generic non-generic structures structures Automatically 3. 3. Use XML extensions XML extensions of existing languages of existing languages Use 4. 4. Shredding for relational stores for relational stores Shredding 5. 5. Native XML processing through XSLT and XML processing through XSLT and Native 6. 6. XQuery XQuery 01/31/07 6

  7. 1. Mapping to generic structures 1. Mapping to generic structures  Represent the data: Represent the data:  Original UNICODE form or Original UNICODE form or  Some binary representation (e.g FastInfoset) Some binary representation (e.g FastInfoset)  Store it: Store it:  Directly on a file system or Directly on a file system or  On a “transacted” file system (e.g. SleepyCat, or a relational On a “transacted” file system (e.g. SleepyCat, or a relational database) database)  Map the XML data to generic XML programmatic Map the XML data to generic XML programmatic APIs APIs  E.g. Dom, Sax, Stax (JSR 173), XMLReader E.g. Dom, Sax, Stax (JSR 173), XMLReader  Use the native programming languages (e.g. Java, C#) Use the native programming languages (e.g. Java, C#) to manipulate the data to manipulate the data  Re-serialize it at the end Re-serialize it at the end 01/31/07 7

  8. 1. Manual mapping to generic 1. Manual mapping to generic structures (example) structures (example) <purchaseOrder> <purchaseOrder> <lineItem> <lineItem> ….. .. … </lineItem> </lineItem> Class DomNode{ <lineItem> <lineItem> ….. .. … </lineItem> </lineItem> public String getNodeName(); </purchaseOrder> </purchaseOrder> public String getNodeValue(); public void setNodeValue(nodeValue); public short getNodeType(); } <book> <book> <author>…</author> <author>…</author> <title>….</title> <title>….</title> Hard coded mappings ….. .. … </book> </book> 01/31/07 8

  9. 2. Manual mapping to non- 2. Manual mapping to non- generic structures generic structures <purchaseOrder> <purchaseOrder> Class PurchaseOrder{ <lineItem> <lineItem> ….. .. … </lineItem> </lineItem> public List getLineItems(); <lineItem> <lineItem> …….. ….. .. … } </lineItem> </lineItem> </purchaseOrder> </purchaseOrder> Class Book{ public List getAuthor(); <book> <book> <author>…</author> <author>…</author> public String getTitle(); <title>….</title> <title>….</title> …… } ….. .. … </book> </book> Hard coded mappings 01/31/07 9

  10. 3. Automatic mapping to non- 3. Automatic mapping to non- generic structures generic structures <type name=“book-type book-type”> ”> <type name=“ <sequence> <sequence> <attribute name=“ <attribute name=“year year” type=“xs:integer”> ” type=“xs:integer”> <element name=“ <element name=“title title” type=“xs:string”> ” type=“xs:string”> <sequence minoccurs=“0”> <sequence minoccurs=“0”> <element name=“ <element name=“author author” type=“xs:string> ” type=“xs:string> </sequence> </sequence> </sequence> </sequence> </type> </type> <element name=“ <element name=“book book” type=“ ” type=“book-type book-type”> ”> Class Book-type{ Automatic mapping public integer getYear(); e.g.XMLBeans public string getTitle(); public List getAuthors(); …….. } 01/31/07 10

  11. 4. XML extensions of existing 4. XML extensions of existing procedural languages procedural languages  Examples: Examples:  C-omega, ECMAscript, PHP extensions, QuickTime™ and a C-omega, ECMAscript, PHP extensions, TIFF (Uncompressed) decompressor are needed to see this picture. Phyton extensions, etc. Phyton extensions, etc.  Most of them define: Most of them define:  A way of importing XML data into their native A way of importing XML data into their native type system type system  A rich API for XML data manipulation A rich API for XML data manipulation  A way of navigating/searching/querying the A way of navigating/searching/querying the XML data via their extensions (Xpath based or XML data via their extensions (Xpath based or Xpath inspired) Xpath inspired) 01/31/07 11

  12. 5. Native XML processing 5. Native XML processing XSLT and XQuery XSLT and XQuery  Most promising alternative for the future. Most promising alternative for the future.  The The only only alternative such that: alternative such that:  the data is modeled only once the data is modeled only once  is well integrated with XML Schema type system is well integrated with XML Schema type system  it preserves the logical/physical data independence it preserves the logical/physical data independence  the code deals with non-generic structures the code deals with non-generic structures  Code can be optimized automatically Code can be optimized automatically  Data is stored: Data is stored:  in plain file systems in plain file systems or or in sophisticated data stores (e.g. XML in sophisticated data stores (e.g. XML extensions of relational stores) extensions of relational stores)  Missing pieces, under development Missing pieces, under development  E.g. no procedural logic E.g. no procedural logic 01/31/07 12

Recommend


More recommend