xml parsers
play

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew - PowerPoint PPT Presentation

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer Engineering Khon Kaen University 1 Overview What are XML Parsers? Programming Interfaces of XML Parsers DOM: Document Object Model


  1. XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer Engineering Khon Kaen University 1

  2. Overview  What are XML Parsers?  Programming Interfaces of XML Parsers  DOM: Document Object Model  SAX: Simple API for XML  StAX: Streaming API for XML 2

  3. What are XML Parsers? (1/2)  The most common XML processing task is parsi sing ng an XML document  Parsing involves reading an XML document to determine its structure and contents  It is essential for the automatic processing of XML documents 3

  4. What are XML Parsers? (2/2)  Parsers also check whether documents conform to the XML standard and have a correct structure  There are two types of XML parsers  Validating: check documents against a DTD or an XML schema  Non-validating: do not check documents against a DTD or an XML schema 4

  5. Available Java XML Parsers APIs  SUN  Integrated in JDK 1.4 version and later  Package javax.xml.parsers  Apache Xerces: XML Parsers in Java, C++, and Perl  http://xerces.apache.org/  SAX  http://www.saxproject.org/  XP – an XML Parser in Java  http://www.jclark.com/xml/xp/index.html 5

  6. Programming Interfaces (1/2)  PHP and Java  Document Object Model (DOM)  Model a document as a tree  Java  Simple API for XML (SAX)  The user needs to create the model  Streaming API for XML (StAX)  Use a pull model for event processing  Provide user-friendly APIs for read-in and write-out 6

  7. Programming Interfaces (2/2)  PHP  SimpleXML extension  Provides a very simple and easily usable toolset to convert XML to an object  XMLReader extension  The reader acts as a cursor going forward on the document stream and stopping at each node  XMLWriter extension  The writer that provides a non-cached, forward- only means of generating streams or files containing XML data 7

  8. How to Use a Parser  In general, here’s how you use a parser:  Create a parser object  Point the parser object at your XML document  Process the results  The common XML parsing tools can make the task much simpler 8

  9. What is DOM? (1/2)  DOM is an official recommendation of the W3C  It defines an interface that enables programs to access and update the structure of XML documents  When an XML parser claims to support the DOM, that means it implements the interfaces defined in the standard 9

  10. What is DOM? (2/2)  When you parse an XML document with a DOM parser, you get back a tree of nodes that represent the structure and contents of the XML document  You can access your information by interacting with this tree of nodes 10

  11. DOM Data Modeling  Each element node contains a list of other nodes as its children  These children might contain text values or other nodes  DOM preserves the sequence of the elements that it reads from XML documents 11

  12. DOM Processing Model (1/2)  The DOM Processing Model consists of reading the entire XML document into memory and building a tree representation of the structured data  This process can require a substantial amount of memory when the XML document is large 12

  13. DOM Processing Model (2/2)  By having the data in memory, DOM introduces the capability of manipulating the XML data by  Inserting, editing, or deleting tree elements  It supports random access to any node in the tree 13

  14. What is SAX? (1/2)  SAX is an alternative way of working with the information in your XML document  It was designed to have a smaller memory footprint, but it puts more of the work on the grammar  SAX does not crate a default object model on top of your XML document  SAX was originally developed by David Megginson 14

  15. What is SAX? (2/2)  When you parse an XML document with a SAX parser, the parser generates a series of events as it reads the document  These events are pushed to event handlers  You need to decide what to do with the events when you parse an XML document 15

  16. Sample SAX Events  The startDocumen rtDocument event  For each element, a startEleme rtElement nt event at the start of the element, and an endElement ement event at the end of the element  If an element contains contain, there will be events such as char arac acter ters for additional text  The endDocu Document ment event 16

  17. What is StAX?  StAX is an exciting new parsing technique  Like SAX, it uses an event-driven model  However, instead of using SAX’s push model, StAX uses a pull model for event processing  Instead of using a callback mechanism, a StAX parser returns events as requested by the application 17

  18. SAX vs. StAX  SAX returns different types of event to the ContentHandler  StAX returns its events to the application and can even provide the events as objects  StAX includes factories for creating the StAX reader and writer  Applications can use the StAX interfaces without reference to the details of a particular implementation 18

  19. StAX vs. DOM and SAX  StAX specifies two parsing models  The cursor model  The iterator model  Like SAX, the cursor model simply returns events  The iterator model returns events as objects  Provide a more natural interface but has the additional overhead of object creation 19

  20. DOM vs. SAX (1/3)  In the case of DOM, the parser does almost everything  Read the XML document in  Create an object model on top of it  Give you a reference to this object model (a document object) so that you can manipulate it  SAX does not expect the parser to do much 20

  21. DOM vs. SAX (2/3)  For SAX, the parser should  Read in the XML document  Fire a bunch of events depending on what tags it encounters in the XML document  Then, the programmer needs to make sense of all the tag events and create objects in their own object model 21

  22. DOM vs. SAX (3/3)  SAX can be really fast at runtime if your object model is simple  SAX is faster than DOM because  it bypasses the creation of a tree based object model of your information  On the other hand, you have to write a SAX document handler to interpret all the SAX events 22

  23. Drawbacks of DOM  Partial parsing is not possible  Loading the whole document and building the entire tree structure in memory can be expensive  The DOM tree is an order of magnitude larger than the document  The generic DOM node type is an interoperability advantage but may not be the best when you do object type binding 23

  24. When to Use DOM  When the development needs to be done quickly  DOM is quite easy to implement  When you need to have random access to the XML document  Example: An XSL Processor  When you need to modify an XML document  Example: An XML Editor 24

  25. Drawbacks of SAX  You have to implement the event handlers to handle all incoming events  Must maintain event states in your code  Must keep track of where the parser is in the document  It does not have built-in document navigation support  No random access support 25

  26. When to Use SAX  When you have a small amount of memory  SAX requires little memory because it does not construct an internal representation of the XML data  When you need to only read the content in a single pass  Example: Many B2B and EAI applications use XML just as an encapsulation format in which the receiving end simply retrieves all the data 26

  27. Drawbacks of StAX  It does not have built-in document navigation support  No random access support  Document modification is still quite difficult if you want to do anything beyond simple one-pass transformations 27

  28. When to Use StAX  When applications need to take advantage of the streaming model for performance while maintaining full support of namespaces  For an application that can easily request events from multiple StAX parsers and put them into a single context  Example: Web services 28

  29. Summary of Java Parser APIs  XML parsers are programs to read, manipulate, and create XML documents  To automate the XML processing, XML developers need to develop XML parsers  XML parsers APIs  DOM  + Easy for developers to develop  + Random access  - Requires lots of memory  SAX, StAX  + Fast processing  - Developers need to create their own data model 29

  30. Streaming APIs in PHP  ext/xmlreader and ext/xmlwriter  Allow for XML to be read or written to/from PHP streams  Resulting in very low memory usage  But providing very focused and uni-directional XML support (can write or read only)  To manipulate XML data tree  Using DOM or SimpleXML 30

  31. PHP DOM vs. SimpleXML (1/2)  DOM allows a developer to access and manipulate XML in any way needed, but it comes at a price  DOM is a large and complex API, requiring a developer to really understand all details  SimpleXML aims to break through all the XML complexities and provide an intuitive and simple 31

  32. PHP DOM vs. SimpleXML (2/2)  The vast majority of people working with XML are really only concerned with elements having simple content  DOM models an XML document as a tree  SimpleXML takes an easier approach and views a document as an object  Elements are represented as properties and attributes as accessors 32

Recommend


More recommend