12 application program interfaces apis
play

12. Application program interfaces (APIs) XML documents are text - PowerPoint PPT Presentation

12. Application program interfaces (APIs) XML documents are text files in principle no special APIs are required. However, for example parsing and validation are tasks needed in almost any application. Predefined class libraries


  1. 12. Application program interfaces (APIs) • XML documents are text files – in principle no special APIs are required. • However, for example parsing and validation are tasks needed in almost any application. • Predefined class libraries and standardized interfaces reduce programmer’s work & errors. • Main alternatives: – Document Object Model (DOM) – Simple API for XML (SAX) – Streaming API for XML (StAX) • Example implementation by Sun: JAXP (containing DOM, SAX, and XSLT) XML-12 J. Teuhola 2013 209

  2. 12.1. Document Object Model (DOM) • W3C recommendation: A tree-based interface: reads and parses the whole document and places the tree in memory for processing. • Not tied to any programming language; Java suits well (platform-independent, such as XML). • DOM Levels 1, 2, 3 : Successively wider support for various features of XML. • Interfaces are divided into modules , enabling varying degrees of support for the API. • Here: Level 2 Core (2000; Level 3: 2004) XML-12 J. Teuhola 2013 210

  3. About DOM specifications • Extensions have been defined for applications, such as MathML, SVG, SMIL. • Alternatives for processing: – Using only generic interfaces, like manipulating the Nodes. – Using application-specific interfaces, e.g. HTML: paragraphs, images, etc. • Specification language: Interface Description Language (IDL by OMG) – independent of programming language and operating system. • Here: Java mapping (rather straightforward). • JDOM: Simplified DOM for Java XML-12 J. Teuhola 2013 211

  4. Tentative DOM example (Xerces & Java) import java.io.*; import org.w3c.dom.*; import org.apache.xerces.parsers.DOMParser; import org.xml.sax.*; … // Print the root tag name of document ”element.xml” DOMParser parser = new DOMParser(); try { parser.parse(”example.xml”); } catch (SAXException saxe) { … } catch (IOException ioe) { … } Document d = parser.getDocument(); Element root = d.getDocumentElement(); System.out.println(”Root: ” + root.getTagName()); XML-12 J. Teuhola 2013 212

  5. Important interfaces in DOM • Node is the root of all component interfaces. – The whole document can be processed by the methods and properties defined for Node . – The in-memory document structure consists of nodes connected by parent, child and sibling links. • NodeList and NamedNodeMap for processing of node sets • DocumentTraversal , NodeIterator , TreeWalker for tree traversal and iteration • DOMImplementation for various purposes • … and many others XML-12 J. Teuhola 2013 213

  6. Node interface hierarchy DocumentFragment Document CharacterData Text CDATASection Attr Comment Node Element DocumentType Notation Entity EntityReference ProcessingInstruction XML-12 J. Teuhola 2013 214

  7. Node methods 1. Node characteristics: getNodeType (), getNodeName (), getNodeValue (), setNodeValue(value ), hasChildNodes (), getAttributes (), getOwnerDocument () 2. Accessing relatives: getFirstChild (), getLastChild (), getChildNodes (), getNextSibling (), getPreviousSibling (), getParentNode () 3. Node manipulation: removeChild (), insertBefore(newChild , refChild), appendChild(newChild ), replaceChild(oldChild , NewChild), cloneNode(deep ), normalize () XML-12 J. Teuhola 2013 215

  8. Access directions in the document tree node last child first child parent parent parent next next next sibling sibling sibling node node node node previous previous previous sibling sibling sibling XML-12 J. Teuhola 2013 216

  9. Document interface • Represents the whole document – Technically implemented as the root node of the document – Extends the Node interface. – Note: the root of DOM = parent of the actual document root. • Accessing the document information: – getDocType () – getImplementation () – getDocumentElement () – getElementsByTagName(tagName ) • DOM Level 2: – getElementsbyTagNameNS (URI, localName) – getElementByID (elementID) – importNode (importedNode, deep) … and many others … XML-12 J. Teuhola 2013 217

  10. Document interface (cont.) • Factory methods for creating objects to a doc: – createElement (tagName) – createTextNode (data) – createComment (data) – createCDATASection (data) – createProcessingInstruction (target, data) – createAttribute (name) – createEntityReference (name) • Dom Level 2: – createElementNS (URI, qualifiedName) – createAttributeNS (URI, qualifiedName) XML-12 J. Teuhola 2013 218

  11. DocumentType interface • General data about the document (DTD): – getName () � DOCTYPE name = root name – getEntities () � Internal and external entities as a list – getNotations () � Notations as a list • DOM Level 2: – getInternalSubset () – getPublicId () – getSystemId () XML-12 J. Teuhola 2013 219

  12. Element interface • Extends the Node interface with element- specific features: – getTagName () – getElementsByTagName (name) – normalize () � merge adjacent text elements • Attribute-related methods: – getAttribute (name) – setAttribute (name, value) – removeAttribute (name) – getAttributeNode (name) – setAttributeNode (newAttr) – removeAttributeNode (oldAttr) XML-12 J. Teuhola 2013 220

  13. Element interface (cont.) • DOM Level 2 element-specific extension: – getElementsByTagNameNS (URI, localName) • Attribute-specific extensions – hasAttribute (name) – hasAttributeNS (URI, localName) – getAttributeNS (URI, localName) – setAttributeNS (URI, qualName, value) – getAttributeNodeNS (URI, localName) – setAttributeNodeNS (newAttr) – removeAttributeNS (URI, localName) XML-12 J. Teuhola 2013 221

  14. Attr interface • Information about attributes: – getName () – getValue () – setValue (value) – getSpecified () � false if the value originates from DTD – getOwnerElement () � DOM Level 2 • Note that most attribute-accessing operations are part of the Element interface. XML-12 J. Teuhola 2013 222

  15. CharacterData interface • Adds text processing methods to the Node interface: – getData () – setData (data) – getLength () – appendData (arg) – substringData (offset, count) – insertData (offset, arg) – deleteData (offset, count) – replaceData (offset, count, arg) XML-12 J. Teuhola 2013 223

  16. Extensions (subtypes) of Character Data • Text interface – One additional method: splitText (offset) – Creation by a factory method in Document : createTextNode (data) • CDATASection interface – No additional methods; just identifies CDATA nodes (reminder: <![CDATA[ ... ]]>) – Creation by a factory method in Document • Comment interface – No additional methods; identifies comments. – Creation by a factory method in Document XML-12 J. Teuhola 2013 224

  17. ProcessingInstruction interface • Name of node = name of target application • Methods: – getTarget () – getData () – setData (data) • Creation (by a factory method in Document): – createProcessingInstruction (target, data) XML-12 J. Teuhola 2013 225

  18. Entities and notations • Replacing entities by their values is parser- dependent. External binary data cannot be replaced, but entity references must be created. • Entity interface: – getPublicId () – getSystemId () – getNotationName () • Notation interface: – getPublicId () – getSystemId () XML-12 J. Teuhola 2013 226

  19. Node lists and named node maps • Some DOM operations return a list of nodes; NodeList interface: – item (index), getLength () • Attribute and entity declarations have no specific order; accessing is based on their names; NamedNodeMap interface: – item (index), getLength (), getNamedItem (nodeName), setNamedItem (node), removeNamedItem (nodeName) DOM Level 2: – getNamedItemNS (URI, localName), setNamedItemNS (node), removeNamedItemNs (URI, localName) XML-12 J. Teuhola 2013 227

  20. Testing the DOM implementation • DOMImplementation interface: hasFeature (feature, version) where – feature = module name: core, XML, HTML (DOM Level 1) Views, Events, Style, Traversal, Range (Level 2) More modules appear in Level 3. – version = ”1.0”, ”2.0”, ... • Other methods: – createDocument (URI, qualifiedName, docType) – createDocumentType (qualifiedName, publicId, systemId) XML-12 J. Teuhola 2013 228

  21. Tree traversal interfaces • DOM Level 2: Optional package for sophisticated traversal of document trees. • DocumentTraversal interface: An iterator can be created to choose node types and filter the nodes further. • NodeIterator interface: Iteration steps: to the next/previous node • TreeWalker interface: Like NodeIterator, but more versatile: first/last child, next/previous sibling, parent • NodeFilter interface: accept/reject/skip nodes. XML-12 J. Teuhola 2013 229

  22. Processing of ranges • Ranges is an optional module in DOM Level 2. • A range is a segment between start and end points; points are offsets from the start of the containing element. • Range interface: Methods e.g. for – setting the start and end point, – comparing two ranges, – copying the contents of the range, – inserting new items to the range, – collapsing the range, – etc. XML-12 J. Teuhola 2013 230

Recommend


More recommend