standardized data formats for quantum chemistry based on
play

Standardized Data Formats for Quantum Chemistry Based on XML/CML ? - PowerPoint PPT Presentation

Standardized Data Formats for Quantum Chemistry Based on XML/CML ? A Literature and Web Resources Research Sarah Gerster 17.01.2007 Overview Motivation A flavour of XML Efforts to develop an XML standard for computations in


  1. Standardized Data Formats for Quantum Chemistry Based on XML/CML ? A Literature and Web Resources Research Sarah Gerster 17.01.2007

  2. Overview ● Motivation ● A flavour of XML ● Efforts to develop an XML standard for computations in quantum chemistry ● Standard?

  3. Motivation ● Facilitate: – Data exchange – Automated workflows Workflow used by Andreas Elsener – Metadata in his Diploma Thesis – Data storage and retrieval

  4. What is XML? ● Markup Language – combines character data and markup ● Extensible – can create any needed tag ● Structured documents – querying the XML files is reasonably easy

  5. An XML Example 1 <?xml version="1.0" encoding="ISO-8859-1"?> 2 3 <book> 4 5 <chapter>Introduction to XML 6 <para>What is HTML</para> 7 <para>What is XML</para> 8 </chapter> 9 10 <chapter>XML Syntax 11 <para>Elements must have a closing tag</para> 12 <para>Elements must be properly nested</para> 13 </chapter> 14 15 </book> XML example given under http://www.w3schools.com/xml/

  6. Some Issues... ● Database to hold the XML files ● XML Schema or Document Type Definition – outline the structure of an XML file – provide a set of rules – well-formed / valid documents – new standard = new XML Schema

  7. An XML Schema Example ● Structure of an energy entry mandatory optional Value format? Energy Unit au, eV, ? ● Which energies have to be specified? Total Value Unit Electronic Value Unit Nuclear Value Unit

  8. Strengths of XML ● Platform-independent ● Based on international standards ● Web standard ● File is human as well as machine-readable ● A lot of software around to handle XML ● Based on SGML which exists since 1986

  9. Weaknesses of XML ● Redundant and verbose syntax ● Hierarchical model for representation ● Parsers have to check for improperly formatted data ● Parsers should be able to recurse arbitrarily nested data

  10. Chemical Markup Language (CML) ● Implementation of an XML for chemistry ● DTD/Schema covering chemistry in general – substances – quantities – structure – metadata – properties – ... ● Extensions for specific domains, for example for computational chemistry

  11. Drawbacks of CML ● Primarily designed for molecular structures and chemical reactions, not for QC – Validation: Don't use the extensibility of CML to define the additional tags – Efficiency: Not all mandatory fields of CML are required for QC compuations ● The array format is not consistent with the standard XML schema => accessibility problems from different platforms

  12. Other projects to create an XML standard for QC ● Quantum Monte Carlo – ALPS – Zori ● QMWISE ● GAMESS – structured data output ● QCML Example from the QCML working group – wrappers

  13. Standard? ● XML is becoming the de facto standard for transferring data in QC ● CML is a “patchwork” ● Much experience and knowledge in CML ● No other widespread XML schema for QC ● Put efforts together for a new standard?

  14. Thank you for your attention! For further information and references, please refer to: http://www.echinops.ch/downloads/

Recommend


More recommend