Introduction and Motivation Introduction and Motivation SOAP and XML Benchmarks SOAP and XML Benchmarks Parallel XML Parallel XML Related Work Related Work Conclusions and Future Work Conclusions and Future Work Outline Introduction and Motivation 1 Analysis and Optimization for Processing XML and SOAP Grid-Scale XML Datasets Ubiquity of Multi-processing Capabilities Contributions SOAP and XML Benchmarks 2 Michael R. Head SOAPBench Ph.D. Candidate XMLBench Grid Computing Research Laboratory Parallel XML 3 Department of Computer Science Binghamton University Investigating System Cache Effects mike@cs.binghamton.edu Piximal : Parallel Approach for Processing XML Tuesday, May 12, 2009 4 Related Work 5 Conclusions and Future Work 1 / 52 2 / 52 Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work XML Defined <?xml version="1.0" encoding="UTF-8"?> <ns1:MoleculeType xsd:type="ns1:MoleculeType" xmlns:ns1="http://nbcr.sdsc.edu/chemistry/types" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <moleculeName xsi:type="xsd:string" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 1kzk Text based (usually UTF-8 encoded) </moleculeName> <moleculeRadius xsi:type="xsd:double" xsi:nil="true" Tree structured xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/> Language independent <atom xsi:type="ns1:AtomType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> Generalized data format <fieldName xsi:type="ns1:FieldNameType">ATOM</fieldName> ... </atom> <atom xsi:type="ns1:AtomType" ... </atom> ... </ns1:MoleculeType> 3 / 52 4 / 52
Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work Motivation from SOAP Importance of High Performance XML Processors Generalized RPC mechanism (supports other models, too) Becoming standard for many scientific datasets Broad industrial support HapMap - mapping genes Web Services on the Grid Protein Sequencing OGSA: Open Grid Services Architecture NASA astronomical data WSRF: Web Services Resource Framework Many more instances At bottom, SOAP depends on XML 5 / 52 6 / 52 Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work Explosion of Data Benchmark Motivation Enormous increase in data from sensors, satellites, experiments, Scientific applications place a wide range of requirements on the and simulations ∗ communication substrate and data formats. Use of XML to store these data is also on the rise Simple and straightforward implementations can have a severe XML is in use in ways it was never really intended (GB and large performance impact. size files) 7 / 52 8 / 52
Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work Prevalence of Parallel Machines XML and Multi-Core All new high end and mid range CPUs for desktop- and laptop-class computers have at least two cores Most string parsing techniques rely on a serial scanning process The future of AMD and Intel performance lies in increases in the number of cores Challenge: Existing (singly-threaded) XML parsers are already very efficient [Zhang et al 2006] Despite extant SMP machines, many classes of software applications remain single threaded Multi-threaded programming considered ‘‘hard’’ 9 / 52 10 / 52 Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work Contributions Contributions Continued We propose techniques to modify the lexical analysis phase for We present the design and implementation of a comprehensive processing large-scale XML datasets to leverage opportunities for benchmark suite for XML and SOAP implementations with parallelism. ( Piximal ) standard mechanisms to quantify, compare, and evaluate the We present an analysis of the scalability that can be achieved performance of each toolkit and study the strengths and with our proposed parallelization approach as the number of weaknesses for a wide range of use case scenarios. processing threads and size of XML-data is increased. We present an analysis of pre-fetching and piped implementation We present an analysis on the usage of various states in the techniques that aim to offset disk I/O costs while processing processing automaton to provide insights on why the performance large-scale XML datasets on multi-core CPU architectures. varies for differently shaped input data files. 11 / 52 12 / 52
Introduction and Motivation Introduction and Motivation XML and SOAP XML and SOAP SOAP and XML Benchmarks SOAP and XML Benchmarks Ubiquity of Multi-processing Capabilities Ubiquity of Multi-processing Capabilities Parallel XML Parallel XML Contributions Contributions Related Work Related Work Thesis statement Thesis statement Conclusions and Future Work Conclusions and Future Work Publications Thesis Statement ‘‘A Benchmark Suite for SOAP-based Communication in Grid Web Services,’’ in The Proceedings of Supercomputing 2005 ‘‘Benchmarking XML Processors for Applications in Grid Web In this thesis we present a comprehensive benchmark suite that Services,’’ in The Proceedings of Supercomputing 2006 facilitates the study of the strengths and weaknesses of XML and SOAP ‘‘Approaching a Parallelized XML Parser Optimized for Multi-Core toolkits for a wide range of use case scenarios. Processors,’’ in The Proceedings of SOCP 2007 , workshop held in conjunction with HPDC 2007 We propose a parallel processing model for some application-based ‘‘Parallel Processing of Large-Scale XML-Based Application large-scale XML datasets that can effectively leverage opportunities for Documents on Multi-core Architectures with PiXiMaL,’’ in The parallelism in emerging multi-core CPU architectures. Proceedings e-Science 2008 ‘‘Performance Enhancement with Speculative Execution Based Parallelism for Processing Large-scale XML-based Application Data,’’ to appear in The Proceedings of HPDC 2009 13 / 52 14 / 52 Introduction and Motivation Introduction and Motivation SOAP and XML Benchmarks SOAP and XML Benchmarks SOAPBench SOAPBench Parallel XML Parallel XML XMLBench XMLBench Related Work Related Work Conclusions and Future Work Conclusions and Future Work SOAP Benchmark Suite XML Benchmark Suite A chosen set of XML documents 1 Defines a set of operations to implement within a SOAP toolkit Low level probes Tests both serialization and deserialization of a variety of data Application-based benchmarks structures over a range of input sizes 2 A driver application for each XML processor Simple types: integers, strings, and floats Runs the parser on the input, but does not act on the data Base64 encoded data Eliminates application-level performance differences Complex types: event streams, mesh interface objects One for each interface style (SAX/DOM) 15 / 52 16 / 52
Recommend
More recommend