an incremental correction algorithm for xml documents and
play

An Incremental Correction Algorithm for XML Documents and Single - PowerPoint PPT Presentation

An Incremental Correction Algorithm for XML Documents and Single Type Tree Grammars Martin Svoboda, Irena Mlnkov XML and Web Engineering Research Group Charles University in Prague The Czech Republic 24 April 2012 NDT 2012 Dubai, United


  1. An Incremental Correction Algorithm for XML Documents and Single Type Tree Grammars Martin Svoboda, Irena Mlýnková XML and Web Engineering Research Group Charles University in Prague The Czech Republic 24 April 2012 NDT 2012 Dubai, United Arab Emirates

  2. Outline • Introduction  Motivation  Objectives • Approach  Corrections  Algorithms  Experiments • Conclusion An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 2

  3. Introduction • Motivation  Incorrect XML documents ‒ Well-formedness ‒ Schema validity ‒ Data consistency ‒ …  Strategies ‒ Adjusting algorithms ‒ Correcting data An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 3

  4. Introduction • Problem  Input ‒ One XML document • Well-formed but (potentially) invalid ‒ DTD or XML Schema  Output ‒ All minimal repairs • Structural corrections of elements An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 4

  5. Definitions • Document  Trees ‒ Nodes for elements and texts ‒ Prefix numbering of nodes  Example ε a <a> <x><d/></x> 0 1 x d <d><d/><d/></d> </a> 0.0 1.0 1.1 d d d An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 5

  6. Definitions • Schema  Grammars ‒ Terminal symbols for element names ‒ Nonterminal symbols for types ‒ Production rules based on regular expressions  Classes ‒ Regular tree grammars ‒ Single type tree grammars (XML Schema) ‒ Local tree grammars (DTD) An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 6

  7. Model • Edit operations  ADD leaf, REMOVE leaf, RENAME label • Update operations  Sequences of edit operations  INSERT , DELETE , REPAIR , RENAME • Cost function  Unit costs of edit operations An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 7

  8. Model ε <a> Type Name Model a <x><d/></x> A a C.D* <d> 0 1 x d B b D* <d/><d/> C c empty </d> 0.0 1.0 1.1 d d d D d D* </a> ε ε ε a a b 0 1 0 1 2 0 1 c d c d d d d 1.0 1.1 1.0 2.0 2.1 0.0 1.0 1.1 d d d d d d d d d An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 8

  9. Algorithm • Naive algorithm  Task ‒ At each level of top-down tree processing… …find repairs for a sequence of sibling nodes  Steps ‒ Construct a repairing multigraph ‒ Recursively repair subtrees ‒ Compose a repairing structure An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 9

  10. Algorithm ε a 0 1 x d x d 0 1 2 0.0 1.0 1.1 d d d 00 10 20 RENAME Type Name Model A a C.D* 01 11 21 B b D* DELETE REPAIR C c empty INSERT D d D* 02 12 22 An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 10

  11. Algorithm ε a x d 0 1 c d 0 1 2 1.0 1.1 d d d 00 10 20 INSERT ε RENAME 1 a 2 01 11 21 0 1 2 c d d REPAIR RENAME 1.0 2.0 2.1 d d d 1 0 02 12 22 0 REPAIR An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 11

  12. Algorithms • Naive • Dynamic  Directly follows Dijkstra’s algorithm and, thus, only required multigraph parts are explored • Caching  Avoids repeated recursive computations by detecting and caching identical repairs • Incremental  Evaluates repairing multigraphs step by step An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 12

  13. Algorithms • Incremental  Task ‒ Structure encapsulating multigraph evaluation • Multigraph structure • Dijkstra’s variables  Scheduler ‒ Processing of an activated task: • Request further refinement of perspective edges • Activate corresponding tasks for nested problems An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 13

  14. Experiments • Data  Single type tree grammar ‒ 7 nonterminal symbols ‒ 6 terminal symbols ‒ Recursion, iteration  XML data trees ‒ Maximal depth 5, fan-out 8 ‒ Elements from 100 to 1,000 ‒ 20 files for each particular size ‒ Average values from 20 repeats An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 14

  15. Experiments • Execution time in miliseconds 40 Incremental 30 Caching 20 10 0 0 200 400 600 800 1000 Elements An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 15

  16. Experiments • Number of correction intents  Equals to a number of distinct multigraphs 4000 Caching 3000 Incremental 2000 1000 0 0 200 400 600 800 1000 Elements An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 16

  17. Conclusion • Contributions  Single type tree grammars  Always all minimal repairs  New incremental algorithm • Advantages  Compact repair structure  Prototype implementation An Incremental Correction Algorithm for XML Documents 24 April 2012 NDT 2012, Dubai, UAE 17

  18. Thank you for your attention… XML and Web Engineering Research Group Charles University in Prague

Recommend


More recommend