refinement correction strategy
play

Refinement Correction Strategy for Invalid XML Documents and Regular - PowerPoint PPT Presentation

Refinement Correction Strategy for Invalid XML Documents and Regular Tree Grammars Martin Svoboda and Irena Holubova (Mlynkova) svoboda@ksi.mff.cuni.cz DEXA 2014 Munich, Germany September 2, 2014 XML and Web Engineering Research Group


  1. Refinement Correction Strategy for Invalid XML Documents and Regular Tree Grammars Martin Svoboda and Irena Holubova (Mlynkova) svoboda@ksi.mff.cuni.cz DEXA 2014 Munich, Germany September 2, 2014 XML and Web Engineering Research Group Charles University in Prague

  2. Outline • Introduction • Corrections • Algorithms • Experiments • Conclusion Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 2

  3. Introduction • Motivation  Incorrect XML data ‒ Well-formedness, schema validity, data consistency • Input  One XML document ‒ Well-formed but (potentially) invalid  DTD or XSD schema • Goal  Structural corrections of elements Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 3

  4. Sample Correction • Document <a> <x><c/></x> <d><c/></d> <d><c/><a/></d> </a> • Grammar [a, C.D A *  A] [b, D B *  B] [c,   C] [d, C*  D A ] [d, A|B|C  D B ] Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 4

  5. Edit Operations • Edit operations  Add leaf node  Remove leaf node  Rename node • Edit sequences  Insert new subtree  Delete existing subtree  Repair existing subtree ‒ With an option of node renaming Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 5

  6. Edit Operations • Example renameNode(0,c), removeLeaf(0.0), renameNode(2.1,c) • Cost Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 6

  7. Algorithm Idea • Recursive processing  From the root node towards leaf nodes…  … and at each particular data tree node…  … correct a sequence of its child nodes • Example C.D A * Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 7

  8. Horizontal Correction • Automaton traversal  Start ‒ Before the entire node sequence ‒ At the initial automaton state  Step ‒ Before some particular node (if any) ‒ At some particular automaton state  End ‒ After the entire node sequence ‒ At one of the accepting states Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 8

  9. Correction Multigraphs • Structure  Vertices  Edges Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 9

  10. Shortest Paths • Paths  Source  Targets Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 10

  11. Intent Repair • Structure Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 11

  12. Intent Signatures • Observation  Different intents may lead to identical repairs ‒ We do not need to evaluate them repeatedly • Solution  Intent signatures  Repairs caching Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 12

  13. Correction Strategies • Strategies  Default  Exploring  Refinement Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 13

  14. Refinement Strategy • Observation  Until now we always worked with… ‒ … fully evaluated nested intents ‒ … and therefore their final costs • Idea  Refinement exploration based on estimations Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 14

  15. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 15

  16. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 16

  17. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 17

  18. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 18

  19. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 19

  20. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 20

  21. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 21

  22. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 22

  23. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 23

  24. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 24

  25. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 25

  26. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 26

  27. Refinement Strategy Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 27

  28. Refinement Strategy • Exploration loop  Complete vertex ‒ Explore outgoing edges ‒ Obtain first cost estimations ‒ Update current distances  Incomplete vertex ‒ Request refinement of open perspective ingoing edges • Assign a quota to limit the allowed refinement progress Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 28

  29. Execution times • Refinement strategy 4 Time in seconds 3 2 1 0 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k Number of nodes Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 29

  30. Conclusion • Features  Regular tree grammars  Compact repair structure  All minimal corrections  No parameters required  Nearly linear algorithms Refinement Correction Strategy for XML Documents September 2, 2014 DEXA 2014, Munich 30

  31. Thank you for your attention… Faculty of Mathematics and Physics Charles University in Prague

Recommend


More recommend