What do you do if your data fail your specification? Target ∉ ∈ ... Repair your data.
What do you do if your data fail your specification? Target Restriction ... ... Repair your data.
What do you do if your data fail your specification? Target Restriction ... ... Different ways of repairing data: Streaming Off-line
Can we streaming-repair each XML document with an uniform number of edits? Definition (informal) Given XML specifications R (restriction) and T (target), determine if there exist a streaming repair process S ∶ L (R) → L (T ) and an uniform bound N ∈ N : cost ( t , S ) ≤ N for all XML documents t ⊧ R . Streaming bounded repair problem
Can we streaming-repair each XML document with an uniform number of edits? Streaming bounded repair problem Example r → a ∗ ⋅ e r → d ⋅ c ∗ r r R : T : d → a ∗ ⋅ b ∗ e → b ∗ ⋅ c ∗ c c a a e d a a b b b b c c input ∶ <r> <d> <a/> <a/> <b/> <b/> </d> <c/> <c/> </r> output ∶ <r> <a/> <a/> <a/> <e> <b/> <b/> <c/> <c/> <c/> </e> </r>
Can we streaming-repair each XML document with an uniform number of edits? Streaming bounded repair problem Example r → ( a + b ) ⋅ x ∗ ⋅ ( a ∗ + b ∗ ) R 2 : r r r → a ⋅ x ∗ ⋅ a ∗ + b ∗ ⋅ x ∗ ⋅ b ∗ T 2 : b x x x a a a a x x x a a a input ∶ <r> <a/> <x/> <x/> <x/> <a/> <a/> <a/> <a/> </r> output ∶ <r> <b/> <x/> <x/> <x/> <b/> <b/> <b/> <b/> </r>
Summary of main results in the paper Effective characterization for the streaming bounded repair problem. ▸ For DTDs and XML Schemas (deterministic top-down tree automata). ▸ Based on a stack game between two players. Precise complexity of the streaming bounded repair problem. ▸ EXPTIME-complete. ▸ An exponential gap between the word and tree case.
Which DTDs are streaming bounded repairable? Cristian Riveros University of Oxford Pierre Bourhis University of Oxford Gabriele Puppis CNRS/LaBRI Bordeaux ICDT 2013
Outline Setting Streaming problem Main characterization Complexity
Outline Setting Streaming problem Main characterization Complexity
Trees and their XML-encoding XML encoding XML encoding Unranked trees over Σ Unranked trees over Σ t ∶ t ∶ person person ˆ ˆ t ∶ t ∶ <person> <person> <name> Chris </name> <name> </name> name name address address <address> <address> <str> </str> <str> Road </str> num num Chris str str <num> 369 </num> <num> </num> </address> </address> Road 369 </person> </person> XML specification A (e.g. XML Schema or unranked tree automata) L ( A ) { t ∈ Trees ∣ t ⊧ A } = Docs ( A ) { ˆ t ∈ XML ∣ t ⊧ A } =
Streaming transducers for repairing XML documents A repair strategy is a function f ∶ L (R) → L (T ) . A streaming repair strategy is a function S ∶ Docs ( R ) → Docs ( T ) : ▸ S is specified by a sequential transducer. ▸ S could have infinite memory. Cost of a streaming repair strategy S over ˆ t = a 1 . . . a n : n cost ( ˆ t , S ) = ∑ dist ( a i , u i ) i = 1 where u i is the output of S after reading a i .
Outline Setting Streaming problem Main characterization Complexity
Streaming bounded repair problem Definition Given XML specifications R and T , determine if there exists a streaming repair strategy S ∶ Docs ( R ) → Docs ( T ) and an uniform bound N ∈ N : cost ( ˆ t , S ) ≤ N ∀ ˆ t ∈ Docs ( R )
We have studied this problem over words and (non-streaming) trees 1. “Regular repair of specifications”, in LICS 2011. 2. “Bounded repairability for regular tree languages”, in ICDT 2012. Main ideas previous papers: Target: Restriction: t ′ t Similar approach does NOT work for the streaming case in general !
Deterministic top-down tree automata Definition A deterministic top-down tree automaton (DTT-automata) is a tuple: A = ( Σ , Q , δ, q 0 , F ) δ ∶ Q × Σ → Q × Q is the transition function, q 0 is the initial state, and F ⊆ Q is the final set of states. DTT-automata over the first-child-next-sibling encoding. Example R ∶ r → cb ∗ q 0 r r q c c → a ∗ q f c � c b b q b R ∶ δ ( q 0 , r ) = ( q c , q f ) b q a q f a a q b δ ( q c , c ) = ( q a , q b ) � a q f b q a � q f δ ( q a , a ) = ( q f , q a ) a � � q f q f q b δ ( q b , b ) = ( q f , q b ) � �
Deterministic top-down tree automata Definition A deterministic top-down tree automaton (DTT-automata) is a tuple: A = ( Σ , Q , δ, q 0 , F ) δ ∶ Q × Σ → Q × Q is the transition function, q 0 is the initial state, and F ⊆ Q is the final set of states. DTT-automata are more expressive than DTDs or XML Schema .
Outline Setting Streaming problem Main characterization Complexity
Main ideas of the characterization Target: T Restriction: R t t t t t t t 1. Transition graph of R and T . 2. Cyclic behavior: Strongly connected components. 3. Stack game between Generator and Repairer. ▸ Following the preorder traversal of the graph (stacks are needed).
Cyclic behavior of DTT-automata (components) Definition G A ∶ Given A = ( Σ , Q , δ, q 0 , F ) , the transition graph q 1 q 2 of A is the graph G A = ( Q , E h ∪ E v ) such that for every δ ( q , a ) = ( q 1 , q 2 ) : E v E h q SCC ( A ) is the set of strongly connected component X of G A . L ( A ∣ X ) = { C ∈ Context Σ ∣ ∃ p , q ∈ X ∶ δ ( p , C ) = q } p δ ( p , C ) = q iff r q 1 q f c � q 3 ⇒ Context C b q 2 q f q 4 a � q f b q 5 q f � a � � q f q f � ● q L ( A ∣ X 1 ) ⊆ L ( A ∣ X 2 ) , then the cyclic behaviour of X 1 is contained in the cyclic behaviour of X 2 .
Stacks over strongly connected components (Prefix rewriting systems). Stack alphabets: SCC ( R ) and SCC ( T ) . Rules of the form: A X ↦ X 1 X 2 X ⋅ w ⇒ X 1 ⋅ X 2 ⋅ w push: A X ↦ ǫ X ⋅ w ⇒ w pop: Two prefix-rewriting systems: Stack ( R ) and Stack ∗ ( T ) X ↦ X 1 X 2 ∈ Stack ( R ) δ ( p , a ) = ( p 1 , p 2 ) ∃ p ∈ X , p 1 ∈ X 1 , p 2 ∈ X 2 iff X 1 ≠ X ∧ X 2 ≠ X X ↦ ǫ ∈ Stack ( R ) always Y ↦ Y 1 Y 2 ∈ Stack ∗ ( T ) δ ′ ( q , a ) = ( q 1 , q 2 ) ∃ q ∈ Y , q 1 ∈ Y 1 , q 2 ∈ Y 2 iff Y ↦ ǫ ∈ Stack ∗ ( T ) always where X , X 1 , X 2 ∈ SCC ( R ) and Y , Y 1 , Y 2 ∈ SCC ( T ) .
Stack-game between Generator and Repairer Given R and T we define a turn-based game M ( R , T ) . Two players: Generator and Repairer. ▸ Generator plays over Stack ( R ) . ▸ Repairer plays over Stack ∗ ( T ) . Repairer Generator L ( R ∣ X 3 ) / L ( R ∣ X 1 ) ⊆ L ( T ∣ Y 1 ) L ( R ∣ X 2 ) ⊆ L ( T ∣ Y 2 ) ⊆ L ( T ∣ Y 2 ) Y 1 X 2 Y 0 Y 3 X 0 ↦ X 1 X 2 Y 0 ↦ Y 0 Y 2 X 2 X 1 Y 0 Y 2 X 0 ↦ X 1 X 3 Y 0 ↦ Y 1 Y 3 X 3 X 0 Y 2 Y 0 X 1 ↦ X 2 X 2 Y 0 ↦ Y 1 Y 3 Stack ( R ) Stack ∗ ( T )
Main characterization Theorem L ( R ) is streaming bounded repairable into L ( T ) iff Repairer has a winning strategy in M ( R , T ) . Details of the proof: read the paper.
Outline Setting Streaming problem Main characterization Complexity
Complexity of the streaming bounded repair problem Stack ( R ) : Non-recursive. Stacks are of polynomial size. Stack ∗ ( T ) : Stacks are of unbounded size (can be bounded by a polynomial). Theorem The streaming bounded repair problem for DTT-automata is EXPTIME-complete. For deterministic word and tree automata: non-streaming streaming words coNP PTIME trees coNEXPTIME EXPTIME
Concluding remarks Effective characterization for the streaming bounded repair problem. Only for DTT-automata (e.g. DTDs and XML Schemas). EXPTIME-complete for DTT-automata. Open problems: Characterization in the general case (regular tree languages). Amount of memory needed for the streaming strategy.
Which DTDs are streaming bounded repairable? Cristian Riveros University of Oxford Pierre Bourhis University of Oxford Gabriele Puppis CNRS/LaBRI Bordeaux ICDT 2013
Recommend
More recommend