A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
Efficient Processing of Sequentially Accessed XML Data Web Service Implementations & RMI Web XML Message XML Transformed Service message Transformer XML message
Efficient Processing of Sequentially Accessed XML Data Web Development Web Front-End XHTML XML XML-to-XHTML page file Transformer
Efficient Processing of Sequentially Accessed XML Data Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML XML XML target archive Processor file file
Efficient Processing of Sequentially Accessed XML Data Sensor Data Analysis Stream Acting/ Sensor Data XML Mining Processor Stream Software
Bandwidth & Connectivity will Increase the Amount of Data … X M XML XML XML L stream stream XML Sensor Data XML Processor XML XML XML stream stream
…Hardware Advances do not Favor Conventional Architectures CPU Speed Bandwidth Magnitude CPU2Memory Speed Year
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
Transducer-Based Processing: On-the-Fly & Minimal Memory XML Stream Machine Condition | Action Condition … … | Action Output Input buffer buffer … Buffers …
XML Stream Machine (XSM) High-Level Architecture XQuery Optional Input XQuery Compiler DTD XSM XSM-to-C Compiler C program
Components of the XQuery Compiler XQuery Optional Input DTD XQuery-to-Network Translation Schema Optimization XSM Network XSM Composition Single XSM
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
XQuery Subset Path Expressions for $X in $R/a return for $Y in $X/b return <res> $Y , $X </res> Element for-where-return Construction Expressions Concatenation
XML Stream: Tags, Data & Control Tokens … S $R E $R <r> <a> <b> 5 </b> <b> 1 </b></a> � Control Tokens � Data XML Stream is Sequence of � Open Tag & Close Tag Tokens
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X <b> 5 </b><a>5 </b> <b> 1 </b> </a> E z S z Output Buffer Z *x=S | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y Concatenation of bindings of Y, X 3 C into bindings of Z *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z Output *x=S w(z,S Buffer Z | y++ 1 *y!=E y | | x ), x++ z w(z,*y), *y=S y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> 5 </b> S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ | y++ 2 *x=E 0 w(z,E | *y=E x ), x++ y z 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> 5 </b> S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> 5 </b><a> S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ 2 *x=E 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> 5 </b><a>5 </b> <b> 1 </b> </a> S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ *x=E 2 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
XML Stream Machine (XSM) y Input … S y E y S y E y S y <b> 5 </b> <b> 1 </b> Buffer Y x Input … S x E x S x <a> <b> 5 </b> <b> 1 </b></a> Buffer X z <b> 5 </b><a>5 </b> <b> 1 </b> </a> E z S z Output *x=S Buffer Z | y++ w(z,S 1 *y!=E y | | x ), x++ w(z,*y), *y=S z y y++ *x=E 2 0 | y++ w(z,E | x ), x++ *y=E z y 3 C *x!=E x | w(z,*x), x++
Comparison of XSM against State Automata & Transducers XSM State Automata Transducers � Unbounded � Do not construct � Finite alphabets alphabet � Do not store � State is the � Buffers intermediate only memory results � Pointer reset � No reset of � Sufficient for input pointers XPath only
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
XSM Networks: Intermediate Step in Translating Queries to XSMs XQuery XQuery-to-Network Translation XSM Network XSM Composition Single XSM
XSM Network for $X in $R/a return for $Y in $X/b return <res> $Y , $X </res> $X/b $Y $X’ For $Y $R $R/a $X [$Y,$X] → $Y’ [$Y’,$X’] $O $Y’,$X’ $Z <res> $Z </res>
From XQueries to XSM Networks: Non-FLWR Expressions $X $O <res> $Y, $X </res> $Y $X $Z $O $Y,$X <res> $Z </res> $Y
From XQueries to XSM Networks: FLWRs without Free Variables for $X in G return expr( $X ) $R $X expr ($X) $O G
From XQueries to XSM Networks: FLWRs with Free Variables for $Y in $X/b return <res> $Y , $X </res> free variable $X $Y $X/b $X’ For $Y <res> $X $O [$Y,$X] → $Y’, $X’ $Y’ [$Y’,$X’] </res>
Overview � Motivation � Architecture � Framework: Streams + XQuery � XSM (XML Stream Machine) � XSM Networks � Network Composition � Conclusions
Composition Merges Two XSMs Into One $X/b $Y $X’ For $Y $R $R/a $X [$Y,$X] → $Y’ [$Y’,$X’] $O $Y’,$X’ $Z <res> $Z </res>
Composition Merges Two XSMs into One $X/b $Y $X’ For $Y $R $R/a $X [$Y,$X] → $Y’ [$Y’,$X’] <res> $O $Y’, $X’ </res>
XSM Composition: “State Product” Emulates Producer-Consumer Producer M 1 Consumer M 2 q 2 q 1 “State Product” M 3 = (M 2 o M 1 ) q 1 q 2
Naive Composition r 1 ... r n M 1 M 2 ϕ 1 |A 1 ϕ 2 |A 2 ... ... ... ... q 1 q 1 ’ q 2 q 2 ’ ψ (q 2 ) = ¬AE(r 1 ) ∧ ... ∧ ¬AE(r n ) = “no shared read-pointer r i of q 2 is A t E nd” M 3 = (M 2 o M 1 ) ψ∧ϕ 2 |A 2 ... ... M 2 step if ψ (q 2 ) q 1 q 2 q 1 q 2 ’ ¬ ψ∧ϕ 1 |A 1 ... ... M 1 step if ¬ ψ (q 2 ) q 1 q 2 q 1 ’ q 2
Recommend
More recommend