Scalable XQuery Type Matching Jens Teubner · IBM T. J. Watson Research Center teubner@us.ibm.com
Scalable XQuery Type Matching Type matching: Inspection of dynamic type information at runtime. 1 Compare runtime types of ( x 1 , . . . , x k ) against t i in turn. typeswitch ( x 1 , x 2 , ..., x k ) case t 1 return e 1 2 First matching branch case t 2 return e 2 determines expression result. . . . Likewise: case t n return e n e instance of t default return e def e / ax ::element ( n , t ) This talk describes a scalable and efficient implementation for 1 . → Leverage existing DBMS capabilities (aggregation). → Faithful to XQuery semantics. Scalable XQuery Type Matching Jens Teubner 2 / 14
The XQuery Data Model XQuery: item = value + type annotation x = v of type t (atomic values) x = element n of type t { · · · } (element nodes) x = attribute n of type t { · · · } (attribute nodes) (text nodes) 1 x = text { · · · } . . . A type annotation t references a (named) XML Schema type. Type information may come, e.g. , from a validated XML instance. Type matching is XQuery’s means to access type annotations. 1 Text, comment, and processing instruction nodes do not carry type information. Scalable XQuery Type Matching Jens Teubner 3 / 14
The XDM Type Hierarchy xs:anyType Types arrange into a hierarchy . xs:untyped Derived types are added according xs:anySimpleType to their base type . xs:anyAtomicType xs:boolean xs:decimal xs:integer xs:string xs:untypedAtomic user-defd. list types user-defd. complex types Scalable XQuery Type Matching Jens Teubner 4 / 14
The XDM Type Hierarchy xs:anyType Types arrange into a hierarchy . xs:untyped Derived types are added according xs:anySimpleType to their base type . xs:anyAtomicType xs:boolean xs:decimal my:shoesize xs:integer my:hatsize xs:string xs:untypedAtomic my:hatsizelist my:stockitem Scalable XQuery Type Matching Jens Teubner 4 / 14
The XDM Type Hierarchy xs:anyType Types arrange into a hierarchy . xs:untyped Derived types are added according xs:anySimpleType to their base type . xs:anyAtomicType xs:boolean let $x := my:hatsize (56) xs:decimal return my:shoesize $x instance of xs:decimal xs:integer my:hatsize xs:string Existing implementations take xs:untypedAtomic the semantics of type matching my:hatsizelist quite literally. my:stockitem → Expensive recursion . Scalable XQuery Type Matching Jens Teubner 4 / 14
Type Ranks Use tree encoding to encode xs:anyType 0 12 type hierarchy. xs:untyped 1 0 xs:anySimpleType 2 9 → pre : preorder rank (of types!) xs:anyAtomicType → size : number of derived types 3 7 xs:boolean 4 0 → cf. XPath Accelerator xs:decimal 5 3 Use pre values to implement type my:shoesize 6 0 annotations . xs:integer 7 1 → “type ranks” my:hatsize 8 0 xs:string 9 0 t 1 derives from t 2 xs:untypedAtomic 10 0 ⇔ my:hatsizelist 11 0 pre ( t 2 ) ≤ pre ( t 1 ) ≤ pre ( t 2 ) + size ( t 2 ) my:stockitem 12 0 Scalable XQuery Type Matching Jens Teubner 5 / 14
Type Ranks Use tree encoding to encode xs:anyType 0 12 type hierarchy. xs:untyped 1 0 xs:anySimpleType 2 9 → pre : preorder rank (of types!) xs:anyAtomicType → size : number of derived types 3 7 xs:boolean 4 0 → cf. XPath Accelerator xs:decimal 5 3 Use pre values to implement type my:shoesize 6 0 annotations . xs:integer 7 1 → “type ranks” my:hatsize 8 0 xs:string 9 0 t 1 derives from t 2 xs:untypedAtomic 10 0 ⇔ my:hatsizelist 11 0 pre ( t 2 ) ≤ pre ( t 1 ) ≤ pre ( t 2 ) + size ( t 2 ) my:stockitem 12 0 � �� � � �� � known at compile time! Scalable XQuery Type Matching Jens Teubner 5 / 14
Type Ranks xs:anyType 0 12 let $x := my:hatsize (56) xs:untyped 1 0 return xs:anySimpleType 2 9 $x instance of xs:decimal xs:anyAtomicType 3 7 xs:boolean 4 0 my:hatsize xs:decimal 5 3 $x = 56 of type 8 my:shoesize 6 0 xs:integer $x instance of xs:decimal 7 1 ⇔ my:hatsize 8 0 5 ≤ 8 ≤ 5 + 3 xs:string 9 0 xs:untypedAtomic 10 0 xs:decimal xs:decimal my:hatsizelist 11 0 my:stockitem 12 0 Decidable in constant time . Scalable XQuery Type Matching Jens Teubner 6 / 14
Sequences and Occurrence Indicators The argument to type matching typically is a sequence. ( x 1 , . . . , x k ) instance of t � � ∈ { � , ? , + , * } The match succeeds iff 1 x i matches t for all x i in x = ( x 1 , . . . , x k ) and 2 the sequence length k is compatible with the occurrence indicator � . Scalable XQuery Type Matching Jens Teubner 7 / 14
Sequences and Occurrence Indicators Expressed in terms of type ranks: 1 x i matches t for all x i in x = ( x 1 , . . . , x k ) ⇔ ∀ ( x i = v i of type t i ) ∈ x : pre ( t i ) ≥ pre ( t ) ∧ pre ( t i ) ≤ pre ( t ) + size ( t ) Scalable XQuery Type Matching Jens Teubner 8 / 14
Sequences and Occurrence Indicators Expressed in terms of type ranks: 1 x i matches t for all x i in x = ( x 1 , . . . , x k ) ⇔ ∀ ( x i = v i of type t i ) ∈ x : pre ( t i ) ≥ pre ( t ) ∧ pre ( t i ) ≤ pre ( t ) + size ( t ) Type aggregation: ⇔ � � min ( x i = v i of type t i ) ∈ x pre ( t i ) ≥ pre ( t ) � � ∧ max ( x i = v i of type t i ) ∈ x pre ( t i ) ≤ pre ( t ) + size ( t ) Find minimum and maximum type ranks first, then compare once. Scalable XQuery Type Matching Jens Teubner 8 / 14
Type Aggregation Aggregation (once more) beneficial for efficient XML processing. Implementations highly tuned in today’s DBMSs. Likewise: Use aggregation to test compatibility with occurrence indicator � : 2 the sequence length k is compatible with � ⇔ Count sequence items, then compare according to � . Scalable XQuery Type Matching Jens Teubner 9 / 14
Type Aggregation in Relational XQuery Example: XQuery on purely relational database back-ends. 2 iter pos item type All loops unrolled, iter : logical iteration. 1 1 6 43 pos : sequence order, item holds payload. 1 2 56 8 new column type : preorder type ranks. 2 1 9 "XL" Type aggregation: SELECT iter , MIN( type ), MAX( type ), COUNT(*) FROM q GROUP BY iter 2 http://www.pathfinder-xquery.org/ Scalable XQuery Type Matching Jens Teubner 10 / 14
Type Aggregation in Relational XQuery iter pos item type my:shoesize Example: 1 1 43 6 1 2 56 8 my:hatsize e instance of xs:decimal* 2 1 "XL" 9 aggregate xs:string 1 Add type information to loop-lifted sequence encoding. iter min max 1 6 8 2 Aggregate, then compare. 2 9 9 compare min ≥ 5 ∧ max ≤ 5 + 3 ? iter min max res 3 Projection re-establishes 1 6 8 true 2 9 9 false loop-lifted encoding. project → Standard DBMS operators iter pos item type suffice. xs:boolean 1 1 true 4 2 1 false 4 Scalable XQuery Type Matching Jens Teubner 11 / 14
Type Aggregation in an RDBMS Proof-of-concept implementation using SQL. 1000 DB2 9 SQL execution time [sec] 100 FpML schema (777 types) 10 10,000 for iterations 1 0.1 5 20 50 � �� � non-indexed average sequence length / iteration recursive type ranks type ranks + aggregation Scalable XQuery Type Matching Jens Teubner 12 / 14
Type Aggregation in an RDBMS Proof-of-concept implementation using SQL. 1000 DB2 9 SQL execution time [sec] 100 FpML schema (777 types) 10 10,000 for iterations 1 0.1 5 20 50 50 � �� � � �� � non-indexed indexed average sequence length / iteration recursive type ranks type ranks + aggregation Scalable XQuery Type Matching Jens Teubner 12 / 14
Type Aggregation has Even Further Potential Type aggregation yields new runtime guarantees. typeswitch : Match a sequence against a number of types in turn. Type aggregation: Traditional: typeswitch ( x 1 , x 2 , ..., x k ) aggregate O ( k ) case t 1 return e 1 match O ( k ) compare O ( 1 ) case t 2 return e 2 match O ( k ) compare O ( 1 ) . . . . . . . . . . . . . . . case t n return e n match O ( k ) compare O ( 1 ) default return e def � � O ( n · k ) O ( n + k ) Recursion may further increase left-hand-side complexity. Scalable XQuery Type Matching Jens Teubner 13 / 14
Summary A scalable implementation for XQuery’s dynamic type semantics. Type ranks: constant time for singleton type matching. → Inspired by XPath Accelerator tree encoding. Type aggregation: use aggregation to handle sequences. → Exploit efficient implementations in modern DBMSs. New runtime guarantees: O ( n · k ) → O ( n + k ) for typeswitch es Faithful to XQuery semantics . → Paper also covers XML node matching, incl. substitution groups Scalable XQuery Type Matching Jens Teubner 14 / 14
Recommend
More recommend