AM A χ O S—Abstract Machine for Xcerpt ➊ Principles ➋ Architecture PPSWR ‘06, Budva, Montenegro, June 11th, 2006 François Bry, Tim Furche , Benedikt Linse
Abstract Machine(s) Definition and Variants … abstract machine := interpreter for low -level code according to “machine” model (representation, instruct. set) abstract machine ~ virtual machine why abstract machines? thought models hardware or OS-level virtualization AMs for high-level (programming) languages 2
��� ����������������� ��������� ����� ���� ����� ������ ������������������� ������������ Virtualization everywhere … Carbon and Mac OS X As shown in the following figure, Carbon is one of several application environments available on Mac OS X. OS-level High-level languages C O V E R F E A T U R E Intel Virtualization Technology ECMA-335 3 rd Edition / June 2005 Once confined to specialized server and mainframe systems, virtualization is now supported in off-the-shelf systems based on Intel architecture Common Language hardware. Intel Virtualization Technology provides hardware support for processor virtualization, enabling simplifications of virtual machine Infrastructure (CLI) monitor software. Resulting VMMs can support a wider range of legacy Partitions I to VI and future operating systems while maintaining high performance. V Hardware 3
AM A χ O S—AM for Web Querying Operational Semantics for Xcerpt abstract machine ~ instruction set + machine model just like algebra ~ operators + data model both: precise query semantics on a “logical” level ——— on the operational/physical level “ optimizability ” different combinations of instructions with equivalent overall result but different performance characteristics 4
AM A χ O S—AM for Web Querying Vision … language neutral starting with a bias towards Xcerpt already: query core applicable for {Xcerpt, XQuery, XSLT, SPARQL} focus: in-memory processing of distributed data no (guaranteed) control over storage and indexing of data ad-hoc index creation (like XSLT key ) and data model selection +: distributed evaluation (if additional query nodes known) distribute compiled code over nodes acc. cost estimation 5
AM A χ O S—AM for Web Querying We have principles … Are we alone in this? not quite: XLSTVM now part of Oracle DB centralized query processing very specialized instruction set for XLST 1.0/Oracle algebra vs. abstract machine very similar idea: operators and their semantics but: usually tightly integrated (at least on physical layer) algebra for XML querying hot research issue 6
Enough vision already… … on to the details …
Example … var Y ← a[[ b { var X }, var Y → desc c { var X } ]] How to evaluate this? one “eval” operator? splitting it into base relations → conjunctive queries root(v 0 ) ∧ child(v 0 , v 1 ) ∧ label(v 1 , “a”) ∧ child(v 1 , v 2 ) ∧ label(v 2 , “b”) … much better for optimization (move “tough” decisions to compile-time) but: naively done exponential compromise path, twig operators: root(v 0 ) ∧ path(v 0, “a.b”, v 1 ) … split at join and result variable 8
AM A χ O S—Core Data Model … several variants but common principles: basic type: node with properties element (structured) vs. content (atomic) nodes semi-structured data model with node identity differs from previous Xcerpt DM (infinite regular trees) memory model: memoization matrix non-1-normal-form table of operator results non-redundant (polynomial) store of query results 9
AM A χ O S—Core d 1 bib v 5 1 conference d 2 Root conference Child Child + 3 1 4 2 paper Child + name d 3 d 8 d 11 d 14 v 4 paper posters pc name v 2 Child + ‘Storage Media’ 1 1 2 1 3 d 4 author member d 7 d 9 d 12 d 13 2 title author paper member member v 1 v 3 ‘Wax Tablets’ ‘Cicero’ ‘Cicero’ ‘Hirtius’ 1 d 5 paper d 10 author 1 d 6 Variable Node Sub-Matrix author v 5 d 2 Variable Node Sub-Matrix v 4 d 3 Variable Node Sub-Matrix v 1 d 6 v 1 d 7 v 4 d 5 Variable Node Sub-Matrix v 3 d 11 10 v 2 d 13
Operators … Three phase algorithm matrix population evaluates only a spanning tree T of operators from query Q “directed” semi-joins → polynomial evaluation expansion of non-tree joins (similar to OO DBS case) worst-case exponential in time and space matrix consumption construction in the flavor of complex value algebra 11
Operators … Matrix population (spanning tree T of query Q) unary relations ( property filters) binary & ternary relations ( structural assembly) basic relations (child, desc), (reg.) path operators, twig operators Non-tree join expansion value, identity, and (direct) structural join Matrix consumption basic constructors for each node type grouping, aggregation, order, … 12
Optimizability … Lot’s of freedom at compilation how to distribute operators between phase (1) and (2) matrix population: semi-joins, but only acyclic CQ join expansion: arbitrary shape, but exponential “cover” areas for join variables to reduce exponent hypertree/query decomposition choosing the “right” operator conjunction of base relations vs. twig operator supportive indices and DM variants e.g., set-based vs. streaming (time vs. space) 13
AM A χ O S—Execution The core of the core: the evaluation algorithm … Complexity tree query graph query O ( q · v 2 + o ) O ( v q ) tree data O ( v q ) graph data O ( q · v · e + o ) Table 1: Overview of Combined Time Complexity ( q : number of query variables; e , v number of edges, vertices resp., in the data; o : size of output) 14
The core of the core: the evaluation algorithm … 10000 without memoization with memoization time (msec, logarithmic) 1000 100 10 1 0.1 0.01 0 5 10 15 20 25 30 35 40 query size (variables) data size fixed 15
The core of the core: the evaluation algorithm … 900 top-down 800 700 600 time (msec) 500 400 300 200 100 0 0 5 10 15 20 25 data size (MB) query size fixed (~ 20 nodes) 16
Query Network AM A χ O S Code Local Data Source Application Hint Segment —e.g. document Local Data Source control API (Java) —e.g. database Dependency Segment —e.g. document Code Segment —e.g. database rule 1 rule 1 AM A χ O S Node rule 2 rule 2 … Xcerpt Node rule 1 rule n rule n Application Query Compiler AM A χ O S Node Web Service API query AM A χ O S Node conjunct q 2,2 Xcerpt Program rule 2 rule 1: c 1 ← q 1,1 ∧ q 1,2 ∧ … ∧ q 1,k 1 rule 2: query query c 2 ← q 2,1 ∧ q 2,2 ∧ … ∧ q 2,k 2 rule 3: conjunct q 1,2 conjunct q 1,1 Application c 3 ← q 3,1 ∨ q 3,2 ∨ … ∧ q 3,k 3 … command-line interface AM A χ O S Node AM A χ O S Node AM A χ O S Node query conjunct q 2,1 Local Data Source Remote Data Source Remote Data Source —e.g. document —e.g. Web service —e.g. Web service —e.g. database 17
AM A χ O S—Architecture And a way to realize them … Compilation API Execution & Answer API Control Plane — simple observation and control API — control, observation, parameterization — compilation strategies — OO & Web Service API Program Parsing & Validation Layer Compilation Layer Execution Layer (AM A χ O S) Plane — pattern matching engine — program parsing and validation — unsatisfiable, tautological parts — rule dispatcher and engine — multi-parser, normalization, modules — extensive query optimization Schema Access Layer Data Access Layer Serialization Layer Plane Data — provides access to schema of data — incremental data access — incremental answer creation — type checking for compilation — storage and indexing engine — versatile Web format support 18
AM A χ O S—Architecture core layer: execution or AM A χ O S proper … Query Compilation Answer API Abstract Machine AM A χ O S Dependency Static Function Library Function Call Hints Rule Engine Rule Call (Recursion) Rule Dispatch Pattern Matching Engine Construction Engine Abstract Code Memoization Matrix In-Memory Machine Code Scheduler Substitution Sets Answer Construction Variable Node Sub-Matrix Answer v 5 d 2 Variable Node Sub-Matrix v 4 d 3 Variable Node Sub-Matrix AM Code v 1 d 6 v 1 d 7 v 4 d 5 Variable Node Sub-Matrix Hint Segment Storage Manager v 3 d 11 v 2 d 13 Dependency Segment Storage & Index Hints Code Segment rule 1 rule 1 Runtime Data Access Layer rule 2 rule 2 … 19
Recommend
More recommend