Xcerpt and visXcerpt: Integrating Web Querying Sacha Berger François Bry Institute for Informatics Tim Furche University of Munich Benedikt Linse Andreas Schroeder http://www.pms.ifi.lmu.de/
1 Data: Semi-structured Trees & Graphs Consistent Extension of XML Graph data model for Xcerpt and visXcerpt — children order may be irrelevant — as in RDF and semi-structured DBs like Lore — possible transparent resolution of — great attention to XML specificities such as non-hierarchical relations attributes and namespaces Overview Data Patterns Rules
Bibliography Entries: DBLP-style Identifier and label of elements Context-Menu: Interactive Features Folding elements for information focus Ordered vs. unordered children list Non-hierarchical relations as hyperlinks Element nesting ( child relation) becomes box nesting and colors Bibliography Entries — rather regular schema with optionals — several ordered lists, otherwise keyed attributes Overview Data Patterns Rules
Topics and Themes: SKOS Ontology acm98:E_1_c ‘Graphs and Networks’ acm98:CCS r e w o acm98:E acm98:E_1 r r a n h a s T o p ‘Computing Classification System’ C o n c e p t mybib:conf_dmc acm98:E_1_d narrower t p narrower e c n subject o hasTopConcept C p o ‘Trees’ T ‘Data Structures’ s ‘Data’ ‘Advancements in Data a h primarySubject Management for Military and Civil Application’ acm98:D ‘Software’ acm98:H_3 ‘Information Storage and narrower narrower Retrieval’ narrower narrower acm98:H acm98:H_3_2 acm98:D_1 narrower n acm98:H_2 a r r o ‘Information Systems’ w ‘Information Storage’ e r ‘Programming ‘Operating Systems’ ‘Database Management’ Techniques’ narrower acm98:D_4 r n e narrower narrower a ‘Systems and Software’ w r acm98:H_3_4 o r r o r w a e acm98:D_4_2 n r acm98:H_2_1 acm98:H_2_2 acm98:D_1_6 acm98:D_1_7 narrower ‘Storage ‘Logical Design’ ‘Physical Design’ ‘Logic Management’ Programming’ ‘Visual narrower narrower ‘Performance evaluation Programming’ (e ffi ciency and effectiveness)’ acm98:H_2_1_a acm98:H_3_4_d acm98:D_4_2_e ‘Secondary ‘Data Models’ Storage’ primarySubject narrower narrower acm98:D_4_2_e_i acm98:D_4_2_e_ii subject related ‘Wax Tablets’ ‘Papyri’ t c subject e j mybib:journal_adm primarySubject b related u S y t r c a e ‘Applied Data j m b i d u e t s a Management’ r l e r p mybib:article_66_scaurus_qumran mybib:inproc_44_brutus ‘From Wax Tablets to Papyri: The primarySubject Qumran Case Study’ ‘Space- and Time-Optimal Data Storage on Wax Tablets’ ‘Efficient Management of Rapidly Changing Personal Records’ mybib:article_66_wax_cicero
2 Patterns: Examples for Selected Data Logical Variables in Patterns Query-by-Example paradigm — select relevant data ( n -ary queries ) — queries just like data plus variables , — group and aggregate data incompleteness , optionality , negation — join di ff erent data items — patterns plus variables instead of navigation Overview Data Patterns Rules
Basic Patterns: Variables and Incompleteness Accessing Web resources : arbitrary XML documents can be accessed using their URL Incomplete patterns in depth : descendant allows additional intermediary elements Grouping collects alternative bindings for variables: essential for structural assembly Incomplete patterns in breadth : partial patterns allow additional child elements Variables are used in lieu of data : express selection, joins, or arithmetic conditions Basic Pattern “return the titles of all top-level sections in articles by Marcus Tullius Cicero and published in ‘Applied Data Management’. ” Overview Data Patterns Rules
Complex Patterns: Formulas, Join, Optionality Terms as formulas : Terms may contain boolean connectives, variables, negation , etc. Subterm negation: Some subterms may be required not to occur in matching data Optional subterms: Local form of disjunction essential for variable schema data Value Joins: Expressed through multiple variable occurrences Optional construction: Limited form of conditional construction based on variable bindings Complex Pattern “return titles and optionally paragraphs of all top-level sections without figures in articles on the topic ‘Wax Tablets’. ” Overview Data Patterns Rules
3 Rules: Separation of Concern by Views Separation of Concern by Views Separation of Query and Construction — separate tasks of a query in rules — two separate parts in rules — efficient evaluation of chained queries — no mixing of construction and querying — memoization and unfolding — instead chaining where necessary Overview Data Patterns Rules
Rules: Inference, Views, and Chaining Terms as formulas : Terms may contain boolean connectives, including disjunctions Rules separate construction from querying and allow for procedural abstraction in query programs Rules and Chaining “close the skos:related relation on the provided data by adding skos:subject and traversing the closure of skos:narrower” Overview Data Patterns Rules
Recommend
More recommend