PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros, Theodore Dalamagas, Timos Sellis, Manolis Terrovitis Knowledge and Database Systems Lab School of Electrical and Computer Engineering National Technical University of Athens {pbour,dalamag,timos,mter}@dblab.ece.ntua.gr
Outline o Introduction o Contribution o Structures o Operators o Prototype o Related work o Conclusion PatManQL 2
Introduction o Huge volumes of data on the Web o Hierarchical structures and catalogs o Paths → knowledge artifacts n Represent group of data ð Conceptual clustering of raw data based on common properties n Semantic guides o Example: Portal catalogs PatManQL 3
Introduction adorama o Paths → alternative pattern root versions for the same group of data (a) cameras digital filters film & lenses o Example: searching for lenses b&w lenses Close PL cameras Up slide n /cameras & lenses/lenses printers point & shoot negative UV APS (adorama) 35mm SLR n /photo/35mm systems/lenses brand model ppm hp 3820 12 (B&H) hp 7350 17 B&H root brand model price 20 hp 6122 digital Canon EOS-3 990 ... ... ... photo general photography Nikon N65 300 bags other 35mm scanners Pentax ZX-M 350 formats systems filters cameras memory tripods ... cards ... ... SLR flatbed APS cameras scanners medium printers lenses film scanners (b) brand focald cam_model price Canon 50 EOS-3 400 EOS-3 450 Canon 80 N65 150 Sigma 28 ... ... ... ... PatManQL 4
Introduction adorama o Paths → complex pattern root o Example: searching for integrated (a) cameras digital filters film & lenses photo systems b&w lenses Close PL cameras Up slide n /cameras & lenses/35mm SLR printers point & shoot negative UV APS (adorama) 35mm SLR n /photo/35mm systems/lenses brand model ppm hp 3820 12 (B&H) hp 7350 17 B&H root brand model price 20 hp 6122 digital Canon EOS-3 990 ... ... ... photo general photography Nikon N65 300 bags other 35mm scanners Pentax ZX-M 350 formats systems filters cameras memory tripods ... cards ... ... SLR flatbed APS cameras scanners medium printers lenses film scanners (b) brand focald cam_model price Canon 50 EOS-3 400 EOS-3 450 Canon 80 N65 150 Sigma 28 ... ... ... ... PatManQL 5
Contribution o A model to represent paths as knowledge artifacts o The PatManQL language: n Operators to manipulate path-like patterns n Relational operators for data o A prototype PatManQL 6
Catalog Schema o A tree with: n a root ( ⊗ ) n a set of non-leaf nodes ( ) n a set of resource items as leaves ( □ ) o Data: instances (records) of resource item n Resource item: Relation R(a1, a2, …, an), where a1, a2, … attributes PatManQL 7
Catalog Schema X Hierarchy cameras digital film filters & lenses PL lenses cameras slide printers negative UV Catalog schema APS point & shoot 35mm SLR 8 1 2 SLR cameras 4 5 Digital printers 7 9 10 brand brand model model Resource items price ppm 12 hp 3820 990 Canon EOS-3 17 hp 7350 205 Nikon N65 hp 6122 20 Pentax ZX-M 148.50 Data PatManQL 8
Tree-Structure Relations (TSRs) o Combining catalog schemas with common resource item o Tree-Structure Relation (AND/OR-like graph): n One resource item n Paths organized in OR components o OR component: group of one or more paths (AND group) o OR components are alternative ways to access the common resource item n Paths = patterns PatManQL 9
Tree-Structure Relations (TSRs) OR #2 OR #1 OR #1 X X OR #2 camera & photo photo lenses photo photo 35mm 35mm SLR systems cameras lenses 35mm 35mm systems bodies SLR SLR cameras SLR systems brand (a) (b) brand model model price price lens_id PatManQL 10
Operators o Select ( σ ) n σ <attribute condition><path condition> (TSR) ð attribute condition: {=, ≠ , <} ð path condition: {=, ≠ , ⊂ , ∠ } n Filters instances of resource items and OR components PatManQL 11
Select example 'Select all non Pentax cameras with price greater than 200Euros, having "/photo/35mm systems" in their paths': σ <brand !="Pentax", price > 200><"/photo/35mm systems" ⊂ $_> (SLR systems) X X photo photo photo photo 35mm SLR 35mm lenses systems 35mm systems bodies (b) (a) SLR systems SLR systems brand model price lens_id brand model price lens_id Canon EOS-3 990 1 Canon EOS-3 990 1 Nikon N65 205 2 Nikon N65 205 2 ... ... ... ... Pentax ZX-M 148.5 3 ... ... ... ... PatManQL 12
Operators o Project ( π ) n π <attribute list><variable list> (TSR) ð attribute list: {attribute} ð variable list: {$i (path variable), #i (OR variable)} n Keeps attributes of resource item and paths of each OR component or OR components on the whole PatManQL 13
Project example 'Cameras with only the model and lens_id attributes and the rightmost component': π <model, lens_id><#2> (SLR systems) X X photo photo photo photo 35mm SLR 35mm lenses 35mm systems systems bodies (b) (a) SLR systems SLR systems model lens_id brand model price lens_id EOS-3 1 990 1 Canon EOS-3 2 2 N65 Nikon N65 205 2 ZX-M Pentax ZX-M 148.5 2 ... ... ... ... ... ... ... PatManQL 14
Operators o Cartesian product (X) n ( Τ SR1) Χ (TSR2) n Combine instances of resources and OR components PatManQL 15
Cartesian product example (SLR systems) X (Lenses) X X X = X photo photo camera & photo photo camera & photo camera & lenses 35mm photo lenses 35mm lenses SLR SLR 35mm systems lenses 35mm lenses lenses lenses lenses systems bodies bodies (b) (a) SLR systems Lenses (c) SLR systems cbrand cmodel cprice clensid lbrand lensid lprice cbrand cmodel cprice clensid lbrand lensid lprice 990 1 Sigma 1 200 Canon EOS-3 990 1 Canon EOS-3 Sigma 1 200 Tamron 2 100 Nikon N65 205 1 990 1 Canon EOS-3 Tamron 2 100 ... ... ... 148.5 2 Pentax ZX-M 205 1 Nikon N65 200 Sigma 1 ... ... ... ... Nikon N65 205 1 Tamron 2 100 148.5 2 Sigma 1 200 Pentax ZX-M Pentax ZX-M 148.5 2 Tamron 2 100 ... ... ... ... ... ... ... PatManQL 16
Operators o Union (U) n (TSR) U (TSR) n Union of instances and all OR components o Intersection ( ∩ ) n (TSR) ∩ (TSR) n Intersection of instances and all OR components o Difference (–) n ( Τ SR) – (TSR) n Instances of the first TSR not present in the second one and all OR components of the first TSR PatManQL 17
Union example (SLR systems) U (SLR systems) X X X photo photo U = photo photo photo photo 35mm 35mm SLR SLR 35mm lenses lenses 35mm systems systems bodies bodies (c) (a) (b) SLR systems SLR systems SLR systems cbrand cmodel cprice clensid cbrand cmodel cprice clensid cbrand cmodel cprice clensid 1 Canon EOS-3 990 990 1 990 1 Canon EOS-3 Canon EOS-3 205 1 Nikon N65 205 1 800 1 Nikon N65 Nikon FM2 148.5 2 Pentax ZX-M 148.5 2 148.5 2 Pentax ZX-M Pentax ZX-M ... ... Nikon FM2 800 1 ... ... ... ... ... ... ... ... ... ... PatManQL 18
Prototype o Interpreter o Query Execution Engine o Storage mechanism n XML files n MySQL RDBMS ð All-edges-in-one-table storage approach o Graphical Interface PatManQL 19
Related work o Pattern management (PANDA project) (S. Rizzi et al.) o Inductive databases framework (Tomasz Imielinski et al.) DMQL (Jiawei Han et al.), MINE RULE(R.Meo et al.) n ð Descriptive rules o Tree algebras TAX (H. V. Jagadish et al.) n ð Selecting – reconstructing bulk XML data YAT (V. Christophides et al.) n ð Tuple-based, not tree-based PatManQL 20
Conclusion o A model to represent paths as knowledge artifacts (patterns) n Catalog schema n Tree-Structure Relations (TSRs) o The PatManQL language: n Operators to manipulate paths as patterns and data o A prototype system PatManQL 21
Future Work o Properties of the Operators o Restructure operators o Join operator PatManQL 22
Questions (?) PatManQL 23
Tree-Structure Relations (TSRs) $1 $1 X X $1 $1 camera & photo photo lenses photo photo 35mm 35mm SLR systems cameras lenses 35mm 35mm systems bodies SLR $2 SLR cameras SLR systems brand (a) (b) brand model model price price lens_id PatManQL 24
Storage mechanism o XML file X <tsr name="SLR systems"> photo <or> <and>/photo/35mm SLR/bodies</and> photo <and>/photo/lenses</and> photo 35mm </or> SLR <or> lenses 35mm <and>/photo/35mm systems</and> systems bodies </or> <item> <attribute name="brand" type="…"/> <attribute name="model" type="…"/> SLR systems … <tuple>…</tuple> brand model … price </item> lens_id </tsr> PatManQL 25
Recommend
More recommend