Gr Graph Analysis of Candidate GQ GQL Features Graph Query Language Project Existing Languages Working Group Thomas Frisendal thomasf@tf-informatik.dk, @VizDataModeler 2019-02-26
The ”Existing Languages Working Group” • In preparation to the commencement of planning for GQL, interested parties -- drawn from industry (Neo4j, Oracle, Redis Labs and TigerGraph), the community (a noted data modelling expert and published technical author), and academia (the University of Talca in Chile) -- formed an informal working group called the “Existing Languages Working Group”. • We have worked in an incremental fashion on systematically identifying, surveying, analysing and comparing graph query language features, drawn from the following existing query languages: • Cypher • PGQL • GSQL • SQL PGQ [ Framework:2020 , Foundation:2020 , SQL/PGQ IWD , ERF-035 • G-CORE. • We hope to comprise a catalogue of: • the groups of features • to which extent (if at all) these are supported in each language • exemplar syntax • supplementary artifacts to aid in the understanding of the underlying semantics • grammar constructs • and any additional details of interest. • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL by virtue of a well-informed work plan and helping to lead to a more robust outcome; i.e. this would help us to have clear and meaningful discussions on scope and priorities, and will facilitate clear and unambiguous design choices. Moreover, this will help us to identify areas of consolidation, innovation and opportunities for language interoperation in GQL (for example, with SPARQL).
Combatting Complexity: The ELWG Graph Database • Establishing an analytical graph database for all 5 languages across all 212 features • Down to the keyword level for each feature of each language across 5 descriptive (text / syntax) dimensions • Now in its 3rd edition • Methodology: • Consolidate all sheets into one • Generate MERGE commands for the features tree and the 5 languages (by way of Excel formulas) • Some manual intervention (remove CR’s and change ;’s to §’s) • Load into Neo4j • Connect all components • Build tags for Descriptors, GrammarTags and SyntaxTags • Build a Keyword tag tree based on all of the 3 above • Do some reporting (this ppt and some excel sheets) • Will be made availabe to phase 2 and in the GQL design work (for analysis) • Ambition: Pragmatic, analytical support tool, not a normative source • Errare humanum est – report errors and omissions, please (a few known issues already)
Current Meta Model
Statistics Node types Count Min rels Max rels Feature 212 6 14 FeatureArea 6 1 17 FeatureGroup 30 2 27 InclDoc 5 80 549 InclLang 1306 4 4 Language 5 208 311 GCOREFeature 212 2 18 GSQLFeature 212 2 30 OpenCypherFeature 212 2 29 PGQLFeature 212 1 25 SQLFeature 212 2 29 DescriptorTag 401 1 22 GrammarTag 299 1 424 KeywordTag 659 1 247 SyntaxTag 214 1 247
The Features Tree
Comparison of Planned or Implemented Features GCORE PGQL GSQL SQL OpenCypher
Implementation Status (Not = ’X’) GCORE: 72, GSQL: 152, Cypher: 168, PGQL: 113, SQL: 140
Implementation Status Not Supported (’X’) GCORE: 118, GSQL: 54, Cypher: 43, PGQL: 99, SQL: 71
The Descriptor Tags
Function Invocation The Grammar Tags (Cypher) Not Defined (SQL)
The Syntax Graph
Part of the Syntax Graph
Zooming in on a ”Word” in the Syntax Graph
Even More Tags in the Essentially the Syntax Tags enhanced with keywords extracted from the Descriptor and Grammar Tags Keyword Graph
Collected Keywords per Feature and Language
Using a Graph Algorithm to Measure Similarity of Expression (Jaccard) AvgSim Feature Name AvgSim Feature Name AvgSim And 1,00 Dynamic property access (accessing a property of a node 1,20 or edge by using a dynamically-computed string value as Comparing values (equality) 1,00 the key§ e,g, allowing for the key to be passed in as a Equality 1,00 1,00 - parameter) Greater than 1,00 Escaping characters - 0,80 Greater than or equal to 1,00 Flattening a list (transform a list into a series of rows§ Inequality 1,00 transpose) - 0,60 Get all the elements of a list/collection/array excluding Less than 1,00 the first element - Less than or equal to 1,00 0,40 Get all the labels for a node - Negation 1,00 Get the identifier of a node or edge - Or 1,00 0,20 Node pattern with label negation - Type coercions (i,e, implicit type conversions) 1,00 interval - - approximate 32-bit binary decimal number 1,00 multidimensional array - … … … … … approximate 64-bit binary Compute 'e' raised to a given Update all properties on an Addition operator for temporal Get all the elements of a Edge pattern with label Standard aggregating operations Get all the nodes in a path Edge directions: r-to-l Sorting returned rows Edge property predicates time with time zone basic list/array Projecting rows Delete an edge Reading from a graph Create an edge And Less than Element existence checking Conversion Power multiset Obtain the current date/time 0,06 approximate 64-bit binary decimal number 1,00 Get all the nodes in a path 0,07 Edge directions: l-to-r 0,87 List/collection/array concatenation 0,07 Specifying a conditional value 0,87 Get all the edges in a path 0,08 date 0,83 Determine whether or not a value is a member of a local time 0,83 0,08 multiset Check if a property exists on a node or an edge 0,80 Input graph specification 0,08 Edge directions: r-to-l 0,79 List equality 0,08 Edge pattern with disjunction of labels 0,79 Create an edge 0,09 Get the edge label as a string 0,09 MATCH with more than one node/edge/path pattern (i,e, allowing for 'star'-shaped patterns etc), Subtraction operator for temporal types and durations 0,11 Essentially this can also be used to obtain a cross Create a node 0,11 product 0,75 Edge pattern with direction 0,75 Subtraction 0,74 Get the first element in a list/collection/array 0,11 Edge directions: any direction 0,73 Replace 0,11 Checking if a pattern exists 0,12 Amalgamate multiple values into a single list 0,13
10 Data Extracts in Excel (ELWG_reports_20190228.zip) • CandidateFeatures_20190228 • DescriptorTags_20190228 • FeaturesNotSupported_20190228 • FeatureSyntaxSimilarity_20190228 • GrammarTags_20190228 • KeywordTagsAcrossLanguages_20190228 • KeyWordTagsCollections_20190228 • SyntaxSummary_20190228 • SyntaxTags_20190228 • SyntaxXref_20190228
Contact information: Thomas Frisendal (Copenhagen, Denmark) thomasf@tf-informatik.dk @VizDataModeler linkedin.com/in/thomas- frisendal-19a56a
Recommend
More recommend