tuning cypher
play

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 - PowerPoint PPT Presentation

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-> 2 How to get good at Cypher tuning Curiosity, patience, flexibility Sometimes its not the query that must bend... Know your tools


  1. Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019

  2. (andrew)-[:FREQUENTS]-> 2

  3. How to get good at Cypher tuning Curiosity, patience, flexibility • Sometimes it’s not the query that must bend... • • Know your tools and resources And knowing is half the battle • Learn esoteric secrets key principles of Cypher execution • Row cardinality, cheap vs expensive operations • Practice! • 3

  4. Goal: Do less work Avoid redundant work and operations • Cut out work that’s going to be filtered out anyway • Recognize cheaper ways to do what you want • 4

  5. Curiosity, patience, and flexibility 5

  6. Patience in the fight against... Cypher! (or misunderstandings of its behavior) Sometimes, the planner itself 6

  7. Sometimes it is not the query that must bend... ...maybe you have to change your model… ...maybe you have to use APOC… ...maybe you need to write a stored procedure... 7

  8. Tuning Tools and Resources 8

  9. Reference documentation Neo4j Refcard • Official Cypher manual • Query tuning section • Execution plan descriptions • Cypher knowledge base • 9

  10. Query logs For finding queries already-being used that need tuning Enable in neo4j.conf, successful queries exceeding threshold are logged. Options for finer grain timing breakdowns: Time spent planning, executing, and waiting on locks 2017-11-22 12:38 ... INFO 3 ms: bolt-session bolt johndoe neo4j-javascript/1.4.1 ... 2017-11-22 22:38 ... INFO 61 ms: (planning: 0, cpu: 58, waiting: 0) - 6164496 B - 0 page hits, 1 page faults ... 2017-11-22 12:38 ... INFO 78 ms: (planning: 40, cpu: 74, waiting: 0) - 6347592 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 44 ms: (planning: 9, cpu: 25, waiting: 0) - 1311384 B - 0 page hits, 0 page faults ... 2017-11-22 12:38 ... INFO 6 ms: (planning: 2, cpu: 6, waiting: 0) - 420872 B - 0 page hits, 0 page faults - ... 10

  11. Query log analyzer Graph app in Neo4j Desktop, analyzes a query.log file 11

  12. Graph, schema, and counts overview Graph structure CALL db.schema() OR CALL apoc.meta.graph() Indexes and constraints CALL db.indexes OR :schema Counts store CALL db.stats.retrieve('GRAPH COUNTS') OR CALL apoc.meta.stats() 12

  13. EXPLAIN and PROFILE Prefix before the query, produces query plans EXPLAIN - does not execute the query No db hits, only rough estimate row counts PROFILE - executes the query Includes valuable db hit and row counts 13

  14. How to walk the tree Start node: from result 1. go up always turning left until leaf node Execute down until we reach a branch 2. We have the lhs, now need rhs 3. Go up one on rhs, then repeat from 1 4. 14

  15. Cypher-shell plans - inverted tree Read from bottom-up 1. Start node: from result go 2. down always turning left until leaf node Follow until we reach a 3. branch We have lhs, need rhs 4. Go down one on rhs, 5. repeat from 1 15

  16. DB hits Abstract unit of db work Not comparable between operations Not always something that can be optimized Smoke, let it draw your attention to... 16

  17. Rows Inputs to operations Multiplies out db hits The more rows, the more work Watch for spikes: 44 -> 200 -> 38 17

  18. Key principles of Cypher execution 18

  19. Row Cardinality Operators produce result records/rows • Operators execute for each input record/row • Implications: Reduce rows across the query, reduce total work • MATCHes are not independent • 19

  20. A simple demonstration PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(DISTINCT actor.name) as actors, collect(DISTINCT dir.name) as directors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms. 20

  21. MATCHes are not independent MATCH (m:Movie {title:’The Matrix’}) MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 2 rows 5 rows RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 21

  22. What actually happens (best case) MATCH (m:Movie {title:’The Matrix’}) executed 1 time executed 2 times 2 rows MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 10 rows (2 * 5) RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 22

  23. It could be worse MATCH (m:Movie {title:’The Matrix’}) executed 5 times executed 1 times 5 rows MATCH MATCH (p:Person)-[:DIRECTED]->(m) (p:Person)-[:ACTED_IN]->(m) 10 rows (5 * 2) RETURN m.title as title, collect (DISTINCT p.name) as actors, collect (DISTINCT dir.name) as directors 23

  24. Same exact plan PROFILE MATCH (dir:Person)-[:DIRECTED]-> (m:Movie {title:'The Matrix'}) <-[:ACTED_IN]-(actor:Person) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 141 total db hits in 1 ms. 24

  25. What increases cardinality? MATCHes and OPTIONAL MATCHes • (watch for back-to-back, even WITH-separated) UNWIND operations • Procedure results • (only if they YIELD something) 25

  26. How do we shrink cardinality? Aggregation • (the grouping key will become distinct) WITH DISTINCT … • (applies to the entire row, not just a single variable) LIMIT • (in special cases) (limits ALL rows, not results per row) (WITH on its own does not shrink cardinality) • 26

  27. How do we fix this? PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) MATCH (actor:Person)-[:ACTED_IN]->(m) RETURN m.title as title, collect(actor.name) as actors, collect(dir.name) as directors these are happening too late... 27

  28. Aggregate earlier PROFILE MATCH (m:Movie {title:'The Matrix'}) MATCH (dir:Person)-[:DIRECTED]->(m) WITH m, collect(dir.name) as directors MATCH (actor:Person)-[:ACTED_IN]->(m) WITH m, directors, collect(actor.name) as actors RETURN m.title as title, directors, actors Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 101 total db hits in 1 ms. (vs 134) 28

  29. Or use pattern comprehension PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m.title, [(dir:Person)-[:DIRECTED]->(m) | dir] as directors, [(actor:Person)-[:ACTED_IN]->(m) | actor] as actors It’s like an (optional) MATCH and collect Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 94 total db hits in 1 ms. 29

  30. The cost of things...starting node lookup How do you find your starting node in the graph? NodeByLabelScan - Visit every node of a label • If there are property predicates, filter A label must be present in the pattern PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m 30

  31. The most expensive, the one to avoid AllNodesScan - Visit every node in your graph • If there are property predicates, filter Used when no labels are present PROFILE MATCH (m {title:'The Matrix'}) RETURN m 31

  32. The fast, preferred lookup NodeIndexSeek - Lookup via index • Used when a label and at least 1 property present An index must have been created for the label/property combo: CREATE INDEX ON :Movie(title); Variations! WHERE m.title STARTS WITH ‘THE MATRIX’ WHERE m.title IN [‘The Matrix’, ‘Toy Story 4’] PROFILE MATCH (m:Movie {title:'The Matrix'}) RETURN m 32

  33. The fastest lookup (that you can rarely use) NodeByIdSeek - Lookup via graph id • You must know the graph id ahead of time! Not good to cache outside of Neo4j What if the node was deleted? Also ids of deleted things get reused, so it may be different Could be useful in a multi-query transaction, or within a query PROFILE MATCH (m) WHERE id(m) = 6 RETURN m 33

  34. Fulltext schema index Procedures for fulltext lookup • Named indexes Can index over multiple labels, multiple properties Can create on relationships too! Case insensitive searching, wildcard searching… Can even use custom lucene analyzers PROFILE CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score RETURN node.title, node.description, score 34

  35. The costs of access and filtering Properties (most expensive) • WHERE m.released = 1999 Node labels • WHERE actor:Person Relationship types • WHERE type(r) = ‘ACTED_IN’ Relationship type selection • -[:ACTED_IN | DIRECTED]-> 35

  36. Delay access until the node is distinct ... … WITH DISTINCT m WITH m, collect(actor) as actors WHERE m.released = 1999 WHERE m.released = 1999 … ... 36

  37. Wait until filter / LIMIT / fewer rows ... … WITH m, actors WITH m WHERE size(actors) > 5 LIMIT 10 WITH m.released as released WITH m.released as released, ... actors ... 37

  38. If need to use multiple times, project out … WITH m, m.title as title WHERE title IN $listOfTitles OR title CONTAINS ‘Matrix OR size(title) > 10 ... 38

  39. It is sometimes more efficient to project before an expansion … WITH m, m.title as title MATCH (m)-[:ACTED_IN]-(actor) RETURN title, actor.name as actor ... 39

Recommend


More recommend