larus-ba.it/neo4j @AgileLARUS Streaming Graph Data with Kafka Andrea Santurbano / @santand84 #NODES #2k19 Earth (Milky Road), 10/10/2019
larus-ba.it/neo4j @AgileLARUS Agenda
Agenda ● Introduction ○ Partnership Neo4j and Larus ● What is Neo4j Streams? ○ What is Apache Kafla? ○ How we combined Neo4j and Kafla? ● DEMO ○ Real-time Polyglot Persistence with Elastic, Kafla and Neo4j ● Hunger Games LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS (LARUS)-[:LOVES]->(Neo4j)
WHO ARE WE? Andrea [:WORKS_AT] [:INTEGRATOR_LEADER_FOR] [:LOVES] LARUS Business Automation Srl Italy’s #1 Neo4j Partner
WHO’S LARUS? LARUS BUSINESS AUTOMATION ● Founded in 2004 ● Headquartered in Venice, ITALY ● Delivering services Worldwide [:BASED_IN] ● Mission: “ Bridging the gap between Business and IT ” #1 Solution Partner in Italy since 2013 ● Creator of the Neo4j JDBC Driver ● Creator of the Neo4j Apache Zeppelin Interpreter ● Creator of the Neo4j ETL Tool VENICE ● Developed 90+ APOC LARUS Business Automation Srl Italy’s #1 Neo4j Partner
COLLABORATING FOR NEO4J USERS Kafla commercial, GraphQL Neo4j APOC, ETL, Spark, Zeppelin, Kafla First Spikes Neo4j JDBC Driver in Retail for Articles’ Clustering 2011 2014 2015 2016 2018 2019 LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS Apache Kafka Widely used open-source, scalable streaming infrastructure
What is Apache Kafka? A DISTRIBUTED STREAMING PLATFORM Has three key capabilities: ● Publish and subscribe to streams of records; ● Store streams of records in a fault-tolerant durable way; ● Process streams of records as they occur. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka? HOW IT WORKS? 1. TOPICS : a topic is a category or feed name to which records are published. 2. PARTITIONS : for each topic, the Kafla cluster maintains a partitioned, distributed, persistent log LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka? HOW IT’S USED? Kafla is generally used for two classes of applications: ● Building real-time streaming data pipelines ; ● Building real-time streaming applications . LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS What is Neo4j Streams? Enables Kafka Streaming on Neo4j!
What is Neo4j Streams? Andrea Michael X [:CREATOR_OF] [:AUTHOR_OF] ENABLES DATA STREAM ON NEO4J The project is a Neo4j Plugin composed of several parts: ● Neo4j Streams Change Data Capture; ● Neo4j Streams Sink; ● Neo4j Streams Procedures We also have a Kafla Connect Plugin: ● Kafla Connect Sink plugin. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Benefits ● Avoid custom "hacky" solutions ● Deployed by Neo4j Field Engineering ● Used by many customers (hardened) ● Continuous development ● Quick response to issues ● Officially (enterprise) supported by Confluent and Neo4j through Larus LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j - Kafka Integration - Use Cases HOW CAN IT BE USED? ● write / read data directly from Neo4j operations to Kafla ● change data capture stream graph changes into larger architectures, e.g. to feed microservices or other databases ● exchange data/updates between distinct Neo4j installations, e.g. from analytics ● integrate with existing Kafla architectures of customers ● use other Kafla connectors to offer more Neo4j integrations ● build just-in-time data warehouses with Spark & Hadoop LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS Neo4j Streams: Change Data Capture Stream database changes!
Neo4j Streams: Change Data Capture Change data “what”? In databases, C hange D ata C apture ( CDC ) is a set of software design patterns used to determine (and track) the data that has changed so an action can be taken using the changed data. Well suited use-cases? ● CDC solutions occur most often in data-warehouse environments; ● Allows to replicate databases without having a/much performance impact on its operation. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Change Data Capture How it works? Each transaction communicates its changes to our event listener: ● exposing creation, updates and deletes of Nodes, Relationships and Properties ● providing before-and-after information ● provide schema information ● configuring property filtering for each topic Those events are sent asynchronously to Kafla, so the commit path should not be influenced by that. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS Neo4j Streams: Sink Ingest data into Neo4j directly from the Stream!
Neo4j Streams: Sink INGEST YOUR DATA, WITH YOUR RULES The sink provides several ways in order to ingest data from Kafla: ● Via Cypher Template ● Via CDC event published by another Neo4j Instance via the CDC module ● Via projection of a JSON/AVRO event into Node/Relationship by providing an extraction pattern ● Via CUD file format (event)-[:TO]->(graph) LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink HOW WE MANAGE BAD DATA The Neo4j Streams Sink module provide a Dead Letter Queue mechanism that if activated re-route all “bad-data” to a configured topic. What we mean for “bad-data”? ● De-Serialization errors. I.e. bad formatted JSON: {id: 1, "name": "Andrea", "surname": "Santurbano"} ● Transient errors while ingesting data into the DB (i.e. MERGE on null values...). LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS Neo4j Streams: Procedures Interact with Apache Kafka directly from Cypher!
Neo4j Streams: Streams Procedures CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER The Neo4j Streams project comes out with two procedures: ● streams.publish : allows custom message streaming from Neo4j to the configured environment by using the underlying configured Producer; ● streams.consume : allows consuming messages from a given topic. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS Confluent Connect Neo4j Plugin Run Neo4j Integration in your Kafka Infrastructure
Kafka Connect WHAT IS KAFKA CONNECT? In open source component of Apache Kafla, is a framework for connecting Kafla with external systems such as databases, key-value stores, search indexes, and file systems. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Kafka Connect Sink HOW IT WORKS? It works exactly in the same way as the Neo4j Sink plugin so you can provide for each topic your own ingestion setup. You can download it from the Confluent HUB! LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS DEMO Real-time Polyglot Persistence with Elastic, Kafka and Neo4j
RT Polyglot Persistence with Elastic, Kafka & Neo4j LARUS Business Automation Srl Italy’s #1 Neo4j Partner
RT Polyglot Persistence with Elastic, Kafka & Neo4j LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Lessons learned THE POWER OF THE STREAM! ● We have seen how to use the CDC in order to stream transaction events from Neo4j to other systems; ● We have seen how to use the SINK in order to ingest data into Neo4j by providing our own business rules; ● We have seen how to use the Streams PROCEDURES in order to consume/produce data directly from Cypher. ● We demonstrated how to create a simple Polyglot workflow with Apache Kafla Connect LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Read More LARUS Business Automation Srl Italy’s #1 Neo4j Partner
CODE NEO4J STREAMS REPOSITORY: github.com/neo4j-contrib/neo4j-streams DEMO CODE: github.com/conker84/nodes-2k19 LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS FEEDBACK Please use the integration in your organization and share your experience
Hunger Games Questions for "Streaming Graph Data with Kafka" CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER 1. Easy: What is the behaviour of the streams procedures. a. They consume/produce data from/to another Neo4j instance via Bolt b. They consume/produce data from/to Apache Kafla within Neo4j c. They consume/produce data via Amazon Kinesis within Neo4j 2. Medium: How many ingestions ways supports The Sink. a. 4 b. 1 c. 3 3. Hard: What kind of informations are exposed via the CDC module? Answer here: r.neo4j.com/hunger-games LARUS Business Automation Srl Italy’s #1 Neo4j Partner
larus-ba.it/neo4j @AgileLARUS THANKS! @santand84 Questions !? #NODES #2k19 Earth (Milky Road), 10/10/2019
Neo4j Streams: Sink INGESTION VIA CYPHER TEMPLATE Configure an import statement for each Kafla topic streams.sink.topic.cypher.<TOPIC>=<CYPHER_STATEMENT> For example: streams.sink.topic.cypher.sales= \ MATCH (c:Customer {id: event.start.id}) \ MATCH (p:Product {id: event.end.id}) \ MERGE (c)-[:PLACED]->(o:Order)-[:FOR]->(p) \ SET o += event.properties LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Recommend
More recommend