redis for fast data ingest agenda
play

Redis for Fast Data Ingest Agenda Fast Data Ingest and its - PowerPoint PPT Presentation

Home of Redis Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast Data Ingest Pub/Sub List Sorted Sets as a Time Series Database The Demo Scaling with Redis e Flash 2


  1. Home of Redis Redis for Fast Data Ingest

  2. Agenda • Fast Data Ingest and its challenges • Redis for Fast Data Ingest • Pub/Sub • List • Sorted Sets as a Time Series Database • The Demo Scaling with Redis e Flash • 2

  3. Fast Data Ingest Scenarios

  4. IOT 4

  5. Network Traffic Inspection 5

  6. Social Media Analysis 6

  7. More Scenarios User Activity Tracking Log Collection Multi-player Gaming And more… Fintech 7

  8. Fast Data Ingest Challenges • Keeping up with the pace of data arrival • Data from multiple sources with no standard data format • Filter, analyze, and transform data in real-time • Managing data arriving from sources distributed geographically 8

  9. Requirements for Fast Data Ingest • Physical infrastructure – network, computational resources, etc. • Software stack to: • Filter • Aggregate • Transform • Distribute data in real-time with sub-millisecond latency 9

  10. Fast Data Ingest with Redis

  11. About Redis Open source. The leading in-memory database platform , supporting any high performance operational, analytics or hybrid use case. The open source home and commercial provider of Redis Enterprise (Redis e ) technology, platform, products & services. 11

  12. Redis for Fast Data Ingest 12

  13. Redis for Fast Data Ingest Subscriber 1 Geospatial Indexes Sets Strings Subscriber 2 Channel Publisher Sorted Sets Lists Bitmaps Subscriber 3 Subscriber n Hashes Hyperloglog Bit field Redis Pub/Sub Redis Data Structures 13

  14. Common Ingest Techniques in Redis

  15. Pub/Sub Subscriber 1 Subscriber 2 Publisher Channel Subscriber 3 Subscriber n Commands Publisher: publish <channel name> <message> Subscriber: subscribe <channel name> 15

  16. List Subscriber 1 Subscriber 2 Publisher Subscriber 3 Subscriber n Commands Publisher: lpush <list name> <message> Subscriber: brpop <list name> <timeout> 16

  17. Sorted Set Subscriber 1 Subscriber 2 Publisher Subscriber 3 Subscriber n Commands Publisher: zadd <timeseries name> <timestamp> <message> Subscriber: zrangebyscore <timeseries name> <last timestamp> <current timestamp> WITHSCORES 17

  18. The Demo

  19. Demo: Problem Description Popular hashtags among English English Tweets Filter tweets "lang":"en" Match pattern “#( \\w +)” Increment count for that pattern All Tweets Influencer Catalog Influencer Tweets Filter followers_count > 10000 Sample Tweet Message in the JSON format: Map influencer id to profile Sorted Set: follower count -> id { "created_at":"Tue Jul 11 17:06:03 +0000 2017", "id":884821096440004600, "text":"USGS reports a M2 #earthquake 31km WSW of Enterprise, Utah on 7/11/17 @ 17:01:53 UTC https://t.co/xXQH2Mfy93 #quake", "user":{ "id":1414684496, "name":"Every Earthquake", "screen_name":"everyEarthquake", "location":"Earth", "followers_count":18978, "friends_count":17, "lang":"en" } } 19

  20. Demo Setup Service Provider for Messages Programming Language for the demo IDE Redis container on Docker 20

  21. The Three Data Ingest Techniques Fast Data Ingest Technique Pros Cons • Easy • Not resilient to connection Pub/Sub • Decoupled setup loss • Good for geographically • Requires many connections distributed setup • Easy • Tightly coupled producers and Lists • Resilient to connection loss consumers • Data duplication • Resilient to connection loss • Consumes space Sorted Sets • Least chance of losing data • Complex logic • Access to historical data • Loosely coupled producers and consumers 21

  22. Technique 1: Fast Data Ingest with Pub/Sub

  23. Fast Data Ingest with Pub/Sub English EnglishTweetsFilter HashTagCollector Tweets Ingest AllTweets PubSub Influencer InfluencerTweetsFilter InfluencerCollector Tweets • Easy Advantages • Decoupled setup • Good for geographically distributed setup 23

  24. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestPubSub 24

  25. Technique 2: Fast Data Ingest with Lists

  26. Fast Data Ingest with Lists EnglishTweets HashTagFilter EnglishTweetsFilter Listener englishtweets Ingest AllTweets Stream Listener alldata InfluencerFilter • Easy Advantages • Resilient to connection loss 26

  27. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestList 27

  28. Technique 3: Fast Data Ingest with Sorted Sets

  29. Fast Data Ingest with Sorted Sets EnglishTweetsFilter HashTagFilter englishtweets Ingest Stream alltweets InfluencerFilter • Resilient to connection loss • Least chance of losing data Advantages • Access to historical data • Loosely coupled producers and consumers 29

  30. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestSortedSet 30

  31. Redis e for Fast Data Ingest

  32. Redis e Technology Redis Database Instances 32

  33. Redis e Technology Enterprise Layer Cluster Manager Zero latency proxy REST API Open Source Layer 33

  34. Redis e Technology Redis e Node Enterprise Layer Cluster Manager Zero latency proxy REST API Open Source Layer 34

  35. Redis e Technology Redis e Cluster • Shared nothing cluster architecture • Fully compatible with open source commands & data structures 35

  36. Redis e - Shared Nothing Symmetric Architecture Distributed Proxies Single or Multiple Endpoints Cluster Proxies Management Node Watchdog Path Cluster Watchdog Redis Data Path Shards … Node 1 Node 2 Node N (odd number) Unique multi- tenant “Docker” like architecture enables running hundreds of databases over a single, average cloud instance witho ut performance degradation and with maximum security provisions 36

  37. Redis e Benefits for Data Ingest Substantially Lower Always On Availability Effortless Scaling Costs Instant Failure Recovery, Run on Flash as a RAM Simple, Seamless No Data loss extension Clustering. Linear scaling Stable and Predictable ACID Compliance in Top notch 24x7 expert High Performance Cluster Architecture support 37

  38. Redis e Flash • Near-RAM performance at 70%+ lower 2048 GB costs RAM • Technology treats Flash as a RAM replacement (or extension) • RAM/Flash ratio can be easily configured Keys & hot values Cold values • Pluggable storage engine 204 GB 1844 GB RAM Flash • Available on SATA-based SSD, NVMe-based SSD, NVDIMM like 3D XPoint/SCM on x86 and P8 platforms 10% 90% 38

  39. Redis e Flash - 10TB Redis Deployment on EC2 Redis e Flash Redis on RAM Dataset size 10 TB 10 TB * Database size with replication 30 TB 20 TB AWS instance type x1.32xlarge i3.16xlarge Actual instance size 1.46 TB 3.66 TB (RAM, and RAM+Flash) # of instances needed 21 6 + 1 (for quorum) Persistent Storage (EBS) 154 TB 110 TB 1 year cost (reserved instances) $1,595,643 $298,896 Savings - 81.27% * Redis Enterprise only needs 1 copy of the data because quorum issues are solved at the node level 39

  40. Questions ? ? ? ? ? ? ? ? ? ? ? 40

  41. One more thing…. redis.conf setting: client-output-buffer-limit pubsub 32mb 8mb 60 With this setting, Redis will force the clients to disconnect under two situations: • If the output buffer grows more than 32mb • If the output buffer holds 8mb of data consistently for 60 seconds 41

  42. Thank You Roshan Kumar Redis Labs roshan@redislabs.com expert@redislabs.com @roshankumar @redislabs 42

Recommend


More recommend