real time analytics meets kubernetes
play

Real-Time Analytics Meets Kubernetes Tal Doron Director, - PowerPoint PPT Presentation

Real-Time Analytics Meets Kubernetes Tal Doron Director, Technology Innovation ABOUT ME @taldor oron on taldor oron on84 tald ld@gigaspaces.com Ta Tal Doron Director, Technology Innovation About GigaSpaces 300+ Direct customers


  1. Real-Time Analytics Meets Kubernetes Tal Doron Director, Technology Innovation

  2. ABOUT ME @taldor oron on taldor oron on84 tald ld@gigaspaces.com Ta Tal Doron Director, Technology Innovation

  3. About GigaSpaces 300+ Direct customers We provide one of the leading in-memory computing platforms for real-time insight to action and extreme transactional processing. With GigaSpaces, enterprises 50+ / 500+ can operationalize machine learning and transactional Fortune / Organizations processing to gain real-time insights on their data and act upon them in the moment. 5,000+ Large installations in production (OEM) 25+ InsightEdge is an in-memory real- In-Memory Computing time analytics platform for instant Platform for microsecond ISVs insights to action; analyzing data scale transactional as it's born, enriching it with processing, data scalability, historical context, for smarter, and powerful event-driven faster decisions workflows

  4. Why * Intro pictures from Wikipedia

  5. Dinosaurs

  6. Dinosaurs

  7. Dinosaurs

  8. We’ve looked up to the stars

  9. Not without first passing through the clouds

  10. It’s the smallest of opponents that are gamechangers

  11. We needed to find a way to ship man there… The first flight of an airplane, the Wright Flyer on December 17, 1903

  12. How do we become cloud native? • Ma Mana nage L Larg rge D Depl ployment nts • Cloud-ready, ZooKeeper based for large-scale and federated deployments • RE REST AP API M Management • Standards-based, utilizing • Container eriza zation and Orch chest estration • Docker, Kubernetes, OpenShift etc. • Applica cation-dri driven D n Depl ployment nt • Serverless-like user experience • Plugga ggable e Elast stic c Reso esource ce Balanci cing g • Scheduling for dynamic re-partitioning and resource allocation • Tel elem emet etry and Clust ster er In Intel elligen gence ce • Predictive maintenance / fault-tolerance over large-scale deployments

  13. Who’s using K8s?

  14. OVERVIEW • An overview of Kubernetes and the value it is bringing for automating deployment, scaling, and management of containerized applications • How organizations can simplify management and container deployment on Cloud, Hybrid or On-premises environments with GigaSpaces InsightEdge • 3 top open-source tools for production: HELM, Istio, and Prometheus • A Kubernetes services comparison between cloud providers: AWS vs. Azure vs. GCP

  15. How Can You Gain the Most Value from Your Data? Near real-ti Ne time data ta is highly valuable if you act on it on time Time-cr Ti critical cal Tr Trad aditional al “ “bat atch ch” de decision business bu in intellig igence His istorical orical + near ar Value re real-ti time data ta is more ore valuable if you have the means Actionab Reactive Historical le to combine them Preventive/ Actionable Reactive Predictive Historical REAL-TIME SECONDS MINUTES HOURS DAYS MONTHS Time

  16. InsightEdge: Real-time Analytics for Instant Insights To Action VARIOUS APPLICATION DATA SOURCES UNIFIED REAL-TIME ANALYTICS, AI & TRANSACTIONAL PROCESSING REAL-TIME INSIGHT TO ACTION DISTRIBUTED IN-MEMORY MULTI MODEL STORE RAM HOT DATA STORAGE-CLASS MEMORY DASHBOARDS SSD STORAGE WARM DATA • No ETL, reduced complexity REAL-TIME LAYER • Built-in integration with external Hadoop/Data Lakes S3-like • Fast access to historical BATCH LAYER data • Automated DEPLOY ANYWHERE COLD life-cycle management CLOUD/ON-PREMISE DATA

  17. Kubernetes At leas ast 54% % of of the Fort ortun une 500 00 we were re hirin iring for or Ku Kubernetes s skills i in 2 2017 Aroun round d 51% % growt rowth for or Ku Kubernetes s share i in t the ma market in 2018

  18. Kubernetes is the Winner • #1 discussed project on GitHub • Top 2 in number of contributors • ~400K users on Slack

  19. Business Landscape • The leading orchestration tool vs. Docker Swarm, Mesos, OpenShift and Cloud Foundry and most used CNCF project • All cloud vendors have a managed Kubernetes service (EKS, AKS and GKE) • Apache Spark 2.3 has native Kubernetes support

  20. Why Kubernetes? Desired State Scheduler Ke Key bui buildi ding bl blocks s for a “cloud ud like” HA Architecture pl platform a m as a a s service Cooperative Multi-Tenancy • Auto deployment of data services, functions and frameworks (Spark Service Account Authentication ML, SQL, Zeppelin, etc.) • Orchestration automation with RBAC Authorization cloud native solutions (auto scale, self healing)

  21. Kubernetes – Management POD • Lookup Service (LUS) - The Lookup Service provides a mechanism for MANAGEMENT services to discover each other. For POD example, querying the LUS to find LOOKUP SERVICE GSA active GSCs. APACHE ZOOKEEPER • Apache ZooKeeper - Zookeeper is a centralized service used for space REST MANAGER leader election • REST Manager - RESTful API for managing the environment remotely from any platform NODE

  22. Kubernetes – Data POD • Data Grid Instance - This is the fundamental unit of deployment in the DATA POD data grid. A Processing Unit instance is the actual runtime entity. DATA GRID INSTANCE #1 GSA • Each Data POD contains a single . instance to provide cloud native . . support using Kubernetes built-in . . controllers (auto scale, self healing) DATA POD DATA GRID INSTANCE #N NODE

  23. Kubernetes – Spark POD CLIENT • Driver Pod – The Spark driver is spark-submit running within a POD. The driver DRIVER POD creates executors, connects to them, SPARK DRIVER and executes the applicative code. GSA • Executor Pod – When the application completes, the executors’ pods terminate and are cleaned up, but the master pod persists logs and remains in “completed” state EXECUTOR EXECUTOR EXECUTOR EXECUTOR POD POD POD POD SPARK SPARK SPARK SPARK EXECUTOR EXECUTOR EXECUTOR EXECUTOR NODE A NODE B

  24. XAP High Level Overview 3,1 CLIENT CLIENT CLIENT REST SELECT MANAGEMENT MANAGEMENT MANAGEMENT POD POD POD #3 #1 #2 DATA DATA DATA DATA DATA DATA POD POD POD POD POD POD C A B B’ C’ A’ NODE 1 NODE 3 NODE 2

  25. InsightEdge High Level Overview 3,1 CLIENT CLIENT CLIENT spark-submit SELECT MANAGEMENT MANAGEMENT MANAGEMENT ZEPPELIN SPARK POD POD POD POD DRIVER #3 #1 #2 POD SPARK SPARK SPARK EXECUTOR EXECUTOR EXECUTOR POD POD POD DATA DATA DATA DATA DATA DATA POD POD POD POD POD POD C A B B’ C’ A’ NODE 1 NODE 3 NODE 2

  26. Kubernetes Dashboard View

  27. “Under the Hood” Guidelines • Apply a POD Anti-Affinity using label selectors for both Data and Management PODs • For example: spread the primary and backup data pods from this service across zones • Each POD has a persistent identifier that is maintained across any rescheduling using StatefulSets • For example: automated rolling updates/scale up data pod one-by-one

  28. Installation • HELM – The package manager for Kubernetes • Helm Charts helps you define, install and upgrade both XAP and InsightEdge # helm install gigaspaces/insightedge --version=14.0 --name demo

  29. Installation – Define Capacity • The following Helm deploys a cluster with 3 partitions with 512MiB allocated for each partition: # helm install gigaspaces/insightedge --version=14.0 --name demo --set pu.partitions=3 ,pu.resources.limits.memory=512Mi

  30. Installation – Define High Availability • The following Helm command deploys a cluster in a high availability topology, with anti-affinity enabled: # helm install gigaspaces/insightedge --version=14.0 --name demo --set pu.ha=true,pu.antiAffinity.enabled=true

  31. Testing for Liveness • Use liveness probes to notify Kubernetes that your application’s processes are unhealthy and it should restart them • The probe calls a bash script livenessProbe: exec: command: - sh - -c - “data-pod-liveness 3181" initialDelaySeconds: 15 timeoutSeconds: 5

  32. Testing for Readiness • Use readiness probes to notify Kubernetes that your application’s processes are able to process input, for example: when data is loading the pod not yet ready. • The probe calls a bash script readienssProbe: exec: command: - sh - -c - “data-pod-ready 2251" initialDelaySeconds: 15 timeoutSeconds: 5

  33. Any Cloud Lang API WAN Gateway WAN Gateway WAN Gateway – Real-time IMDG WAN Gateway Data Replication

  34. WAN GATEWAY POD DELEGATOR WAN Gateway SINK CLUSTER A CLUSTER B WEB UI MANAGEMENT WEB UI MANAGEMENT MANAGEMENT MANAGEMENT MANAGEMENT MANAGEMENT POD POD3 POD POD POD POD POD POD PUBLIC IP WAN GW WAN GW POD POD DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA POD POD POD POD POD POD POD POD POD POD POD POD POD POD POD POD C C A B B’ D’ A’ B B’ D’ C’ D A C’ A’ D NODE 1 NODE 3 NODE 1 NODE 3 NODE 2 NODE 2

Recommend


More recommend