operating multi tenant kafka services for developers
play

Operating Multi-Tenant Kafka Services for Developers Data Council - PowerPoint PPT Presentation

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku Data Agenda Intro Motivation Single Tenant Dedicated Multi-tenancy Configuration & Tuning Testing Automation


  1. Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku Data

  2. Agenda • Intro • Motivation • Single Tenant Dedicated • Multi-tenancy • Configuration & Tuning • Testing • Automation • Limitations Data Council SF 2019 - Heroku Data 2

  3. Intro I am… Ali Hamidi, an engineer on the Heroku Data team at Salesforce. Heroku is... a cloud platform that lets companies build, deliver, monitor and scale apps. Heroku Data is… the team that provides secure, scalable data services on the Heroku Platform. Data Council SF 2019 - Heroku Data 3

  4. Apache Kafka • Distributed Streaming Platform Data Council SF 2019 - Heroku Data 4

  5. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) Data Council SF 2019 - Heroku Data 5

  6. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) • Durable message store (commit log) Data Council SF 2019 - Heroku Data 6

  7. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) • Durable message store (commit log) • Highly available Data Council SF 2019 - Heroku Data 7

  8. Apache Kafka on Heroku • Fully Managed Service Data Council SF 2019 - Heroku Data 8

  9. Apache Kafka on Heroku • Fully Managed Service • Opinionated Data Council SF 2019 - Heroku Data 9

  10. Apache Kafka on Heroku • Fully Managed Service • Opinionated • Configured for best practices for most users* 10 Data Council SF 2019 - Heroku Data 

  11. Use Cases • Decompose a monolithic app 11 Data Council SF 2019 - Heroku Data 

  12. Use Cases • Decompose a monolithic app • Process high volume, real-time data streams 12 Data Council SF 2019 - Heroku Data 

  13. Use Cases • Decompose a monolithic app • Process high volume, real-time data streams • Power a real-time, event-driven architecture 13 Data Council SF 2019 - Heroku Data 

  14. SHIFT Commerce Decompose a monolithic app 14 Data Council SF 2019 - Heroku Data 

  15. Quoine • QUOINE is a leading global fintech company that provides trading, exchange, and next generation financial services powered by blockchain technology • Consume real-time cryptocurrency pricing data from individual markets and exchanges 15 Data Council SF 2019 - Heroku Data 

  16. Caesars Entertainment • Ingest, aggregate, and process customer data in real-time to provide the best customer experience • Real-time, event-driven architecture 16 Data Council SF 2019 - Heroku Data 

  17. The Motivation 17 Data Council SF 2019 - Heroku Data 

  18. Why Multi-tenant Kafka? • More accessible • Additional use cases • Development • Testing • Low volume production 18 Data Council SF 2019 - Heroku Data 

  19. 19 Data Council SF 2019 - Heroku Data 

  20. Single Tenant Dedicated 20 Data Council SF 2019 - Heroku Data 

  21. 21 Data Council SF 2019 - Heroku Data 

  22. 22 Data Council SF 2019 - Heroku Data 

  23. Multi-tenancy 23 Data Council SF 2019 - Heroku Data 

  24. 24 Data Council SF 2019 - Heroku Data 

  25. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 25 Data Council SF 2019 - Heroku Data 

  26. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 26 Data Council SF 2019 - Heroku Data 

  27. Security 27 Data Council SF 2019 - Heroku Data 

  28. A tenant should not be able to access another tenant’s data 28 Data Council SF 2019 - Heroku Data 

  29. 29 Data Council SF 2019 - Heroku Data 

  30. 30 Data Council SF 2019 - Heroku Data 

  31. Security • Access Control Lists (ACLs) • Namespacing 31 Data Council SF 2019 - Heroku Data 

  32. Security • Access Control Lists (ACLs) • User A can carry out action B on resource C • Namespacing 32 Data Council SF 2019 - Heroku Data 

  33. Security • Access Control Lists (ACLs) • User A can carry out action B on resource C • Namespacing • wabash-58779.events 33 Data Council SF 2019 - Heroku Data 

  34. Performance 34 Data Council SF 2019 - Heroku Data 

  35. A tenant should not adversely affect another tenant’s performance 35 Data Council SF 2019 - Heroku Data 

  36. Performance • Quotas • Produce • Consume 36 Data Council SF 2019 - Heroku Data 

  37. Safety 37 Data Council SF 2019 - Heroku Data 

  38. A tenant should not jeopardise the stability of the cluster 38 Data Council SF 2019 - Heroku Data 

  39. Safety • Limits • Topics • Partitions • Consumer Groups • Storage • Throughput 39 Data Council SF 2019 - Heroku Data 

  40. Capacity = Message Throughput * Retention * Replication 40 Data Council SF 2019 - Heroku Data 

  41. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput 41 Data Council SF 2019 - Heroku Data 

  42. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput • Monitoring 42 Data Council SF 2019 - Heroku Data 

  43. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput • Monitoring • Limit enforcement! 43 Data Council SF 2019 - Heroku Data 

  44. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 44 Data Council SF 2019 - Heroku Data 

  45. Parity 45 Data Council SF 2019 - Heroku Data 

  46. For the service to be useful, it needs to behave like a normal cluster 46 Data Council SF 2019 - Heroku Data 

  47. Parity • Access to a standard cluster 47 Data Council SF 2019 - Heroku Data 

  48. Parity • Access to a standard cluster • ...but with some limitations 48 Data Council SF 2019 - Heroku Data 

  49. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 49 Data Council SF 2019 - Heroku Data 

  50. Compatibility 50 Data Council SF 2019 - Heroku Data 

  51. The service needs to support standard clients No vendor lock-in 51 Data Council SF 2019 - Heroku Data 

  52. Compatibility • Open Source Apache Kafka • Not a fork • No custom code required • Use standard clients 52 Data Council SF 2019 - Heroku Data 

  53. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 53 Data Council SF 2019 - Heroku Data 

  54. Costs 54 Data Council SF 2019 - Heroku Data 

  55. The service needs to be financially feasible 55 Data Council SF 2019 - Heroku Data 

  56. Resource Costs • Packing Density • Utilization 56 Data Council SF 2019 - Heroku Data 

  57. Resource Costs • Cluster size? • No over provisioning • Seamless upgrading • Can’t move tenants (can’t migrate message offsets) 57 Data Council SF 2019 - Heroku Data 

  58. Operational Costs • Minimal operational burden • Minimize impact/blast radius 58 Data Council SF 2019 - Heroku Data 

  59. Operational Costs • Safe defaults • Similar clusters to our dedicated • Automation (kind of our thing) • Testing (lots) 59 Data Council SF 2019 - Heroku Data 

  60. Configuration & Tuning 60 Data Council SF 2019 - Heroku Data 

  61. Configuration & Tuning • Partitions • Quotas • Topics & Consumer Groups • Guard Rails 61 Data Council SF 2019 - Heroku Data 

  62. Partitions • Lots of partitions • 48,000 • Max file descriptors • 500,000 • Max mmap count • 500,000 62 Data Council SF 2019 - Heroku Data 

  63. Quotas • Per Broker! • Counter intuitive enforcement 63 Data Council SF 2019 - Heroku Data 

  64. Topics & Consumer Groups • Explicit Topic creation • Explicit Consumer Group creation 64 Data Council SF 2019 - Heroku Data 

  65. Guard Rails • Limit potential bad usage 65 Data Council SF 2019 - Heroku Data 

  66. Guard Rails • Limit potential bad usage • “Customers don’t make mistakes, we make bad tools” 66 Data Council SF 2019 - Heroku Data 

  67. # Heroku Data Control Plane min_retention_time = 24.hours 67 Data Council SF 2019 - Heroku Data 

  68. # Heroku Data Control Plane min_retention_time = 24.hours max_retention_time = 7.days 68 Data Council SF 2019 - Heroku Data 

  69. # Heroku Data Control Plane min_retention_time = 24.hours max_retention_time = 7.days default_replication_factor = 3 69 Data Council SF 2019 - Heroku Data 

Recommend


More recommend