the unbundled database
play

THE UNBUNDLED DATABASE Leveraging the unbundled database via - PowerPoint PPT Presentation

THE UNBUNDLED DATABASE Leveraging the unbundled database via distributed logs and stream processing Who Am I? Data Infrastructure at Pluralsight Software and data engineering at Software engineering at WDPRO Rackspace Hosting | 2


  1. THE UNBUNDLED DATABASE Leveraging the unbundled database via distributed logs and stream processing

  2. Who Am I? Data Infrastructure at Pluralsight Software and data engineering at Software engineering at WDPRO Rackspace Hosting | 2

  3. Pluralsight W ? h n a r t a s e h l o I u d l i d d I t l a e h a W r n ? TECHNOLOGY LEARNING PLATFORM W ? h e e m r e p S l e h h o n u l a d c I o S h t W a r t ? | 3

  4. Table of Contents page page page page page 4 17 22 30 39 Microservices overview Event Driven Services The Distributed Log Kafka Log Semantics The Unbundled Database page 52 Stream Processing | 4

  5. Microservices Background Challenges Data dichotomy Streams

  6. Why? SCALABILITY | 6 6

  7. Independence comes at a cost COMPLEXITY BOUNDARIES Independence is a double-edged sword. | 7

  8. Services depend on each other Most business services share the Over time, services may become Services are inherently part of a bigger, same notions of core facts. unable to retain the same clear separation of concerns. interconnected ecosystem. This makes their futures inevitably connected. | 8

  9. Data on the “inside” vs Data on the “outside” Data on the inside Data on the outside Encapsulated private data contained Information that flows between within a service. independent services. | 9

  10. How do services share data? Three well-known approaches: Service interfaces Messaging Shared databases

  11. Service Interfaces Synchronized changes are hard! Data and functionality are encapsulated in the service. Goal is to clearly separate concerns between services and define different bounded contexts. | 11

  12. Messaging Middleware Data and functionality are scattered across organization. Messaging architectures can scale well. Even though messaging architectures can move massive amounts of data, they don’t provide any historical context, which can lead to data divergence over time. | 12

  13. Shared Databases Functionality is encapsulated within the service; data is not. Shared databases concentrate too much data in a single place. For microservices, databases create an usually strong rich coupling. This is due to the broad interface that databases expose to the outside world. | 13

  14. Data Dichotomy Data on the outside Data on the outside interface interface Data on the inside Data on the inside Databases Services Service interfaces minimize the data they Data systems are about exposing data. expose to the outside. Services are about hiding it. Database interfaces, on the other hand, tend to amplify the data they hold. | 14

  15. Data Diverges Overtime Different services make different interpretations of the data they consume, which leads to divergent information. Services also keep that data around; data is altered and fixed locally and soon it doesn’t represent the original dataset anymore. | 15

  16. ������������� ����� Looking ahead: Sharing data with distributed logs USER SERVICE WRITES TO SHOPPING CART CATALOG SERVICE SERVICE REPLICATES REPLICATES Events are broadcast to a log. USER COMMIT LOG REPLICATES REPLICATES RETURNS SERVICE FULFILLMENT SERVICE | 16

  17. Event Driven Services

  18. Ways services interact Commands Queries Events Commands are actions in the form of Requests to look up some data point. Events can be thought of are both a side effect generating requests fact and a trigger. Queries are side effect free and leave indicating some operation to be the state of the system unchanged. They express something that has performed by another service. happened, usually in the form of a Commands expect a response. notification. | 18

  19. Event Driven Services Broadcast events to a centralized, This paradigm is in a departure from I broadcast what I did! immutable stream of facts. request-driven services, where flow resides in commands and queries. Downstream freedom to react, adapt and change to any consumer. There are also some other interesting gains that event driven services provide, such as Exactly Once Processing. | 19

  20. Advantages State Transfer Decoupling Locality Events are both triggers and facts, Both data producers and consumers Queries and lookups are local to the that can be used to notify and are completely decoupled. There is bounded context and can be propagate entity state transfers. no API binding them together, no optimized in the best way that fits the synchronized changes that need to current use case. be performed. | 20

  21. The Single Writer Principle Writes to A single service owns all events for a HADOOP MATERIALIZED VIEWS/CACHE single type. ETL Having a single code path helps with data quality, consistency, and other EVENT STREAM / LOG SERVICE data sharing concerns. This is important because these TRANSF events represent durable shared Replicates to facts. | 21

  22. The Distributed Log concepts Kafka Topics and partitions Log

  23. What’s a log? Old New Messages are added here Reads and writes are sequential Ordered, immutable sequence of records that is operations. continuously appended to. They are, therefore, sympathetic to the underlying media, leveraging pre-fetch, the various layers of caching and naturally batching similar operations together. | 23

  24. Writing to the Log Old New Messages are added here Data is stored in the log as stream of Due to their structure, logs can be Writes are append only, always added to the head of bytes. optimized. the log. For instance, when writing data to Kafka, data is copied directly from the disk buffer to the network buffer, without any memory cache. | 24

  25. Reading from the Consumer 2 Log Only sequential access Old New Seek to offset and SCAN Consumer 1 Both reads and writes are sequential Since the log is durable, messages Reads are performed by seeking to a specific operations. can be replayed for as long as they exist in the log. position and sequentially scanning. Messages are read in the order they were written. Consumers are responsible for periodically recording their position in the log. | 25

  26. Kafka A distributed streaming platform | 26

  27. Key Capabilities Publish / Subscribe Storage Processing Kafka is like a message queue or Stores streams of records using Kafka can process and apply logic to enterprise messaging system, but replicated, fault-tolerant, durable streams of records as they occur. with some very distinct design mechanisms. concerns and side effects. Persists all published records — whether or not they have been consumed, using a configurable retention period. | 27

  28. The Kafka Broker BROKER BROKER BROKER BROKER Linearly Scalable Resilient Fault Tolerant Scaling is a matter of adding more Messages are replicated across different Retries, message acknowledgement, nodes to an existing cluster. nodes. ack strategies, are all baked into the Rebalancing, leader election, and platform. replication are automatically adjusted. | 28

  29. Topics and Partitions Topics Partitions Topics are categories or feed names to which Split into ordered commit logs called Each message is assigned a records are published to. partitions. sequential id called an offset. Data in a topic is retained for a Allow the logs to scale beyond a size configurable period of time. that will fit a single broker. Act as the unit of parallelism. | 29

  30. Kafka Log Ordering guarantees Message durability Semantics Load balancing Compaction Storage Topic types

  31. Ordering guarantees Consumer Service Consumer Service Consumer Keys map to the partitions Group Relative Ordering Global Ordering Most business systems need strong ordering Messages that require relative Requires a single partition topic. guarantees. ordering must be sent to the same Tends to come up when migrating partition. legacy systems where global ordering Message keys will map the same was an assumption. Consumers in a group are responsible for a partition. single partition, so ordering is guaranteed. Throughput limited to a single machine. | 31

  32. Message Durability Producer Messages are written to a leader and Kafka provides durability through replication. then replicated to a user-defined number of brokers. Records can be configured to be persisted for a period of time or based on keys. | 32

  33. Kafka can load balance services Consumer Consumer Consumer Group If a consumer leaves a group for any Kafka assigns whole partitions to Load balancing provides high availability reason, Kafka will detect this change different consumers. and re-balance how messages are In other words, a single partition can distributed across the remaining only ever be assigned to a single consumers. consumer. If the failed consumer comes back Since this is always true, ordering is online, load is balanced again. guaranteed, across failures and restarts. | 33

  34. Compaction ‘Compacted Topics’ retain only the Compacted tops reduce how quickly Key-based datasets can be compacted most recent events, with any old a dataset grows, reducing storage events, for a certain key removed. requirements while also increasing performance of replication jobs. They also support deletes. | 34

Recommend


More recommend