how zhaopin built its event center using apache pulsar
play

How Zhaopin built its Event Center using Apache Pulsar Penghui Li - PowerPoint PPT Presentation

How Zhaopin built its Event Center using Apache Pulsar Penghui Li Sijie Guo Zhaopin.com Zhaopin.com is the biggest online recruitment service provider in China Zhaopin.com provides job seekers a comprehensive resume service, latest


  1. How Zhaopin built its Event Center using Apache Pulsar Penghui Li Sijie Guo

  2. Zhaopin.com Zhaopin.com is the biggest online recruitment service provider in China Zhaopin.com provides job seekers a comprehensive resume service, latest employment, and career development related information, as well as in-depth online job search for positions throughout China Zhaopin.com provides professional HR services to over 2.2 million clients and its average daily page views are over 68 million.

  3. Who are we Penghui Li queues and microservices - Tech lead of infrastructure team at zhaopin.com - 5+ years of experiences developing message - Apache Pulsar Committer

  4. Who are we Sijie Guo - Apache Pulsar Committer & PMC Member - Apache BookKeeper Committer & PMC Member - Interested in technologies around Event Streaming - Worked for Twitter and Yahoo before

  5. 1. Why building an Event Center 2. Why Apache Pulsar 3. Apache Pulsar at Zhaopin 4. Streaming Platform 5. Zhaopin’s contributions to Apache Pulsar

  6. Why building an Event Center Data Silos -> Unified Platform

  7. Data Silos To Enterprises MSMQ To End Users RabbitMQ Data Processing Kafka • High Maintenance Cost • Extremely hard to share data cross teams • Inconsistency between data silos • Doesn’t Scale • No consistent SLA Pain Points

  8. Data Silos To Enterprises MSMQ To End Users RabbitMQ Data Processing Kafka • High Maintenance Cost • Extremely hard to share data cross teams • Inconsistency between data silos • Doesn’t Scale • No consistent SLA Pain Points

  9. Unification - MQService RabbitMQ • Order Guarantee • Data rewind • Keep messages for longer period Problems Solved: • High availability • Scale-out Service • Simplified Operations RabbitMQ MQService Thrift Job Search Resume Service Submission Service MQTT HTTP RabbitMQ RabbitMQ RabbitMQ Problems Unsolved:

  10. Unification - MQService Online Services MQService Data Processing Kafka

  11. Why Building an Event Center Better order guarantee Better consumption parallelism Partition-2 Partition-1 Queue Partition-0 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 Consumer-1 Consumer-2 Consumer-3 New consumer Consumer-1 Consumer-1 New consumer Consumer-1 0 1 2 3 0,1,2,3 0,1,2,3 0,1,2,3

  12. Why Building an Event Center RabbitMQ is better for work queue use cases, more consumers can increase consumption. Kafka need more partitions to increase consumption. We used RabbitMQ a lot for work queue use cases.

  13. Why Building an Event Center Kafka integrates well with the data processing ecosystem (Flink, Spark), and provides high throughput. We used Kafka a lot for data processing.

  14. Why Building an Event Center The cost of operating two different message systems is high Data sits at two different silos But We need a unified platform to handle both scenarios

  15. Why Apache Pulsar Pulsar == Messaging + Storage

  16. What is Apache Pulsar “Flexible Pub/Sub messaging backed by durable log/stream storage ”

  17. Apache Pulsar - Multi Tenancy

  18. Apache Pulsar - Queue + Streaming

  19. Apache Pulsar - Cloud Native • Independent Scalability • Instant Failure Recovery • Balance-free on cluster expansions Layered Architecture

  20. Why Apache Pulsar 1. Pulsar provides a better abstraction of consumption patterns 2. Pulsar provides better fault tolerance and consistency options 3. Pulsar uses a scalable storage system (Apache Bookkeeper) 4. Hierarchical topic management and resource isolation Perfect match with our requirement.

  21. Apache Pulsar at Zhaopin 20+ core services, 6 billions msgs/day

  22. Unification - Apache Pulsar Online Services Apache Pulsar • No Data Silos • Queue + Streaming • Disaster Recovery • Infinite Message Storage (via Tiered Storage) • Data rewinding Problem Solved: Data Processing Queue Streaming

  23. Milestones POC 2018/07 2018/09 Pulsar on Production 2018/10 Pulsar based Event Center 
 1 billion msgs/day 2018/11 Win the best innovative platform award at Zhaopin 2018/12 3 billion msgs/day 2019/02 6 billion msgs/day

  24. Core Metrics 50+ Namespaces 3000+ Topics 6+ billion Messages per day 3TB Storage per day 20+ Core Services

  25. System Metrics Latency 99.5% < 5ms Write 100K+/s Read 200K+/s Network In 190MB+/s Network Out 550MB+/s

  26. Pulsar at Zhaopin 1. One copy of data, single source-of-truth. 2. Don’t worry about data consistency between RabbitMQ and Kafka 3. Multi-tenancy makes topic management easier 4. Strong data durability allows us to stop worrying about message loss

  27. Streaming Platform Beyond an Event Center

  28. Streaming Platform Pulsar S3 Hive Flink Pulsar SQL HDFS OSS Steaming Layer Tiered Storage

  29. Stream to Stream Stream -> Table Table -> Stream Stream -> Stream Stream -> Stream Table -> Table

  30. Unified Data Processing Hive Topic Topic Topic Topic Stream Processing

  31. Contribute to Apache Pulsar

  32. Zhaopin’s Contributions to Pulsar Client interceptors We use this feature to track message between producer and consumers Dead Letter Topic Time partitioned message tracker Service url provider We use this feature to dynamically switching traffic Hive Pulsar integration Muti-version Schema and more…

  33. Thank you

Recommend


More recommend