women in big data x pinterest welcome
play

Women in Big Data x Pinterest Welcome! Regina Karson, WiBD Chapter - PowerPoint PPT Presentation

Women in Big Data x Pinterest Welcome! Regina Karson, WiBD Chapter Director Tian-Ying Chang, Engineering Manager Goku: Pinterests in house Time-Series Database Tian-Ying Chang Sr. Staff Engineer Manager Pinterest Pinterest Discover new


  1. Women in Big Data x Pinterest

  2. Welcome! Regina Karson, WiBD Chapter Director Tian-Ying Chang, Engineering Manager

  3. Goku: Pinterest’s in house Time-Series Database Tian-Ying Chang Sr. Staff Engineer Manager Pinterest

  4. Pinterest Discover new ideas and find ● inspiration to do the things they love 300M+ MAU, billions pins ○ Metrics for monitoring site health ● Latency, QPS, CPU, memory ○ Metric about product quality ● MAU, Impression, etc ○ Monitoring service needs to be fast, ● reliable and scalable Confidential 7

  5. Monitoring at Pinterest Graphite ● Easy to setup at small scale ○ Down sampling support long range query well ○ Hard to scale ○ Deprecated at Pinterest’s current scale ○ OpenTSDB ● Rich query, tagging support ○ Easy to scale horizontally with underlying HBase cluster ○ Long latency for high cardinality data ○ Long latency for query over longer time range ○ ■ No down sampling Heavy GC worsened by combined heavy write QPS and long range scan ○ Confidential 8

  6. Why OpenTSDB is not good fit HBase Schema ● Row key: <metric><timestamp>[<tagk1><tagv1><tagk2><tagv2>...] (metric, tag key values are encoded in 3 bytes) ○ ○ Column qualifier: <delta to row key timestamp(up to 4 bytes)> Unnecessary Scan ● Query: m1{rpc=delete} [t1 to t2] ○ ○ <m1><t1><host=h1><rpc=delete> <m1><t1><host=h1><rpc=get> ○ ○ <m1><t1><host=h1><rpc=put> HBase RS <m1><t2><host=h2><rpc=delete> ○ Data size ● ○ 20 bytes per data point Aggregation ● HBase OpenTSDB RS Read data onto one opentsdb and aggregate ○ ○ Ex. ostrich.gauges.singer.processor.stuck_processors {host=*} Serialization ● Json. Super slow when there are many many data points to return ○ HBase RS Confidential 9

  7. Goku is here to save Confidential

  8. Write OpenTSDB Read Kafka ● Read/|Write requests are sent to a random selected OpenTSDB box, and then routed to corresponding RS based on row key range Ingestor Statsboard (Write Client) (Read Client) ● Reads: raw data is read from individual HBase RS, send to OpenTSDB box, then aggregated at openTSDB, then send result to client OpenTSDB HBase HBase HBase RS RS RS

  9. Write Goku cluster Read Kafka ● A Goku box is not only storage engine, but also: ○ Proxy that route requests ○ Aggregation engine Ingestor Statsboard ● Client can send requests to any Goku (Write Client) (Read Client) box who will route requests ○ Scatter and Gather Goku Goku Goku Goku

  10. Two level sharding 1.Requests sent to a random goku box 6.return response ● Group# hashed from metric name ○ E.g tc.metrics.rpc_latency Shard ● Shard# hashed from metric + set Goku 5.another aggregation config 2.comput sharding to of tagk and tagv G2: S1 and S2, then ○ E.g. look up shard config tc.metrics.rpc_latency{rpc=put,host=m1} 3.route requests ● Control read fanout while easy to scale out individual group G2:S2 G3:S1 G2:S1 G1:S3 G1:S1 G1:S2 4.Retrieve data and local aggregate G3:S1 G3:S3 G1:S3 G4:S1 G4:S2

  11. Goku #1. Time Series Database based on Beringei Confidential

  12. Beringei Write Read ● I n-memory key value store ○ Key: string ○ Value: list of <timestamp, value> pairs ● Gorilla compression ○ Delta-of-Delta encoding on timestamps Shard ○ Delta encoding on values ● Stores most recent 24 hours data ts ts Bucket (configurable) ts ts Gorilla Gorilla Bucket Encode Decode ts ts ● One level of sharding to distribute Bucket ts ts ● Datapoint size reduced ○ from 20 bytes to 1.37 bytes Disk Beringei

  13. Goku #2. Query Engine -- Inverted Index Confidential

  14. Write Read Inverted Index ● A map from search term to its bitset ● Built along with processing incoming data points ● Fast lookup when serve query Shard Inverted Index ● Support query filters ○ ExactMatch : metricname{host=h1,api=get). => intersect ts ts Bucket bitsets of metricname, host=h1, api=get ○ Or : metricname{host=h1|h2}. => union bitsets of host=h1 ts ts Gorilla Gorilla and host=h2 and intersect with bitset of metricname Bucket Decode Encode ts ts ○ Nor : metricname{host=not_literal_or(h1|h2)}. => remove bitsets of host=h1 and host=h2 from bitset of metricname Bucket ts ts ○ Wildcard : a. metricname{host=*} => intersect bitsets of metricname and host=*; b.metricname{host=h*} => convert to regex filter ○ Regex : metricname{host=h[1|2].*, api=get, az=us-east-1} => apply other filters first. Then build a regex pattern based DISK on the filter values and then iterate corresponding full metric names of all ids after applying other filters. Goku Phase #1

  15. Goku #3. Query Engine -- Aggregation Confidential

  16. Write Read Aggregation ● Post-process after retrieving all relevant time Aggregation series ● Mimic OpenTSDB’s aggregation layer Shard ● Support basic aggregators, including SUM, Inverted Index AVG, MAX, MIN, COUNT, DEV and Downsampling ts ts Bucket ● Versus OpenTSDB ts ts Gorilla Gorilla Bucket Decode Encode ○ OpenTSDB does aggregation on a ts ts single instance since HBase RS don’t Bucket ts ts know how to aggregate ○ Goku does aggregation in two phases. First on each leaf goku node, and DISK second on the routing goku node ○ Distribute the computation and save data on the wire Goku Phase #1

  17. AWS EFS Confidential

  18. Write Read AWS EFS ● Store log and data files to recovery Aggregation ● Posix compliant ● Data durability Shard Inverted Index ● Operate it asynchronously, so latency isn’t an issue ts ts Bucket ● Easy to move shard ts ts Gorilla Gorilla Bucket Decode Encode ts ts ● Easy to use on AWS Bucket ts ts AWS EFS Goku Phase #1

  19. Phase #2 Disk based Goku Confidential

  20. Write Read Goku Phase #2 -- Disk based S3 ● Hadoop job constantly Aggregation runs to compact data Group into disk with downsample Shard Inverted Index ● Data stored into S3 for better availability and ts ts Bucket low cost Hadoop job ts ts Gorilla Gorilla Bucket Decode Encode ● RocksDB is used for ts ts Bucket online serving data ts ts AWS EFS Distributed KV store(Rock Store) Goku Phase #2

  21. Next step for Goku ● Replication ○ Currently dule write to two clusters for fault tolerance ○ Replication to improve availability and consistency ● More query possibilities ○ TopK ○ Percentile ● Analytics use case ○ Another big consumer of Time Series data Confidential 24

  22. Thanks!

  23. Scheduling Asynchronous Tasks at Pinterest Isabel Tallam Data (Core Services) Team Pinterest

  24. Why asynchronous tasks? Asynchronous task processing service Design considerations

  25. Why asynchronous tasks? %$#* M A P S SPAM SPAM %$#* SPAM SPAM # * % $

  26. Why asynchronous tasks? Asynchronous task processing service Design considerations

  27. Pinlater Asynchronous Task Processing Service Pinlater features - High throughput - Easily create new tasks - At-least-once guarantee - Strict ack mechanism - Metrics and debugging support - Different task priorities - Scheduling future tasks - Python, Java support

  28. Pinlater Asynchronous Task Processing Service Pinlater components Pinlater Pinlater Pinlater Pinlater Pinlater Pinlater Clients Clients Servers Workers Clients Servers Workers Servers Workers insert request /ack Storage Storage Storage Master Slave Master Slave Master Slave

  29. Pinlater Asynchronous Task Processing Service Pinlater Stats ~1000 different tasks defined ~8 billion task instances processed per day ~3000 Pinlater hosts

  30. Why asynchronous tasks? Asynchronous task processing service Design considerations

  31. Pinlater Asynchronous Task Processing Service Storage Layer Pinlater Pinlater Pinlater Servers Servers Servers Cache Storage Storage Storage Master Slave Master Slave Master Slave

  32. Pinlater Asynchronous Task Processing Service Handling failures in the system Pinlater Pinlater Pinlater Pinlater Pinlater Pinlater Clients Clients Servers Workers Clients Servers Workers Servers Workers insert request /ack timeout monitor Storage Master Slave

  33. Pinlater Asynchronous Task Processing Service Thank You!

  34. Experimentation at Pinterest Lu Yang Data (Data Analytics - Core Product Data) Team Pinterest

  35. Outline 1 Background 2 Platform 3 Architecture

  36. What is an a/b experiment? It is a method to compare two (or more) variations of something to determine which one performs better against your target metrics OR

  37. With Experiment Mindset Idea → Feature Development → Release to small % of users → Measure impact → Release to 100% of users based on the impact of sample launch Existing code - CONTROL Changed code - ENABLED A randomized, controlled trial with measurement All Users Not in experiment

  38. Number of Experiments Over Time

Recommend


More recommend