timelines at scale
play

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted - PowerPoint PPT Presentation

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted twitter.com User / Site Streams home_timeline API Mobile Push (SMS, etc.) Queried Search API Track / Follow Streams the challenge > 150M world wide active users > 300K


  1. timelines at scale @ra ffi qcon sf 2012

  2. Pull Push Targeted twitter.com User / Site Streams home_timeline API Mobile Push (SMS, etc.) Queried Search API Track / Follow Streams

  3. the challenge ⇢ > 150M world wide active users ⇢ > 300K QPS for timelines ⇢ naïve timeline “materialization” can be slow

  4. Write API Ingester Fanout Batch Compute Timeline Cache Push Compute Search Cache HTTP Push Redis Redis Redis Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  5. Write API Social Ingester Fanout Graph Service Batch Compute Timeline Cache Push Compute Search Cache HTTP Push Redis Redis Redis Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  6. Write API Social Ingester Fanout Graph Service insert Batch Compute Timeline Cache Push Compute Search Cache HTTP Push Redis Redis Redis Hadoop Redis Earlybird Redis ⇢ keyed o ff Mobile Push “recipient” Timeline Blender ⇢ pipelined 4k Service “destinations” at a time ⇢ replicated

  7. Write API Ingester Fanout using redis Batch Compute Timeline Cache Push Compute Search Cache Tweet ID User ID Bits HTTP Push Redis Redis Redis Hadoop Redis 8 bytes 8 bytes 4 bytes Earlybird Redis ⇢ native list Mobile Push structure Timeline Blender Service

  8. Write API Ingester Fanout using redis Batch Compute Timeline Cache Push Compute Search Cache Tweet ID User ID Bits HTTP Push Redis Redis Redis Tweet ID User ID Bits Hadoop Tweet ID Redis Earlybird Redis ⇢ native list Mobile Tweet ID User ID Bits Push Tweet ID User ID Bits structure Tweet ID User ID Bits Tweet ID Timeline Blender ⇢ RPUSHX to Tweet ID User ID Bits Service Tweet ID User ID Bits only add to Tweet ID User ID Bits Tweet ID cached Tweet ID User ID Bits Tweet ID User ID Bits timelines

  9. Write API Ingester Fanout Batch Compute Timeline Cache Push Compute Search Cache HTTP Push Redis Redis Redis Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  10. Write API Fanout Timeline Cache Redis Redis Redis Timeline Gizmoduck TweetyPie Service

  11. Pull Push Targeted twitter.com User / Site Streams home_timeline API Mobile Push (SMS, etc.) Queried Search API Track / Follow Streams

  12. Write API Ingester Fanout Batch Compute Timeline Cache Push Compute Search Cache HTTP Push Redis Redis Redis Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  13. Write API Ingester Fanout blender Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis ⇢ queries one Mobile Push replica of all indexes Timeline Blender Service ⇢ merges & ranks results

  14. Write API Ingester Fanout Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  15. ⇢ O(n) write ⇢ O(1) read Cache Cache API API Redis Redis Write Read Redis Redis API API Redis Redis Earlybird Earlybird Write Read Earlybird Earlybird API API Earlybird Earlybird ⇢ O(1) write ⇢ O(n) read

  16. the challenge (part #2) ⇢ fanout can be really slow! ⇢ ...especially for high follower counts

  17. @ladygaga 31 million followers @katyperry 28 million followers @justinbieber 28 million followers @barackobama 23 million followers @ra ffi 0.019 million followers

  18. there are over 400 million tweets a day

  19. 4600 tweets a twee ≈ a second 0.2 ms

  20. Write API Ingester Fanout Timeline Cache Search Index Redis Redis Earlybird Redis Earlybird Redis search index fanout index ⇢ [‘hello’,‘world’] ⇢ [@danadanger, ...]

  21. User Intent Query Expansion “Hello, world” “Hello” AND “world” @ra ffi ’s home timeline home_timeline:ra ffi

  22. User Intent Query Expansion “Hello, world” “Hello” AND “world” user_timeline:nelson @ra ffi ’s home timeline OR user_timeline:danadanger

  23. User Intent Query Expansion “Hello, world” “Hello” AND “world” @ra ffi ’s home timeline home_timeline:ra ffi

  24. User Intent Query Expansion “Hello, world” “Hello” AND “world” home_timeline:ra ffi @ra ffi ’s home timeline OR user_timeline:taylorswift13

  25. Write API Ingester Fanout Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Service

  26. Synchronous Path Write API Ingester Fanout Asynchronous Path Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Query Path Service

  27. Synchronous Path Write API Ingester Fanout Asynchronous Path Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Query Path Service

  28. Synchronous Path Write API Ingester Fanout Asynchronous Path Batch Compute Timeline Cache Push Compute Search Index HTTP Push Redis Redis Earlybird Hadoop Redis Earlybird Redis Mobile Push Timeline Blender Query Path Service

  29. timeline query statistics ⇢ >150m active users worldwide ⇢ >300k qps poll-based timelines @ 1ms p50 / 4ms p99 ⇢ >30k qps search-based timelines

  30. tweet input ⇢ ~400m tweets per day ⇢ ~5K/sec daily average ⇢ ~7K/sec daily peak ⇢ >12K/sec during large events

  31. timeline delivery statistics ⇢ 30b deliveries / day (~21m / min) ⇢ 3.5 seconds @ p50 to deliver to 1m ⇢ ~300k deliveries / sec

  32. thanks!

Recommend


More recommend