sve distributed video processing at facebook scale
play

SVE: Distributed Video Processing at Facebook Scale Qi Huang - PowerPoint PPT Presentation

SVE: Distributed Video Processing at Facebook Scale Qi Huang Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin


  1. SVE: Distributed Video Processing at Facebook Scale Qi Huang Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, Wyatt Lloyd Facebook , University of Southern California, Cornell, Princeton

  2. Video is growing across Facebook • FB: 500M users watch 100M hours video daily (Mar. 16) • Instagram: 250M daily active users for stories (Jun. 17) • All: many tens of millions of daily uploads, 3X NYE spike 01

  3. Processing is diverse and demanding Processing Re-encoding Video Input Classification video Thumbnail Pt. 1 Pt. 2 Legacy System SVE Scaling Challenges Impact of Design 02

  4. Legacy: upload video file to web server Client Web Server She is having so much fun with #MSQRD 03

  5. Legacy: preserve original for reliability Original Client Web Server Storage She is having so much fun with #MSQRD 04

  6. Legacy: process after upload completes Original Processing Client Web Server Storage She is having so much fun with #MSQRD 05

  7. Legacy: encode w/ varying bitrates Original Processing Client Web Server Storage 720P She is having so much fun with #MSQRD 4Mbps 1080P 16Mbps 480P 1.5Mbps 06

  8. Legacy: store encodings before sharing Original Final Processing Client Web Server Storage Storage 720P She is having so much fun with #MSQRD 4Mbps 1080P 16Mbps 480P 1.5Mbps 07

  9. Sharing with adaptive streaming Final Client Web Server Storage 720p 480p FBCDN 08

  10. Focus: pre-sharing pipeline Original Final Client Web Server Processing Storage Storage All steps from when a user starts an upload until a video is ready to be shared 09

  11. Serial pipeline leads to slow processing Original Final Client Web Server Processing Storage Storage 10

  12. Monolithic script slows development “Let’s experiment speech recognition, add a logic to extract audio and analysis” “Change color coding at “We need to change the thumbnail different time” Original Final Blob Client Web Server Processing generation logic for videos > x Storage Storage minutes to create scene-based scrubber preview” “We want to experiment AI-based “Pass-through for encodings to spend 10x CPU for 30% small and well- compression improvement on formatted videos” popular videos” 11

  13. Challenges for video processing @ FB Speedy Users can share videos quickly Flexible Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 12

  14. Our Streaming Video Engine (SVE) is speedy, flexible, and robust 13

  15. Speedy: harness parallelism Users can share videos quickly • Overlap fault tolerance and processing • Overlap upload and processing • Parallel processing 14

  16. Architectural changes for parallelism Original Final Client Web Server Processing Storage Storage 15

  17. Architectural changes for parallelism Scheduler Worker Final Client Web Server Preprocessor Storage Worker Worker Original Storage 16

  18. Overlap fault tolerance and processing Scheduler Worker Final Write-through Client Web Server Preprocessor Storage Cache Worker Worker Original Storage 17

  19. Overlap upload and processing Scheduler Preprocessor Final Worker Client Web Server Storage Worker Split into segments Worker Original Storage 18

  20. Overlap upload and processing Scheduler Preprocessor Final Worker Client Web Server Storage Worker ...upload in progress Worker Original Storage 19

  21. Parallel processing w/ many workers Scheduler Preprocessor Final Worker 720P Encode Client Web Server Storage Worker 480P Encode ...upload in progress Worker Thumbnail Original Storage 20

  22. Parallel processing w/ many workers Scheduler Preprocessor Final 720P Encode Client Web Server Storage 480P Encode ...upload in progress Thumbnail Original Storage 21

  23. Parallel processing w/ many workers Scheduler Preprocessor Final 720P Encode Client Web Server Storage 480P Encode ...upload in progress Thumbnail Original Storage 22

  24. Parallel processing w/ many workers Scheduler Preprocessor Final Worker Client Web Server Storage Worker Worker Original Storage 23

  25. Three sources of parallelism Scheduler Preprocessor Final Worker Client Web Server Storage Worker Overlap fault tolerance and processing Overlap upload and processing Worker Original Parallel processing Storage 24

  26. Results: 2.3x ~ 9.3x speedup 9.3 10 Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 25

  27. Results: 2.3x ~ 9.3x speedup 9.3 10 Overlap upload & processing Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 26

  28. Results: 2.3x ~ 9.3x speedup Parallel Processing 9.3 10 Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 27

  29. Challenges for video processing @ FB Speedy Users can share videos quickly 2.3x ~ 9.3x speedup Flexible Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 28

  30. Flexible: build DAG framework Thousands of engineers can write pipelines for tens of apps • DAG of computation on the stream-of-tracks abstraction • Engineers write only sequential tasks in a familiar language • Dynamic DAG generation per video 29

  31. DAG on stream-of-tracks abstraction $pipeline = Pipeline.build() Images $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() $audio_track=$pipeline>addTrack(AUD_TYPE) Sound ->addTask() Input video $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Metadata Track 30

  32. DAG on stream-of-tracks abstraction Encode(HD) Encode(SD) $pipeline = Pipeline.build() Images Thumbnail $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask(Encode(HD), Encode(SD), Thumb) ->addTask(Encode(HD, 10s), ->addTask() Encode(SD, 10s), Thumb(10s)) Encode(AAC) $audio_track=$pipeline>addTrack(AUD_TYPE) Sound ->addTask(Encode(AAC)) ->addTask() $meta_track=$pipeline>addTrack(META_TYPE) ->addTask(Analysis) ->addTask() Analysis Metadata Tasks Track 31

  33. DAG on stream-of-tracks interface Cnt Segments Encode(HD, 10sec) Encode(HD) Cnt Segments Encode(SD, 10sec) Encode(SD) Combine Tracks Images Cnt Segments Thumbnail(10sec) Thumbnail Notification Encode(AAC) Sound Video Classification Analysis Metadata Tasks Sync Point Tasks Track 32

  34. Dynamic DAG Generation DAG Structure Preprocessor Scheduler $pipeline = Pipeline.build() DAG $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() Web Server Generation $audio_track=$pipeline>addTrack(AUD_TYPE) ->addTask() Worker Code $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Worker Worker Worker Cache 33

  35. Dynamic DAG Generation DAG Structure Preprocessor Scheduler DAG Web Server Generation Encode(HD) Code Encode(SD) Encode(AAC) Analysis Cache 34

  36. Dynamic DAG Generation DAG Structure Preprocessor Scheduler $pipeline = Pipeline.build() DAG $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() Web Server Generation $audio_track=$pipeline>addTrack(AUD_TYPE) ->addTask() Worker Code $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Worker Worker Worker Cache 35

  37. One system for 15+ applications • Generate billions of tasks per day • Varying DAG size • 360 video has thousands of tasks per upload • Newsfeed post averages at 153 tasks per upload • Instagram averages at 22 tasks per upload • Messenger averages at 18 tasks per upload 36

  38. Challenges for video processing @ FB Speedy 2.3x ~ 9.3x speedup Flexible One system for 15+ applications Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 37

  39. Robust: tolerate overload Handle faults and overload that is inevitable at scale • Rely on priority to degrade non-latency-sensitive tasks • Defer full video processing for some new uploads • Load-shedding across global deployments 38

  40. 3X peak load during New Year Eve NYE Xmas Upload volume 3X Date 39

  41. Prepare for overload Scheduler Worker Worker Final Blob Client Web Server Preprocessor Preprocessor Storage Worker Worker Worker Worker Original Storage 40

  42. Use priority for worker overload Worker Worker Worker Worker Worker Worker Scheduler Hi-priority queue Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Low-priority queue Worker Worker Worker Worker Worker Worker Only assign hi-pri tasks Worker Worker Worker Worker Worker Worker under overload 41

  43. Defer full video processing Preprocessor Scheduler DAG Web Server Generation Code Hi-priority queue Original Storage Cache 42

  44. Regional redirection Scheduler Worker Web Server Preprocessor Scheduler Worker Traffic: Worker Traffic: Local distribution → 70% Local distribution → 100% Worker Remote distribution → 30% Preprocessor Worker Worker 43

Recommend


More recommend