SVE: Distributed Video Processing at Facebook Scale Qi Huang Petchean Ang, Peter Knowles, Tomasz Nykiel, Iaroslav Tverdokhlib, Amit Yajurvedi, Paul Dapolito IV, Xifan Yan, Maxim Bykov, Chuen Liang, Mohit Talwar, Abhishek Mathur, Sachin Kulkarni, Matthew Burke, Wyatt Lloyd Facebook , University of Southern California, Cornell, Princeton
Video is growing across Facebook • FB: 500M users watch 100M hours video daily (Mar. 16) • Instagram: 250M daily active users for stories (Jun. 17) • All: many tens of millions of daily uploads, 3X NYE spike 01
Processing is diverse and demanding Processing Re-encoding Video Input Classification video Thumbnail Pt. 1 Pt. 2 Legacy System SVE Scaling Challenges Impact of Design 02
Legacy: upload video file to web server Client Web Server She is having so much fun with #MSQRD 03
Legacy: preserve original for reliability Original Client Web Server Storage She is having so much fun with #MSQRD 04
Legacy: process after upload completes Original Processing Client Web Server Storage She is having so much fun with #MSQRD 05
Legacy: encode w/ varying bitrates Original Processing Client Web Server Storage 720P She is having so much fun with #MSQRD 4Mbps 1080P 16Mbps 480P 1.5Mbps 06
Legacy: store encodings before sharing Original Final Processing Client Web Server Storage Storage 720P She is having so much fun with #MSQRD 4Mbps 1080P 16Mbps 480P 1.5Mbps 07
Sharing with adaptive streaming Final Client Web Server Storage 720p 480p FBCDN 08
Focus: pre-sharing pipeline Original Final Client Web Server Processing Storage Storage All steps from when a user starts an upload until a video is ready to be shared 09
Serial pipeline leads to slow processing Original Final Client Web Server Processing Storage Storage 10
Monolithic script slows development “Let’s experiment speech recognition, add a logic to extract audio and analysis” “Change color coding at “We need to change the thumbnail different time” Original Final Blob Client Web Server Processing generation logic for videos > x Storage Storage minutes to create scene-based scrubber preview” “We want to experiment AI-based “Pass-through for encodings to spend 10x CPU for 30% small and well- compression improvement on formatted videos” popular videos” 11
Challenges for video processing @ FB Speedy Users can share videos quickly Flexible Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 12
Our Streaming Video Engine (SVE) is speedy, flexible, and robust 13
Speedy: harness parallelism Users can share videos quickly • Overlap fault tolerance and processing • Overlap upload and processing • Parallel processing 14
Architectural changes for parallelism Original Final Client Web Server Processing Storage Storage 15
Architectural changes for parallelism Scheduler Worker Final Client Web Server Preprocessor Storage Worker Worker Original Storage 16
Overlap fault tolerance and processing Scheduler Worker Final Write-through Client Web Server Preprocessor Storage Cache Worker Worker Original Storage 17
Overlap upload and processing Scheduler Preprocessor Final Worker Client Web Server Storage Worker Split into segments Worker Original Storage 18
Overlap upload and processing Scheduler Preprocessor Final Worker Client Web Server Storage Worker ...upload in progress Worker Original Storage 19
Parallel processing w/ many workers Scheduler Preprocessor Final Worker 720P Encode Client Web Server Storage Worker 480P Encode ...upload in progress Worker Thumbnail Original Storage 20
Parallel processing w/ many workers Scheduler Preprocessor Final 720P Encode Client Web Server Storage 480P Encode ...upload in progress Thumbnail Original Storage 21
Parallel processing w/ many workers Scheduler Preprocessor Final 720P Encode Client Web Server Storage 480P Encode ...upload in progress Thumbnail Original Storage 22
Parallel processing w/ many workers Scheduler Preprocessor Final Worker Client Web Server Storage Worker Worker Original Storage 23
Three sources of parallelism Scheduler Preprocessor Final Worker Client Web Server Storage Worker Overlap fault tolerance and processing Overlap upload and processing Worker Original Parallel processing Storage 24
Results: 2.3x ~ 9.3x speedup 9.3 10 Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 25
Results: 2.3x ~ 9.3x speedup 9.3 10 Overlap upload & processing Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 26
Results: 2.3x ~ 9.3x speedup Parallel Processing 9.3 10 Relative speedup 6.1 3.7 3 2.3 0 < 3M 3M ~ 10M 10M ~ 100M 100M ~ 1G >1G Video size buckets 27
Challenges for video processing @ FB Speedy Users can share videos quickly 2.3x ~ 9.3x speedup Flexible Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 28
Flexible: build DAG framework Thousands of engineers can write pipelines for tens of apps • DAG of computation on the stream-of-tracks abstraction • Engineers write only sequential tasks in a familiar language • Dynamic DAG generation per video 29
DAG on stream-of-tracks abstraction $pipeline = Pipeline.build() Images $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() $audio_track=$pipeline>addTrack(AUD_TYPE) Sound ->addTask() Input video $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Metadata Track 30
DAG on stream-of-tracks abstraction Encode(HD) Encode(SD) $pipeline = Pipeline.build() Images Thumbnail $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask(Encode(HD), Encode(SD), Thumb) ->addTask(Encode(HD, 10s), ->addTask() Encode(SD, 10s), Thumb(10s)) Encode(AAC) $audio_track=$pipeline>addTrack(AUD_TYPE) Sound ->addTask(Encode(AAC)) ->addTask() $meta_track=$pipeline>addTrack(META_TYPE) ->addTask(Analysis) ->addTask() Analysis Metadata Tasks Track 31
DAG on stream-of-tracks interface Cnt Segments Encode(HD, 10sec) Encode(HD) Cnt Segments Encode(SD, 10sec) Encode(SD) Combine Tracks Images Cnt Segments Thumbnail(10sec) Thumbnail Notification Encode(AAC) Sound Video Classification Analysis Metadata Tasks Sync Point Tasks Track 32
Dynamic DAG Generation DAG Structure Preprocessor Scheduler $pipeline = Pipeline.build() DAG $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() Web Server Generation $audio_track=$pipeline>addTrack(AUD_TYPE) ->addTask() Worker Code $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Worker Worker Worker Cache 33
Dynamic DAG Generation DAG Structure Preprocessor Scheduler DAG Web Server Generation Encode(HD) Code Encode(SD) Encode(AAC) Analysis Cache 34
Dynamic DAG Generation DAG Structure Preprocessor Scheduler $pipeline = Pipeline.build() DAG $video_track=$pipeline>addTrack(IMG_TYPE) ->addTask() Web Server Generation $audio_track=$pipeline>addTrack(AUD_TYPE) ->addTask() Worker Code $meta_track=$pipeline>addTrack(META_TYPE) ->addTask() Worker Worker Worker Cache 35
One system for 15+ applications • Generate billions of tasks per day • Varying DAG size • 360 video has thousands of tasks per upload • Newsfeed post averages at 153 tasks per upload • Instagram averages at 22 tasks per upload • Messenger averages at 18 tasks per upload 36
Challenges for video processing @ FB Speedy 2.3x ~ 9.3x speedup Flexible One system for 15+ applications Thousands of engineers can write pipelines for tens of apps Robust Handle faults and overload that is inevitable at scale 37
Robust: tolerate overload Handle faults and overload that is inevitable at scale • Rely on priority to degrade non-latency-sensitive tasks • Defer full video processing for some new uploads • Load-shedding across global deployments 38
3X peak load during New Year Eve NYE Xmas Upload volume 3X Date 39
Prepare for overload Scheduler Worker Worker Final Blob Client Web Server Preprocessor Preprocessor Storage Worker Worker Worker Worker Original Storage 40
Use priority for worker overload Worker Worker Worker Worker Worker Worker Scheduler Hi-priority queue Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Low-priority queue Worker Worker Worker Worker Worker Worker Only assign hi-pri tasks Worker Worker Worker Worker Worker Worker under overload 41
Defer full video processing Preprocessor Scheduler DAG Web Server Generation Code Hi-priority queue Original Storage Cache 42
Regional redirection Scheduler Worker Web Server Preprocessor Scheduler Worker Traffic: Worker Traffic: Local distribution → 70% Local distribution → 100% Worker Remote distribution → 30% Preprocessor Worker Worker 43
Recommend
More recommend