Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu + , Riad S. Wahby, Keith Winstein Stanford University, + Saratoga High School https://snr.stanford.edu/salsify
Outline • Introduction • Salsify's New Architecture • Measurement Testbed • Evaluation • Conclusions � 2
Remote Surgery Cloud Video Gaming NSDI’18 AT&T-MOBILE Teleoperation of Robots and Vehicles
Video Conferencing
Video Conferencing (reality)
WebRTC (Chrome 65)
Current systems do not react fast enough to network variations , end up congesting the network, causing stalls and glitches .
Enter Salsify • Salsify is a new architecture for real-time Internet video. • Salsify tightly integrates a video-aware transport protocol , with a functional video codec , allowing it to respond quickly to changing network conditions . • Salsify achieves 4.6 ⨉ lower p95-delay and 2.1 dB SSIM higher visual quality on average when compared with FaceTime, Hangouts, Skype, and WebRTC. � 8
Outline • Introduction • Salsify’s New Architecture • Measurement Testbed • Evaluation • Conclusions � 9
Today's systems combine two (loosely-coupled) components video transport codec protocol � 10
Two distinct modules, two separate control loops target bit rate video transport codec protocol 24 frames/s 300 packets/s compressed frames � 11
Shortcomings of the conventional design • The codec can only achieve the bit rate on average . • Individual frames can still congest the network. • The resulting system is slow to react to network variations. � 12
Salsify explores a more tightly-integrated design transport protocol & video codec � 13
Brand-new architecture based on components we know and love! • Individual component of Salsify are not exactly new: • The transport protocol is inspired by “packet pair” and “Sprout-EWMA”. • The video format, VP8, was finalized in 2008. • The functional video codec was described at NSDI’17. • Salsify is a new architecture for real-time video that integrates these components in a way that responds quickly to network variations. � 14
Salsify’s architecture: Video-aware transport protocol transport protocol & video codec � 15
Video-aware transport protocol “What should be the size of the next frame?” * without causing excessive delay There’s no notion of bit rate, only the next frame size! • Transport uses packet inter-arrival time , reported by the receiver. • � 16
The sender does not transmit continuously • Pauses between frames give frame i frame i+1 the receiver a “pessimistic” Sender view of the network. • Receiver treats each frame Receiver of the video as a separate t ₁ t ₂ t ₃ t ₄ t ₅ grace packet train. period � 17
Salsify’s architecture: Functional video codec transport protocol & video codec � 18
Transport tells us how big the next frame should be, but... It’s challenging for any codec to choose the appropriate quality settings upfront to meet a target size —they tend to over-/undershoot the target. � 19
How to get an accurate frame out of an inaccurate codec • Trial and error: Encode with different quality settings, pick the one that fits. • Not possible with existing codecs. � 20
After encoding a frame, the encoder goes through a state transition that is impossible to undo frame frame frame frame � 21
There’s no way to undo an encoded frame in current codecs encode ( 🏟 , 🏟 ,... ) → frames ... The state is internal to the encoder—no way to save/restore the state. � 22
Functional video codec to the rescue encode ( state , 🏟 ) → state ′ , frame Salsify’s functional video codec exposes the state that can be saved/restored. � 23
Order two, pick the one that fits! • Salsify’s functional video codec can explore different execution paths without committing to them. • For each frame, codec presents the transport with three options: A slightly-higher-quality version, B K 0 5 A slightly-lower-quality version, r e t t e b Discarding the frame. w o r s e 1 0 K B � 24
Salsify’s architecture: Unified control loop transport protocol & video codec � 25
Codec → Transport “Here’s two versions of the current frame.” B K 0 5 r e t t e b w o r s e 2 5 K B target frame size 30 KB � 26
Transport → Codec “I picked option 2. Base the next frame on its exiting state.” 2 5 K B target frame size 30 KB � 27
Codec → Transport “Here’s two versions of the latest frame.” B K 0 5 r e t t e b w o r s e 2 5 K B target frame size 55 KB � 28
Transport → Codec “I picked option 1. Base the next frame on its exiting state.” B K 0 5 target frame size 55 KB � 29
Codec → Transport “Here’s two versions of the latest frame.” B K 0 7 r e t t e b w o r s e 5 2 5 0 K K B B target frame size 5 KB � 30
Transport → Codec “I cannot send any frames right now. Sorry, but discard them.” target frame size 5 KB � 31
Codec → Transport “Fine. Here’s two versions of the latest frame.” 45 KB better worse 20 KB target frame size 50 KB � 32
Transport → Codec “I picked option 1. Base the next frame on its exiting state.” 45 KB target frame size 50 KB � 33
There’s no notion of frame rate or bit rate in the system. Frames are sent when the network can accommodate them.
Outline • Introduction • Salsify's New Architecture • Measurement Testbed • Evaluation • Conclusions � 35
Goals for the measurement testbed • A system with reproducible input video and reproducible network traces that runs unmodified version of the system-under-test. • Target QoE metrics: per-frame quality and delay . � 36
emulated network receiver HDMI output barcoded video video in/out (HDMI) HDMI to USB camera
Sent Image Timestamp: T+0.000s Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM
Outline • Introduction • Salsify's New Architecture • Measurement Testbed • Evaluation • Conclusions � 39
Evaluation results: Verizon LTE Trace 18 Video Quality (SSIM dB) 16 Skype WebRTC (VP9-SVC) 14 FaceTime WebRTC 12 r e t 10 t e Hangouts B 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) � 40
Evaluation results: Verizon LTE Trace 18 Video Quality (SSIM dB) 16 Skype WebRTC (VP9-SVC) 14 FaceTime WebRTC 12 Status Quo (conventional transport 10 and codec) Hangouts 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) � 41
Evaluation results: Verizon LTE Trace 18 Video Quality (SSIM dB) 16 Skype WebRTC (VP9-SVC) 14 FaceTime WebRTC Salsify (conventional codec) 12 Status Quo (conventional transport 10 and codec) Hangouts 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) � 42
Evaluation results: Verizon LTE Trace 18 Salsify Video Quality (SSIM dB) 16 Skype WebRTC (VP9-SVC) 14 FaceTime WebRTC Salsify (conventional codec) 12 Status Quo (conventional transport 10 and codec) Hangouts 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) � 43
Evaluation results: AT&T LTE Trace 16 Salsify 15 WebRTC (VP9-SVC) Video Quality (SSIM dB) FaceTime 14 13 WebRTC 12 Hangouts 11 Better 10 Skype 9 8 5000 2000 1000 700 500 300 200 Video Delay (95th percentile ms) � 44
Evaluation results: T-Mobile UMTS Trace 14 Salsify Video Quality (SSIM dB) 13 WebRTC (VP9-SVC) Skype WebRTC 12 11 FaceTime Better 10 Hangouts 9 18000 14000 10000 7000 5000 3500 Video Delay (95th percentile ms) � 45
Evaluation results: Emulated Wi-Fi (no variations, only loss) 12 WebRTC (VP9-SVC) Video Quality (SSIM dB) 11 WebRTC 10 Salsify 9 FaceTime Better Hangouts 8 Skype 7 15000 5000 2000 1000 700 500 300 Video Delay (95th percentile ms) � 46
Check out the demo videos at Salsify WebRTC (Google Chrome 65.0 dev) https://snr.stanford.edu/salsify
Outline • Introduction • Salsify's New Architecture • Measurement Testbed • Evaluation • Conclusions � 48
Codecs have been treated as black boxes in video systems for a long time.
New systems have emerged from this functional interface • NSDI’17: ExCamera • Using the functional codec to do massively-parallel video compression on AWS Lambda. • NSDI’18: Salsify • Using the functional codec to compress frames to the right size, at the right time. • Same interface, two different applications. � 50
We encourage the codec designer and implementors to include save/restore state in the codecs—even if it’s large or opaque.
Improvements to video codecs may have reached the point of diminishing returns, but changes to the architecture of video systems can still yield significant benefits.
Recommend
More recommend