Live Video Analytics at Scale with Approximation and Delay-Tolerance Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, Michael J. Freedman
Video cameras are pervasive 2
Video analytics queries AMBER Alert Intelligent Traffic System Electronic Toll Collection Video Doorbell 3 3
Video query: a pipeline of transform s • Vision algorithms chained together • Example: traffic counter pipeline transform transform transform transform decode detect object track object count object 4
Video queries are expensive in resource usage • Best car tracker [1] — 1 fps on an 8-core CPU • DNN for object classification [2] — 30GFlops transform transform transform transform decode b/g subtract track object count object • When processing thousands of video streams in multi-tenant clusters • How to reduce processing cost of a query? • How to manage resources efficiently across queries? [1] VOT Challenge 2015 Results. [2] Simonyan et al. CVPR abs/1409.1556, 2014 5
Vision algorithms are intrinsically approximate • Knobs: parameters / implementation choices for transforms Frame Rate Resolution Window Size Mapping Metric • License plate reader → window size • Car tracker → mapping metric • Object classifier → DNN model • Query configuration: a combination of knob values 6
Knobs impact quality and resource usage 720p 3 Quality=0.93, CPU=0.54 Frame Rate Resolution 480p 1 Quality=0.57, CPU=0.09 7
Knobs impact quality and resource usage Quality CPU Quality CPU Quality CPU Quality CPU 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 480p 576p 720p 900p1080p 1.7 1.6 1.5 1.4 1.3 DIST HIST SURF SIFT 2 4 6 8 10 12 14 Frame Resolution Object Mapping Metric Frame Rate Window Size Step Frame Rate Resolution Window Size Mapping Metric 8
Knobs impact quality and resource usage License Plate Reader 0.8 Quality of Result 0.6 0.4 0.2 0 0.01 0.1 1 10 100 1000 Resource Demand [CPU cores, log scale] • Orders of magnitude cheaper resource demand for little quality drop • No analytical models to predict resource-quality tradeoff • Different from approximate SQL queries 9
Diverse quality and lag requirements Lag: time difference between frame arrival and frame processing Toll Collection AMBER Alert Intelligent Traffic Quality? High Moderate High Lag? Hours Few Seconds Few Seconds 10
Goal Decide configuration and resource allocation to maximize quality and minimize lag within the resource capacity Configuration . Quality Resource Allocation . Lag 11
Video analytics framework: Challenges 1. Many knobs → large configuration space • No known analytical models to predict quality and resource impact 2. Diverse requirements on quality and lag • Hard to configure and allocate resources jointly across queries Configuration . Quality Resource Allocation . Lag 12
VideoStorm: Solution Overview Profiler query resource-quality Scheduler Workers profile utility function offline online 13
VideoStorm: Solution Overview Profiler Builds model Trades off query Reduces config resource-quality quality and lag Scheduler Workers profile space across queries utility function offline online 14
VideoStorm: Solution Overview Profiler query resource-quality Scheduler Workers profile offline online 15
Offline: query profiling • Profile: configuration ⟹ resource, quality • Ground-truth: labeled dataset or results from golden configuration • Explore configuration space, compute average resource and quality 0.8 Quality of Result 0.6 ⨂ is strictly higher quality better than 0.4 ⨂ in quality 0.2 and resource efficiency 0 0.01 0.1 1 10 100 1000 Resource Demand [CPU cores, log scale] more efficient 16
Offline: Pareto boundary of configuration space • Pareto boundary: optimal configurations in resource efficiency and quality • Cannot further increase one without reducing the other • Orders of magnitude reduction in config. search space for scheduling 0.8 Quality of Result Pareto optimal 0.6 higher quality 0.4 0.2 0 0.01 0.1 1 10 100 1000 Resource Demand more efficient 17
VideoStorm: Solution Overview Profiler resource-quality Scheduler Workers profile utility function offline online 18
Online: utility function and scheduling • Utility function: encode goals and sensitivities of quality and lag • Users set required quality and tolerable lag • Reward additional quality, penalize higher lag higher quality • Schedule for two natural goals: • Maximize the minimum utility – (max-min) fairness higher lag • Maximize the total utility – overall performance • Allow lag accumulation during resource shortage, then catch up 19
Online: scheduling approximate video queries • Queries: blue and orange Fair Quality-aware (tolerate 8s lag) Resource (cores) Resource (cores) 4 4 . . 3 3 1.0 2 2 0.8 Quality 1 1 0.6 0 0 0.4 10 22 38 10 22 38 Time Time 0.2 0.0 1.0 1.0 1 2 3 Lag Lag 0.8 0.8 CPU (core) Quality Quality 0.6 0.6 10 10 0.4 0.4 • Total CPU: 4 → 2 → 4 C2 C1 C2 C1 0.2 0.2 0.0 0 0.0 0 Time Time 10 22 38 10 22 38 • Fair scheduler: best ∑=1.0 ∑=1.5 configurations w/o lag 1.0 1.0 Lag Lag 0.8 0.8 Quality Quality • Quality-aware scheduler: 0.6 C2 C1 C2 0.6 C3 C3 C2 C3 10 10 0.4 0.4 allow lag → catch up 0.2 0.2 0.0 0 0.0 0 10 22 38 Time 10 22 38 Time 20
Additional Enhancements • Handle incorrect resource profiles • Profiled resource demand might not correspond to actual queries • Robust to errors in query profiles • Query placement and migration • Better utilization, load balancing and lag spreading • Hierarchical scheduling • Cluster and machine level scheduling • Better efficiency and scalability 21
VideoStorm Evaluation Setup • Platform: • Microsoft Azure cluster VideoStorm Manager Profiler + Scheduler • Each worker contains 4 cores of the 2.4GHz Intel Xeon processor and 14GB RAM • Four types of vision queries: • license plate reader 100 Worker Machines • car counter • DNN classifier • object tracker 22
Experiment Video Datasets • Operational traffic cameras in Bellevue and Seattle • 14 – 30 frames per second, 240P – 1080P resolution 23
Resource allocation during burst of queries Lag Goal=300s Lag Goal=20s High-Quality, Lag Goal=20s Share of Cluster CPUs • Start with 300 queries: 1.0 0.8 � Lag Goal=300s, low-quality ~60% 0.6 � Lag Goal=20s, low-quality ~40% 0.4 0.2 0.0 0 50 100 150 200 250 • Burst of 150 seconds (50 – 200): Time (seconds) � 200 LPR queries (AMBER Alert) Lag Goal=300s Lag Goal=20s High-Quality, Lag Goal=20s High-Quality, Lag Goal=20s 1.0 0.9 Quality 0.8 • VideoStorm scheduler: 0.7 � dominate resource allocation 0.6 120 significantly delay � Lag (sec) 100 80 60 run � with lower quality 40 20 All meet quality and lag goals 0 0 50 100 150 200 250 Time (seconds) 24
Resource allocation during burst of queries Lag Goal=300s Lag Goal=20s High-Quality, Lag Goal=20s Share of Cluster CPUs • Start with 300 queries: 1.0 0.8 � Lag Goal=300s, low-quality ~60% 0.6 � Lag Goal=20s, low-quality ~40% 0.4 0.2 0.0 0 50 100 150 200 250 • Burst of 150 seconds (50 – 200): • Compare to a fair scheduler with varying burst duration: Time (seconds) � 200 LPR queries (AMBER Alert) • Quality improvement: up to 80% Lag Goal=300s Lag Goal=20s High-Quality, Lag Goal=20s High-Quality, Lag Goal=20s • Lag reduction: up to 7x 1.0 0.9 Quality 0.8 • VideoStorm scheduler: 0.7 significantly delay � 0.6 120 run � with lower quality Lag (sec) 100 80 60 � dominate resource allocation 40 20 All meet quality and lag goals 0 0 50 100 150 200 250 Time (seconds) 25
VideoStorm Scalability • Frequently reschedule and reconfigure in reaction to changes of queries • Even with thousands of queries, VideoStorm makes rescheduling decisions in just a few seconds Number of Machines 100 200 500 1000 6 Scheduling 5 Time (s) 4 3 2 1 0 500 1000 2000 4000 8000 Number of Queries 26
VideoStorm: account for errors in query profiles • Errors in profile on resource demands • Over/under allocate resources → miss quality and lag goals! • Example: 3 copies of same query, should get same allocation • Profiled resource synthetically doubled, halved and unchanged • VideoStorm keeps track of mis-estimation factor # – multiplicative error between the profiled demand and actual usage Accurate Twice Half 45 (w/ adaptation) 45 45 40 (w/o adaptation) 2.0 (w/ adaptation) 40 40 35 1.5 CPU 35 35 30 CPU CPU ¹ 30 30 1.0 25 25 25 20 0.5 20 20 15 0.0 0 100 200 300 400 500 15 15 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 Time (seconds) Time (seconds) Time (seconds) Time (seconds) 27
Recommend
More recommend