Spatula: Efficient cross-camera video analytics on large camera - PowerPoint PPT Presentation

Spatula: Efficient cross-camera video analytics on large camera networks Xun Zhang Samvit Jain (UC Berkeley) Xun Zhang (Univ of Chicago) Yuhao Zhou (Univ of Chicago) Ganesh Ananthanarayanan (Microsoft Research) Junchen Jiang (Univ of Chicago) Yuanchao Shu (Microsoft Research) Victor Bahl (Microsoft Research) Joseph Gonzalez (UC Berkeley)

�� Computer Vision is improving Advances in computer vision Image – classification, object detection - Video – action recognition, object tracking - Rise of large video analytics operations London – 12,000 cameras on rapid transit system - Chicago – 30,000 cameras across city - Paris – 1,500 cameras in public hospitals -

�� CV is a powerful tool BUT It is challenging to scale it to proliferating large camera deployments . Huge Cost of current Computer Vision task on large camera deployments For Chicago Public Schools, 7000 security cameras installed as a counter to crimes. $28 million in GPU hardware (at $4,000 / GPU) - $1 million/month in GPU cloud time (at $0.9 / GPU hour) -

�� Problem statement Given: instance of query identity Q - Return: all later frames in which Q appears - Application space �� !��-�� Many applications rely crucially on cross-camera video analytics Real-time search: Track threat (e.g. AMBER alert) - Post-facto search: Investigate crime (e.g. terrorist attack) - Trajectory analysis: Learn customer behavior -

�� When it comes to large camera deployments. Challenges: High compute cost and low inference accuracy How to go?

�� Prior work falls short of addressing this challenge. Methods in recent systems to reduce cost: Frame sampling - Cascade filter for discarding frames. - However J ust cost/accuracy tradeoffs Optimization of one video stream is independent of other streams. Compute/network cost grows with the number of cameras, and with the duration of the identity’s presence in the camera network.

�� Cam1 → Cam2 0.89 means 89% of all traffic leaving Camera 1 first appears at Challenges: High compute cost and low inference accuracy Camera 2 Geographical proximity is not a good filter, eg. Cam 5 0.48 0.95 Learning these patterns 0.56 0.45 0.52 in a data-driven 0.37 0.11 fashion is a more 0.11 robust approach! 0.26 0.38 0.11 0.44 0.34 0.62 0.49 0.89 0.33

�� The velocity of the object is within a certain range. The travel times between cameras can be clustered around a mean value. For objects which leave from camera 1 and next appear at camera2, the travel times are likely clustered around a mean value 66. In the DukeMTMC dataset, the average travel time between all camera pairs is 44.2s , and the standard deviation is only 10.3s (or only 23% of the mean)

�� Spatula Applications Cross-camera identity tracking (§5.2,5.3) Multi-camera identity detection (§5.4) Challenges: High compute cost and low inference accuracy Methods: Using physical correlations Spatio-temporal Real-time Replay analysis Spatula model (§5.1) inference (§5.5) to prune the search space Shared functions Model profiling (§6) Spatio-temporal model - Cameras & underlying … compute resources Replay analysis - Multi-camera identity detection -

�� Definition of spatial correlation '(" # , " % ) : the number of individuals leaving the source '(" # , " % ) camera " # ’s stream ! " # , " % = Σ + '(" # , " + ) for the destination camera " % Definition of temporal correlation = '(" # , " % , - . , - / ) '(" # , " % , - . , - / ) : individuals reaching " % from " # within a , " # , " % , - . , - / duration window - . , - / '(" # , " % ) Spatio-temporal model 2344 = 51, ! " # , " % ≥ 8 9:4;#: <'= , " # , " % , 1 > , 1 ≤ 1 − - 9:4;#: 2344 0 " # , " % , 1 0, B-ℎDEFG8D 1 > is the frame index at which the first historical arrival at " % from " # was recorded.

�� Frequency M (Cq, C1, 10sec) = 1 C1: f 0 f curr t 10 0 M (Cq, C2, 20sec) = 1 C2: f 0 f curr t 20 10 C3: M (Cq, C3, f curr ) = 0 t (a) Spatio-temporal correlations (

�� Current camera Next camera to search Camera skipped by RexCam Spatula C1 C1 Cq C2 Cq C2 C3 C3 [ t 1 , t 2 ] = [0, 10]sec [ t 1 , t 2 ] = [10, 20]sec ons (b) Pruned search based on spatio- temporal model

�� Dataset: AnonCampus, DukeMTMC, Porto, Beijing Metrics: Compute cost, Network cost, Recall, Precision, Delay Baseline: Baseline-all: Searches for query - identity q in all the cameras at every frame step. Baseline (GP): Searches for - query identity q only in the cameras that are in geographical proximity to the query camera at every frame step. AnonCampus Dataset, we developed 5 cameras at Uchicago, JCL.

�� Results for different versions of spatula and baseline. For spatula, each version is coded as Ss-Tt, where s indicates the spatial filtering threshold and t indicates the temporal filtering threshold.

�� Cost savings and precision of Spatula with increasing number of cameras

�� Highlight results about spatula on 4 datasets. Dataset Comp.sav. Netw.sav. Prec. Recall 21.3% ↑ 2.2% ↓ AnonCampus 3.4x 3.0x 39.3% ↑ 1.6% ↓ DukeMTMC 8.3x 5.5x 36.2% ↑ 6.5% ↓ Porto 22.7x n/a Beijing 85.5x n/a 45.5% ↑ 7.3% ↓

�� Problem: cross-camera analytics is data and compute intensive Our Approach: computation can be drastically reduced by exploiting the spatio-temporal correlations Key results: spatula reduces compute load by 8.3x on an 8-camera dataset, and by 23x - 86x on two datasets with hundreds of cameras

Spatula: Efficient cross-camera video analytics on large camera networks Xun Zhang Samvit Jain (UC Berkeley) Xun Zhang (Univ of Chicago) Yuhao Zhou (Univ of Chicago) Ganesh Ananthanarayanan (Microsoft Research) Junchen Jiang (Univ of Chicago) Yuanchao Shu (Microsoft Research) Victor Bahl (Microsoft Research) Joseph Gonzalez (UC Berkeley)

Spatula: Efficient cross-camera video analytics on large camera networks Thanks!

Spatula: Efficient cross-camera video analytics on large camera - PowerPoint PPT Presentation

Spatula: Efficient cross-camera video analytics on large camera networks Xun Zhang Samvit Jain (UC Berkeley) Xun Zhang (Univ of Chicago) Yuhao Zhou (Univ of Chicago) Ganesh Ananthanarayanan (Microsoft Research) Junchen Jiang (Univ of Chicago)

# Camera camera = Camera.open(); Camera camera

Camera camera = Camera.open();

# Camera camera = Camera.open();

holder.addCallback(this); holder.setType(SurfaceHolder.STP); MediaRecorder r = new

Multi-view geometry Slides from L. Lazebnik Structure from motion Camera 1 Camera 3 Camera 2 R

Basic AXIS Camera Functionality http://camera/mjpg/video.mjpg?options... client

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Digital cameras and Imaging! CSE467 1 Today: Pinhole camera Film camera Digital

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

Camera Rongkai Guo Why Camera First? Games have their own visual rules Contrary to other

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

. Live Your Vision Edge Analytics Appliance Sonys First AI-Based Video Analytics Solution

Echossential or Echorrhoea? Idle thoughts on some more advanced echocardiographic techniques Dave

Networks and large scale optimization Open Data Science Conference Boston, May 2018 Sam Safavi

Bounds on the largest families of subsets with forbidden subposets Gyula O.H. Katona R enyi

Treating Functional MR ? Lyon Cardiothoracic and Vascular Surgery Department Hpital Louis

Intro to Camera Models Photography: scene (captures processing print light) processing

AIRS Near-Realtime Retrievals in support of the TEXas Air Quality Study (TEXAQS) II Gulf of

Flavored-mass terms for naive and staggered fermions Tatsuhiro MISUMI YITP/BNL M. Creutz, T.

The Father- IS LOVE Son and Spirit- As we privately and publicly grow closer to the Father,

Spatula: Efficient cross-camera video analytics on large camera - PowerPoint PPT Presentation

Spatula: Efficient cross-camera video analytics on large camera networks Xun Zhang Samvit Jain (UC Berkeley) Xun Zhang (Univ of Chicago) Yuhao Zhou (Univ of Chicago) Ganesh Ananthanarayanan (Microsoft Research) Junchen Jiang (Univ of Chicago)

# Camera camera = Camera.open(); Camera camera

Camera camera = Camera.open();

# Camera camera = Camera.open();

holder.addCallback(this); holder.setType(SurfaceHolder.STP); MediaRecorder r = new

Multi-view geometry Slides from L. Lazebnik Structure from motion Camera 1 Camera 3 Camera 2 R

Basic AXIS Camera Functionality http://camera/mjpg/video.mjpg?options... client

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Digital cameras and Imaging! CSE467 1 Today: Pinhole camera Film camera Digital

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it &amp; why do we use it? *

Camera Rongkai Guo Why Camera First? Games have their own visual rules Contrary to other

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Humanoid Robotics Camera Parameters Maren Bennewitz What is Camera Calibration? A camera

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

. Live Your Vision Edge Analytics Appliance Sonys First AI-Based Video Analytics Solution

Echossential or Echorrhoea? Idle thoughts on some more advanced echocardiographic techniques Dave

Networks and large scale optimization Open Data Science Conference Boston, May 2018 Sam Safavi

Bounds on the largest families of subsets with forbidden subposets Gyula O.H. Katona R enyi

Treating Functional MR ? Lyon Cardiothoracic and Vascular Surgery Department Hpital Louis

Intro to Camera Models Photography: scene (captures processing print light) processing

AIRS Near-Realtime Retrievals in support of the TEXas Air Quality Study (TEXAQS) II Gulf of

Flavored-mass terms for naive and staggered fermions Tatsuhiro MISUMI YITP/BNL M. Creutz, T.

The Father- IS LOVE Son and Spirit- As we privately and publicly grow closer to the Father,

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *