Beyond detection: GANs and LSTMs to pay attention at human presence - PowerPoint PPT Presentation

Talk @Munich October 11, 2017 Beyond detection: GANs and LSTMs to pay attention at human presence Ri Rita Cucchia iara Imag Imagelab, , Di Dipartimento di di Ing Ingegneria «E «Enzo Ferrari» University of Modena e Reggio Emilia, Italy

Agenda Beyond Human detection: 1) See humans 2) See what humans see Use of GANs, Iterative and Recurrent neural architectures in Vision

Beyond (People) detection ✓ 10 10 year ears pe pedestr trian de detectio ion [S. Zhang, R. Benenson, M. Omran, J. Hosang, B. Schiele CVPR2016] , about 70% accuracy on Caltech ✓ Many dee ** FAST CFM [ Hu, Wang, Shen, van den Hengel, Porikli IEEE TCSVT 2017] deep ne netw tworks for or pe pedestr trian de detectio ion : CNNS+ handcraft feature: 9% miss rate on Caltech reasonable dataset * ✓ Ob Object de detector or: SSD**, YOLO, YOLOv2***.. YOLOv2 78.6% mAP on VOC2007-12 at 40fps **SSD [W.Liu at al SSD Single Shot MultiBox Detector 2017] Still a margin of improvement.. ***YOLOv2 [Redmon Farhadi ArXiv 2017]

Standard networks are not enough. Embedded vision solutions with bckg sub and CNNs CHALLENGES IN NEW ENVIRONMENTS Real-time detection of people and AGVs in working areas on embedded NVIDIA boards at Imagelab

If People Detection Solved.. thanks to PANASONIC GANS FOR UNDERSTANDING HUMAN PRESENCE UNDER EXTREME CONDITIONS (thanks to Matteo Fabbri, and Simone Calderara)

Male Attribute Classification Jacket Black hair Backpack Object Plastic Long Trousers bag Now CNNs can classify more than 50 attributes Low-Resolution Problems with • Low resolution • Occlusions and self-occlusions Occlusion

Generative Adversarial Networks “..a generative model G captures the data distribution, ..a discriminative model D estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the CNN probability of D making a mistake” Discriminator [I.Goodfellow.. Y.Bengio 2014] CNN Generator Noise A conditional l gen enerative model p(x | c) can be obtained by adding c as input to both G and D Low Resolution Incomplete

With a GAN from Noise.. RAP RAP: A Richly Annotated Dataset for Pedestrian Attribute Recognition [http://rap.idealtest.org/] Dataset dimension: - 41,585 pedestrian samples - 33,268 for training - 8,317 for testing Fabbri, Calderara, Cucchiara Dataset image resolution: Generative Adversarial Models for People - from 36x92 to 344x554 Attribute Recognition in Surveillance Generated Dataset IEEE AVSS 2017

Generative Adversarial Network for De-occlusion (or Super-Resolution) RAP RAP occRAP Generator Compare (SSE) occRAP by Imagelab De-occluded Original image Encoder Decoder Occluded (Fake) (Real) Discriminator Cross Entropy lowRAP by Imagelab

Selected Architecture (3x160x64) (128x160x64) (3x160x64) Decoder Encoder (256x80x32) (256x80x32) (256x40x16) (256x40x16) (512x20x8) (512x20x8) (1024x10x4) Generator SConv4 TConv1 5x5 SConv3 5x5 Stride2 TConv2 5x5 Upsample2 5x5 Stride2 Upsample2 SConv2 5x5 Stride2 TConv3 5x5 SConv1 Upsample2 5x5 Stride2 TConv4 5x5 Batch-Norm Leaky-ReLU Upsample2 Batch-Norm Conv SConv2 5x5 TConv3 Input ReLU Output (3x160x64) (128x80x32) (256x40x16) (512x20x8) (1024x10x4) Discriminator (1x1x1) SConv4 Classification 5x5 SConv3 Stride2 5x5 (fake or real) Stride2 SConv2 5x5 Stride2 SConv1 5x5 Stride2 SConv2 Input image Batch- Leaky- Norm ReLU (fake or real)

RESULTS De-occlusion Super Resolution

The Complete Approach: De-occlusion and Super-resolution For Aspect Recognition Attribute Class lassifi fication Network de details batch size: 8 GPU: 1080ti Training time: 24 hours Rec econstruction GAN AN (f (for or de deocclusi sion) ) de details batch size: 256 GPU: 1080ti Training time: 48 hours Sup Super Res esolution GAN AN (f (for or im imag age res esolution) ) de details batch size: 128 GPU: 1080ti Training time: 72 hours

Attribute classification ✓ More than 75% of precision and recall for 50 people attributes on RAP ✓ Acceptable results for occluded shapes and good for low resolution shapes

If People Detection still not solved.. ..without detection EU-ER- FESR 2015-2018 TRACKING HUMANS IN THE WILD BY JUNCTIONS WITH CMP (thanks to Fabio Lanzi, and Simone Calderara)

State-of of-the the-art: Recurrent Nets for object tracking For lon long-term-trackin ing -YOLO network for detection (fine tuned on PascalVOC) - NVIDIA GTX1080 GPU 45fps (python TensorFlow) - -70fps with precomputed YOLO features Recu ecurrence is is p provided by an LS LSTM From [D.Zhang, H,Maei,X.Wang, Y-F. Wang Samsung, UCSB ArXiv 2017] Very fast. Still very low accuracy..

Recurrence with CPM ✓ CP CPM Con Convolutional l Pos ose e Machines*: a sequence of Convolutional nets that repeatedly produce 2D belief maps for the location of interesting parts (human junctions) ✓ Belief map is a non-parametric encoding of the spatial uncertainly of location. ✓ CPM learns implicit relationships between parts ✓ it is not recurrent but t a mult lti-stage network, trained with backpropagation *[S-E Wei, V.Ramakrishna,T.Kanade,Y.Sheickh «Convolutional Pose Machines»CVPR 2016]

Without detection: Temporal CPM3 Imagelab: tracking multiple body parts with T-CP CPM (T (Tem emporal Co Convolutional l Pos ose e Mach chin ines). An iterative network (CPM) for predicting: • the position of joints (H) • their mutual association in space (P) • their association in time (T)

Three Branches: : Heatmaps, , PAFs and and TAFs • Heatmap models the part locations as gaussian peaks in the map; 1 for each joint (“nose”, “neck”, “lest -sholder ”.) • PAFs: (P (Part Affin inity Fi Field) to assemble the detected joints. The score of a candidate limb is proportional to the alignment with the PAF associated with that type of limb. • TAFs:(Temporal Affin init ity Fi Field ld) to link the corresponding joints of the same person in consecutive frames (for an unknown number of people). “ left knee ” joint PAF vector connecting two nodes TAF vector connecting the same node in the time

Visual Example

How to provide initial annotation? ScriptHook library • • Access to native GTA Photorealistic • functions Plausible dynamics • • Lifelike entity AI Customizable • Extract all the information available to the game engine

T-CPM 3 In Action On Tracking People In The Wild The Deep architecture and the software is propriety of ImageLab UNIMORE, We thanks Jum Jump pr project fu funded with th EU EU ER ER-FESR-2015 2015-2020 pr program

For Tracking, Action, Behavior Recognition T-CPMs do not use recurrence but works on sequences of frames and refines s wit ith it iteration with th lon ong convolu lutional layers -Problems of vanishing gradient -Long-Short Term Memory architectures can give solution for time iterations, but not for long time sequences

If target Detection is not required.. SALIENCY DETECTION WITH LSTMS SAM ARCHITECTURE (thanks to Marcella Cornia, Giuseppe Serra and Lorenzo Baraldi)

SALIENCY DETECTION @Imagelab SAM ✓ SAM MIT300 (Itti, Torralba et al) more than 70 competitors since 2014 SALICON (Jiang et al 2015), 10000 images; Saliency Attentive Model (SAM): Marcella Cornia, L. Baraldi, G. Serra, and ML-NET+ LSTMs R. Cucchiara. A Deep Multi-Level Network for Saliency Prediction , ICPR CPR 2016

Number of of im images: 20.000 10.000 training images 5.000 validation images 5.000 test images GPU: NVIDIA K80 on Supercomputer GALILEO CINECA Training Tim ime: ~15 hours Winner of the competition LSUN Challenge CVPR 2017

Groundtruth Actions in the Eye (Hollywood2) dataset SAM

Saliency in task-driven video Bottom-up saliency, detected by ML-NET, trained on SALICON on DR(EYE)VE dataset http://imagelab.ing.unimore.it/dreyeve Saliency not driven by a task.. Saliency trained by driving as a passenger sees as a driver sees

SIFT-BASED REGISTRATION FRAME BY FRAME Collected with SMI ETG 2w, Frontal camera 720p/30fps + Eye pupils cameras at 60fps GARMIN VirvX , 1080 p/25fps +GPS.

Some conclusion (if any) ✓ Computer vision now is a Deep Learning based discipline ✓ Computer vision systems cannot be build without GPUs (both in training and at run-time) “ ✓ Conv-Nets are fundamental bricks to new architectures ✓ Autoencoders: for image generation ✓ (Conditional) Generative Adversarial Networks: for low-resolution occluded attribute recognition ✓ Multi- layers convolutional networks for emulating recurrency as T-CPM3 for tracking ✓ Recurrent and Long Short Term Memories for short time analysis: saliency and video captioning ✓ … Computer Vision Deep Architetcures GPUs GPUs

Th Thank you! rit rita.cucchiara@unim imore.it it http://imagela lab.in ing.unimore.it it Acknowle ledgements Thank to to

Beyond detection: GANs and LSTMs to pay attention at human presence - PowerPoint PPT Presentation

Talk @Munich October 11, 2017 Beyond detection: GANs and LSTMs to pay attention at human presence Ri Rita Cucchia iara Imag Imagelab, , Di Dipartimento di di Ing Ingegneria E Enzo Ferrari University of Modena e Reggio Emilia,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Perimeter Intrusion Detection Mikro Tek Detection Technologies Ltd | +44 (0) 1773 744750 |

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Local features: detection and description detection and description Kristen Grauman UT Austin

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Human Development Report 2019 Beyond income, beyond averages, beyond today: Inequalities in human

Pipeline leak detection eLearning Part 1 of 2 Please turn on your speakers Historical

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

People-Tracking-by-Detection and People-Detection-by-Tracking Mykhaylo Andriluka Stefan Roth

Styles of Intrusion Detection Misuse intrusion detection Try to detect things known to be

Collision Detection That Collision Detection That Collision Detection That Really Works Really

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

Cohesive Particle Model milauer using the Discrete Element Method DEM on the Yade platform

NCASI Air Quality Program Lee Carlson NCASI Arkansas Environmental Federation Air Seminar - May

Certified Company EN 9100:2018 / UNI EN 9100:2018 COMPANY The company OFFICINE C.P.M srl was

Vanguard Engineering Vibration M onitoring Presentation Outline 1. Current Program ( 8

WHY PACKAGE? CHANNEL SHIFT MOVING OFF-PREMISE This most likely is the loyal craft consumer

Basel 3: A new perspective on portfolio risk management Tamar JOULIA-PARIS October 2011 1

Silver Leverage Near Term Production March 2020 Forward-Looking Information Cautionary Statement

Breakout Session: CPM Scheduling The Full Potential of the CPM Scheduling Process Brandon Howell