EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring - PowerPoint PPT Presentation

EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring Sangeeta Kumari Dhrubojyoti Roy Mark Cartwright Ohio State University Ohio State University New York University Juan Pablo Bello Anish Arora New York University Ohio State University May 24, 2019 S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 1 / 26

Outline 1 Introduction 2 L 3 -Net 3 Approach 4 Results 5 Mote-scale Implementation 6 Python Package 7 Conclusion S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 2 / 26

Urban Noise Monitoring 70 million people across USA were exposed to noise levels beyond what the EPA considers harmful (2014) Credit: Getty Images S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 4 / 26

Urban Noise Monitoring 70 million people across USA were exposed to noise levels beyond what the EPA considers harmful (2014) In 2016, NYC’s 311 service line received an average of 48 noise complaints per hour Credit: Getty Images S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 4 / 26

Urban Noise Monitoring 70 million people across USA were exposed to noise levels beyond what the EPA considers harmful (2014) In 2016, NYC’s 311 service line received an average of 48 noise complaints per hour Limitations with 311 reporting Inaccurate information on all sources of disruptive noise Verification of authentic noise Credit: Getty Images complaints S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 4 / 26

SONYC Sounds of New York City (SONYC) aims at continuous monitoring, analysing, and mitigating urban noise pollution Figure 1: Acoustic sensing unit deployed on a New York City street S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 5 / 26

Machine Listening Goals Low-cost and battery/solar powered sensing S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 6 / 26

Machine Listening Goals Low-cost and battery/solar powered sensing Real-time multi-label noise classification Noise: traffic, sirens, construction, unnecessary honking, social noise etc. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 6 / 26

Machine Listening Goals Low-cost and battery/solar powered sensing Real-time multi-label noise classification Noise: traffic, sirens, construction, unnecessary honking, social noise etc. Address lack of annotated data S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 6 / 26

Machine Listening Goals Low-cost and battery/solar powered sensing Real-time multi-label noise classification Noise: traffic, sirens, construction, unnecessary honking, social noise etc. Address lack of annotated data Limited Flash (2 MB) and RAM (1 MB) on edge devices (ARM Cortex-M7) ‘ mote-scale ’ devices S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 6 / 26

Look, Listen, and Learn (L 3 -Net) L 3 -Net trains audio embedding by Correspond? (Yes / No) learning associations between audio Fusion layers snippets and video frames 1 Dense: 2 + SoftMax Dense: 128 + ReLU Concatenate Audio-Visual Correspondence (AVC) task Audio subnetwork Video subnetwork Max pool: (28,28) Max pool: (32,24) 8 Conv: 512 (3,3) + BN + ReLU Conv: 512 (3,3) + BN + ReLU 7 Conv: 512 (3,3) + BN + ReLU Conv: 512 (3,3) + BN + ReLU Max pool: (2,2) Max pool: (2,2) 6 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU 5 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU Max pool: (2,2) Max pool: (2,2) 4 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU 3 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU Max pool: (2,2) Max pool: (2,2) 2 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU 1 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU Batch Normalization Batch Normalization 1 s Mel-spectrogram Input Single image video frame Size: (256, 199, 1) Size: (224, 224, 3) Figure 2: Architecture of the L 3 -Net embedding models 1 Arandjelovic, Relja and Zisserman, Andrew. "Look, Listen and Learn". IEEE ICCV. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 8 / 26

Look, Listen, and Learn (L 3 -Net) L 3 -Net trains audio embedding by Correspond? (Yes / No) learning associations between audio Fusion layers snippets and video frames 1 Dense: 2 + SoftMax Dense: 128 + ReLU Concatenate Audio-Visual Correspondence (AVC) task Audio subnetwork Video subnetwork Max pool: (28,28) Max pool: (32,24) 8 Conv: 512 (3,3) + BN + ReLU Use audio embedding to train Conv: 512 (3,3) + BN + ReLU 7 Conv: 512 (3,3) + BN + ReLU Conv: 512 (3,3) + BN + ReLU Max pool: (2,2) downstream task (classifier for limited Max pool: (2,2) 6 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU 5 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU data) Max pool: (2,2) Max pool: (2,2) 4 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU 3 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU Max pool: (2,2) Max pool: (2,2) 2 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU 1 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU Batch Normalization Batch Normalization 1 s Mel-spectrogram Input Single image video frame Size: (256, 199, 1) Size: (224, 224, 3) Figure 2: Architecture of the L 3 -Net embedding models 1 Arandjelovic, Relja and Zisserman, Andrew. "Look, Listen and Learn". IEEE ICCV. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 8 / 26

Look, Listen, and Learn (L 3 -Net) L 3 -Net trains audio embedding by Correspond? (Yes / No) learning associations between audio Fusion layers snippets and video frames 1 Dense: 2 + SoftMax Dense: 128 + ReLU Concatenate Audio-Visual Correspondence (AVC) task Audio subnetwork Video subnetwork Max pool: (28,28) Max pool: (32,24) 8 Conv: 512 (3,3) + BN + ReLU Use audio embedding to train Conv: 512 (3,3) + BN + ReLU 7 Conv: 512 (3,3) + BN + ReLU Conv: 512 (3,3) + BN + ReLU Max pool: (2,2) downstream task (classifier for limited Max pool: (2,2) 6 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU 5 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU data) Max pool: (2,2) Max pool: (2,2) 4 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU 3 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU Downstream datasets: Max pool: (2,2) Max pool: (2,2) 2 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU US8K : 8732 audio clips divided into 10 1 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU cross-validation folds Batch Normalization Batch Normalization ESC-50 : 2000 clips divided into 5 folds Downstream Accuracy US8K: 75.91% | ESC-50: 73.65% 1 s Mel-spectrogram Input Single image video frame Size: (256, 199, 1) Size: (224, 224, 3) Figure 2: Architecture of the L 3 -Net embedding models 1 Arandjelovic, Relja and Zisserman, Andrew. "Look, Listen and Learn". IEEE ICCV. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 8 / 26

Look, Listen, and Learn (L 3 -Net) L 3 -Net trains audio embedding by Correspond? (Yes / No) learning associations between audio Fusion layers snippets and video frames 1 Dense: 2 + SoftMax Dense: 128 + ReLU Concatenate Audio-Visual Correspondence (AVC) task Audio subnetwork Video subnetwork Max pool: (28,28) Max pool: (32,24) 8 Conv: 512 (3,3) + BN + ReLU Use audio embedding to train Conv: 512 (3,3) + BN + ReLU 7 Conv: 512 (3,3) + BN + ReLU Conv: 512 (3,3) + BN + ReLU Max pool: (2,2) downstream task (classifier for limited Max pool: (2,2) 6 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU 5 Conv: 256 (3,3) + BN + ReLU Conv: 256 (3,3) + BN + ReLU data) Max pool: (2,2) Max pool: (2,2) 4 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU 3 Conv: 128 (3,3) + BN + ReLU Conv: 128 (3,3) + BN + ReLU Downstream datasets: Max pool: (2,2) Max pool: (2,2) 2 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU US8K : 8732 audio clips divided into 10 1 Conv: 64 (3,3) + BN + ReLU Conv: 64 (3,3) + BN + ReLU cross-validation folds Batch Normalization Batch Normalization ESC-50 : 2000 clips divided into 5 folds Downstream Accuracy US8K: 75.91% | ESC-50: 73.65% 1 s Mel-spectrogram Input Single image video frame L 3 -Net audio has 4,688,066 parameters Size: (256, 199, 1) Size: (224, 224, 3) Figure 2: Architecture of the L 3 -Net embedding models and is 18 MB 1 Arandjelovic, Relja and Zisserman, Andrew. "Look, Listen and Learn". IEEE ICCV. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 8 / 26

Non-sparse Audio Model Depth Reduction : conv8 has 2,359,808 params (50% of total) 2 Li, Hao et al. "Pruning Filters for Efficient ConvNets." ICLR. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 10 / 26

Non-sparse Audio Model Depth Reduction : conv8 has 2,359,808 params (50% of total) Embedding could be generated from penultimate layer or before 2 Li, Hao et al. "Pruning Filters for Efficient ConvNets." ICLR. 2017. S. Kumari, D. Roy et al. PAISE 2019 Workshop May 24, 2019 10 / 26

EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring - PowerPoint PPT Presentation

EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring Sangeeta Kumari Dhrubojyoti Roy Mark Cartwright Ohio State University Ohio State University New York University Juan Pablo Bello Anish Arora New York University Ohio

PUMA toolbox Philip Mote OCCRI.net Oregon State University Philip Mote CMIP3 CMIP5 NARCCAP

Lempel- -Ziv Ziv- -Welch (LZW) Welch (LZW) Lempel Data Compressing Model Data Compressing

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Kevan L. Main, Nicole R. Rhody, Matthew Resley, Michael J. Nystrom, Ron Hans & Ryan Schloesser

Southwest Anthony Henday Drive Noise Study 1 Introduction to Sound 2 Decibel Scale Noise is

Compressing Strings of the Kernel Wolfram Sang Consultant 21.8.2014, LinuxCon14 Wolfram Sang

NOISE AT WORK AWARENESS SESSION FOR WORKERS WHAT IS NOISE Noise is all around us at home,

Noise Barrier Meeting March 12, 2019 WHY ARE WE HERE TONIGHT? Noise Barrier Final Design Noise

Widening and Improvements Noise Review: Grant Road Hampton St to Santa Rita Rd January 13, 2016

Noise Programs & NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1

Proposition of a mechanism to divide a MANET network into subnetworks of given size What are we

Compositional Timing Analysis Ramzi Ben Salah Marius Bozga Oded Maler CNRS - VERIMAG Grenoble,

HotNet Background Determine significantly mutated subnetworks in a large gene interaction

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Slides prepared for

Extraction and Applications of Implicit Networks from Unstructured Text Andreas Spitz Heidelberg

Anomalies Detection for HEP Experiments Maxim Borisyak Denis Derkach, Fedor Ratnikov, Andrey

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

An End-to-End Infrastructure for Cyber-Physical Intrusion Detection REINHARD GENTZ, MAHDI JAMEI,

EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring - PowerPoint PPT Presentation

EdgeL 3 : Compressing L 3 -Net for Mote-Scale Urban Noise Monitoring Sangeeta Kumari Dhrubojyoti Roy Mark Cartwright Ohio State University Ohio State University New York University Juan Pablo Bello Anish Arora New York University Ohio

PUMA toolbox Philip Mote OCCRI.net Oregon State University Philip Mote CMIP3 CMIP5 NARCCAP

Lempel- -Ziv Ziv- -Welch (LZW) Welch (LZW) Lempel Data Compressing Model Data Compressing

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Kevan L. Main, Nicole R. Rhody, Matthew Resley, Michael J. Nystrom, Ron Hans &amp; Ryan Schloesser

Southwest Anthony Henday Drive Noise Study 1 Introduction to Sound 2 Decibel Scale Noise is

Compressing Strings of the Kernel Wolfram Sang Consultant 21.8.2014, LinuxCon14 Wolfram Sang

NOISE AT WORK AWARENESS SESSION FOR WORKERS WHAT IS NOISE Noise is all around us at home,

Noise Barrier Meeting March 12, 2019 WHY ARE WE HERE TONIGHT? Noise Barrier Final Design Noise

Widening and Improvements Noise Review: Grant Road Hampton St to Santa Rita Rd January 13, 2016

Noise Programs &amp; NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1

Proposition of a mechanism to divide a MANET network into subnetworks of given size What are we

Compositional Timing Analysis Ramzi Ben Salah Marius Bozga Oded Maler CNRS - VERIMAG Grenoble,

HotNet Background Determine significantly mutated subnetworks in a large gene interaction

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Slides prepared for

Extraction and Applications of Implicit Networks from Unstructured Text Andreas Spitz Heidelberg

Anomalies Detection for HEP Experiments Maxim Borisyak Denis Derkach, Fedor Ratnikov, Andrey

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

An End-to-End Infrastructure for Cyber-Physical Intrusion Detection REINHARD GENTZ, MAHDI JAMEI,

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Kevan L. Main, Nicole R. Rhody, Matthew Resley, Michael J. Nystrom, Ron Hans & Ryan Schloesser

Noise Programs & NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1