Over-the-air Audio Identification Arda Yalner FOSDEM '16 , Brussels - PowerPoint PPT Presentation

Over-the-air Audio Identification Arda Yalçıner FOSDEM '16 , Brussels Open Media Devroom

Speaker S o f t w a r e A r c h i t e c t @ O t o . n e t / I s t a n b u l B.Sc. Astronautical Eng. M.Sc. Software Eng. arda.yalciner@gmail.com wizardctp ardayalciner Yes Yes, a a pizza pizza lover! ver!

OTA Audio Identification Matching an audio sample with a pre-recorded sound clip ● Music track recognition ● Radio / TV station detection ● Licensing ● Second screen applications – Previously on <insert TV Show here> – Track watched movies / TV shows – Nearby concerts of playing artist – Information on a currently speaking movie / TV show character

Reference Architecture

Digital Sound Signals ● In nature, sound propagates as sound waves. ● We measure sound pressure at specific intervals. This interval is called sample rate. ● A sample rate of 44.1 kHz means, we measured the sound pressure 44100 times per second. ● These discrete signals represent sound in a digital form.

Digital Sound Signals

Digital Sound Signals ● Properties: – B i t d e p t h : # o f b i t s a s a m p l e o c c u p i e s – Channels: # of simultaneous recordings ( 1 : m o n o , 2 : s t e r e o , e t c . ) – Endianness: Big-endian vs. Little-endian ● File Formats: – Uncompressed: PCM, Wave – Compressed: ● L o s s l e s s : F L A C ● Lossy: MP3 , AAC , Ogg

Frequency Analysis ● Record or play audio signals in the time domain : SPL vs. Time ● Analyze audio signals in the frequency domain : Frequency vs. Amplitude vs. Time

Frequency Analysis: Spectrum ● Covers frequencies up to 0.5 * sample_rate [Hz] ● Divided into bins. Each bin represents the average amplitude for 0.5 * sample_rate / fft_points wide of frequencies

Frequency Analysis: Spectrogram ● Sensitive either in time dimension or frequency dimension: not both

Fingerprinting Problem: We need to uniquely summarize a part of an audio recording despite various challenges Approach Using: ● Music information retrieval ( MIR ) ● Acoustic fingerprinting

Fingerprinting: MIR “What can we retrieve?” More specific : – Musical features ( notes, chords, harmony, rhythm, … ) – Speech – Instruments – Melody: Query by Humming More abstract : – Time-frequency peaks

Fingerprinting: Challenges ● Noise – Duration : instantaneous / continuous – Frequency range : small / wide – Loudness : quiet / loud ● Echo ● Changes in tempo ● Changes in pitch ● Attenuation or boost in certain frequencies ( e.g., Equalization )

Fingerprinting: Time-Frequency Peaks ● Divide the spectrum into N equal areas (e.g., 16 parts) ● For each area, find the frequency bin that provides the peak amplitude

Fingerprinting: Packing FFT Points P = 1024 # of Areas N = 16 We can represent 5513 using a 16-bits integer. 16 of them occupies 256-bits (32 bytes). # of Bins / Area 0.5 * P / N = 32 Sample Rate SR = 11025 However, we can represent 32 with 5-bits. Max. Frequency SR / 2 = 5513 It is possible to store them in 80-bits (10 bytes). i 0 1 2 3 4 5 ... ... 14 15 F 269 495 753 1270 1431 2045 ... ... 4876 5285 b 25 14 6 22 5 30 ... ... 5 11

Fingerprinting: Hashing 11 12 7 8x frequency 5 9 6 3 bin offsets 30 4 32 5 22 (3) Generate (1) Select combination an area 6 6 (2) Find vectors 1-vertical; 14 2-horizontal 25 neighboring areas ~21.53 ms 120607 090607 040607 120603 090603 040603 120632 090632 040632

Fingerprinting: Key Choices S e l e c t i o n o f a u d i o i n f o r m a t i o n – S h o u l d b e r o b u s t – Should be as unique as possible The FFT algorithm – Managing losses due to the uncertainty principle ● T i m e - r e s o l u t i o n = 1 / F r e q u e n c y - r e s o l u t i o n – Discrete-time FT or Short-time FT – # of FFT points

Static Database

Streaming Database

Streaming Database In YYYYMMDDHHAB format Stream name Timestamp A: {0, 1, 2, 3, 4, 5} → High minute B: {0, 2, 4, 6, 8} → Low minute FOSDEM / 201601301648.fingerprint Content : T = YYYYMMDDHHAB file contains fingerprints from the moment T to T + 4 minutes Reading : At t = YYYYMMDDHHAB moment, the file corresponding to the T = t – 2 – (B & 1) timestamp will be opened. Writing : At t = YYYYMMDDHHAB moment, files corresponding to T1 = t – 2 – (B & 1) T2 = T1 + 2 timestamps will be written.

Identification Find the best matching fingerprint, if there is any Strategy – Reduce the search space by elimination – Rank candidates by detailed comparison Outcomes – True positive: We found the correct match – True negative: We found a correct non-match – False negative: We couldn't find the correct match – False positive: We found an incorrect match

Identification: Elimination ● For each hash, try to f i n d e x a c t m a t c h e s . ● For each matching hash, calculate the time difference . ● Create a histogram for time difference vs. match count. ● Eliminate candidates where the best histogram score is less than a predefined value.

Identification: Ranking 9 4 0 7 9 2 6 4 9 5 Shift the window 1 7 Spectrum score: 3 0 Window score: 106 9 8 4

Testing & Optimization ● Mix samples with: – White noise of varying volumes – Pre-recorded noise ● Record samples under different acoustic conditions ● Make the configuration dynamic and use a machine learning algorithm to select the best configuration

THANKS! More will be at: g i t h u b . c o m / w i z a r d / f o s d e m 2 0 1 6 ● Links to open-source software ● Source code for everything we talked about ● Markdown documentation for this presentation ● Dockerfile

References F O S D E M i c o n : https://fosdem.org/2016/ ● Email icon: https://thenounproject.com/term/mail-with-at-sign/71812/ ● FFmpeg: https://www.ffmpeg.org/ ● SoX: http://sox.sourceforge.net/ ● Sonic Visualizer: http://www.sonicvisualiser.org/ ● Audacity: http://audacityteam.org/ ● PostgreSQL: http://www.postgresql.org/ ● Redis: http://redis.io/ ● Solr: http://lucene.apache.org/solr/ ●

Over-the-air Audio Identification Arda Yalner FOSDEM '16 , Brussels - PowerPoint PPT Presentation

Over-the-air Audio Identification Arda Yalner FOSDEM '16 , Brussels Open Media Devroom Speaker S o f t w a r e A r c h i t e c t @ O t o . n e t / I s t a n b u l B.Sc. Astronautical Eng. M.Sc. Software Eng. arda.yalciner@gmail.com

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

#AIR AIR EXPRESS SELECTION AIR SOLUTION 4 YOU AIR EXPRESS SELECTION MOBILE DUST EXTRACTORS

Air Air Car Cargo go in IL in IL & the S & the South outh Suburban Suburban Air

What are we breathing? Clean air healthier cities Air Quality research by the Clean Air and

Local Air Pollution Modelling, Local Air Pollution Modelling, AIM/Air AIM/Air Takeshi Fujiwara

AIR ASTANA JSC AIR ASTANA JSC 30 September 2011 SHAREHOLDERS OF AIR ASTANA SHAREHOLDERS OF AIR

ARREL AUDIO ML-118 Mid-Side Unit Livio Argentini, Marco Re ARREL AUDIO Rome Via Arnoldo

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

CobraNet CobraNet Audio Network Audio Network Overview Overview Developed by Peak Audio

CS378 - Mobile Computing Audio Android Audio Use the MediaPlayer class Common Audio

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

8. Audio databases About digital audio: Advent of digital audio CD in 1983. Order of

If youre using a Mac, follow these commands to prepare your computer to run these demos (and

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

Audio declipping Matthieu Kowalski Univ Paris-Sud L2S (GPI) Matthieu Kowalski Audio declipping

UTILIZING ZAPTION AS SCAFFOLDING FOR A FLIPPED CLASS OF INTEGRATED SKILLS Le Thi Hong Phuc

Misusing the Type System for & Ian Dees @undees PNSQC 2015 Brewing for

Dot-product: Linear equations Example: A sensor node consist of hardware components, e.g. I CPU I

Multimedia Mobile Application Development in iOS School of EECS Washington State University

Ubiquitous and Mobile Computing CS 528: Unsupervised Speaker Counter with Smartphones Xuanyu Li

Over-the-air Audio Identification Arda Yalner FOSDEM '16 , Brussels - PowerPoint PPT Presentation

Over-the-air Audio Identification Arda Yalner FOSDEM '16 , Brussels Open Media Devroom Speaker S o f t w a r e A r c h i t e c t @ O t o . n e t / I s t a n b u l B.Sc. Astronautical Eng. M.Sc. Software Eng. arda.yalciner@gmail.com

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

#AIR AIR EXPRESS SELECTION AIR SOLUTION 4 YOU AIR EXPRESS SELECTION MOBILE DUST EXTRACTORS

Air Air Car Cargo go in IL in IL &amp; the S &amp; the South outh Suburban Suburban Air

What are we breathing? Clean air healthier cities Air Quality research by the Clean Air and

Local Air Pollution Modelling, Local Air Pollution Modelling, AIM/Air AIM/Air Takeshi Fujiwara

AIR ASTANA JSC AIR ASTANA JSC 30 September 2011 SHAREHOLDERS OF AIR ASTANA SHAREHOLDERS OF AIR

ARREL AUDIO ML-118 Mid-Side Unit Livio Argentini, Marco Re ARREL AUDIO Rome Via Arnoldo

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

CobraNet CobraNet Audio Network Audio Network Overview Overview Developed by Peak Audio

CS378 - Mobile Computing Audio Android Audio Use the MediaPlayer class Common Audio

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

8. Audio databases About digital audio: Advent of digital audio CD in 1983. Order of

If youre using a Mac, follow these commands to prepare your computer to run these demos (and

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &amp;

Audio declipping Matthieu Kowalski Univ Paris-Sud L2S (GPI) Matthieu Kowalski Audio declipping

UTILIZING ZAPTION AS SCAFFOLDING FOR A FLIPPED CLASS OF INTEGRATED SKILLS Le Thi Hong Phuc

Misusing the Type System for &amp; Ian Dees @undees PNSQC 2015 Brewing for

Dot-product: Linear equations Example: A sensor node consist of hardware components, e.g. I CPU I

Multimedia Mobile Application Development in iOS School of EECS Washington State University

Ubiquitous and Mobile Computing CS 528: Unsupervised Speaker Counter with Smartphones Xuanyu Li

Air Air Car Cargo go in IL in IL & the S & the South outh Suburban Suburban Air

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

Misusing the Type System for & Ian Dees @undees PNSQC 2015 Brewing for