Hardware Model and Software Validation for AcoustiGLASS (Autonomous Wearable Alert Device based on Sound Pattern Recognition) Kei Kojima March 2016
OBJECTIVE & DESIGN CRITERIA The objective is to build an autonomous ‘hearing glass’ prototype that performs a real-time object identification through a robust audio pattern recognition algorithm. • ! The ‘hearing glass’ is wearable in physical size. ! • The system is autonomous and performs the ! microphones RGB leds necessary tasks without human interactions. ! • The time response of the system is fast enough, typically less than 0.5 seconds, in order to timely notify the user of the alarming sound. ! • The system is capable of discerning an approximate orientation, i.e., from left-hand side or right-hand side, of the origin of the sound.
‘ HEARING ’ GLASS PROTOTYPE • Two RGB LEDs are placed on the top right and left outer corners of the glass to maintain clear vision for the user. ! • Each RGB LED pin is connected to a specific GPIO pins of a micro-computer. ! • Two Stereo microphones (SONY) are mounted on the glass and are placed by the user’s ears for sound localization ! • USB audio adapters allow microphones to convey the audio signals to Raspberry Pi. ! • It is powered by an external battery which feeds 5V into the Raspberry Pi’s power port for >8 hours of operation. ! microphones Raspberry Pi 2 RGB leds
METHODS Audio Spectrogram Analysis • A Buffer block overlaps the raw sound waves. ! • A Periodogram block estimates the PSD (Power Spectral Density) of the signal through a fast Fourier transform (FFT). ! • A second buffer block constructs the sound data into multi-dimensional spectrogram arrays. ! • The spectrogram is normalized by taking the mean value and dividing it by the max value of the signal. Police Reference Spectrogram Two-Dimensional Cross-Correlation • Reference spectrograms are created from prerecorded audio or from wave files from sound libraries. ! ! After the spectrogram of the recorded audio has been constructed… ! • Pre-recorded reference spectrograms are cross-correlated two-dimensionally with the incoming spectrogram through 2-D convolution . ! • The venctor mean of the cross-correlations are computed.
METHODS, CONTINUED Peak Detection • The absolute value of the cross- correlation is taken flipping all negative numbers to their positive Bluejay to Bluejay ! 2-D cross- correlation counter parts. ! • Utilizing the Matlab function findpeaks, the peaks in the mean cross-correlations are detected and their prominence to other peaks in the correlation are computed. ! • The cross-correlation results are examined to determine the best thresholds for the heights, location, and prominence. Sound Object Visualization (simulation) When the cross-correlation result meets all the preset thresholds, the code sends a command to display a LED light pattern or text alert with a graphic icon for the specific sound object recognized.
SOUND OBJECT RECOGNITION ! ALGORITHM (S IMULATION ON M AC )
SOUND LOCALIZATION ! ALGORITHM (D EPLOYED TO A H ARDWARE )
SOUND LOCALIZATION TEST Right � 10 - 4 Steps to compute signal envelope • Two microphones record sound independently, approximately 7 inches apart. ! • The audio signal is squared and then down sampled via. FIR (Finite Impulse Response) decimation. ! • The down-sampled signals Left � 10 - 3 undergo low pass filter to eliminate high-frequency components. ! • A buffer block constructs the sound data into multi- dimensional arrays (1x64) and a mean function takes the average of each array to increase the stability and accuracy of the sound localization.
SOUND LOCALIZATION DATA PROCESSING Envelope of Audio Wave (RIGHT) • The signals will not be processed unless their amplitude exceeds a preset threshold in order to keep background noise from setting off the LEDs (i.e., false alert). ! 0.045 • The magnitudes of the signal envelop are compared to each other 0.100 and the signal with the higher magnitude indicates the orientation Envelope of Audio Wave (Left) of the sound origin. ! • One of the limitations of the system is a lack of ability to discriminate exact orientation. Just as human brain uses the difference in incoming sound volume with two ears, the present electronics detect the minute difference in sound volume with the two microphones for localization.
REFERENCE SPECTROGRAMS B LUEJAY C ALL ! P OLICE S IREN ! G UNSHOT ! Has distinct and discrete Mechanical sounds makes the Has an short abrupt frequency detection relatively. ! frequency peaks between two ranging from 0 to 64 kHz Has three distinct frequencies distinct frequencies between 1 and 3 kHz. approximately 10 and 20 kHz
BASIC OF 2-D CROSS CORRELATION Values of M2 ! matrix 2-D cross correlation computes element-by- element products and then sums them. 2-D cross-correlation result 3 by 3 matrix 34 201 286 121 106 167 60 165 470 329 244 334 299 109 Values of M1 ! matrix 271 359 405 570 585 479 256 Alignment of center ! 186 229 550 615 730 409 206 element of M2 116 309 595 760 575 349 221 137 263 504 434 339 222 51 66 119 256 181 256 25 72 5 by 5 matrix (5+3-1)-by-(5+3-1) or 7-by-7 matrix 1 * 8 + 7 * 3 + 13 * 4 + 8 * 1 + 14 * 5 + 20 * 9 + 15 * 6 + 16 * 7 + 22 * 2 = 585
2-D CROSS CORRELATION IMAGE Dog Growl (input) vs. Dog Growl (reference) Strong correlation : ! - Single, symmetric peak. ! - Higher peak height. ! - Peak location is at the center line. ! ! ☑ Sound category identified ! Dog Growl (input) vs. Bird Weak correlation : ! chirp (reference) - Multiple, asymmetric peaks. ! - Lower peak height. ! - Peak location(s) is off center. ! ! ☐ Sound category rejected ! ! ! -
PEAK DETECTION Averaged Police to Police cross-correlations(abs value) Averaged Police to Dog cross-correlations(abs value) max peak/ max peak/ second ! second peak = ! peak ! 2.0605 2.582 peak location = 101 peak location = 105.1 • The height of the max peak was 172.1 and • The height of the max peak is 159.4 and its its location is 67 on the x axis. This is not in location is 64 on the x axis(64 is the exact between the thresholds of 62 - 66. ! middle in a graph that contains 127 • The ratio of the max peak to the second columns) and it is between 62 - 66. ! • The ratio of the max peak to the second highest peak was 2.0605. This is not in between the thresholds of 2.4 - 2.6. ! highest peak was 2.582, which is between • The location of the third peak was 101. This 2.4 - 2.6. ! • The location of the third peak is 105.1, is not between the thresholds of >102 and <110. which is between >102 and <110.
TRUE/FALSE TEST Simulation Test 2 ! Simulation Test 1: ! ( improved peak detection scheme/thresholds ): ! - Audio input: 22.05 kHz, 1 sec. ! - Audio input: 22.05 kHz, 1 sec. ! - FFT span: 128 samples ! - FFT span: 128 samples ! - Peak prominence 2.1 to 2.7 ! - Peak prominence 2.2 to 2.8 ! - 20 runs - 30 runs G B S P D D S P B G REJECTION REJECTION 30 0 20 0 P 25 5 0 13 7 D 28 2 0 13 7 0 B 29 1 20 0 S 22 8 6 14 0 G P: Police D: Dog B: Bird chirping S: Smoke alarm G: Gunshot
CONCLUSION ! • The ‘ Hearing’ Glass is a bridging technology that can fill the gaps in lives of deaf people. ! • A computer algorithm was developed in a graphic programming environment, Simulink that utilizes spectrogram, and two-dimensional cross-correlation methods to identify sound objects of interest. ! • A prototype was built on a low-cost computer, i.e., Raspberry Pi with two LEDs to relay information was a success. ! • Computer simulation of sound object recognition showed promising capability for 4 alarming sounds and 1 friendly sound. ! • One of the major technical challenges is a noise treatment . Presently, a noisy background is overcome through various thresholds the cross-correlation result must meet. ! • The similarity in the lower frequency components of the dog growling and typical background can cause false detections with increasing noise floor. ! • The Sound Localization algorithm succeeded in having the ability to give the orientation (right-left) of the sound object’s origin.
OUTLOOK Applications ! A. Environmental sound alert system for Deaf and People with hearing loss ! B. Machine failures detection based on changes in sounds ! C. Sound-based object recognition or situation sensors for rescue and military robots (in addition to vision sensors). ! ! Bag of Features ! The bag of features technique in which different features are taken from each reference spectrograms and the recorded spectrograms. ! ! Machine Learning ! A endeavor into Machine Learning which trains the computer pr micro-controller to “learn” information directly from data without assuming a predetermined equation as a model, can be worthwhile in the longer outlook when detection with 2-D cross- correlation between the input signal and hundreds of reference spectrograms may be come computationally heavy. ! ! Add-on Systems ! The development of an algorithm that is compatible with the iPhone or Google glass ! would be suitable. The computation rates would be much better and the visualizing section (Google Glass)would be much better. Although the process of acquiring smart glasses would be costly, the insurances and state-owned funds would minimize the cost for the user’s.
Recommend
More recommend