BASELINE SPEECH RECOGNITION SYSTEM FOR WEATHER DOMAIN

� ُ � � � � � DIALOG FLOW : ل�� یا ��ا �� آ ش�� - �� :�� ل�� ن�� پآ- �� م�� سا � ��: ر��ﻻ ��: �� ِہا��- �� ر�� ا � �� - �� ی��ڈ �� ترا�� رد�� ر��ﻻ نارود� ں�� : ��ا۲۴��ر �� وروا �� شر� ��و�� ن��ا- �� ں�� ؟�� ل�� روا �� ن�� پآ ��ُ�� : ��د با�� : �� : �� ل��-

BASELINE ACCENT INDEPENDENT SPEECH RECOGNITION SYSTEM FOR WEATHER DOMAIN Architecture Diagram Offline word ASR Results Accent Vocabulary Training Testing Accuracy size Utterances Utterances (%age) Punjabi, Urdu, 139 31802 10216 91.87 Pashto, Balochi

BASELINE ACCENT DEPENDENT SPEECH RECOGNITION SYSTEM FOR WEATHER DOMAIN Architecture Diagram

OFFLINE RESULTS Accuracy of Accent Identifier Accent Training Files Testing Files Correctly Accuracy identified Balochi 3670 1995 1439 72.13% Pashto 3670 1771 839 47.37% Punjabi 3670 988 464 46.96% Urdu 3670 4341 3234 74.49% All Accents 9095 5976 65.71% Accuracy of word ASR system Accent Vocabulary Training Testing Accuracy Size Utterances Utterances (%age) Punjabi 139 3670 988 91.29 Urdu 139 17080 4341 95.09 Pashto 139 6781 1771 90.06 Balochi 139 4271 1995 90.82 Overall AD ASR System Accuracy 92.76

FIELD TESTING The purpose of conducting field-testing of ASR system is to evaluate system performance in the scenarios and places where the system is intended to be used, and hence get the feel of how system will perform in real-world scenarios. Offline Testing Field Testing Silence is precisely cut from speech Silence is cut from speech automatically manually using Voice Activity Detector outlined in (Rabiner & Sambur, February 1975) Noisy files are separated from test file Noisy files are part of test files manually Out-Of-Vocabulary (OOV) and Out-Of-Vocabulary (OOV) and mispronounced words are also removed mispronounced words are removed from the testing data. using methodology given in (Irtza, Anwar, & Hussain, 2014).

SELECTED NOISE SCENARIOS AND DEMOGRAPHICS Based on the amount of noise present in the surroundings, from very quiet environment to very loud, different places selected were  Labs  offices, classrooms  campus-parking space  open-fields (campus lawns)  cafeteria  bus-stand and roads within the campus Demographics include:  Technical people involved with the project  Technical people not involved with the project  Non-technical staff, students, car and rickshaw drivers, shopkeepers and waiters of the cafeteria

FIELD ACCURACY OF DIALOG SYSTEM WITH ACCENT INDEPENDENT ASR The accuracy of complete dialog system is measured in terms of the response it generates and how it handles the error cases. Complete end to end Dialog accuracy: No. of Total Test Files Correct System Response Incorrect Overall Speakers System System Response Accuracy In-vocabulary OOV or In-vocabulary word correctly Multiple words words decoded correctly misrecognized identified or marked as OOV 67 537 272 60 205 61.82% The errors which lead to incorrect system response can be broadly classified into ASR related and non-ASR related errors.

ERROR CONTRIBUTION FROM DIFFERENT SOURCES Voice Activity Detection Both Phone and 16% Word ASR (results in misrecognition) 29% Ambient Noise 32% Only Phone ASR (results in false OOV alarm) 23% Both Phone and Word ASR Only Phone ASR Ambient Noise Voice Activity Detection Performance of accent-independent word-based ASR Test Files Correctly Decoded Incorrectly Decoded Accuracy of Word ASR 379 320 59 84.43%

FIELD ACCURACY OF DIALOG SYSTEM WITH ACCENT DEPENDENT ASR In case of dialog system with accent dependent ASRs, the errors due to non- ASR issues (voice activity detection and background noise) remain the same but errors due to speech recognition system increase significantly and we get an overall drop in the accuracy of the complete system. No. of Total Correct System Response Incorrect System Overall Speakers Test Response System Files Accuracy In-vocabulary OOV or Multiple In-vocabulary words word correctly words correctly misrecognized or marked decoded identified as OOV 67 537 219 60 258 51.95%

ERROR CONTRIBUTION FROM DIFFERENT SOURCES Voice Activity Detection 13% Both Phone and Word Ambient Noise ASR 25% (results in misrecognition)52% Only Phone ASR (results in false OOV alarm) 10% Performance of accent-independent word-based ASR Test Files Correctly Decoded Incorrectly Decoded Accuracy of Word ASR 379 246 133 64.91%

CONCLUSION In field, accent-independent ASRs outperform the accent-dependent ASRs.

FUTURE WORK In order to minimize the gap between ASR results in lab and in field, We will improve the accuracy of: • Baseline ASR systems • Out of vocabulary detector • Accent identification system • Voice activity detector

BASELINE SPEECH RECOGNITION SYSTEM FOR WEATHER DOMAIN - PowerPoint PPT Presentation

BASELINE SPEECH RECOGNITION SYSTEM FOR WEATHER DOMAIN DIALOG FLOW : -

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief

A GPU-Based Cloud Speech Recognition Server For Dialog Applications Alexei V. Ivanov,

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Problem Statement To design an automatic speech recognition system that gives best recognition

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

Speech Recognition Speech Recognition Berlin Chen,

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 11: Recurrent

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

Speech Separation for Recognition and Enhancement Dan Ellis Laboratory for Recognition and

Pattern Recognition Part 9: Speaker and Speech Recognition Gerhard Schmidt

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 7: Hidden

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 16: Language

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Speech Recognition and Synthesis for Conversational AI Mari Ostendorf University of Washington

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural

EECS E6870 converting speech to text Speech Recognition automatic speech recognition