Cyberon Voice Com m ander - PowerPoint PPT Presentation

Cyberon Voice Com m ander 多國語言語音命令系統開發經驗談賽微科技研發部協理劉進榮 2007/ 08/ 16 1

Cyberon Profile • One of the leading em bedded speech solution • One of the leading em bedded speech solution providers w orldw ide providers w orldw ide • Establishm ent : Jan, 2000 • Headquarter : Hsin-Tien City, Taipei, Taiwan China Office : XiaMen • Em ployees : 36 (R&D: 27) • More than 1 5 m illion units shipped w orldw ide • 2 0 0 6 Revenue: NTD 87million , EPS: NTD 13.2 2

3 Pow ered By Cyberon – Grand China

4 Pow ered By Cyberon - W orldw ide

Cyberon’s Solutions • Speaker-Dependent Voice Recognition � Spead Dial engine � Cyberon Voice Spead Dial (CVSD) • Speaker-Independent Voice Recognition � CListener engine � Cyberon Voice Dialer (CVD) � Cyberon Voice Com m ander ( CVC) • Text-To-Speech � CReader engine � Cyberon Talking Tutor (CTT) � Cyberon Talking Dictionary (CTD) 5

Cyberon Voice Com m ander • A Voice Dialing and Command&Control Application � Name Dial/ Digit Dial � Phone Book Lookup � Program Launch � Media Player Control � E-mail/ SMS/ Calendar reader � Callback, Redial, Time... etc. � Voice Feedback � Bilingual Speech Recognition • Technology � Speaker-I ndependent Com m and-Based SR � Text-To-Speech � Continuous Digit Recognition � Speaker-Dependent SR (Voice Tag) � Speaker Adaptation (for Digit Model) 6

Supported Language European Region European Region Am erican Region Am erican Region UK English UK English Czech Czech Northern American English Northern American English German Turkish German Turkish Brazilian Portuguese Brazilian Portuguese French Danish French Danish Southern American Spanish Southern American Spanish Italian Swedish Italian Swedish Spanish Finnish (07’ ’Q3) Q3) Spanish Finnish (07 Portuguese Norwegian (07’ ’Q3) Q3) Portuguese Norwegian (07 Asian Region Asian Region Russian Greek (07’ ’Q3) Q3) Russian Greek (07 Traditional Chinese Traditional Chinese Dutch Dutch Slovak (07’ Slovak (07 ’Q4) Q4) Simplified Chinese Simplified Chinese Polish Polish Hungarian (07’ Hungarian (07 ’Q4) Q4) Chinese Accent English Chinese Accent English Ukrainian (07’ Ukrainian (07 ’Q4) Q4) Korean Korean Thai Thai Cantonese Cantonese Japanese Japanese 7

8 Speaker-I ndependent Speech Recognition

Architecture Grammar Lexicon Feature Voice Signal Vectors Result Feature Search Algorithm Extraction Vocabulary Acoustic Database Search Algorithm Size -Isolated Word -Small : tens Speaker Dependence -Discrete Speech -Middle: hundreds -Speaker-Dependent (SD) -Large: thousands -Continue Speech -Speaker-Independent (SI) -Very Large: ten thousands -Keyword Spotting -Speaker Adaptation (SA) Approach -Neural Network -HMM Unit -Word based -Phoneme based 9

Feature & Gram m ar • Feature � Input Signal: 8k Hz, 16-bit PCM � 8-Dim MFCC and 8-Dim Delta MFCC � 100 Frames Per Second � Cepstral Mean Subtraction • Grammar 人名住家、公司、手機打電話住家、公司、手機人名打電話開啟應用程式名應用程式名開啟 start end start end 歌曲名播放播放歌曲名其他單詞命令其他單詞命令 10

Lexicon, Model & Search • Lexicon � Word-to-Phone Conversion � Several approaches for different languages � 30 KB ~ 250 KB per language • Model � Phoneme-Based HMM � 3 Left-to-Right States for a Phoneme Model � Decision-Tree Triphone Model � Forward-Backward Training � 180KB ~ 220 KB per language • Search Algorithm � Viterbi Search � Word transition governed by Grammar 11

Language Developm ent • Procedure � Define Phoneme Set � Wikipedia, SAMPA, Language Learning Web Site, ... � Build Lexicon Module � Rule: Academic Paper, Language Learning Web Site... � Pronunciation Dictionary: LDC, ELRA, other research organizations... � Design Recording Scripts � News Web Site � Collect Speech Data � Local Agents � Train Model & Test • 3 ~ 6 months for developing a language 12

Lexicon Module • Basic Approaches � Rule � Simple Letter-to-Phone Rules � Ex: Italian, Spanish, Portuguese... etc. � Hardcode � Ex: Chinese, Korean... etc. � Decision Tree � Trained by a pronunciation dictionary � Accuracy: inside 92% ~ 98% , outside 60% ~ 75% � Ex: English, German, French... etc. • Hybrid for Most Languages 13

Data Collection • Corpus � 100 ~ 800 Informants Per Language � Per Speaker • 40 ~ 60 short words for booting model • 200 ~ 300 sentences (25 ~ 30 min) for training • Accent Issue � Collect data in big cities � Try to enlarge the coverage of accents • Verification & Phoneme Transcription � Done by tools 14

Engine Sim ulation Test � Vocabulary: 200 full names � Tester: 4 ~ 6 native speakers � Device: Dopod 900 (HTC Universal) � Add several degrees AURORA CAR noise to source data � Accuracy (% ) S/N Clean 15dB 10dB 5dB 0dB Language Taiwan Mandarin 98.03 97.04 96.37 93.09 75.33 China Mandarin 96.62 96.21 95.21 90.33 71.67 Cantonese 95.36 94.01 93.97 88.01 71.62 US English 98.9 97.9 96.68 92.58 79.4 UK English 93.88 94.85 94.21 91.45 77.79 German 95.17 95.17 93.65 87.81 75.29 French 94.83 95.02 94.08 90.25 76.62 Italian 95.77 94.15 93.64 91.56 81.73 Spanish 96.18 95.37 92.83 89.28 78 Brazilian Portuguese 96.2 97.15 95.49 93.35 80.29 Dutch 94.25 93.12 92.62 88.12 74.75 Japanese 96.55 96.1 92.4 90.4 81.1 Russian 97.15 95.6 93.62 87.07 75.47 Average 96.07 95.51 94.21 90.25 76.85 15

CVC Field Test � Vocabulary: 200 full names, 20 ~ 30 apps with grammar � Tester: 4 ~ 6 native speakers � Device: Several PocketPC phone models � Environment: Office, Roadside, and Highway � Accuracy Env. Office Roadside Highway Language Taiwan Mandarin 98.6 92.8 93.5 China Mandarin 96.2 90.4 92.3 Cantonese 94.8 89.7 91.5 US English 93.7 85.2 90.5 UK English 93.2 83.7 88.5 German 95.7 86.3 93.8 French 96.5 91.4 92.6 Italian 97.5 92.3 94 Spanish 97.1 89.4 91.2 Brazilian Portuguese 95.3 87.6 88.7 Dutch 92.4 84 91.3 Japanese 96.2 88.3 91.2 Russian 96.3 88.4 92.8 Average 95.63 88.42 91.68 16

17 Text-To-Speech

TTS in CVC • Mainly for voice feedback of VR result • 16k Hz, 16-bit PCM Output • Compact Size: 300 KB ~ 600 KB per Language • Acceptable quality • Lack of rich prosody (Robotic) • Good for pronunciation of single word and short phrase after fine tuning � Cyberon Talking Dictionary 18

Architecture Speech Unit Database Word/Phrase break Pronunciation Input Text Output Speech POS tag Text Analysis Synthesizer Pronunciation Lexicon Prosody Model POS Lexicon 19

Text Analysis • Word Boundary � For Chinese and Thai � Longest word first • POS Tagging � By POS n-gram and Viterbi search • Phrase Boundary � By boundary n-gram and Viterbi search � Simplified approach: by syllable length 20

Prosody Model • Mandarin & Cantonese (Syllable Unit) � Save first tone of each syllable in database � Pre-define F0 contour of each tone � Adopt fixed base F0 contour of phrase � Compute duration by syllable position in word and in phrase • Other Languages (Diphone Unit) � Predict accent position and type by CART (Classification And Regression Tree) � Generate F0 contour by linear regression � Predict duration by CART 21

Synthesizer • LPC (Linear Predictive Coding)-Based Approach � Save LPC coefficients and residual of pitch of speech unit into database � Adjust residual length for F0 contour � Adjust number of pitch for duration 22

23 Conclusion

Cyberon Voice Com m ander • A successful commercial voice application on mobile device • Integrate several speech technologies, such as SI VR and TTS, into embedded system • Experience of developing a lot of languages • Show speech technologies workable in real daily life 24

Future W ork • Improve TTS quality • Enhance recognition performance in heavy noisy condition • Find accurate approach to verify and transcribe speech data • Create more effective procedure of developing a language • Develop other advanced speech technology and application 25

The End and Thanks Cyberon Corporation TEL : + 8 8 6 -2 -2 9 1 0 -9 0 8 8 FAX : + 8 8 6 -2 -2 9 1 0 -7 9 8 6 W ebsite : w w w .cyberon.com .tw 26

Cyberon Voice Com m ander - PowerPoint PPT Presentation

Cyberon Voice Com m ander 2007/ 08/ 16 1 Cyberon Profile One of the leading em bedded speech solution One of the leading em bedded speech solution

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Some aspects of the lattice structure of C 0 ( K , X ) and c 0 () Michael Alex ander Rinc

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Active / Passive Voice Acquistion THE PASSIVE VOICE IS EVIL. Two exceptions:

SOUTHERNAIRAVIATION FLY-BY-VOICE TM INTO NEXTGEN CENTURY Voice Activated Cockpit

Voice and Craniofacial Disorders By: Grace Castillo & Christine Truong What is a Voice

TO TO VOICE SEARCH Be Beware: e: Vo Voice Search Is Ex Exciting iting Everyone is

WEVE SET SIP FREE GUY MILLER DIRECTOR OF NEXT GENERATION VOICE SERVICES STEVE HARRINGTON

YOUR VOICE, YOUR CHOICE YOUR VOICE, YOUR CHOICE Headlines 1,279 people participated casting

One Voice Wales Un Llais Cymru Mr Lyn Cadwallader Chief Executive Prif Weithredwr The Voice of

London Nautical School Pupils Voice Everyones Voice Deserves to be Heard Junior Leadership

Assessing the Risk of Heat- -stressed stressed Assessing the Risk of Heat Mortality due to

GLENORCHY COMMUNITY BRIEFING 4 June 2019 Community Meeting Why were here: Options to

1 Overview Good result in FY2011 made up of underlying performance in accordance with

Communities, Conservation and Livelihood Conference 28 th 30 th May 2018 Halifax Canada

Budget Committee Meeting February 8, 2018 1 Agenda Norms and Parking Lot 8:30 Capital Outlay

FACA SEMINAR 2011 FEDERAL AND STATE MANDATES, LOCAL HOME RULE - PERSONAL WIRELESS SERVICE

Warsaw Central School District Smart Schools Initiative School Board Presentation January 2018

Max India Limited Investor Presentation August 2020 www.maxindia.com Max Group Vision To be

Cyberon Voice Com m ander - PowerPoint PPT Presentation

Cyberon Voice Com m ander 2007/ 08/ 16 1 Cyberon Profile One of the leading em bedded speech solution One of the leading em bedded speech solution

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Some aspects of the lattice structure of C 0 ( K , X ) and c 0 () Michael Alex ander Rinc

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Active / Passive Voice Acquistion THE PASSIVE VOICE IS EVIL. Two exceptions:

SOUTHERNAIRAVIATION FLY-BY-VOICE TM INTO NEXTGEN CENTURY Voice Activated Cockpit

Voice and Craniofacial Disorders By: Grace Castillo &amp; Christine Truong What is a Voice

TO TO VOICE SEARCH Be Beware: e: Vo Voice Search Is Ex Exciting iting Everyone is

WEVE SET SIP FREE GUY MILLER DIRECTOR OF NEXT GENERATION VOICE SERVICES STEVE HARRINGTON

YOUR VOICE, YOUR CHOICE YOUR VOICE, YOUR CHOICE Headlines 1,279 people participated casting

One Voice Wales Un Llais Cymru Mr Lyn Cadwallader Chief Executive Prif Weithredwr The Voice of

London Nautical School Pupils Voice Everyones Voice Deserves to be Heard Junior Leadership

Assessing the Risk of Heat- -stressed stressed Assessing the Risk of Heat Mortality due to

GLENORCHY COMMUNITY BRIEFING 4 June 2019 Community Meeting Why were here: Options to

1 Overview Good result in FY2011 made up of underlying performance in accordance with

Communities, Conservation and Livelihood Conference 28 th 30 th May 2018 Halifax Canada

Budget Committee Meeting February 8, 2018 1 Agenda Norms and Parking Lot 8:30 Capital Outlay

FACA SEMINAR 2011 FEDERAL AND STATE MANDATES, LOCAL HOME RULE - PERSONAL WIRELESS SERVICE

Warsaw Central School District Smart Schools Initiative School Board Presentation January 2018

Max India Limited Investor Presentation August 2020 www.maxindia.com Max Group Vision To be

Voice and Craniofacial Disorders By: Grace Castillo & Christine Truong What is a Voice