offline language translation tool
play

Offline Language Translation Tool Capability Collaboration Event 11 - PowerPoint PPT Presentation

Offline Language Translation Tool Capability Collaboration Event 11 June 2019 sofwerx.org/translator Blue Sky Working Group Brief Outcomes Team Votes Black 15 Blue 2 Purple 4 Red 1 Green 18 Orange 9 {Black} One-Year Product


  1. Offline Language Translation Tool Capability Collaboration Event 11 June 2019 sofwerx.org/translator

  2. Blue Sky Working Group Brief Outcomes Team Votes Black 15 Blue 2 Purple 4 Red 1 Green 18 Orange 9

  3. {Black} One-Year Product Development Plan Surreptitious/Background (Future) One-on-One, Two-Way Speech 1. Hardware 1. Hardware • Current generation Android phone w/ large display • Current generation Android phone • Memory: 2 GB RAM, 16 GB internal mem (min) • Memory: 2 GB RAM, 16 GB internal mem (min) • Microphone: Far-field and/or array • Microphone: Push-to-Talk directional microphone 2. Software • Audio: Speaker/AUX output • Identify candidate existing algorithm sources for… 2. Software • ASR, MT • Identify candidate existing algorithm sources for… • Build application, UI, integrate components • ASR, MT, TTS • Target MOPs: ASR Error < 50%; MT BLEU > 15; Latency = • Build application, UI, integrate components 0.5s; CTR* = X • Target MOPs: ILR = 1+; ASR Error < 50%; MT BLEU > 15; • Integrate keyword spotting algorithm Latency = 0.5s; CTR* = X 3. Language & Acoustics 3. Language & Acoustics • General Data collection (leverage existing data) > Database • General Data collection (leverage existing data) > population Database population • Use-specific Data Collection > Database population • Use-specific Data Collection > Database population • Build transcription and translation libraries • Build transcription and translation libraries • Build language and acoustics models • Build language and acoustics models • Target Languages: User’s single highest priority language • Target Languages: User’s single highest priority language 4. Other 4. Other • User interface research • User interface research

  4. {Black} Year 2/3 Product Improvement Plan Surreptitious/Background One-on-One, Two-Way Speech 1. Hardware 1. Hardware • • Upgrade Android host phones as newer models Upgrade Android host phones as newer models come out (refinements to software will only come out (refinements to software will only continue to increase computing load) continue to increase computing load) 2. Software 2. Software • • Refine ASR, MT, TTS algorithms Refine ASR, MT, algorithms • • Update UI based on testing/field data Update UI based on testing/field data • • Target MOPs (increase): ASR Error < 25%; MT Target MOPs (increase): ILR = 1+; ASR Error < 25%; BLEU > 25; Latency = 0.5s; CTR* = X MT BLEU > 25; Latency = 0.5s; CTR* = X 3. Language & Acoustics 3. Language & Acoustics • • Update language and acoustics models based on Update language and acoustics models based on testing/field data testing/field data • • Target Languages (user-driven): either (A) add Target Languages (user-driven): either (A) add more languages, or (B) increase sophistication on more languages, or (B) increase sophistication on priority languages, or (C) pay a ton of $ and do priority languages, or (C) pay a ton of $ and do both both 4. Other 4. Other • • User interface research User interface research

  5. {Blue} • One-Year product development plan ▪ Select/Define Use Case ▪ Real time transcription/translation via audio ▪ Must address concerns for: a) Hardware Architecture – Form Fit Factor, Processing Ability a) Wireless ear piece: Near Field Magnetic Solution b) Handheld/Body - S7 ATAK/Galaxy Note 8 c) Centralized Processing Unit – KLAS VOYAGER; AWS Snowball/Outpost, etc. a) Bandwidth = 80MB (Trellisware/MANET) b) Computing power = 50 Watts

  6. {Blue} • One-Year product development plan ▪ Select/Define Use Case ▪ Real time transcription/translation via audio ▪ Must address concerns for: a) Software Architecture – AI Software Selection/Configuration w/HW a) Engine Selection for Transcription/Translation w/ appropriate mix of AI engines a) Via, Sage Maker, Veritone aiWARE for orchestration and algorithm management b) Identify key acoustic engines – open source & proprietary c) Identify key transcription/translation engines – open source & proprietary b) Language – Identify Set # of Languages to process for first year. a) Arabic, Spanish, Russian, Chinese c) UI/UX Definition – SW/HW a) Data Processing & Analysis, Training. b) In Theater/On Scene Interactions • How will you employ it? ▪ One on one, surreptitious/background/etc.? ▪ One on one – general collection/Ground Truth Classification/low noise ▪ Background Noise/Combination of Speakers – Preprocessing/ Adv Classification /De noise/Acoustic Eng

  7. {Blue} • Pre-Planned Product Improvements for year two to three: • Add Use Cases • Interactive Solution – More engines, recommender, Selective Speech ID • Add languages • Improve UI • Audio/video – Ear Piece/Helmet Cam/Eye Glass • Add additional cognitive categories • Speaker separation, Speech ID, Language ID, Voice Recognition, Sentiment • Object Detection, OCR, Clothing etc. • Behavioral Engines • Performance Improvements, Scale, Processing ability • Adv Architecture deployment Core/Edge Solution via Cloud/Stand Alone • On the fly Training Feedback Loop/Algorithm-Model Retraining Process • Transition capability to additional problem sets • Transition data to cloud-based system

  8. {Purple} • One-Year product development plan ▪ Must address concerns for: a) Hardware – microphones – far field experience. Beam forming tech on consumer smartphones. Mini cloud. Finite languages (2 or 3?). USB for terabyte extension b) Software – multiple OS (android/iOS). Transcription tech to share. Data at rest encryption. Gather existing solutions. c) Acoustics – capture enough noise environments. Phonetics. d) Language- decide on metrics/product viability. Recording data for languages without existing data sources. e) Other – assess cognitive load of the Warfighter when using machine translation • How will you employ it? ▪ One on one at first

  9. {Purple} – Year 1 Plan • Stage 1: • Test and evaluate using off the shelf commercial hardware and software including Android/iOS using Google, Microsoft, iTranslate etc. Modern capable phone • Using Warfighters, role players and interpreters, figure out how a warfighter’s operational capability is increased (eg detect when interpreter is changing the message or missed something important) with machine translation. • Use best language translation pairs • Evaluate key languages to begin field testing for the languages with less data and begin improving them with acquisition of language data Stage 2 : collect data from louder environments for key languages Stage 3: Measures of success – task completion %.

  10. {Purple}- Year 2/3 Plan • Pre-Planned Product Improvements for year two to three: • Hardware development – microphones for multi-person environments and background noise. • Increase storage capacity on phones (USB?) for recordings to improve the machine and assess accuracy of translations • Increase language availability through acquisition of language data

  11. {Red} One Year • Hardware • Android based device • Best of breed devices • Focus on using earphones for translation • Hands free as much as possible • External microphones on user • Bottom line: device is based upon program office procurements today and future

  12. {Red} One Year • Software • Application based • Cloud Updates for new phrases and common terms • Language Identification to distinguish unexpected languages • Loose coupling: data can be processed and viewed the same way across different systems • Algorithms: Speech to text, text to text (translation), text to speech

  13. {Red} One Year • Acoustics • Assessment of current technology, most effective • Beamforming: multiple microphones to distinguish voices (alexa) • Audio processing is noise cancelling for speaker’s separation from the environment

  14. {Red} One Year • Language • Upload and focus on gathering information for mission critical languages and dialects (local translators, language database via government, social media, television closed captioning) • Understand common phrases or indicators of harm (someone saying “bomb”) • Languages will be selected on device before entering mission area (most common languages/dialects in region) • User discretion for switching language

  15. {Red} • Pre-Planned Product Improvements for year two to three: • User feedback process improvement • Upgrade size of devices • Enhance range of microphones • Continuously collect information for more languages and dialects • Bottom Line: stay consistent with program office and user needs

  16. End solution • Different modules that could be used based on the situation and available hardware. • Mobile phone only • Mobile phone + microphone • Mobile phone + mic + Jetson

  17. Microphone Jetson AGX Xavier + Battery + storage Standalone unit for situations where other hardware is not available

Recommend


More recommend