An Unseen Interface :D Creating Speech-driven UI For Your App That - PowerPoint PPT Presentation

An Unseen Interface :D Creating Speech-driven UI For Your App That Makes Users Happy by Halle Winkler, @politepix http://www.politepix.com

What is a speech-driven UI?

A speech-driven UI uses either speech recognition as an input method, speech synthesis as an information source for the user, or both together. ...but it can also be multi-modal.

How does speech recognition work? The elements of speech recognition are: 1. An acoustic model 2. A lexicon 3. A language model (probability) or grammar (ruleset for states) 4. A decoder

What kind of apps benefit from speech UIs? Large Vocabulary Tasks: server, built-in vocabulary (UITextView, Android.speech, Nuance, AT&T, iSpeech) Tasks in which free-form dictation is useful Tasks which relate specifically to language Command and control tasks: o ffl ine, you generally define vocabulary (OpenEars or other CMU Sphinx or Julius implementations, some Android.speech devices and OSes) Interfaces where the user is looking somewhere else Interfaces where speech provides a new input or output Interfaces that are more fun with speech Interfaces where it’s easier to speak than type Interfaces where it’s easier to listen than read Interfaces where a heavy obstacle is removed

Why o ffl ine? The interface is always available to your user Speed is as fast or faster as a network API – and it's quantifiable! Interface design and implementation is simpler and more predictable without an asynchronous network dependency The user is not giving away any of their data

How is a speech UI di ff erent from a visual UI? What are the dimensions on which a visual UI is rendered? What are the dimensions on which a speech UI is rendered? A speech UI is rendered on the dimension of time. People value their time exquisitely.

Do people understand each other perfectly all the time? Why not? Accents Lack of shared vocabulary/Dialect Noise Distractions Interruptions Hearing di ffi culties Distance Language errors � Human speech interactions have frequent comprehension faults Emotional intelligence makes us incredibly fault-tolerant

Automated speech recognition is subject to all the same issues as human speech recognition, but without the emotional intelligence

We have to stack the deck in our (users’) favor.

Short is good. Don't bite o ff more than you can chew – small (read "fast") steps forward means small (read "fast") steps backwards � Use keyword detection to launch events � Switch between small vocabularies that each relate to one domain This results in accuracy, speed, and a large vocabulary!

Short is bad. Phonemes are the smallest unit of speech Words with few of them have a lot of rhymes Contextless rhyming is our enemy Medium-sized, crunchy granola words are our friends

My app, my rules Some apps need to recognize words or phrases in ways that can be expressed by rules. Or be flexible Some apps need to do probability-based detection There are probability-based language models for expressing this such as ARPA models

Out of vocabulary Your app also has to behave well when people aren't speaking to it!

Mic distance and vocabulary The more distance, the less vocabulary

Test, test, test. And obtain appropriate test material.

Case study 1: Recipe App A natural implementation of o ffl ine speech recognition

What are our interface considerations? • What are we buying with our time? Hands-free operation, moving locus • Hands-free doesn't mean eyes-free! We can provide visual info • Operational distance is pretty far • Instead of NLP, o ffl ine grammar • Secret weapon: we know all the words in a recipe in advance • Fault tolerance: one level of complexity, don't confirm; return! • Challenges: noise, moving locus, reflection, competing speech �

Case study 2: Marco Polo A dialog management tag game: one user checks in a single location and the other user receives volume-based speech feedback about their proximity to the target when they say “Marco”

UX Considerations • What are we buying with our time: play! • For a single word, language model is fast and su ffi cient • Acoustic environment and OOV semi-important • This is a single-mode interface – an actual dialog manager • Extra development time should be put into increasing voice dynamic range

Case study 3: TalkCheater An app to whisper sweet presentation notes in your ear

UX Considerations • What are we buying? Eye contact, moving locus, enhanced human capabilities • Is this a speech recognition app? • Does this have a visual or a touch interface? • The body is the interface • Fault tolerance, always important but most important in a high-value scenario • Volume • Speaking speed of synthesized speech

Talk to me @politepix and the OpenEars forums. I will tell you all the things.

An Unseen Interface :D Creating Speech-driven UI For Your App That - PowerPoint PPT Presentation

An Unseen Interface :D Creating Speech-driven UI For Your App That Makes Users Happy by Halle Winkler, @politepix http://www.politepix.com What is a speech-driven UI? A speech-driven UI uses either speech recognition as an input method,

PARADOX THE UPSIDE DOWN TRUTH OF FAITH PARADOX Week 4 Seeing the Unseen to Truly See

COMPREHENSION OF UNSEEN PASSAGES UNSEEN PASSAGES Teacher : Prof. Indu Bora Subject :

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

& Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

Language Modeling (Part II) Lecture 10 CS 753 Instructor: Preethi Jyothi Unseen Ngrams By

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

Linux Kernel Crypto API Herbert Xu Red Hat Inc. Current State Async + sync cipher interface.

WatchKit Segues Segues Transition to another interface controller Push segues and modal segues

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract

User Interface Design User Interface Design Designing effective Designing effective interfaces

Interface Documents David Christian 11/20/17 Interface between CE and DAQ Interface

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types Hady

The General Counsel Program of the Greater Richmond Bar Foundation Nonprofit Corporate

CS 403X Mobile and Ubiquitous Computing Lecture 6: Maps, Sensors, Widget Catalog and Presentations

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Grounded Semantics Daniel Fried with slides from Greg Durrett and Chris Potts Language is

NTAG 5 PRODUCT INTRODUCTION NFC FORUM TYPE 5 TAGS: NTAG 5 PRODUCT FAMILY PRESENTATION PABLO

Class 2 @rwdkent Web Inspector Tool Demo in Your Preferred Browser RWD Case Studies 5-8

Miscellaneous intermediate topics Miscellaneous intermediate topics Abhijit Dasgupta Abhijit

How to Create a Tech Docs Builder to Automate Builds PRESENTED BY Jenny Pittman, Sr. Technical

An Unseen Interface :D Creating Speech-driven UI For Your App That - PowerPoint PPT Presentation

An Unseen Interface :D Creating Speech-driven UI For Your App That Makes Users Happy by Halle Winkler, @politepix http://www.politepix.com What is a speech-driven UI? A speech-driven UI uses either speech recognition as an input method,

PARADOX THE UPSIDE DOWN TRUTH OF FAITH PARADOX Week 4 Seeing the Unseen to Truly See

COMPREHENSION OF UNSEEN PASSAGES UNSEEN PASSAGES Teacher : Prof. Indu Bora Subject :

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

&amp; Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

Language Modeling (Part II) Lecture 10 CS 753 Instructor: Preethi Jyothi Unseen Ngrams By

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

Linux Kernel Crypto API Herbert Xu Red Hat Inc. Current State Async + sync cipher interface.

WatchKit Segues Segues Transition to another interface controller Push segues and modal segues

TDDE18 &amp; 726G77 Interface, command line and vector interface An interface is an abstract

User Interface Design User Interface Design Designing effective Designing effective interfaces

Interface Documents David Christian 11/20/17 Interface between CE and DAQ Interface

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types Hady

The General Counsel Program of the Greater Richmond Bar Foundation Nonprofit Corporate

CS 403X Mobile and Ubiquitous Computing Lecture 6: Maps, Sensors, Widget Catalog and Presentations

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Grounded Semantics Daniel Fried with slides from Greg Durrett and Chris Potts Language is

NTAG 5 PRODUCT INTRODUCTION NFC FORUM TYPE 5 TAGS: NTAG 5 PRODUCT FAMILY PRESENTATION PABLO

Class 2 @rwdkent Web Inspector Tool Demo in Your Preferred Browser RWD Case Studies 5-8

Miscellaneous intermediate topics Miscellaneous intermediate topics Abhijit Dasgupta Abhijit

How to Create a Tech Docs Builder to Automate Builds PRESENTED BY Jenny Pittman, Sr. Technical

& Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract