Using Speech Recognition

Developing an Application with Speech Recognition

Designing a speech recognition application involves a great deal of planning. While users may prefer voice input to touch-tone input, if the interface is not natural, user-friendly, and error-free it will cause a lot of frustration.

Importantly, you must communicate to the user that your application is speech-aware and provide him or her with understandable commands. It is also important to provide command sets that are consistent and complete.

Feeling overwhelmed? Pronexus can help develop your voice recognition application! Contact our sales office for details.

Best Practices for Developing a Speech Interface

Communicating Speech Awareness and Managing User Expectation

Speech recognition is still uncommon enough that users may not realize that an application supports it. You should somehow indicate that the application can hear the user or is waiting for the user to speak.

In telephony applications, if you want users to speak clearly and provide concise answers, it is particularly important they be aware that they are talking to a computer.

When users hear that they can speak to their computers, they often think of Star Trek and 2001: A Space Odyssey! They expect that the computer will correctly transcribe and understand every word they utter, and then act upon their input in an intelligent manner.

Clearly convey to users what the application can and cannot do and emphasize that they should speak clearly, using words the application understands.

Let users know that touch tone can still be used. If users assume that the system only handles speech, they may get frustrated and hang up.

EXAMPLE

Depending on the type of voice recognition engine, you may use a prompt such as: "Please press or say ..." or "This system has been designed to respond to specific words and phrases as well as touch tones. If communicating by voice, please listen carefully to the instructions and speak clearly."

The last example tells users that they are talking to a computer, indicates what the system is expecting, and instructs them about how to proceed.

Tips for Designing Voice Menus

Include alternative wordings	Determine common variations of a command and include those in the voice menu to allow the user some flexibility. EXAMPLE: Variations of "What time is it?" might include "Tell me the time" and "Give me the time, please". However, do not include too many alternatives or recognition accuracy will be affected negatively.
Use consistent wording and word order	Consistency makes voice commands easier to learn. EXAMPLE: For commands you might always use a verb + noun pattern such as "Say the time" or "Save the file." For questions you might always preface the command pattern with "How do I" to form questions such as "How do I change the time?".
Avoid similar-sounding words	Most engines have a very high error rate for similar-sounding words in the same state. EXAMPLE: If "go" and "no" are in the same state, the engine has maybe a 50% chance of recognizing the word correctly. Some engines may confuse "cut" with "up" or "on" with "off."
Keep lists short	Most recognition systems break down when more than 100 words are active, so try to keep lists well below this number. EXAMPLE: Allowing the user to address electronic mail to any of 10,000 employees will not work. Recognition will work if you allow any name from 100 or 75 or 50 employees to whom the user frequently sends mail.

Providing Feedback to the User

Whenever a voice command is spoken, you should give feedback to users to verify their input and/or to indicate that their input understood and acted upon.

EXAMPLE	Computer:	Who would you like to speak with?
	User:	I would like to speak with John.
	Computer:	Okay, I will transfer you to John.

Breaking Up a Long Series of Numbers

Most engines have a high error rate for a long series of continuously spoken digits. For phone numbers, credit card numbers, or other long series of digits, try to break the number down into groups of four or less or ask the user to speak each digit as an isolated word.

Compensating for Background Noise

Background noise influences recognition results. If the user is having a simultaneous conversation in the room or is talking on his/her cell phone from the street, the engine attempts to recognize the user's speech, but with unpredictable results.

If your application's design increases the likelihood that the user will be having conversations nearby, such as during a telephone application, you should ensure that the user can shut off speech recognition during the conversation.