Speech Recognition Grammars

What is a grammar? A grammar consists of both rules and words or phrases that a speech recognition application can recognize. This section describes the standard grammar formats used by context-free and dictation grammars.

Types of Grammars

There are three types of grammars used in speech recognition applications: context-free, dictation, and limited domain. Each grammar uses a different strategy for narrowing the set of sentences it will recognize.

Context-Free Grammar

To improve recognition and to improve processing demands on the CPU, current speech recognition technology requires a list of all valid sentences that can be spoken. Such a list could include thousands of entries. A major benefit of a context-free grammar is that it allows an engine to reduce the possible entries to a list of several hundred rules.

After a word is recognized, the context-free grammar uses rules to predict subsequent words. This prediction reduces the set of candidates against which the engine must evaluate the next word. This grammar type is normally used in telephony applications.

Dictation Grammar

This grammar defines a context for the speaker by identifying the subject of the dictation, the expected style of language, and what dictation has been done in the past. A dictation grammar does not define all the words that can be spoken, nor does it define all possible syntactic structures.

Limited-Domain Grammar

The limited-domain grammar does not provide strict syntactic structures, but does provide a set of words for recognition. This grammar is a hybrid of context-free and full dictation grammars.

Parsing Verbal Input

A speech recognition engine accepts verbal input using one of these (parsing?) technologies:

Discrete Speech	Each word must be isolated by a pause before and after the word.
Word Spotting	Only certain key words are recognized in an utterance.
Continuous Speech	All words from a continuous utterance with no discrete pauses are recognized.

The table below demonstrates how each of these technologies influences the way different grammars recognize a phrase.

Technology	Context-free grammar	Dictation or limited-domain grammar
Discrete Speech	A phrase is completed when one path of the grammar is traversed by the isolated words. A path is abandoned when an incorrect word is spoken or the delay between words is too long.	A phrase is sent to the application when the engine isolates the word.
Word Spotting	The grammar paths specify what words are to be spotted in what sequence. EXAMPLE: If one of the grammar paths is "mail" followed by "Fred", then "Send all of my MAIL to FRED Smith" would complete the path.	Keywords are spotted in the utterance and a phrase is sent when the engine determines that the keyword was spoken.
Continuous Speech	Words can be spoken in a fluid manner and are allowed to blend into one another, although slight pauses are allowed. A path is abandoned when an incorrect word is spoken or the delay between words is too long.	The engine parses the continuous stream, and when it determines a sequence of words has been spoken, it collects them into a phrase and sends them to the application.

Creating Grammars

Grammars are specific to the recognition engine and are created with a grammar compiler provided with the engine. See the documentation provided with the engine for details.