Initial Setup Properties |
||
|
||
Runtime Properties |
||
|
|
|
Greetings |
||
Methods |
||
Events |
||
Overview
The VerifySpeaker control verifies the caller's identity using Nuance verification. Verification is typically a two-step process.
Step 1: A Training Session is needed where the caller is asked to utter some digits or a text string. The results of these utterances are stored in a Nuance-provided verification database. Each record in the database consists of an application provided key (called the speaker model) and the speech parameters for a single caller. The speech parameters are generated by the Nuance verification engine generated during the training session.
Step 2: The caller's identity is established by performing a Verification Session. The application sets the SpeakerModel property and then asks the caller to utter some digits. The Nuance engine compares the speaker against the given SpeakerModel's speech parameters. The result is a positive match (Accept) or a negative match (Reject).
Like the VoiceRec control, the VerifySpeaker control uses a grammar file.
Please note that Nuance Verifier technology is not supported starting in VBVoice 5.4. Pronexus plans to support the next generation of Nuance verification technology once it becomes available.
Building a Verification Session
There are two methods for building the verification session.
In the first method, you build the verification session combining the retrieval of the SpeakerModel property and the recognition of utterances. Here, the verification session requires two steps. An application first establishes the SpeakerModel. Next, the control requires the caller to utter some digits, and then the verification is performed.
However, sometimes a one-step process offers a more user-friendly solution. Consider a voicemail application that asks callers to identify themselves by saying their name. The application first recognizes a name and locates the SpeakerModel for this caller from the grammar. Next, the application verifies the identity of the caller by asking them to utter some digits.
In the second method, if you use the one-step verification mode, the application still asks callers to identify themselves by saying their name. But next the VerifySpeaker control performs recognition on this utterance, followed by firing an event to the application. The application locates the SpeakerModel from the recognition result and allows the VerifySpeaker control to continue to perform verification using the same utterance. See EnhancedVerifySpkr for another example.
These two methods for building verification sessions are described in more detail below.
Two-Step Verification Session
Verification session is selected when the control is entered by the Verify entry node. To build this two-step verification session:
-
The Entry event is fired and then the Entry greeting is played.
-
The Nuance verification database is opened using the DBRoot, VBName and other database properties. The SpeakerModel is used to select a single record from the database.
-
Speech recognition is performed.
-
If the utterance is not valid, an InvalidUtterance event is fired. If the utterance is valid, a ValidUtterance event is fired. Retries occur for silence or unrecognized utterances.
-
The recognition process is repeated unless one of the following conditions occur:
-
The maximum number of retries is exceeded (this number is only the number of retries defined in the property pages in this case).
-
The Nuance engine is satisfied with the number of valid utterances detected.
-
The speaker depressed any DTMF button.
-
On finishing the Verification Session, it exists via the corresponding node. The Nuance database is not modified during a verification session.
The verification database is closed on exit of both sessions.
One-Step Verification Session
To build a one-step verification session:
-
Design your call flow by drag-and-dropping the VerifySpeaker control onto the frame. Fill in all the required parameters.
-
Empty the SpeakerModel property in the Setup tab.
-
At runtime, the Entry event is fired and then the Entry greeting is played.
-
The Nuance verification database is opened using the database properties. Since SpeakerModel is emptied, the control now enters into the special mode.
-
Speech recognition is performed.
-
A VoiceError event is fired with ErrorType = 24. The Words property contains the recognition result in which the SpeakerModel can be retrieved. Application must now set the SpeakerModel runtime property.
-
Verification process begins to verify the utterance acquired in previous step.
-
If the utterance is not valid, an InvalidUtterance event is fired. If the utterance is valid, a ValidUtterance event is fired. Retries occur for silence or unrecognized utterances.
-
The recognition and verification process is repeated until one of the following conditions is met:
-
The Nuance engine is satisfied with the number of valid utterances detected.
-
The maximum number of retries, which is defined in the Setup property tab, is exceeded.
-
The caller pressed any DTMF button.
-
On finishing the Verification Session, the control exists via the corresponding node depends upon the verification result. The Nuance database is not modified during the session.
To see a demonstration of this method, refer to the VerifySpeakerEnhanced example.
Grammars
See Grammars.
Use of Nuance Resources
This control works with the Nuance recognition engine only. Nuance has separate licenses for speaker verification. The speaker verification capability is controlled on per-package basis.
VBVoice allocates connections to the Nuance engine (the recognition sessions) at system startup time. Depending on the Nuance packages available, the sessions are recognition-only or recognition-and-verification. The sessions are grouped into two pools and than shared between all VBVoice channels. Since there may be more channels than sessions, the sessions are allowed to "float" between channels: they are "grabbed" and "released" by a channel as needed.
When processing a call, a recognition session may be grabbed when the call is first answered or when it's first time required by a VBVoice control. Similarly, a session may be release by a VBVoice control or can be kept until the call goes on hook. However, once released, a session may not be allocated again for the same call.
In summary, the following VBVoice parameter are involved in speech verification:
-
Use Speaker Verification in Line Setup tab of the LineGroup property page - turns the special initialization for speaker verification ON/OFF. If this setting is OFF, VBVoice will allocate all sessions with Nuance as recognition-only.
-
ASR in the Startup tab of vbvFrame property page - turns all speech recognition functionality ON/OFF. This is a convenience function used in debugging, which allows to skip the lengthy initialization sequence of the ASR engines
-
ASR Engines and Nuance Packages on the ASR tab of the of vbvFrame property page - select the type of recognition engine (for speaker verification has to be Nuance) and the packages to be used. VBVoice automatically checks which packages are capable of speaker verification and tries to initialize appropriate number of sessions as defined by parameters below.
-
NumberOfEngines in the [ASR] section of the VBVOICE.INI file (or the application specific configuration file). This parameter defines the total number of sessions (verification-capable and recognition-only) required for the application.
-
NumberOfVerifyEngines in the [ASR] section of the VBVOICE.INI file or application specific configuration file. This parameter defines how many of the sessions have to be capable of speaker verification. The NumberOfVerifyEngines may be less than the total NumberOfEngines. For some applications, that don't require verification on all channels, this may mean substantial cost savings on verification licenses.
-
Release Engine On Exit in the VerifySpeaker control allows to release a session before the call ends, which makes it immediately available for other channels. However, another session may not be grabbed for the same call.
-
AllocEnginePerCall in the [ASR] section of the VBVOICE.INI instructs VBVoice to grab a recognition session when a call if first answered. Otherwise, a session would be grabbed when first needed by a VBVoice control.
Input Nodes
This control has two input node:
Verify Entry |
Starts a verification session. |
Training Entry |
Starts a training session. |
Because these inputs must be hard-wired to other controls in the application, developers may choose between two design approaches: to use two controls, one to be used for training and one for verification, or to use one control with a condition control (e.g. INI switch, Get Digit) upon which a decision could be taken to enter the corresponding session.
Exit Nodes
VerifySpeaker exits via the corresponding node one of the following three cases:
DTMF |
When the speaker press any DTMF button, the current training or verification session is immediately terminated. |
Accept |
Training: The required number of valid training utterances has occurred. The session is saved in the verification database. Verification: The verification engine has heard enough utterances to positively match the caller to the current speaker model. |
Reject |
Training: The maximum number of retries has been exceeded without the required number of valid utterances occurring. Verification: The verification engine has heard enough utterances to reject a match between the caller and the current speaker model. |
Greetings
This control plays three different greetings which could be set at design-time or altered at run-time:
Entry Greeting |
Played on entering the control and before recognition. |
Silence Greeting |
Played when silence is detected while recognition is occurring. |
Invalid Greeting |
Played to inform the person that the system could not recognize his utterance. In this scenario a limited number of retries is offered before giving up and exiting via the Reject exit node. |
VerifySpeaker Control Example
Use of this control is shown in the example VerifySpeaker.
Initial Setup Properties
BeepBeforeSpeech
(Boolean)
If set, the control will play a beep before each recognition attempt. The beep is generated from the BEEP.WAV file and can be modified as required.
ClearDigits
(Boolean)
Clears the digit buffer before the entry greeting is played.
DBAuth
(String)
This property contains the username and password needed to connect to a relational database. The string should be in the username:password format.
DBFormat
(String)
This property contains a string identifying the data types for the database provider. Your database provider should support variable length binary data that can be fetched and written piece by piece. For example, Microsoft SQL Server 7.0 supports IMAGE data types. If you do not specify this option, the data type LONG RAW is used by default, which may not be the right data type for your database provider.
DBName
(String)
This property contains the name of the database, either file system based or Oracle database.
DBProvider
(String)
This property contains a string identifying the database provider. The only supported values are fs for file system and oci for Oracle.
DBRoot
(String)
This property contains a string identifying the database root directory. Used for file system only.
DBServer
(String)
This property contains a string identifying the database alias used to connect to database via network. Used for Oracle only.
DisableHelp
(Boolean)
Set to TRUE to disable the help digit handler. If not set (default), then if a help digit is detected (as defined in the LineGroup control), the call transfers to either the control set in the Connections property page or the LineGroup help digit output. See Help Digit. This property can be set in the Terminations page.
GrammarName
(String)
See GrammarName.
GrammarFile
(String)
This property contains the name of the grammar text file. It is only used at design time to pull in and validate the voice commands. At runtime, this property is ignored.
IBargeIn
(Boolean)
Enables barge-in, which allows the caller to begin speaking while a greeting is being played.
IDoNBest
(Boolean)
(Supported by Nuance only)
When set to TRUE, it enables the N-Best processing, i.e. the recognition engine generates a set of possible recognition results instead of only the best single solution. This capability is offered by the Nuance Systems N-best recognition processing method, which provides a list of possible recognition results, ranked from highest to lowest likelihood.
IMaxKeys
(Integer)
Sets the maximum number of digits to be received before digit collection is terminated. You can override it for a call using the MaxKeys property, which is set in the Terminations page.
IMaxSil
(Integer)
The maximum amount of silence before recognition is terminated.
IMaxTime
(Integer)
The maximum time in seconds for a speech.
INumNBest
Integer
(Supported by Nuance only)
Controls how many N-Best results can be generated.
IRecordDirectory
(String)
The directory where valid utterances are to be saved. This setting is used when the developer chooses to record the valid utterances (see IRecordUtterance.).
IRecordFilename
(String)
The name of the file containing the valid utterances. This setting is used when IRecordUtterance is checked.
IRecordUtterance
(Boolean)
Enables/disables the recording of valid utterances. This setting allows the application developer to choose to record the valid utterances as they are heard from the caller. This could be used for legal issues as a proof that the caller made the call.
IReleaseEngineOnExit
(Boolean)
When set to TRUE, Nuance engine is released and could be used by another VoiceRec control on another channel or for a new call. If the engine is released, all subsequent voice recognition and speaker verification requests during the current call will fail.
IRequiredPhrase
(String)
The phrase that is accepted as a valid utterance from the speaker. If this property is the empty string (), any speech that is not a silence or noise is considered valid. Use the Nuance grammar format. For example, to require the caller to say the digits 123, set RequiredPhrase to one two three, or to <num 123>.
ISpeakerModel
(String)
A key to the caller's speech parameters in the verification database. This key is application-defined. Typically it is a phone number or PIN that is unique to the caller.
ITermDtmf
(Integer)
A digit mask using a combination of any vbvDigitMaskConstants ordered together. When any digit included in this property is received, speaker verification is terminated.
MaskLogDigit
Boolean
Sets the collected digits to not visible in the VBVLog; a Protected is logged instead. This feature is intended to be used when collecting security sensitive information such as passwords and bank account numbers.
MaxRetries
(Integer)
The maximum number of error retries before the error handler is invoked. See also RetryOnSilence. This property can be set in the Terminations property page.
NumRetries
(Integer)
Training: This value plus NumValidTrainingUttrRequired defines the maximum number of utterances asked for. For example, if NumValidTrainingUttrRequired is 5 and NumRetries is 3 then up to 8 utterances will be asked for. The control stops as soon as 5 valid utterances are heard.
Verification: The maximum number of utterances asked for. The control will stop sooner if a positive or negative match against the speaker model is determined by the verification engine.
NumValidTrainingUttrRequired
(Integer)
Training: The number of utterances that must be detected successfully in a Training Session in order for the training session to be considered successful.
Verification: Ignored.
RetryOnSilence
(Boolean)
Decides if silence is considered a speech error or not. When it is TRUE the control repeats on detection of silence up to MaxRetries. A FALSE value makes an error on silence detection a terminating condition.
UseDefaultError
(Boolean)
If UseDefaultError is set to False, two additional outputs are added to the control: Invalid and Silence.
These outputs can be used to override the normal error handling for these conditions. When one of these conditions occurs, the call will move to the control connected to that output. This performs an equivalent function to the Invalid Digit and Silence handlers set in the Connections property page (NoDigitErrorControl and InvalidErrorControl properties), but also provides visual representation on the form. This property can be set in the Terminations property page.
Runtime Properties
DoNBest
(Channel as Integer)
(Supported by Nuance only)
Enables or disables N-Best processing at runtime.
GotoNode
See GotoNode.
Grammar
See Grammar.
NuanceGrammar
(Channel as Integer)String
(Supported by Nuance only)
Set the grammar to be used for recognition. The default value is the GrammarName property.
NumNBest
(Channel as Integer)Integer
(Supported by Nuance only)
Specifies number of N-Best results generated.
NumValidUtterances
(Channel as Integer)Integer
Read-only. Returns the number of successfully detected utterances during the current session.
RequiredPhrase
(Channel as Integer)String
Sets and gets the required phrase. See IRequiredPhrase. An application asks for different phrases on each utterance by changing this property in the ValidUtterance and InvalidUtterance events.
RecordUtterance
(Channel as Integer)Integer
(Supported by Nuance only)
This property enables or disables the recording of recognized utterances.
SpeakerModel
(Channel as Integer)String
Sets and gets the speaker model that will be used in the current session. It must be set before the session begins.
TermDtmf
(Channel as Integer)String
The digit that will be used to allow exit of the control while maintaining position in the queue.
Words
(Channel as Integer) String
(Supported by Nuance only)
This property contains the words recognized by the control. The property contains each recognized word separated by spaces. The format of the recognition results is discussed below.
The recognition result follows after the ***RECOGNIZED: token, as a string delimited by quotes followed by the confidence score as (Conf= &). If there are several recognition results the tokens will be ***RECOGNIZED00:, ***RECOGNIZED01: and so on.
The natural language recognition result follows after the ***INTERPRETATION: token, as a string. If there are several recognition results, the tokens will be ***INTERPRETATION00:, *** INTERPRETATION 01: and so on.
EXAMPLE |
***RECOGNIZED: "two seven one eight nine" (Conf=74 )***INTERPRETATION: {<digits (2 7 1 8 9)>} |
If the control exits via Dtmf node, Words will contain the digit which terminated the recognition.
(Slot-based confidence scoring)
The Nuance System is capable of generating confidence scores on a per-slot basis within a recognition result. This allows you to more closely analyze recognition results and handle any necessary error checking or re-prompting more naturally and efficiently. Slot-based confidence scoring lets you more closely identify the portions of a phrase that were likely to be accurately (or inaccurately) recognized.
To enable slot-base-confidence scoring, you must set rec.GenSlotConfidence=TRUE parameter in your RecServer initialization string.
EXAMPLE |
recserver -package d:\nuance\(...)\banking1 rec.GenSlotConfidence=TRUE rm.Addresses=localhost lm.Addresses= localhost |
Or you can add rec.GenSlotConfidence=TRUE to your Nuance-Resources site.
If you do not set rec.GenSlotConfidence=TRUE in either your Nuance-Resource.site file or RecServer process, no slot-based-confidence results will be generated.
Of course, you must design your grammars so that each slot is filled by a single subgrammar. Look at the banking1.grammar from the Nuance sample-package grammar for an example on how to design your grammars so that each slot is filled by a single subgrammar.
Once slot-based-confidence is enabled, a typical recognition result for "transfer five hundred dollars from my checking to my savings" may look like this:
***RECOGNIZED: "transfer five hundred dollars from my checking to my savings" (Conf=77 )***INTERPRETATION:{<amount 500> <command-type transfer> <destination-account savings> <source-account checking>}***SLOTCONFIDENCE:{<amount 79> <command-type 81> <destination-account 65> <source-account 79>}
In this particular case, the overall confidence is 77.
While the slot-based-confidence for the slot amount being "500" is a score of 79, the slot command-type being "transfer" is a score of 81, the slot destination-account being "savings" is a score of 65, and the slot source-account being "checking" has a score of 79.
If Do N-Best property is selected, the recognition result for "withdraw five hundred dollars from my checking account", may look like this:
***RECOGNIZED 0: "withdraw five hundred dollars from my checking account" (Conf=64 )***INTERPRETATION 0:{<amount 500> <command-type withdraw> <source-account checking>}***SLOTCONFIDENCE 0:{<amount 66> <command-type 73> <source-account 63>}***RECOGNIZED 1: "withdraw five hundred dollars to my checking account" (Conf=64 )***INTERPRETATION 1:{<amount 500> <command-type withdraw> <destination-account checking>}***SLOTCONFIDENCE 1:{<amount 66> <command-type 73> <destination-account 56>}
Greetings
EntryGreeting
This greeting is played on entering the control and before each recognition.
SilenceGreeting
This is played if silence is detected upon when the caller is expected to speak.
UnrecognizedGreeting
This is played when Nuance engine detects a speech that is not recognized, such as noise.
Methods
DeleteSpeakerModel
(Integer, String)
Deletes the specified speaker model from the verification database. This method is not required for training or verification sessions. It is useful if the application wishes to un-enroll a caller from speaker verification.
SpeakerModelExists
(Integer, String)Boolean
Checks for the existence of the specified speaker model in the verification database. It returns TRUE if the speaker model exists.
TakeCall
This method allows the programmer to override the graphical connections and transfer a call to any other control. See TakeCall.
Events
Disconnect
See Disconnect Event.
Enter
This event is fired upon entering the control from the Verification entry only.
Exit
See Exit Event.
InvalidUtterance
(Integer, vbvSpkrVerAction)
Fired every time Nuance engine fails to recognize a non-silence period. Action is a returned command with one of the following accepted values:
-
vbvSpkrVerAbort: Tells VBVoice to abort Training or Verification.
-
vbvSpkrVerContinue: (Default) Continues as usual.
-
vbvSpkrVerStopAndSave: Stops the operation and save.
NoLicenseAvailable
See NoLicenseAvailable.
PhraseError
See PhraseError.
PlayRequest
See PlayRequest.
ValidUtterance
(Integer, String, vbvSpkrVerAction, Integer)
Fired on each successfully detected utterance by the Nuance engine. Utterance is the current detected utterance. To alter the behaviour of the control Action can be set by the event code to one of the following values:
-
vbvSpkrVerAbort: Abort the current session (and exit out the Reject exit node).
-
vbvSpkrVerContinue: (Default) Continue as usual.
-
vbvSpkrVerStopAndSave:
-
Training: Stop the training session and consider it successful. The verification database is updated.
-
Verification: Stop the verification session and consider it a positive verification.
To have the verification engine ignore this utterance, set Ignore in the event code to TRUE. The utterance will be ignored and not used as training or verification data.
VoiceError
See VoiceError Event.
VerifySpeaker Setup Property Page
Beep before speech
(BeepBeforeSpeech property)
If this box is checked, a beep will be played prompting the caller to speak either for Verification or Training.
Grammar To Load
(GrammarName property)
This field provides the name of the grammar to load. If no name is set, the NuanceGrammar property must be set by code at runtime.
Speaker Model
(ISpeakerModel property)
This field contains the name of the property of a control that provides the speaker model to be verified in the current session.
Required Phrase
(IRquiredPhrase property)
This field contains the phrase that is expected to be vocalized (e.g. one two three). An empty string means that any word can be valid.
Required Number of Training Utterances
(NumValidTrainingUttrRequired property)
The number of valid utterances required for a Training session to be considered successful.
Maximum Number of Retries
(NumRetries property)
The number of retries on invalid utterances.
VerifySpeaker Terminations Property Page
Use default error handler
(UseDefaultError property)
This check box is set by default. When the maximum retries for invalid digits or retries have been exceeded, the system will check for an error handling control or a connection on the LineGroup error output. If these conditions are not true, the ERROR.WAV file is played and the system hangs up. If this check box is not set, two new outputs appear on the control: Invalid and Timeout. These outputs can be connected to other controls to override the default error handler.
See Global Events.
Retry on silence
(RetryOnSilence property)
If this box is checked, both silence timeout and unrecognized utterances are retried up to MaxRetries. If this box is unchecked, then a silence timeout will terminate the session.
Clear digits on entry
(ClearDigits property)
Set this box if you want to clear all previously collected digits from the VBVoice digit buffer. Because this control exits on any DTMF, if the control is entered with a DTMF in the buffer and this box was unchecked, it exits immediately.
Termination Conditions
Maximum silence
(IMaxSil property)
This field specifies the number of seconds that VerifySpeaker will wait for an utterance before it considers it as silence.
Number of retries on error
(RetryOnSilence property)
This field specifies the number of unrecognized utterances or silence errors that can occur before VerifySpeaker passes the call to the Reject node or invokes the default error handler.
Maximum time for speech
This field specifies the maximum time for an utterance. After this time the control will stop listening and attempt to analyze the speech heard up to this point. Zero means there is no maximum time. This setting matches the property IMaxTime.
VerifySpeaker Nuance Property Page
Barge-in
(IBargeIn property)
If checked, the prompts will be stopped when the speech starts. This checkbox sets the IBargeIn parameter.
Release engine on exit
(ReleaseEngineOnExit property)
If checked, it will release Nuance verification engine on exit from the control. It is recommended to release the engine when leaving the last control that uses Nuance and no more Nuance operations are needed during the call. For best performance, the engine should be allocated for the duration of the call (AllocEnginePerCall=1 in vbvoice.ini, Nuance section) and released when no more verification is needed.
IMPORTANT:: If a recognition operation will follow the VerifySpeaker control using Nuance, this box should be unchecked, only the last control that uses a Nuance engine in a call could be set to release the engine.
Record recognized utterances
(IRecordUtterance property)
If checked, it will enable the recording of valid utterances.
DoNBest
(IDoNBest property)
If checked, it enables N-Best processing. Nuance Systems offers N-Best recognition processing method, which provides a set of possible recognition results, ranked from highest to lowest likelihood.
Number of results
(INumNBest property)
If DoNBest property is checked, this field sets the number of N-Best results to be returned.
Recording directory
IRecordDirectory property
This field sets the directory where the valid utterances are to be saved.
Recording filename
IRecordFilename property
This field sets the name of the file containing the valid utterances.
VerifySpeaker DB Setup Property Page
DB Provider
(DBProvider property)
This property contains a string identifying the database provider. The supported values are:
-
fs for file system
-
oci for Oracle
-
odbc for Microsoft SQL Server 7.0
DB Name
(DBName property)
This field contains the name of the database, either file system based or relational database used for the Nuance Dynamic Grammar.
DB Server
(DBServer property)
This field contains the name identifying the database alias used to connect to database via network. Used for relational database ODBC or Oracle only.
DB Auth
(DBAuth property)
This field contains the name identifying the username and password needed to connect to a relational database. The string should be in username:password format.
DB Format
(DBFormat property)
This property contains a string identifying the data types for the database provider. Your database provider should support variable length binary data that can be fetched and written piece by piece.
For example, Microsoft SQL Server 7.0 supports IMAGE data types. If you do not specify this option, the data type LONG RAW is used by default, which may not be the right data type for your database provider.
DB Root
(DBRoot property)
This field contains the name identifying the database root directory. Used for File System only.