Speech Recognition
Setup - Windows Vista/7/8
Disable Global Speech Recognition
Speech Recognition is a feature of Microsoft Windows Vista and Windows 7 and it can be used to control many software packages. Because the VV Simulator allows only specific instructions to be recognised, it is important that the global dictionary be turned off.
The Vista/7/8 Speech Interface Panel appears on the screen like this if the system is selected ON.
Speech Interface Panel
If it is not in view, open it in Vista from Start > Control Panel > Classic View > Speech Recognition Options > Start Speech Recognition.
Or, in Windows 7/8 from: Control Panel > Ease of Access > Speech Recognition > Start Speech Recognition.
Right-click on the Speech Interface Panel to open this window and select “Off: Do not listen to anything I say”:
Be sure you have turned the system off, as shown below:
The Speech Recognition will now be selectively turned on when the Press-to-talk button is pressed in the VV Simulator, but remain off at all other times. If speech is required for other purposes on this computer the above procedure will need to be reversed.
Set up a Speech Profile for a new User
Open Speech Properties:
Vista: Start > Control Panel > Classic View > Speech Recognition Options > Advanced Speech Options.
Windows 7/8: Control Panel > Ease of Access > Speech Recognition > Advanced speech options
Check that the language selected is as shown below: Microsoft Speech Recognizer 8.0 for Windows (English US or UK as appropriate). If that version is not available in the drop-down menu select one of the other English engines.
Click New… to add yourself as a New User of the Speech Recognition program. This window opens:
Type your name in the Add a profile window without spaces between names. This name should match your Visual Vectoring user name (). Clicking OK will close the window.
Observe the Microphone dialog. Ensure the headset you will be using in simulation is plugged into the computer and carry out the Configure Microphone procedure; it is very brief. If no audio is reaching the system investigate the situation under the Audio Input dialog:
The Configure Microphone procedure can be performed now. You will be asked to read a short sentence:
The system uses this sentence to set the volume level it receives from the microphone, so you should have the microphone positioned as you will be using it for simulation. It should be below and to one side of your mouth and about 30mm from it. Do not have it directly in front of your mouth where you would be breathing directly into it. Careful attention to microphone positioning can greatly assist recognition performance.
Your name should now appear in the Recognition Profiles list and be checked as the current user:
Clicking Apply makes you the user of the system, and you will be assigned the default recognition profile. It is a matter now of training that profile for your voice; that need not be done until you first need to use the simulator.
Note that the Text To Speech tab allows configuration of the Speech Synthesis voices, which are used in the VV Simulator to provide aircraft replies. You should not need to refer to it.
Train the Speech Engine
This procedure can be accessed from the Speech Properties window (shown above) in Windows 7.
If you have not yet carried out the Configure Microphone procedure you should do that now.
In training the speech recognition, the following dialog will appear:
Click Next to commence your training. This window (or similar) will open:
The training script is quite lengthy, so you may need to pause at times. It is advisable to repeat the training, particularly if English is not your native language. Your Speech Profile is refined with each re-read of the script, not replaced with a new one. So, the more the system is trained and used the better the recognition performance will be.
If the training has proceeded satisfactorily, you are ready to use the VV Simulator. Do not expect perfect results at first; it takes time to learn what to say and how to say it.
In simulation, be aware that the Speech Recognition is not programmed to understand general English. It will recognise only the groups of words it has been programmed to interpret as an instruction to an aircraft. So, you MUST SAY THE CORRECT WORDS or you will not be understood. You will be taught those words and phrases in the lessons.
Also, be aware that a reply of ‘Say again’ from the simulator is not necessarily an indication that you have not been understood. It is often a query of your instruction to the aircraft - the simulator has detected an irregularity in it. In this way, it acts like a human pilot. A failure to recognise an instruction is indicated by its failure to appear in the Phrase Recognition Display at the top of the radar screen.
Setup - Windows XP
Add a New User
Open the Speech Properties User Interface (UI) (see below). In Windows XP this is at Start>Control
Panel>Speech.
Check that the language selected is as shown: Microsoft English Recognizer v5.1. If that version is not
available in the drop-down menu select one of the other English engines.
Click New to add yourself as a new user of the Speech Recognition program. This window will open:
Type your name in the Profile window being careful not to put a space before it. This is very easy to do if the cursor is placed in the window prior to typing.
Commence typing direct into the window; the space for the first letter is highlighted. Please note that this name will be recorded as part of the scoring system. There is no necessity to describe your environment or microphone type.
You can click Next and carry out the speech training now if you wish. However, it is probably best to record your name into the system and first calibrate the microphone.
Clicking Finish will record your name as a user of the system. You will be assigned the default speech profile and you will be returned to the UI window:
Your name should be highlighted and checked as the current user. Clicking Apply completes the establishment of your User Profile, even though the Speech Engine has not yet been trained to recognise your voice. The default settings will be loaded as your profile until you complete the training.
The Settings button will open a window giving access to adjustments to the operation of the Speech Recognition program. The default settings should not need to be altered, but feel free to experiment if you continue to experience poor performance after reasonable usage.
The Text To Speech tab allows configuration of the Speech Synthesis options. You should not need to refer to it.
Before leaving this window, plug your headset microphone into the computer and see that there is activity in the Microphone Level window as you speak. This shows that the microphone is operating. If no activity is observed, click Audio Input to investigate the input source. Be sure it is the headset microphone that is providing the audio; laptop computers have a built-in microphone that will not be satisfactory for simulation. Training the Speech Engine will not be possible if it is not receiving suitable quality audio from the microphone. Now proceed to microphone calibration.
Calibrate the Microphone
Clicking Configure Microphone in the Speech Properties UI will open the dialog window below.
Click Next. You will be given a phrase to say that will allow the system to adjust the volume setting on the microphone.
Train the Speech Engine
Clicking Train Profile in the Speech Properties UI will open the dialog below:
You will be presented with a script that may be an excerpt from a novel, or some specialised text related to Air Traffic Control.
As you read the numbers they will be inverse-highlighted like this. As the end of each line or paragraph is reached the next will appear. Overall progress is shown in the Training Progress section of the window.
if you reach the end of the script and the words are not being recognised reliably, repeat the training. If you have difficulty with individual words use the Skip Word facility and move on. Re-doing the training refines your speech profile; it does not create a new one. So, the more the system is trained and used the more reliable it becomes.
The training script is quite lengthy, so you may need to pause at times. It is advisable to repeat the training, particularly if English is not your native language. Your Speech Profile is refined with each re-read of the script, not replaced with a new one. So, the more the system is trained and used the better the recognition performance will be.
If the training has proceeded satisfactorily, you are ready to use the VV Simulator. Do not expect perfect results at first; it takes time to learn what to say and how to say it.
In simulation, be aware that the Speech Recognition is not programmed to understand general English. It will recognise only the groups of words it has been programmed to interpret as an instruction to an aircraft. So, you MUST SAY THE CORRECT WORDS or you will not be understood. You will be taught those words and phrases in the lessons.
Also, be aware that a reply of ‘Say again’ from the simulator is not necessarily an indication that you have not been understood. It is often a query of your instruction to the aircraft - the simulator has detected an irregularity in it. In this way, it acts like a human pilot. A failure to recognise an instruction is indicated by its failure to appear in the Phrase Recognition Display at the top of the radar screen.
General Tips for Speech Recognition
Delivery
Audio
Speech Settings
Speech Synthesis
The VV Simulator makes use of the Windows Speech Synthesis module for generation of aircraft and controller voices. The voice used for each aircraft is randomly generated from suitable voices available on the system.
Speech synthesis may be disabled in the Speech Tab of the Setup Window.
The rate and volume of synthesised speech is also adjustable there. Click DEF to return to default values.
PTT Clear Lockups
Some soundcards occasionally exhibit an issue whereby the speech synthesis module causes a lockup of the speech recognition module. When this occurs, the Radio Call Light will remain yellow, even when the aircraft has finished speaking. Subsequently the speech recognition will appear not to work.
Selecting this option in the Speech Tab of the Setup Window will mean that a press of the PTT (Press-to-talk) button will clear the lockup. For most systems, there is no need to have this selected.
PTT Mute Radio
Selecting PTT Mute Radio in the Speech Tab of the Setup Window will mean that any incoming radio call will be muted when the PTT (Press to Talk) button is depressed.
This mimics most operational environments where only one party can be speaking on the radio at any time.
Auto Radio Off
The Auto Radio off function (located on the Speech Tab of the Setup Window) will automatically turn off speech recognition a short period after the Press to Talk button is released. Enabling this function removes a processing overhead from the computer, sometimes resulting in a better performance.
For most computers where speech recognition accuracy is high, this function will not be of importance. As a default, this function should be left enabled.
Radio Always On
When this feature is selected in the Speech Tab of the Setup Window , it means that, for speech recognition purposes, the computer is always listening for transmissions.
This is not an ideal mode of operation, but may be necessary when speech performance is otherwise unacceptable, particularly when lag time is high.
Emulate SR
When the Speech Recognition module analyses a spoken phrase, it forms a series of hypotheses with associated probabilities of correctness for what was spoken. At completion of the process, no hypothesis may be acceptable, so a negative result is returned.
It has been found that, above a certain level, that the hypothesis with the highest probability is actually the correct phrase. If Emulate SR is selected, this hypothesis is selected as the phrase and the aircraft will respond accordingly. This can improve speech recognition performance markedly.
For some soundcards, emulated recognitions cause a lockup in the computer. This is evidenced by the whole program stalling, sometimes for up to 30 seconds. If this event is experienced, Emulate SR should be disabled in the Speech Tab of the Setup Window .
Audio Source
The Audio Source dropdown in the Speech Tab of the Setup Window enables the source of the speech recognition and synthesis streams. For some systems this selector will be disabled. See the section on setting up for information in how to do this.
Non-standard Utterances
Settable in the Speech Tab of the Setup Window, this will result in aircraft transmissions including random utterances such as "ok" and "roger".
Whilst this adds a degree of realism to the simulation, most training is generally completed with it disabled.
Troubleshooting Speech
Step 1. Is Speech Recognition correctly enabled?
In typical installations, the VV ATCX™ simulator generally runs at a level of in excess of 98% accuracy for Speech Recognition. If you are achieving much less than this rate, try the following:
If you are achieving no level of recognition at all, it is possible that the Speech Recognition has not been enabled, has not installed correctly, or that the microphone has not been correctly connected to the computer. That should have become evident in the set-up process. If there is a problem with Speech, and you are not able to train the speech engine as described in that section, it may be necessary to reinstall the VV Simulator. Contact us for assistance.
For Microsoft Vista and Windows 7/8, some special problems may occur. Because Speech Recognition is included as part of the operating system, some users may have the feature turned on for communication with other software. It is important that Speech Recognition be disabled prior to starting the VV Simulator. Review the section on disabling global speech recognition to ensure that has been done.
Step 2. Determine what phrases are not being understood
If the Speech Engine has been trained correctly, and you are getting inconsistent results consider the words you are actually saying. The high level of accuracy achievable by the VV Simulator is due to limiting the allowable phrases to those that are absolutely necessary. These phrases are listed in Appendix A. Any other phrase will not be understood – or, worse, may be misunderstood for something else.
Experienced controllers in particular often have problems because they lapse into phraseology that they use every day at work.
Be aware that the simulator is not programmed to recognise general English, but only the standard phrases taught by the course.
Step 3. Determine what the Speech Engine is hearing
It is important for good accuracy that the Speech Engine hears what you are saying without background noise and electrical interference.
A good way to determine the clarity of your transmissions is to record yourself and play back the transmission. he Sound Recorder program (usually under Start>Programs>Accessories>Entertainment) is good for this purpose.
Remember to place the microphone below and to the side of your mouth.
Ensure that there is no discernible background noise or buzz, and that your voice is clear and distinct. The elimination of electrical noise and the improvement of reception may be achieved by the use of a noise-reduction microphone. Soundcards in laptop computers are commonly affected by higher levels of interference. Often a USB microphone will help in these cases.
If you are using the correct phrases, and still experiencing poor recognition, consider slowing your rate of speech. Once again, this is something that involves some discipline, especially for rated controllers who are used to delivering instructions at high speed.
Step 4. Retrain the Speech Engine
If slowing your rate of speech makes no difference, and the recording of your voice is clear and distinct, try spending some time training the engine. A half an hour’s extra training should result in discernible improvement.
Step 5. Use Radio Always On
For some soundcards, there is a lag delay in activating the soundcard for speech recognition. This effectively means that the start of the transmission is not heard, and recognition performance is degraded.
Information on this lag is available in the section on the Setup Window.
If the average lag exceeds about 150 milliseconds, performance should improve by selecting the Radio Always On feature in the Speech tab of the Setup Window.
The use of a USB microphone or USB adaptor for a standard audio microphone can result in significant reduction in lag time and improved recognition.
Understood Phrases
General Format
The Speech Engine does not understand ‘general English’. It is programmed to understand only the precise control phrases needed to perform the VV exercises contained within the course for which the simulator is being used. So, the correct instructions must be used at all times; these are taught in the instruction modules.
The phrase structures are in the following format:
Level Phrases
Phrase Structure |
Examples |
(callsign) CLIMB/DESCEND [TO] (level) |
“Cathay One Hundred, descend to flight level three two zero” “X-ray Golf Delta, climb (to) niner thousand (feet)” |
(callsign) MAINTAIN (level) |
“American one three zero, maintain six thousand” |
Notes:
Turn Phrases
Phrase Structure |
Examples |
“Cathay One Hundred, turn right heading one seven zero” “Eastern One Two Three, turn left heading two four zero” |
|
“Etihad one seven one fly heading three zero zero” |
|
“Speedbird One Six, stop turn heading one eight zero” |
|
(callsign) TURN RIGHT/LEFT, I SAY AGAIN RIGHT/LEFT HEADING (heading) |
“Korean Eight One Seven, turn right, I say again right, heading zero six zero” |
Notes:
Compromised Separation Phrases
Phrase Structure |
Examples |
(callsign) AVOIDING ACTION TURN LEFT/RIGHT IMMEDIATELY |
“Air Canada Six Seven Two, avoiding action, turn left immediately” |
(callsign) TRAFFIC (distance) MILES (clockposition) O'CLOCK (traffic disposition) |
“Mexicana seven eight two eight, traffic three miles, one o'clock, converging from the right” |
Notes: