VV Simulator Manual: Speech Recognition

Speech Recognition

Setup - Windows Vista/7/8

Disable Global Speech Recognition

Speech Recognition is a feature of Microsoft Windows Vista and Windows 7 and it can be used to control many software packages. Because the VV Simulator allows only specific instructions to be recognised, it is important that the global dictionary be turned off.

The Vista/7/8 Speech Interface Panel appears on the screen like this if the system is selected ON.

Windows 7 Speech Interface Panel Speech Interface Panel

If it is not in view, open it in Vista from Start > Control Panel > Classic View > Speech Recognition Options > Start Speech Recognition.

Or, in Windows 7/8 from: Control Panel > Ease of Access > Speech Recognition > Start Speech Recognition.

Windows 7 Speech Control Panel

Right-click on the Speech Interface Panel to open this window and select “Off: Do not listen to anything I say”:

Windows 7 Speech Selection

Be sure you have turned the system off, as shown below:

Windows 7 Speech Off

The Speech Recognition will now be selectively turned on when the Press-to-talk button is pressed in the VV Simulator, but remain off at all other times. If speech is required for other purposes on this computer the above procedure will need to be reversed.

Back To Top

Set up a Speech Profile for a new User

Open Speech Properties:

Vista: Start > Control Panel > Classic View > Speech Recognition Options > Advanced Speech Options.

Windows 7/8: Control Panel > Ease of Access > Speech Recognition > Advanced speech options

Speech Properties

Check that the language selected is as shown below: Microsoft Speech Recognizer 8.0 for Windows (English US or UK as appropriate). If that version is not available in the drop-down menu select one of the other English engines.

Language Selection

Click New… to add yourself as a New User of the Speech Recognition program. This window opens:

New Profile

Type your name in the Add a profile window without spaces between names. This name should match your Visual Vectoring user name (). Clicking OK will close the window.

Observe the Microphone dialog. Ensure the headset you will be using in simulation is plugged into the computer and carry out the Configure Microphone procedure; it is very brief. If no audio is reaching the system investigate the situation under the Audio Input dialog:

Audio Input

The Configure Microphone procedure can be performed now. You will be asked to read a short sentence:

Microphone Setup

The system uses this sentence to set the volume level it receives from the microphone, so you should have the microphone positioned as you will be using it for simulation. It should be below and to one side of your mouth and about 30mm from it. Do not have it directly in front of your mouth where you would be breathing directly into it. Careful attention to microphone positioning can greatly assist recognition performance.

Your name should now appear in the Recognition Profiles list and be checked as the current user:

Recognition Profile Current User

Clicking Apply makes you the user of the system, and you will be assigned the default recognition profile. It is a matter now of training that profile for your voice; that need not be done until you first need to use the simulator.

Note that the Text To Speech tab allows configuration of the Speech Synthesis voices, which are used in the VV Simulator to provide aircraft replies. You should not need to refer to it.

Back To Top

Train the Speech Engine

This procedure can be accessed from the Speech Properties window (shown above) in Windows 7.

Train Profile

If you have not yet carried out the Configure Microphone procedure you should do that now.

In training the speech recognition, the following dialog will appear:

Voice Profile Training Introduction Screen

Click Next to commence your training. This window (or similar) will open:

Voice Training Page 1

The training script is quite lengthy, so you may need to pause at times. It is advisable to repeat the training, particularly if English is not your native language. Your Speech Profile is refined with each re-read of the script, not replaced with a new one. So, the more the system is trained and used the better the recognition performance will be.

If the training has proceeded satisfactorily, you are ready to use the VV Simulator. Do not expect perfect results at first; it takes time to learn what to say and how to say it.

In simulation, be aware that the Speech Recognition is not programmed to understand general English. It will recognise only the groups of words it has been programmed to interpret as an instruction to an aircraft. So, you MUST SAY THE CORRECT WORDS or you will not be understood. You will be taught those words and phrases in the lessons.

Also, be aware that a reply of ‘Say again’ from the simulator is not necessarily an indication that you have not been understood. It is often a query of your instruction to the aircraft - the simulator has detected an irregularity in it. In this way, it acts like a human pilot. A failure to recognise an instruction is indicated by its failure to appear in the Phrase Recognition Display at the top of the radar screen.

Back To Top

Setup - Windows XP

Add a New User

Open the Speech Properties User Interface (UI) (see below). In Windows XP this is at Start>Control
Panel>Speech.

XP Speech Properties UI

Check that the language selected is as shown: Microsoft English Recognizer v5.1. If that version is not
available in the drop-down menu select one of the other English engines.

Click New to add yourself as a new user of the Speech Recognition program. This window will open:

XP Profile Wizard

Type your name in the Profile window being careful not to put a space before it. This is very easy to do if the cursor is placed in the window prior to typing.

Commence typing direct into the window; the space for the first letter is highlighted. Please note that this name will be recorded as part of the scoring system. There is no necessity to describe your environment or microphone type.

You can click Next and carry out the speech training now if you wish. However, it is probably best to record your name into the system and first calibrate the microphone.

Clicking Finish will record your name as a user of the system. You will be assigned the default speech profile and you will be returned to the UI window:

XP Speech UI User Added

Your name should be highlighted and checked as the current user. Clicking Apply completes the establishment of your User Profile, even though the Speech Engine has not yet been trained to recognise your voice. The default settings will be loaded as your profile until you complete the training.

The Settings button will open a window giving access to adjustments to the operation of the Speech Recognition program. The default settings should not need to be altered, but feel free to experiment if you continue to experience poor performance after reasonable usage.

The Text To Speech tab allows configuration of the Speech Synthesis options. You should not need to refer to it.

Before leaving this window, plug your headset microphone into the computer and see that there is activity in the Microphone Level window as you speak. This shows that the microphone is operating. If no activity is observed, click Audio Input to investigate the input source. Be sure it is the headset microphone that is providing the audio; laptop computers have a built-in microphone that will not be satisfactory for simulation. Training the Speech Engine will not be possible if it is not receiving suitable quality audio from the microphone. Now proceed to microphone calibration.

Back To Top

Calibrate the Microphone

Clicking Configure Microphone in the Speech Properties UI will open the dialog window below.

XP Microphone Calibration

Click Next. You will be given a phrase to say that will allow the system to adjust the volume setting on the microphone.

Back To Top

Train the Speech Engine

Clicking Train Profile in the Speech Properties UI will open the dialog below:

XP Training UI

You will be presented with a script that may be an excerpt from a novel, or some specialised text related to Air Traffic Control.

As you read the numbers they will be inverse-highlighted like this. As the end of each line or paragraph is reached the next will appear. Overall progress is shown in the Training Progress section of the window.

if you reach the end of the script and the words are not being recognised reliably, repeat the training. If you have difficulty with individual words use the Skip Word facility and move on. Re-doing the training refines your speech profile; it does not create a new one. So, the more the system is trained and used the more reliable it becomes.

The training script is quite lengthy, so you may need to pause at times. It is advisable to repeat the training, particularly if English is not your native language. Your Speech Profile is refined with each re-read of the script, not replaced with a new one. So, the more the system is trained and used the better the recognition performance will be.

If the training has proceeded satisfactorily, you are ready to use the VV Simulator. Do not expect perfect results at first; it takes time to learn what to say and how to say it.

In simulation, be aware that the Speech Recognition is not programmed to understand general English. It will recognise only the groups of words it has been programmed to interpret as an instruction to an aircraft. So, you MUST SAY THE CORRECT WORDS or you will not be understood. You will be taught those words and phrases in the lessons.

Also, be aware that a reply of ‘Say again’ from the simulator is not necessarily an indication that you have not been understood. It is often a query of your instruction to the aircraft - the simulator has detected an irregularity in it. In this way, it acts like a human pilot. A failure to recognise an instruction is indicated by its failure to appear in the Phrase Recognition Display at the top of the radar screen.

Back To Top

General Tips for Speech Recognition

Delivery

  • Speech Recognition Training is often more a process of training yourself than it is one of training the system. Learn to speak clearly and steadily. Be patient and develop discipline in the way you speak into the microphone. This is in fact very good training for real-life ATC where accuracy of instruction delivery to pilots is very important.
  • Learn to be aware of the Phrase Window (top right of Radar screen) in your peripheral vision during simulation. There is no need actually to read what has been recognised. If the aircraft has misunderstood an instruction it should become evident to you when it reads it back incorrectly (this is also very good training in monitoring pilot readbacks for correctness). The appearance of white text in the window is sufficient to tell you that you have been recognised; non-appearance of white text tells you you have not been understood and that you should repeat the instruction. This monitoring of the Speech Recognition program becomes subconscious after some practice.
  • Eliminate pauses and extraneous sounds such as ‘umm’ and ‘ahh’.The aim is to achieve steady, controlled delivery.
  • Corrections cannot be made mid-transmission. If you make a mistake or stumble in delivering an instruction release the press-to-talk button and start again.
  • Always think about what you are going to say before pressing the button.
  • Do not run words together; say each clearly. This is particularly important when a word with a ‘soft’ vowel sound at its end is followed by one that begins with a soft sound. For example: “Singapore two two zero Approach ….” It is very easy to run the ‘zero’ into ‘Approach’ to make ‘zeroapproach’, which the Speech Recognition will attempt to interpret as one word. Also watch phrases like ‘Descend to four thousand’ - don’t say ‘Descen’t’fourthousand’
  • Be very careful to say the aircraft callsign clearly at the start of each transmission because this is what the program is listening for when the press-to-talk is pressed.
  • Back To Top

    Audio

  • Position the microphone carefully – to the side of your mouth - so you are not breathing directly into it.
  • Eliminate ambient noise - find a quiet place to work. You will spend many hours using the simulator and the intensity of the training requires a favourable learning environment.
  • Some modern laptop computers have sophisticated sound cards for entertainment applications, and this can affect Speech Recognition performance in simulation. The problem causes a small delay between when the system is activated (Press-to-talk pressed) and when it becomes fully operational. The delay is typically smaller than one second, but in Radar Control that can be significant. Using a USB microphone (or a USB adaptor) can assist greatly. Desktop computers rarely display the problem. The simple solution is to be careful to pause very slightly after pressing the Press-to-talk button and actually beginning to speak.
  • Back To Top

    Speech Settings

    Speech Synthesis

    The VV Simulator makes use of the Windows Speech Synthesis module for generation of aircraft and controller voices. The voice used for each aircraft is randomly generated from suitable voices available on the system.

    Speech synthesis may be disabled in the Speech Tab of the Setup Window.

    Speech Synthesis Rate and Volume

    The rate and volume of synthesised speech is also adjustable there. Click DEF to return to default values.

    Back To Top

    PTT Clear Lockups

    Some soundcards occasionally exhibit an issue whereby the speech synthesis module causes a lockup of the speech recognition module. When this occurs, the Radio Call Light will remain yellow, even when the aircraft has finished speaking. Subsequently the speech recognition will appear not to work.

    Selecting this option in the Speech Tab of the Setup Window will mean that a press of the PTT (Press-to-talk) button will clear the lockup. For most systems, there is no need to have this selected.

    Back To Top

    PTT Mute Radio

    Selecting PTT Mute Radio in the Speech Tab of the Setup Window will mean that any incoming radio call will be muted when the PTT (Press to Talk) button is depressed.

    This mimics most operational environments where only one party can be speaking on the radio at any time.

    Back To Top

    Auto Radio Off

    The Auto Radio off function (located on the Speech Tab of the Setup Window) will automatically turn off speech recognition a short period after the Press to Talk button is released. Enabling this function removes a processing overhead from the computer, sometimes resulting in a better performance.

    For most computers where speech recognition accuracy is high, this function will not be of importance. As a default, this function should be left enabled.

    Back To Top

    Radio Always On

    When this feature is selected in the Speech Tab of the Setup Window , it means that, for speech recognition purposes, the computer is always listening for transmissions.

    This is not an ideal mode of operation, but may be necessary when speech performance is otherwise unacceptable, particularly when lag time is high.

    Back To Top

    Emulate SR

    When the Speech Recognition module analyses a spoken phrase, it forms a series of hypotheses with associated probabilities of correctness for what was spoken. At completion of the process, no hypothesis may be acceptable, so a negative result is returned.

    It has been found that, above a certain level, that the hypothesis with the highest probability is actually the correct phrase. If Emulate SR is selected, this hypothesis is selected as the phrase and the aircraft will respond accordingly. This can improve speech recognition performance markedly.

    For some soundcards, emulated recognitions cause a lockup in the computer. This is evidenced by the whole program stalling, sometimes for up to 30 seconds. If this event is experienced, Emulate SR should be disabled in the Speech Tab of the Setup Window .

    Back To Top

    Audio Source

    The Audio Source dropdown in the Speech Tab of the Setup Window enables the source of the speech recognition and synthesis streams. For some systems this selector will be disabled. See the section on setting up for information in how to do this.

    Back To Top

    Non-standard Utterances

    Settable in the Speech Tab of the Setup Window, this will result in aircraft transmissions including random utterances such as "ok" and "roger".

    Whilst this adds a degree of realism to the simulation, most training is generally completed with it disabled.

    Back To Top

    Troubleshooting Speech

    Step 1. Is Speech Recognition correctly enabled?

    In typical installations, the VV ATCX™ simulator generally runs at a level of in excess of 98% accuracy for Speech Recognition. If you are achieving much less than this rate, try the following:

    If you are achieving no level of recognition at all, it is possible that the Speech Recognition has not been enabled, has not installed correctly, or that the microphone has not been correctly connected to the computer. That should have become evident in the set-up process. If there is a problem with Speech, and you are not able to train the speech engine as described in that section, it may be necessary to reinstall the VV Simulator. Contact us for assistance.

    For Microsoft Vista and Windows 7/8, some special problems may occur. Because Speech Recognition is included as part of the operating system, some users may have the feature turned on for communication with other software. It is important that Speech Recognition be disabled prior to starting the VV Simulator. Review the section on disabling global speech recognition to ensure that has been done.

    Back To Top

    Step 2. Determine what phrases are not being understood

    If the Speech Engine has been trained correctly, and you are getting inconsistent results consider the words you are actually saying. The high level of accuracy achievable by the VV Simulator is due to limiting the allowable phrases to those that are absolutely necessary. These phrases are listed in Appendix A. Any other phrase will not be understood – or, worse, may be misunderstood for something else.

    Experienced controllers in particular often have problems because they lapse into phraseology that they use every day at work.

    Be aware that the simulator is not programmed to recognise general English, but only the standard phrases taught by the course.

    Back To Top

    Step 3. Determine what the Speech Engine is hearing

    It is important for good accuracy that the Speech Engine hears what you are saying without background noise and electrical interference.

    A good way to determine the clarity of your transmissions is to record yourself and play back the transmission. he Sound Recorder program (usually under Start>Programs>Accessories>Entertainment) is good for this purpose.

    Remember to place the microphone below and to the side of your mouth.

    Ensure that there is no discernible background noise or buzz, and that your voice is clear and distinct. The elimination of electrical noise and the improvement of reception may be achieved by the use of a noise-reduction microphone. Soundcards in laptop computers are commonly affected by higher levels of interference. Often a USB microphone will help in these cases.

    If you are using the correct phrases, and still experiencing poor recognition, consider slowing your rate of speech. Once again, this is something that involves some discipline, especially for rated controllers who are used to delivering instructions at high speed.

    Back To Top

    Step 4. Retrain the Speech Engine

    If slowing your rate of speech makes no difference, and the recording of your voice is clear and distinct, try spending some time training the engine. A half an hour’s extra training should result in discernible improvement.

    Back To Top

    Step 5. Use Radio Always On

    For some soundcards, there is a lag delay in activating the soundcard for speech recognition. This effectively means that the start of the transmission is not heard, and recognition performance is degraded.

    Information on this lag is available in the section on the Setup Window.

    If the average lag exceeds about 150 milliseconds, performance should improve by selecting the Radio Always On feature in the Speech tab of the Setup Window.

    The use of a USB microphone or USB adaptor for a standard audio microphone can result in significant reduction in lag time and improved recognition.

    Back To Top

    Understood Phrases

    General Format

    The Speech Engine does not understand ‘general English’. It is programmed to understand only the precise control phrases needed to perform the VV exercises contained within the course for which the simulator is being used. So, the correct instructions must be used at all times; these are taught in the instruction modules.

    The phrase structures are in the following format:

  • (callsign) should be replaced with the appropriate callsign, spoken phonetically (eg “CATHAY ONE HUNDRED”)
  • (level) should be replaced with the appropriate level, again spoken phonetically (eg “NINER THOUSAND", "FLIGHT LEVEL ONE EIGHT ZERO", "TWO THOUSAND THREE HUNDRED METRES")
  • (heading) is the three digit magnetic heading (e.g. ZERO EIGHT ZERO, TWO SIX FIVE)
  • Where two or more phrases are separated by a stroke, either one or the other might be used. For example: RIGHT/LEFT
  • Back To Top

    Level Phrases

    Phrase Structure

    Examples

    (callsign) CLIMB/DESCEND [TO] (level)

    “Cathay One Hundred, descend to flight level three two zero”

    “X-ray Golf Delta, climb (to) niner thousand (feet)”

    (callsign) MAINTAIN (level)

    “American one three zero, maintain six thousand”

    Notes:

  • Applicable products: VV Approach, VV Enroute, VV CompSep
  • The word "TO" in recognition is optional, and depends on the jurisdiction.
  • The use of metric levels is location-specific.
  • The use of "flight level" for metric level is location-specific.
  • The use of the word "altitude" for non-metric levels is location-specific.
  • The use of the word "feet" for non-metric levels is location-specific.
  • The use of the above phrases in readbacks is settable in the Speech Tab of the Setup Window.
  • Back To Top

    Turn Phrases

    Phrase Structure

    Examples

    (callsign) TURN LEFT/RIGHT HEADING (heading)

    “Cathay One Hundred, turn right heading one seven zero”

    “Eastern One Two Three, turn left heading two four zero”

    (callsign) FLY HEADING (heading)

    “Etihad one seven one fly heading three zero zero”

    (callsign) STOP/CONTINUE TURN HEADING (heading)

    “Speedbird One Six, stop turn heading one eight zero”

    (callsign) TURN RIGHT/LEFT, I SAY AGAIN RIGHT/LEFT HEADING (heading)

    “Korean Eight One Seven, turn right, I say again right, heading zero six zero”

    Notes:

  • Applicable products: VV Approach, VV Enroute, VV CompSep
  • Headings ending in zero and five are accepted. This may be limited further if required.
  • Back To Top

    Compromised Separation Phrases

    Phrase Structure

    Examples

    (callsign) AVOIDING ACTION TURN LEFT/RIGHT IMMEDIATELY

    “Air Canada Six Seven Two, avoiding action, turn left immediately”

    (callsign) TRAFFIC (distance) MILES (clockposition) O'CLOCK (traffic disposition)

    “Mexicana seven eight two eight, traffic three miles, one o'clock, converging from the right”

    Notes:

  • Applicable products: VV CompSep
  • The phrasing of the avoidance instruction may be localized on request.
  • In the traffic statement, the distance and clock position may be reversed in order.
  • Many variants of the disposition element of the traffic statement will be accepted, including "SAME LEVEL", "OPPOSITE DIRECTION", CROSSING [(LEFT to RIGHT)/(RIGHT to LEFT)]
  • Prefix to the clock position of "LEFT" and "RIGHT", which is used in some jurisdictions is accepted.
  • Back To Top