Speech recognition technology requires practice

If you’re dictating directly into the computer and watching the words appear as you speak them, you may find the navigation frustrating

June 18, 2013 08:35 AM •

Using a handheld recorder like the Olympus DS-3500 eliminates the need for a computer alongside the person doing the dictating. It comfortably fits and can be operated by one hand.

The nice people at Olympus recently sent me one of their digital voice recorders to test and evaluate. While I was playing with it, it occurred to me that the use of speech recognition in computing hasn’t quite fulfilled its promise. It could be very good for police work, but it requires considerable acclimation.

Reasonably reliable speech recognition software has been around since at least the mid-90s, but it has never caught on the way many thought it would. Part of the reason is the “reasonably reliable” aspect. Out of the box, speech recognition software will be more than 90 percent accurate, which means that the word you meant will be the one that shows up on the screen. With practice, you can get that up to 98 percent or 99 percent, which is better than a lot of people can do with voice transcription.

One error in 100 sounds pretty good, but that means that in a paragraph like the preceding one of 91 words, there’s likely to be an error. The mangled word won’t come up in spell check, because anything coming out of the speech rec engine will be from its dictionary of correctly spelled words. You have to proofread the copy carefully to find the outliers.

Alpha, Bravo, Charlie
If you’re dictating directly into the computer and watching the words appear as you speak them, you may find the navigation frustrating. Executing computer commands by voice requires using precise terms that aren’t always intuitive. If you need to spell out something like a proper name, you can say the letters, but using the phonetic alphabet is more accurate.

Speech recognition in Windows uses the NATO phonetic alphabet (Alpha, Bravo, Charlie) where many cops use a different set (Adam, Boy, Charles).

Can you shift from one to the other easily?

There used to be several speech recognition software vendors in the marketplace, but that has largely shaken out to one specialized vendor, Nuance.

The Dragon’s Layer
Nuance makes the “Dragon” software line and has specialized packages for many industries, including law and medicine. The aforementioned Olympus recorder interfaces with Dragon software, but it’s not included when you buy one of their devices.

Speech recognition has made inroads in consumer devices, especially those used in cars. Many new cars come with speech recognition interfaces to run the entertainment and navigation systems, although road noise can make use of these difficult when moving. iPhones include Siri, which moves processing of the speech off the device and onto an Apple server (Siri won’t work without an active data connection).

You can also try speech recognition in most versions of Windows, although it’s not activated by default. Look in the Control Panel, and you’ll see an icon for speech recognition. You generally need a headset microphone and have to train the software to recognize your voice, but it works almost as well as the dedicated Dragon software for most purposes.

A Learning Curve
In law enforcement, maybe the biggest obstacle to use of speech recognition is that we aren’t used to it. If you have always written with pen and paper or to a keyboard, it’s difficult to shift your method to voice only. It’s easy to forget where you are in the middle of a sentence, or if you have already covered a topic in a previous paragraph. Having an outline or template to guide you helps, but there is still a learning curve.

Speech recognition is widely used in legal and medical settings, but a large part of the success there is that attorneys and physicians frequently learn to dictate documents in their professional training. Many hospitals all but require their physicians to dictate chart and surgical notes, but these have mostly been transcribed by human typists. These are slowly being replaced by speech recognition engines as the technology becomes more reliable.

Using a handheld recorder like the Olympus DS-3500 eliminates the need for a computer alongside the person doing the dictating. It comfortably fits and can be operated by one hand. A color OLED display shows the time, date, file number and amount of recording time remaining in memory — more than 154 hours at full capacity.

Each recording is time- and date-stamped and can be stored in one of five virtual folders to keep recordings from becoming comingled. It’s very easy to run the recording back a few seconds or clear to the beginning to review what you’ve already said.

When it’s time to transcribe the recordings, a mini-USB cable connects the recorder to a PC with the appropriate Dragon software, and the memory is cleared for new recordings. There are no moving parts other than the control buttons. Power comes from a proprietary rechargeable battery pack.

It’s a great little device if you need to have your recorder with you all the time. My only misgiving is its high retail price of $399.99 (its big brother, the DS-7000, with a few more bells and whistles, goes for $499.99), which could be a steep investment for most law enforcement agencies.

Officer Safety

Tim Dees

Tim Dees is a writer, editor, trainer and former law enforcement officer. After 15 years as a police officer with the Reno Police Department and elsewhere in northern Nevada, Tim taught criminal justice as a full-time professor and instructor at colleges in Wisconsin, West Virginia, Georgia and Oregon. He was also a regional training coordinator for the Oregon Dept. of Public Safety Standards & Training, providing in-service training to 65 criminal justice agencies in central and eastern Oregon.