At long last, slaying the dictation dragon
Speech-recognition software has finally overcome numerous technical and business
hurdles and is poised to take off in a big way, particularly for report-dependent markets like pathology.
Yes, we’ve heard that before, but this time it may be true.
Proponents of speech recognition were making similar predictions in the late 1990s, but those predictions failed to become fact. The technology was still too difficult to use and its results too unreliable. In addition, the vendors were roiled by turmoil as two pioneers in the field, Kurzweil and Lernout & Hauspie (which had bought another early player, Dragon Systems), closed their doors.
Today, however, one primary vendor, ScanSoft, is winning solid reviews with its most recent speech-recognition program, Dragon Naturally Speaking version 7 (based on Dragon Systems’ technology), and is distributing IBM’s ViaVoice. Meanwhile, Dictaphone plans to introduce a new, more flexible product for pathology to replace its popular Clinical Reporter, which depends on fill-in-the-blank templates.
Couple advances in technology with the cost and time pressures pathologists and pathologists’ assistants face, and you’re likely to see a sharp increase in the use of speech-recognition software. Moreover, transcriptionists, who handle the majority of pathology reporting, are in short supply in some markets. And that may be in part why laboratory information system vendors such as Cerner Corp. are preparing to incorporate speech recognition into their products.
Add it all up and the use of speech-recognition software by pathology departments could increase from less than 10 percent today to about 50 percent in the next several years, predicts Matt Revis, product marketing manager for dictation products at ScanSoft. “We’ve seen increasing interest among pathologists and among all of health care,” Revis says.
John E. Tomaszewski, MD, professor of pathology and director of surgical pathology at the University of Pennsylvania, Philadelphia, agrees. “There’s a real interest out there,” he says. “We’re coming out of a hiatus where there was a shakeout in the vendors.” Within the next five years, “you’re not going to be seeing much in regular tapes,” he forecasts, meaning that pathologists will switch from dictating to tape and sending it to transcriptionists to dictating into a computer program.
These optimistic projections can’t be realized, however, until the cultural barriers to adopting speech technology are overcome. Many pathologists consider transcribing to be a secretarial function, and they will have to be coaxed into altering their work habits. The transition no doubt will occur more quickly at hospitals with larger numbers of young pathologists more accustomed to new technology.
“We need pathologists who are not computer nerds to be adopting this technology if it is going to have significant market penetration,” says Raymond Aller, MD, a pathology and speech-recognition enthusiast who is director of bioterrorism preparedness for Los Angeles County. “It can’t just be early adopters.”
Carl Teplitz, MD, emeritus chair and professor of pathology at Beth Israel Medical Center, New York, who instituted routine speech recognition at Beth Israel a decade ago, says pathologists often have a “phobia of technology that would change the M.O. on how they deliver a diagnosis.” Thus, “there’s a tremendous resistance factor” to speech technology, which he believes can be overcome by looking at the turnaround time and cost-saving benefits.
‘I’d never go back’
North Shore University Hospital at Glen Cove, NY, part of the North Shore Long Island Jewish Health System, has been using speech recognition for six years, starting with the old Kurzweil discrete speech version, “which certainly was not user friendly,” says Paul E. Kalish, MD, chairman of the Department of Pathology. Discrete speech recognition, as opposed to the continuous-speech technology available today, required users to speak with pauses between words.
In 1999, the hospital upgraded to a continuous-speech version of Kurzweil, interfaced with Cerner’s CoPath. “All our dictation, both gross and microscopic, went directly into CoPath” by means of speech recognition, Dr. Kalish says.
But a problem surfaced. In January of this year, the hospital converted to Cerner Millennium for its anatomic pathology. Millennium “didn’t have an interface for anybody’s voice-recognition package,” says Dr. Kalish, and with Kurzweil now defunct, Cerner wasn’t interested in supporting a “dead-end product.”
North Shore University Hospital refused to abandon speech recognition, so it engaged S&P Consultants Inc. to develop a customized interface between a newer speech-recognition product, Dragon Naturally Speaking 6.10, and Millennium. (Cerner paid for the interface under terms of its contract with the hospital.) “We dictate into Dragon and paste the text into Cerner by voice command,” Dr. Kalish says.
S&P Consultants built the vocabulary, set up templates to allow for synoptic and gross reporting, and read hundreds of pathology reports into the new DNS 6.10 engine to accustom it to the vocabulary. The consultant also created a foot pedal that lets users enter complex commands and tabs so the dictator conserves speech and avoids touching the keyboard and mouse. A unidirectional freestanding USB microphone allows hands- and headphone-free dictation.
Dr. Kalish says he and other users are pleased with the results. DNS 6.10, which is loaded in the PCs of the hospital’s two pathologists and two pathologists’ assistants, “is absolutely super, 100 times better than Kurzweil.” With the PCs interfaced to the hospital’s internal network, Dr. Kalish and the others can download their voice profiles “anywhere in the hospital,” which means they’re not tied to a single PC for dictation.
“I don’t even have a transcriber anymore,” says Dr. Kalish. “We’ve been using this [new system] full-time for the past seven months.” The pathologists’ assistants do the gross dictation and the pathologists, the final report. “I’d never go back to transcription,” he adds. “I don’t think there are any more corrections with voice recognition, and I can make any changes immediately while I’m dictating, either by voice or keyboard. I don’t have to wait for the text to come back days later from a transcriptionist.”
Speech recognition is also a time saver, Dr. Kalish says. His pathologists’ assistants do the gross reports in the afternoon. “When I come in in the morning,” he says, “all the gross texts are in the computer.” Histologic processing occurs on the night shift, so first thing in the morning he dictates the final surgical report, corrects it (unless it’s an unusual case that he wants others to review), and sends it to the clinician. “By 10 AM,” says Dr. Kalish, “most of my surgicals are at the clinician’s office,” compared with waiting a day or two for transcription.
Despite North Shore University Hospital’s success with speech recognition, only one other pathology department in North Shore’s network—Southside Hospital—has adopted it. “Every radiology department in our network is using voice recognition,” Dr. Kalish reports, but pathology has been slower to move. In fact, he says, three of the pathology departments continue to use handwritten notes that are then passed to transcribers.
Dr. Kalish agrees with ScanSoft’s Revis that speech technology will be adopted more readily in the next few years. “It’s ideally suited to specialized applications like pathology reporting,” he says. If pathology departments don’t adopt it, “we won’t be teaching the next generation properly.” What’s more, “in most parts of this country good transcribers are impossible to find,” forcing hospitals to outsource the work and resulting in longer report turnaround times.
‘Out of necessity’
Myra Wilkerson, MD, director of diagnostic informatics in anatomic pathology at Geisinger Medical Center, Danville, Pa., knows exactly of what Dr. Kalish speaks. Over a six-month period about two years ago, the rural hospital lost all but one of its six transcriptionists in anatomic pathology and found them difficult to replace. “In central Pennsylvania, you don’t have a lot of people with transcription skills growing on trees,” Dr. Wilkerson says.
So, “we decided to let a couple of people try Dragon—me and another pathologist,” she says. “We weren’t intending to replace transcription but at least to lessen the pressure until we could get new people. It was basically out of necessity.”
The hospital bought DNS 6 for Dr. Wilkerson and her colleague, which has just been replaced with DNS 7. Even with version 6, she found the voice training incredibly fast. “It takes about 15 minutes to train it to your voice,” Dr. Wilkerson says.
Her only complaint is with the microphone, which doesn’t allow for remote dictation. “You’re tethered to your PC,” she says. “If you’re doing gross dictation, that becomes a real problem,” especially if there’s background noise. Consequently, Geisinger is using DNS for microscopy only. Even so, “I had to change the whole design of my office to accommodate the placement of the microphone,” Dr. Wilkerson says.
Overall, she likes the flexibility of speech recognition and finds it reasonably easy to use. “It’s at least as accurate as some of our newer transcriptionists, most of whom were right out of school and had no training in pathology terminology.” And she can complete most of her cases by lunchtime—“my turnaround time is 24 hours.” She’s finding DNS 7 is even faster and the vocabulary more sophisticated than version 6.
Despite the advantages, some pathologists will not convert to speech recognition. “It’s a matter of how computer literate you are,” she says, since the technology requires users to adjust their work habits. A knowledge of computers comes in handy because “you can customize it and add macros to make it work better for you,” she adds.
Although Geisinger has hired new transcriptionists, Dr. Wilkerson says she will stick with speech recognition and hopes to persuade other pathologists to move to it. “We’re planning to do lunchtime seminars and give CME credit for people who are interested,” she says. “Then we’ll set up training.” Conversion has to be voluntary, she adds, because many pathologists “have a mindset that they should not have to do transcription.” Dr. Wilkerson predicts: “As we get more pathologists comfortable working with computers, they will also adopt speech technology. We already have one more using it and two more that are willing to give it a try.”
‘Not bound to someone’s schedule’
Another pathologist experimenting with speech recognition is Robert O. Rainer, MD, chair of the informatics committee at Spartanburg (SC) Regional Medical Center. He’s been using DNS 5 for about four years. “It was just to try it out and see what the technology is all about,” he says. “I use it mainly for bone marrows, routine surgical cases, and autopsy dictation. I don’t do it for the complicated cases because it’s not as efficient as the typist.”
Dr. Rainer achieves 90 to 95 percent accuracy with his dictation. “It’s still a little quirky, and sometimes gibberish comes out,” he says. It took him about an hour to train the system to his voice, although he concedes that DNS 5 is now several years out of date. (ScanSoft claims DNS 7 is more accurate and requires less training.) He agrees with Dr. Wilkerson that microphone positioning is essential to improving accuracy. “I put the microphone at a distance about the thickness of my palm away from my mouth,” he says. And reducing background noise also helps.
Convenience could be a powerful inducement for pathologists to convert. “I can dictate my cases and immediately release the report,” Dr. Rainer says, compared with a six- or seven-hour turnaround using a transcriptionist. “I’m not bound to someone else’s schedule. My partners are scrambling in the afternoon while I’ve got my stuff done.” Also, when he goes to different hospitals, “I can have my dictation system there on my laptop.”
Still, no other pathologist at Dr. Rainer’s hospital has followed his lead. “People are interested, but they don’t want to spend the time to learn it,” he says. “As long as a hospital provides transcriptionists, there’s not much incentive.” Dr. Rainer believes the technology will become more widespread as it gets easier to use and vendors begin offering continuous speech and synoptic reporting.
For the simpler cases
Rodney A. Schmidt, MD, associate professor of pathology and director of medical informatics for the pathology department at the University of Washington, Seattle, used DNS 6 for about nine months “to see whether to recommend it to the whole department.” He used it for pathology reports, memos, and letters. To boost accuracy, he experimented with “stripping down the base vocabulary, adding a custom vocabulary from old pathology reports, and using different microphones.” Now that he has upgraded to DNS 7, the “voice recognition accuracy is much greater, particularly with the custom vocabulary,” he says.
“I believe that with the right person and the right situation, speech recognition is perfectly fine,” he says. “It takes more active intervention than some people are willing to do, in terms of putting on the headset, using the program, and editing as you go.”
Dr. Schmidt uses DNS 7 with Tamtron’s PowerPath 2000 anatomic pathology system. “With PowerPath, Microsoft Word is tightly integrated into the application, and all of the features that DNS has built to interact with Word are fully available,” he says. “Even without any customization by Tamtron, it only costs me about one extra mouse click to use DNS to create my reports. The user also has complete freedom to create and use custom templates with voice-navigable fields.”
Like other users, Dr. Schmidt considers the time element a distinct advantage. “I can knock out the report and sign it out right away,” he says. “If somebody calls up at 4:59 and wants to know the answer, I can not only tell them but have the report out by 5:10.”
With reports where only a single pathologist is involved, that works well. You may need to adjust your workflow for more complicated cases requiring additional opinions. “If you’ve got a variety of cases and long complex reports, it’s probably still better done by a transcriptionist,” Dr. Schmidt says.
Dr. Schmidt believes that about half of the 15 pathologists in his department are likely to use speech recognition. It should not be forced upon pathologists, he says. He knows of another hospital where it was “imposed on the pathology department, and it wasn’t a good match for all the pathologists. Some found it fairly frustrating to deal with Dragon [version 6] when they were under a lot of pressure to get the work done as quickly as possible.” With DNS 7, Dr. Schmidt says, “this technology is just about where it needs to be” in terms of accuracy and other features that will encourage wider use among pathologists.
Like Drs. Rainer and Schmidt, Anne Brenner, MD, a pathologist at St. Joseph Hospital, Denver, is experimenting with speech recognition on her own. Several years ago she bought DNS 4, which she uses to dictate microscopy, corrections to gross reports, and final diagnoses in routine cases.
“For straightforward cases it’s a time saver,” she says. “There was initially the thought that we would all do it, but it’s not very user friendly.” It takes time to “learn the way you say things.” She adds that DNS 4 gets the long, complex words correct but makes mistakes on small words because they’re easier to confuse. “I would not recommend it for every pathologist,” she says, “because many aren’t willing to do the needed proofreading.”
Dr. Brenner acknowledges that moving to the newest version of DNS might make it more user friendly, but St. Joseph’s computer system is old and she’s hesitant to move to new software until they upgrade. Until then, she suspects she’ll remain the lone user in her department. “If the administration said you have to use it, we probably would. But people don’t like to change, and our transcriptionists are excellent,” she says.
At the University of Pittsburgh Medical Center, pathologists’ assistants are using Clinical Reporter for gross descriptions in a program just getting under way. “We have a high number of skin biopsies,” says Alena Sikorova, lead pathologists’ assistant, and Clinical Reporter, in which users can create numerous templates with routine language, allows for fill-in-the-blank responses for unique characteristics, such as size. “You say ‘shave biopsy,’ and it brings up the form; you just put in the measurement,” she says.
Rick Nestler, UPMC information systems manager for pathology, oversees the system. “All [pathologists’ assistants] do is dictate measures, color, and consistency,” Nestler says. Clinical Reporter allows for some “free speech,” but it’s limited, and the product is about to be replaced by its vendor, Dictaphone. For pathologists, who need continuous free speech for their reports, Nestler is looking at DNS 7. In nine hospitals affiliated with UPMC, “I have a half-dozen pathologists who are interested in trying it,” he says.
Nestler already has a champion in Dan Galvis, a pathologists’ assistant at Children’s Hospital of Pittsburgh who is experimenting with DNS 7. “We’re using it for gross descriptions and autopsies,” he says. Since DNS is “relatively user friendly,” he adds, the goal is to stimulate interest among pathologists. Children’s Hospital employs remote transcriptionists, Galvis notes, so “pathologists would benefit by not having to wait.”
The ‘right model’
Few pathology departments have moved wholesale to speech recognition. And Dr. Tomaszewski agrees that “it’s still being used very tentatively even though the technology has continued to improve.”
At his hospital, the University of Pennsylvania, pathologists’ assistants are using Clinical Reporter for gross reports. Because it is a template-based system, “it gives you tremendous control over training people and getting them to dictate in a consistent way,” he says. On the other hand, “it also has a huge learning curve because there’s thousands of templates.”
Dr. Tomaszewski expects his department “probably will upgrade to PowerScribe for surgical pathology” once Dictaphone releases it next year. (See the section “How to get it,” page 22, for more information.) PowerScribe will allow for continuous speech and use the Dragon engine, which ScanSoft and Dictaphone employ. Dr. Tomaszewski will probably lead the way because he does his personal consults and reports in voice recognition using DNS 6.
But a drawback to widespread speech recognition is that if the program is installed on an individual computer, it prevents pathologists from doing their reports elsewhere. “As a server-based solution, it makes more sense,” Dr. Tomaszewski says, because that would allow a voice profile to be accessed from any computer on the system.
Ultimately, he forecasts, the major LIS vendors “will embed a voice engine in their systems.” The stand-alone versions most pathologists are using today “are a patch. If you want maximum adoption, you have to embed it into the system.”
Dr. Tomaszewski’s version of a “home run” voice-recognition system for pathology would include a server-based program interfaced with the LIS, providing continuous speech as well as templates for synoptic reporting. Even then, the product would have to overcome resistance. “A lot of pathologists say, ‘It’s quicker for me to let someone else do the secretarial work,’” Dr. Tomaszewski says. In that case, he adds, the solution would be for the pathologist to continue doing dictation but into a speech-recognition program installed on the server. The transcriptionist could then make the corrections and return it to the pathologist. “That may be an interim model that wins,” says Dr. Tomaszewski, though “you’re not going to get the immediacy of doing it yourself.”
Once this type of product exists, he predicts, the majority of pathology departments will adopt voice recognition within about five years. “There’s going to be one product that’s the right model, and it will take off and move through the community fairly rapidly,” Dr. Tomaszewski says.
One study that points out the difficulties with speech recognition was conducted at St. Joseph’s Healthcare, Hamilton, Ontario, Canada, and published in the June 2003 issue of Archives of Pathology and Laboratory Medicine. In early 2002, during a period of several months, the system generated 206 routine surgical pathology reports simultaneously using speech recognition and transcription. The speech-recognition product used was IBM ViaVoice Pro version 8 with pathology vocabulary from Voice Automated. The mean accuracy rate was 93.6 percent using the software versus 99.6 percent for human transcription. Time needed to edit the speech-recognition documents was on average twice that needed for transcribed documents. Extra time needed for editing computer-generated documents was 67 minutes per week, or about 13 minutes per day.
The article, by Maamoun M. Al-Aynati, MD, a pathology resident, and Katherine A. Chorneyko, MD, associate professor of pathology and molecular medicine at McMaster University, says that speech recognition “can be successfully used in pathology practice, even during the handling of gross pathology specimens and with users who are non-native English speakers.”
The authors caution, however, that the lower accuracy rate, which increases the editing burden for pathologists, “may not encourage its application on a wide scale in pathology departments with sufficient human transcription power, despite significant financial savings.”
The authors predict that speech recognition will become more popular as accuracy improves, adding that it’s already “an attractive and relatively inexpensive alternative to human transcription” in areas where there is a shortage of transcriptionists. “This continuously evolving technology will no doubt become more commonly used in the future practice of technology,” they conclude.
Dr. Chorneyko tells CAP TODAY that by the end of the study, the accuracy rate for the speech-recognition software was about 96 percent, closer to that of human transcriptionists, showing that the technology grows better with use. However, the system had to get accustomed to only one voice during the study—that of Dr. Al-Aynati. “It would be nice to do a study where you could look at multiple users and see how they fared,” she says.
Dr. Chorneyko now uses the software for letters and personal memos, and one of the other pathologists in the department uses voice recognition for publications. But for a department to adopt it, she says, “you need a strong commitment from the administration and your LIS system, and all of the pathologists would have to buy in.”
How to get it
A pathologist could get speech recognition by installing an individual program on the desktop, by accessing it through an LIS vendor such as Cerner, or by dictating into a computer—which makes a sound file that goes to a transcriptionist, the so-called back-end system that makes using the technology more transparent.
The advantages of the front-end systems (the first two mentioned) are that reports are self-corrected and instantly available, says ScanSoft’s Revis. ScanSoft primarily sells front-end systems. The problem, he says, is that “many doctors object to having to do self-correction.”
Numerous objections to speech technology are being overcome in the software’s latest versions, Revis says. For example, DNS 7 is about 15 percent more accurate than version 6. He says about 50 percent of users report 97 percent accuracy with DNS 7, “which is a reasonable goal for most doctors.”
Revis claims that the Al-Aynati-Chorneyko study is outdated because ViaVoice 8 is “old software”—noting that IBM is currently on ViaVoice 10. (ScanSoft distributes it, and IBM sells it at the server level.) “It’s like driving a 1972 Chevy Impala,” he says.
Training time, too, has improved. With DNS 4, “you had to train for 45 minutes,” he says. “Now it’s down to four minutes with DNS7.”
Add that to the fact that major LIS vendors, including Cerner, “are coming to us and asking how can we embed voice recognition,” Revis says. Although he would not disclose most of these discussions because negotiation is under way, he says Cerner already builds DNS compatibility into CoPath, which it sells directly to customers.
“You’re going to see a steep incline in the adoption rate,” Revis predicts. He estimates pricing for speech recognition at the front end, including training and support, at $500 to $2,000 per physician in the first year. After that, “you have ongoing support costs, which are minimal.”
Dictaphone is planning to phase out Clinical Reporter and replace it next year with PowerScribe Workstation for pathology, according to product manager Robert Fleming. PowerScribe has been in the radiology market for about five years and “we’re going into beta cycle at the end of this year with a pathology product,” he says.
Fleming says that what differentiates PowerScribe from similar products is that it’s not only speech recognition but also a workflow product. It’s a complete system for dictating, with an HL7 interface to the LIS, plus routing, faxing, and scripting documents, he says. He emphasizes that while speech recognition software requires physicians to alter their work patterns, it offers many benefits as well.
In contrast to ScanSoft’s front-end systems, PowerScribe is a back-end system built into the LIS. “It’s not for an individual,” Fleming says. “It’s a server-based system for a work group,” which includes a database for file storage and access. The pricing is on a per-user basis, starting around $20,000 for small groups of three to five users.
Like Revis, Fleming believes the pathology market for speech recognition is poised to take off. “The technology is here,” he says, “and people are beginning to realize that and accept it.”
Karen Southwick is a writer in San Francisco.