College of American Pathologists
Printable Version

Cancer registry project passes midterm exam


cap today

December 2005

Feature Story

Tony Sullivan

Three years after launching a project to automate the process of collecting cancer data from select pathology laboratories and reporting the data to cancer registries, the National Cancer Institute reports that it has just passed the halfway point in reaching its goal.

The project is part of the NCI’s Surveillance Epidemiology and End Results, or SEER, program, in which cancer data are collected routinely from designated cancer registries in the United States. Public and private entities then use those data to derive trends in cancer incidence, mortality, and patient survival; conduct cancer-related studies; and help state governments and other organizations predict their resource needs for managing the disease and conducting epidemiological studies.

The SEER program now collects and publishes cancer incidence and survival data from 14 population-based cancer registries and four supplemental registries in the U.S. which, together, cover about 26 percent of the U.S. population. In total, SEER has data on more than 3 million in situ and invasive cancer cases, and about 350,000 cases are added each year within the SEER coverage areas, says Carol Kosary, a statistician in the NCI’s Division of Cancer Control and Population Sciences. That’s more than double the annual volume of cases that were being plugged into the data bank just two years ago, she says.

The SEER registries routinely collect data on patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and followup, Kosary told CAP TODAY. It is the only comprehensive source of population-based cancer information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage, she says.

Steve Peace, a public health analyst with the SEER program, says it has been the "primary means of measuring the national burden of cancer" and that the epidemiologic benefits are powerful. "SEER is often where changes in cancer incidence and death rates are first detected, stimulating additional epidemiologic investigation to reveal the cause," he says.

Historically, most, if not all, of the laboratories participating in the SEER program have relied on manual processes to identify cancer cases and forward reportable data to the cancer registries involved in the SEER program, Kosary says. That changed about three years ago when the NCI signed a five-year, annually renewable contract with Artificial Intelligence in Medicine Inc., or AIM, a Toronto-based medical informatics engineering firm, to begin automating those manual processes with software called E-Path.

"We’re building the infrastructure to improve cancer data collection," Kosary explains. "It’s a five-year piece of work that we’re trying to get accomplished, and we’ve just entered year three. Within the five years we’re attempting to get electronic pathology reporting installed in 80 percent to 85 percent of our 207 labs. We’re probably a bit more than halfway there."

That plan will ultimately create a massive public-use database with real-time, complete, and accurate information on cancer incidence, mortality, and survival, something unattainable with manual data gathering and reporting. Most of the cancer cases the NCI registers in the SEER program—in the high 90 percent range—are found in pathology reports by hospital tumor registrars, Kosary says. Generally, registrars do this by hand, an inefficient process that often results in missed cases.

Programs like E-Path do away with the inefficiency, says Peter Brueckner, MD, AIM’s chief executive officer. "E-Path goes through the electronic pathology records and automatically identifies those cancer cases that are reportable through lexical analyses of the text of the reports," Dr. Brueckner told CAP TODAY. "Then it automatically sends these cases to the registries, where they need not be transcribed. They enter right into the system."

Automating cancer data identification, collection, and reporting frees up busy tumor registrars to do other things, an important benefit because of how hard it is to find and employ qualified tumor registrars, Kosary says. "It allows us to use them in other areas of cancer registration other than wading through massive volumes of pathology reports."

Dr. Brueckner says they also find that quality and timeliness of the reporting is far better—"quality in terms of accuracy and completeness of reporting and timeliness in that this takes place in a day or two rather than several months when it is done manually," he says.

E-Path, which AIM introduced about eight years ago, has evolved to the point today where it accurately detects about 99 percent of all reportable cancer cases during its automated review of pathology reports, Dr. Brueckner says. That compares with a sensitivity that may be as low as 70 percent in a manual process, he says. And the software is about 98 percent specific, meaning only two percent of the cases are false-positives—cases the software identifies as being reportable when they are not.

CAP member Murray Treloar, MD, chief of laboratory and genetic services for Lakeridge Health Corp. in Oshawa, Ontario, has experienced these high sensitivity and specificity rates firsthand. Though his laboratory is not participating in the SEER project, he has been using E-Path for about five years, beginning as a beta site for the software when it was introduced. Before that, his laboratory had been using an "error-prone" manual process to identify and report cancer data it was required to submit to the Ontario cancer registry. "We used a manual process of culling them on a monthly basis," Dr. Treloar explains. "Pathologists were reminded regularly that they were supposed to indicate cases that needed to be sent into the registry. The secretarial staff also were on alert to recognize cases. A senior secretary spent one to two days at the end of every month collating cases. Then we packaged and mailed [the data] to the registry."

After installing E-Path, the laboratory’s time commitment devoted to fulfilling its reporting requirements "went to zero," Dr. Treloar says. "And we had a much better capture rate [of cancer data]. We were missing about 30 percent of reportable cases."

The NCI’s Steve Peace says the lexicon that AIM created and cultivated for E-Path has matured into a highly sophisticated tool. "It started out with a basic word-search capability. It did an automatic search of the pathology reports for words and word-string combinations to match for cancer cases. It has evolved into a much more sophisticated lexicon that looks at not just combinations of words and word strings but a number of other different factors that more accurately identify what we need."

"It does a lot of the ’thinking’ to identify the cancer cases that need to be reported and incorporated in the [SEER] database," Peace says.

For example, E-Path can make sense of complex pathology reports that are produced in different ways by different pathologists, he explains. "The software works through all of those different variations to come to an accurate conclusion, whether the word is spoken in XYZ format or YZX format," he says.

Although they are not built into E-Path, the CAP’s cancer protocols and short-form checklists are playing an indirect but important role in limiting the variations the E-Path software has to accommodate as the laboratories participating in the SEER project report their data, Peace says. Many U.S. labs, including those participating in the SEER project, use these structured reports to ensure that their cancer data identification, collection, and reporting processes are standardized, Peace says. "AIM software and the electronic reporting system take advantage of those protocols to allow more sophisticated searching and identification of cases and the components of the pathology reports for the things we need to collect."

One of the main challenges the SEER program has had to contend with since implementing its plan to automate cancer data collection and reporting has been a lack of uniformity in collection and reporting methods among the laboratories and cancer registries. The NCI has met that challenge by laying the groundwork for standardized electronic pathology reporting for cancer surveillance, Peace says.

"As a leader in electronic reporting, we’ve had to try to set those standards and work through the problems and difficulties that come with doing that," he says. "CAP’s protocols have helped with that in terms of establishing more consistent structures and formats for reporting cancer data."

Dr. Brueckner and CAP representatives are discussing incorporating the cancer protocols and checklists into E-Path, which would entail expanding the software’s lexicon to include the language used in the protocols. "Having a standard makes our life easier, so we would welcome it. It would allow for more complete and more consistent data collection," says Dr. Brueckner.

The SEER program’s automated disease surveillance is one of many applications for the CAP’s terminology SNOMED CT, and talks are ongoing between AIM and the CAP about incorporating SNOMED CT into E-Path. AIM now uses the International Classification of Diseases for Oncology, Third Edition (ICD-O-3), to automatically encode cancer cases in E-Path. ICD-O-3 is embedded within SNOMED CT either through a one-on-one integration of the morphology concepts or through a mapping of the ICD-O-3 topography concepts to appropriate codes in SNOMED CT.

Within the next 12 months, NCI expects to have all of the SEER regions in some phase of implementation, says Peace. "Our program is trying to make sure that E-Path reporting as a component of all of the electronic reporting processes is in place in the overall SEER program. It’s been a gradual process, but when you look at the larger scope of things, [the program] has actually moved quite quickly."

Tony Sullivan is a writer in Wheaton, Ill.

Related Links Related Links