College of American Pathologists
Printable Version

  Feature Story


cap today

Bringing order to data chaos

November 2003
Eric Skjei

Pressure to adopt standards for structuring and communicating pathology data is mounting, and the move to create the standards is gaining momentum.

"A few months ago the NIH came out with a requirement that all grants where the request is larger than $500,000 per year must include a section in the application that describes how the authors plan to share their data," says Jules J. Berman, MD, PhD, program director for pathology informatics at the National Cancer Institute. The good grant applications, he says, will do more than just say they will make their data publicly available. "They will show they have a realistic way of organizing their data so that it can be merged into other data sets," says Dr. Berman.

Those who write gene expression array-related grants, he notes, already are aware of the need to share data and typically use the Microarray Gene Expression Databases or the Minimum Information About a Microarray Experiment specifications for their gene expression array data.

"I think this will probably be the case for any kind of research funded by the federal government," he says. "The data must be presented in a way that can be understood by humans and by computers. Research files should be self-describing and prepared in a way that meets the general rules for the standard exchange of data. Taking these steps will enhance the chances that a grant will be reviewed favorably."

Many journals now require that those who publish papers include the data that support the assertions made in the paper, he says, which means authors will submit supplemental data files that will appear on the journal’s Web site. And to be usable, these files will have to adopt standard formats. "Just about anything anyone does will have to be exchangeable, the exchanges are going to be done electronically, and they will need to be done in such a way that any given data set can be integrated with every other data set of the same type," Dr. Berman says.

Not easy, of course, but we’re moving in that direction. A modest milestone in the history of standards development in pathology was achieved May 23 with the publication of "The Tissue Microarray Data Exchange Specification: A Community-Based, Open Source Tool for Sharing Tissue Microarray Data." Authored by Dr. Berman and pathologists Mary E. Edgerton, MD, PhD, and Bruce A. Friedman, MD, the specification is the result of more than two years of effort and the first such work published by the Association for Pathology Informatics. (It can be found at

Tissue microarrays, or TMAs, are a relatively recent technology that allows researchers to include and study hundreds of samples on a single slide. To facilitate data exchange among researchers and data inclusion in scientific journals, leaders in the standards area realized that a TMA data-exchange specification was needed. Work began on one in 2001 under the auspices of the API with funding from the National Cancer Institute. The resulting specification describes an XML document with four required sections: header, block, slide, and core. Eighty common data elements constitute the universe of XML tags used in the TMA specification. In addition, six simple semantic rules -for example, "Every TMA file must consist of well-formed XML"-guide the development of TMA data-exchange specifications. (A common data element is a generic type of data-for example, lab name-that is likely to show up in any TMA file. XML, or Extensible Markup Language, is a high-powered, more flexible cousin of HTML that allows users to create their own tags, among other things, and is rapidly becoming the de facto tool for creating medical informatics standards and specifications of a variety of types.)

Like an RTF file

An easy way to understand the significance of the TMA specification is to refer to a commonly used file exchange function of most commercial word-processing programs, the RTF (Rich Text Format) file. Take, for example, Microsoft Word and WordPerfect. A person writing in Word cannot send a Word file to someone working in WordPerfect (or vice versa) and have the recipient open it and read it. An intermediate file, functioning something like a lingua franca among all mainstream word-processor files, must be used. RTF serves this purpose well, preserving text, format instructions, and image placement as the original writer intended. In other words, RTF serves as a gateway through which data, in this case word-processing data, can be exchanged among heterogeneous or "foreign" applications.

The new TMA data-exchange specification serves precisely this purpose for those working with TMA data. In most cases, users of the specification will also require a script, similar to the "save as" option found in most word-processing programs, to convert their own specific TMA data files into the published TMA data-exchange format. This script is not part of the TMA data-exchange specification. Again, the specification is analogous to the RTF format, but it does not include the simple programming tools needed to convert proprietary TMA data formats into the XML-based specification format.

"The purpose of a data-exchange specification," Dr. Berman says, "is so that anybody who has their own way of making TMAs can exchange that data with others, regardless of what company they bought their arrayer from, what image-analysis software they’re using, or how they’ve set up their database." The assumption, he says, is that individuals are going to create their TMA files in their own way for their own purposes. "What we wanted to do was create a format that was so plastic that anyone could port their own data into a very general data-exchange envelope that would accept almost any kind of TMA data but that would be self-describing, so that a script could reinterpret it into someone else’s own system."

To date, Dr. Berman says, the specification has been well received and appears to be serving the intended need. The API plans to continue to develop the specification as needed, based on user input, over the next several years.

Other end of the spectrum

The other end of the process by which a standard or a specification comes into being can be seen in the work of David L. Booker, MD, director of pathology at St. Joseph Hospital, and associate professor, Medical College of Georgia, both in Augusta. He is a proponent of what is referred to as "structured documentation." Noting that anatomic pathology reports are largely written in free text, Dr. Booker decries the fact that, because they are not machine-readable, the invaluable data they contain are lost to other pathologists, clinicians, researchers, and educators.

"The inadequacy of the unstructured report becomes even more apparent when one considers emerging technologies such as genomics, proteomics, and tissue microarrays," Dr. Booker says. "The massive data sets these technologies produce are highly structured and must be related to structured pathology data." Anatomic pathology will not have input into this process, he says, "unless pathology report data become structured in a standardized manner."

Synoptic reporting, he notes, takes a small step toward making freeform text narratives more consistently readable and searchable by humans, but it does not complete the process of automating them so they’re machine-readable, which is his ultimate goal. "Some computer-based patient record vendors are now using Web technologies and structured data entry to prevent errors through decision support and to capture clinical information in a structured, useful way," he notes. Pathologists have to adopt similar systems if they wish to remain central figures in medical practice, research, and education, he adds.

Structured data entry-which includes point-and-click navigation, speech recognition, touchscreen technology, and keyboarding-and structured documentation are parts of what Dr. Booker refers to as "structured reporting." In his vision of the future, pathologists will create their reports by using largely pre-formatted templates, picking and choosing from lists of terms and text that is largely predefined and standardized. They will also draw on a standardized computerized medical vocabulary, such as SNOMED, and use a data-exchange standard, probably HL7, to communicate their work to one another. Ideally, a sophisticated medical knowledge database will support their structured documentation and structured data-entry tools and facilitate the real-time creation of customized templates and lists. "These systems will not proscribe the inclusion of unstructured data," Dr. Booker says. "Qualifiers for ambiguity and uncertainty may be used and may be either structured or unstructured."

A working pathologist himself, Dr. Booker is well aware of the general resistance among his colleagues to anything that might slow their productivity or appear to compromise or constrain their diagnostic skills. The structured documentation systems now in development will be sufficiently flexible and comprehensive to overcome these objections, he says. Structured documentation will reduce errors, improve workflow, encourage standardization, enable automated analysis of AP reports, ensure compatibility with computer-based patient records, allow for more decision support functionality in the creation of AP reports, and more.

"Such systems won’t be widely adopted, even if their many benefits are recognized, unless the data-entry system is actually faster and easier than current methods," Dr. Booker adds. "My contention is that these systems will be exactly that."

One of the first tasks inherent in setting the stage for structured documentation in anatomic pathology is the preparation of common data elements, the general types of data that must be deployed to create an AP report. Dr. Booker has compiled such a list and has submitted it to a laboratory journal and to the API for review and publication.

Slow to implement

Creating, writing, and refining the standard are in many ways the easy steps. The hard step is getting laboratories and commercial vendors to adopt and use it.

"It’s terribly slow because people typically have already built a system, and what they’ve built is working just fine as far as they’re concerned, so they wonder why they should rip it apart and incorporate standards after the fact. But it’s critical they do so," says Raymond Aller, MD, director of bioterrorism preparedness and response in the public health division of the Department of Health Services of Los Angeles County. And it is only in recent years that the need to communicate health care information outside the walls of one’s own institution has become significant and pressing, adds Steven Steindel, PhD, supervisory health scientist at the Centers for Disease Control and Prevention. "If the pathologist was just using information within the walls of his or her institution, then he or she had at best a mixed need for standardizing terminology," says Dr. Steindel. A standard like SNOMED might be used for classifying and looking up cases, and pathologists might choose to use it or not with little consequence as long as the work stayed within the pathologist’s own institution.

"Now they’re finding that there is much more of a need to communicate this information to peers outside their own facilities, and that raises the question of and need for standards," says Dr. Steindel, who is the CDC advisor for data standards and vocabulary.

Ulysses J. Balis, MD, assistant professor of pathology and computer engineering at Harvard University Medical School and chief of pathology at Shriners Hospitals for Children, Boston Burns Unit, predicts that some regional medical centers will soon move to a model of electronic interchange, where image and text will become available in real time, making it possible for consultants to provide a diagnosis quickly. "From an information standpoint, as institutions shift to larger, regional-based centers where the need arises for information sharing among pathologists in geographically distributed groups and for real-time exchange of information, having standards allows for seamless exchange of information in real time as opposed to the current model, which many people call FedEx pathology, where you ship the printed pathology report and the slides." He points to the advent of technologies allowing entire slides to be digitized as one of the factors that makes this shift likely.

"What I would say to pathologists who are looking to see what happens is that these are in fact going to be real productivity enhancers and doing two things will facilitate this," says Dr. Balis. "One is asking vendors to make sure they support the standards, and two is learning as much as they can about standards so that when the opportunity arises they can implement them to the betterment of their overall practice structure."

Pressure builds

Pathologists should be the first "to embrace various schemas for enabling greater utility to the information we produce," agrees Dr. Bruce Friedman, professor of pathology at University of Michigan Medical School and Health System, Ann Arbor.

One reason is that adopting standards makes it easier to create a longitudinal electronic medical record. Assuming the hospital administrator’s perspective, Dr. Friedman points out that the administrator (or payer) might very well say to the pathologist who questions the need for standards: "’We’re compensating you, Dr. Pathologist, for your services. What I’m trying to do is produce an electronic medical record that delivers the highest standard of patient care. And that is a longitudinal record across time, and it’s in my best interest-and I would say in your best interest-to spend more time to put that information in standardized form in order to create this larger electronic medical record. I want to be able to integrate the observations of my clinicians, I want to put images in the record, I want to include the reports of the radiologists, I want to put your clinical pathology data in there, and I want to put your anatomic pathology data in there also.’"

All of this information from diverse sources can’t be integrated without a variety of documentation, terminological, and communication standards.

Genomics and proteomics testing is the second reason to be aware of and involved in standards. "With genomics, proteomics, and other very sophisticated testing methods, there will be techniques and tools that will be brought to bear that are critical for the surgical pathologist to know in order to render his or her diagnosis," says Dr. Friedman. "Tissue microarrays, genetic mapping of tumors, and more are going to require more involvement with outside laboratories."

Dr. Friedman draws an analogy with hematopathology, specifically to antigen testing. "It’s in the immediate best interest of surgical pathologists to embrace data-exchange standards because some of this genomic and proteomic testing will not be done in their laboratories but in outside laboratories," says Dr. Friedman. "So in order for pathologists to imperceptibly and seamlessly merge these data into their own reports, which they could previously create at their desks, they’re going to need to embrace data-exchange standards."

Other external factors will press pathologists to become more at ease with integrating data from outside their own institutions.

"On the CP side, testing is becoming increasingly commoditized," says Dr. Friedman. As a result of aggressive bidding by reference labs on managed care contracts, profit margins are slim on most "bread and butter" testing. The same thing is happening on the AP side, he says. Now "pathologists are also being besieged by commercial labs, particularly in the specialty areas," he says. "So there’s now competition on both the CP side and the AP side." As a result, pathologists are moving away from doing many of those tests in-house and moving toward a new "product line" he calls clinical laboratory consulting.

Clinical laboratory consulting arises as the boundaries between CP and AP become fuzzier and the two begin to blend. And this is beginning to happen, says Dr. Friedman. For example, when pathologists look at tissue, they will have to look at it not only in morphologic terms, but also with a better understanding of the molecular basis for the diagnosis. "A lot of those molecular tests are what I would call numerical or CP tests," he says. As this happens, there will be a natural tendency for pathologists to start providing more diagnoses. Doing so will involve looking at CP and surgical pathology data, of course, but also clinical data, including imaging data from the radiologist, observations by the clinician, and data from other kinds of testing. So it’s in pathologists’ best interests, if they want to pursue and be compensated for a broader set of diagnoses and other professional advice, to take a higher-level view of what’s going on with any given patient and position themselves to add value in terms of rendering diagnoses and therapeutic recommendations.

"This trend is sometimes referred to as ’theranostics,’" Dr. Friedman explains, which merges diagnoses with the recommendation of appropriate therapy. "The classic model is Herceptin, where there is a laboratory test that assesses the overexpression of the HER2/ neu antigen," says Dr. Friedman. "Patients who overexpress the antigen, as determined by the laboratory, become candidates for treatment by Herceptin, thus illustrating the gatekeeping role of the laboratory."

In an era of personalized medicine, a person’s molecular or genetic profile is going to position him or her for approval or denial of a particular therapy, and the pathologist will help make that determination. "So the potential is here for pathologists to break out of this very constrained box that has traditionally been limited primarily to diagnosis," says Dr. Friedman. "But that process has to be data-driven, and that means becoming more comfortable with standards."

Harvard’s Dr. Balis points to eclampsia and preeclampsia as an example of a condition in which pathologists can and should play a theranostic role.

"There’s currently no single laboratory test that can effectively predict eclampsia, and yet the overall fingerprint from a molecular-based diagnosis appears to be extremely promising," he says. "And, of course, ultimately if pathologists play their cards right, we will see the development of a set of technologies and tools in this area, leveraged through the practice of pathology and laboratory medicine."

Learning a lesson from radiology

Pathologists are no strangers to medical data-exchange standards. They’ve been "doing data standards for a very long time," says Dr. Aller, who notes that the earliest iterations of the International Classification of Diseases, or ICD, made use of input from pathology, particularly with respect to terms for diagnoses. The earliest standards effort in this century began in the 1920s, he says, and in 1965 became the CAP’s Systematized Nomenclature of Pathology, under the direction of Roger Coté, MD. SNOP, of course, became SNOMED, now a mainstream medical informatics standard recognized worldwide.

HL7 got its start in the early 1980s under the direction of Don Simborg, MD, with input from several pathologists and others. And at the same time, a group at the American Society for Testing and Materials was writing a description of a laboratory information system, a process that required several different standards to be developed. One of the ASTM subcommittees, chaired by Clement McDonald, MD, focused on standardizing ways to transmit lab data between computer systems, an effort that became the order and result transmission component of HL7, Dr. Aller says.

But in discussions of pathology informatics standards, it’s radiology that is often said to have conquered the issue of standards. "We have very good case studies from the field of radiology, going back 20 years ago, when they entered the realm of digital diagnoses with CT and MR, where interoperability across vendor platforms became paramount," says Dr. Balis. In anatomic pathology, he continues, where the main medium of exchange has until now been a text report, there has been much less demand for interoperability.

"But this is now changing for a number of reasons, including public-sector reporting, emergency reporting to the CDC for disaster preparedness and bioterrorism, along with opportunities in research for exchange of rare cases, interdisciplinary collaborative efforts between proteomics, genomics, and so forth," he says. "There is now a growing tide of urgency to allow for seamless interoperability of information."

In short, says Dr. Balis, pathology is in the position radiology was 20 years ago. And radiology answered the need resoundingly with three iterations of the imaging standard known as DICOM (Digital Imaging and Communications in Medicine). "I’m part of the original pathology DICOM implementation team, so I’ve had the privilege of working with these people," says Dr. Balis. One of the reasons for DICOM’s success is that radiologists took it upon themselves to become sophisticated users of the standard. "And we absolutely don’t have that now in pathology," he says. "Organized pathology, in terms of the CAP, the ASCP, the API, has to do a far more effective job than has been done in the past of making users aware of and at the same time excited about all the benefits that interoperability tools can provide."

If not, he says, pathology will remain trapped in the vicious circle in which vendors don’t make interoperability tools and standards available and users don’t demand them when new systems are implemented.n

Eric Skjei is a writer in Stinson Beach, Calif.