College of American Pathologists
Printable Version

  Lifting the quality of IHC analysis


CAP Today



January 2008
Feature Story

William Check, PhD

When the Hungarian obstetrician Ignatz Semmelweis demonstrated empirically in the mid-19th century that handwashing is an effective way to prevent the spread of infection, his colleagues in Vienna failed to appreciate the significance of his discovery. Their rejection forced Semmelweis to return to a small clinic in Hungary, where he died in anonymity. His work had to be rediscovered two decades after his death, when Lister and Pasteur had already independently demonstrated the germ theory of disease. Yet now, in the modern age of powerful antibiotics, we consider handwashing to be one of the most important steps in limiting nosocomial infections, even for such virulent organisms as methicillin-resistant Staphylococcus aureus and Clostridium difficile.

A transformation is taking place in anatomic pathology that has some similarity to the role of handwashing in infection control. It is a change not in therapeutics or technology but in attitude and technique. The “new” practice is to raise the quality of immunohistochemistry, or IHC, for prognostic and predictive markers to a much higher level. The steps for doing this are embodied in the guideline for HER2 testing, which “strongly recommends validation of laboratory assay or modifications, use of standardized operating procedures, and compliance with new testing criteria to be monitored with the use of stringent laboratory accreditation standards, proficiency testing, and competency assessment” (Wolff A, et al. Arch Pathol Lab Med. 2007;131:18). That this innovation has the potential to change practice well beyond its initial application is not widely appreciated, just as with handwashing. The goals, techniques, and criteria in the guideline for IHC testing for HER2 are already being extended to IHC for estrogen receptor and progesterone receptor proteins, and they may well become the standard for determination of all prognostic and predictive analytes. While IHC that is reproducible and more accurate will not save the dramatic numbers of lives that rigorous handwashing practices can, it can improve the quality of patient care substantially. It can also benefit pathologists in their efforts to maintain thriving practices.

One measure of the high value that knowledgeable pathologists place on improved IHC was the course presented at CAP ’07 titled “Evidence-Based Immunohistochemistry: Technical Validity and Diagnostic Relevance,” the stated purpose of which was “to bridge the gap between current practice and best practice.” Course organizer Paul E. Swanson, MD, director of anatomic pathology and professor of pathology at the University of Washington Medical Center, says promulgation of the guideline for HER2 testing was “a very big part” of why this course was introduced at this time. “Surveys for HER2 and ER/PR proteins have indicated deficiencies in laboratories performing and interpreting IHC for these markers,” Dr. Swanson says. “The implications are quite profound.

“It is not unreasonable to expect that we will be under the gun for all [predictive and prognostic] markers,” Dr. Swanson continues. “Why would HER2 be any more crucial than ER/PR, c-KIT [CD117], CD20, epidermal growth factor receptor [EGFR], or any other of the fairly long list of markers used to predict response to targeted therapies?” Dr. Swanson notes that, through the deeming authority granted to bodies such as the CAP, laboratories are essentially licensed to perform such “high-complexity” tests. Thus, there is a governmental interest in laboratories being able to validate markers globally. To publish standards for HER2 testing “sets a precedent for validating markers more broadly,” Dr. Swanson says. “I think it’s likely that in the next five years all predictive and prognostic markers will be held to the same standards, in part through additional joint guidelines from CAP and ASCO. Once published, such guidelines become standard of practice.”

“We have a legal responsibility to optimize IHC testing,” agrees Neal S. Goldstein, MD, one of the course faculty and director of IHC and molecular pathology, Department of Anatomic Pathology, William Beaumont Hospital, Royal Oak, Mich. “Based on FDA’s ruling on class reagents in IHC,” he says, “the legal responsibility for validation and knowing the relevant parameters is put squarely on the shoulders of the lab director.” Moreover, Dr. Goldstein contends that in certain situations diagnostic IHC can now require the same level of rigor as prognostic and predictive IHC. “Identifying the primary site of a neoplasm can rise to the level of a predictive test,” he says, “though it is not officially recognized as such.” As chemotherapeutic agents and regimens have been developed, neoplasms of a certain type and organ are often treated in a specific way. “So identifying the site and organ of a neoplasm de facto becomes a predictive result.”

M. Elizabeth H. Hammond, MD, who was a co-chair of the HER2 guideline expert panel, says the CAP ’07 course helped attendees apply that groundbreaking guideline more broad¬ly, which is important because “the requirements for HER2 IHC will be applied to anything that is a predictive cancer test and in the same way. “The elements that we put in place for HER2 were constructed to be generic,” she says. “So I expect they will be applied widely.” Dr. Hammond, professor of pathology at the University of Utah and a pathologist at Intermountain Healthcare, says the ASCO/ CAP panel will meet soon about applying the requirements to ER/PR determination.

A central message of the course was that “Pathologists should be validating every test they offer, not just HER2, but anything that is a critical predictive cancer test,” says Dr. Hammond. “It is more important to validate these tests because the result stands alone to guide patient treatment.” Pathologists have a huge stake in this testing, in Dr. Hammond’s view. “Ultimately all cancer treatment will be targeted to individual patients,” she says. “Whether any of those tests will be done by IHC depends on our being able to show that we can do these tests in an accurate way.” The session provided keys to doing that.

Allen M. Gown, MD, medical director and chief pathologist at PhenoPath Laboratories, Seattle, and clinical pathologist at the University of British Columbia, Vancouver, says Dr. Hammond is “absolutely right” and that most pathologists are probably not aware of the “broad sweep” of this issue. In IHC proficiency testing the mechanics have been left to individual laboratories. “That is changing with respect to HER2,” Dr. Gown notes. “And HER2 may be in the vanguard in this respect.” Explicit guidelines need to be extended to ER/PR assays. A recently published clinical therapy trial in Italy, Dr. Gown says, found a “shocking” false-negative rate for ER protein in referring labs relative to the central lab (Viale J, et al. Clin Oncol. 2007;25: 38– 46). He notes that some of the most important papers that address this topic are in the oncology literature rather than the pathology literature. “We pathologists need to be monitoring this literature,” he urges.

Particularly worrisome, in Dr. Gown’s experience, is his impression that many pathologists are not aware of the difference between validation and titering an antibody. “A pathologist buys an antibody that the vendor says is specific for a cancer, runs it on a few cases that are high expressors, and says that it works. That is not the same thing as validation, which means testing the antibody on a large number of positive and negative specimens. To do it properly is a lot of work,” Dr. Gown acknowledges. “At a meeting a pathologist recently approached one of my colleagues and asked him, ‘Do you really validate all those antibodies that you use?’ I have to say that we actually do,” Dr. Gown says. Often an antibody that the vendor and the literature say performs in a certain way proves to be unsatisfactory when Dr. Gown does the validation.

Dr. Hammond, too, has the impression that anatomic pathologists often fail to validate IHC assays. “One of the most common questions that I get about the guideline is about validation,” she says. The other critical element about which she gets questions is fixation. “These are key concepts that many pathologists are confused about and that are on everybody’s mind.” To help pathologists comply with the HER2 guidelines, Dr. Hammond says, the CAP is planning to develop a suite of products or tools, including educational sessions and online exercises. “We have to convince them to do it,” she says. “If pathologists don’t step up to the plate, they are going to lose it.”

Dr. Swanson was the opening speaker in the course on evi¬dence-based IHC, and his first objective was to “present a detailed overview of technical validation for predictive and prognostic markers.” He used the HER2 guideline as his illustrative text, calling it “a fairly important step forward in our practice as pathologists.” The most important element in this document, he says, is the “need for methodological and technical validation in each lab performing IHC.” This requirement should—and will—be extended to other markers. “Clinicians are treating patients based on what we say, but what we say is meaningless unless our methodology is valid,” Dr. Swanson said.

He listed criteria for clinical validation published many years ago by oncologist William L. McGuire, MD, which include a definitive clinical study with a sample size adequate for statistical analysis, methodologic validation, and optimized cutoff value (McGuire WL. J Natl Cancer Inst. 1991;83:154–155). While most “diagnostic” markers need only be validated rigorously against known positive and negative cases [as Dr. Gown noted], predictive and prognostic markers must also reliably predict outcomes or response to treatment in the patients’ samples used for validation. “In other words,” Dr. Swanson said, “the latter markers must be both technically valid and clinically valid. This is the higher level of validation that Dr. Goldstein referred to.”

Ideally, a valid method would show substantial equivalence with protein expression (or a clinically relevant surrogate) or methodologic identity to the original clinical validation assay. In reality, Dr. Swanson said, since there is no gold standard for HER2 testing—“neither FISH nor IHC is accurate in predicting outcome in 100 percent of cases”—laboratories have to settle for substantial methodologic (analytic) equivalence to the original study using a validated sample set or cross-validation with an alternative valid method (IHC vs. FISH).

A method should be validated when first put into use and whenever testing parameters or interpretive criteria change, such as when adopting the 30 percent cutoff value for 3+ IHC scoring advocated in the HER2 guideline. An “optimal” interval for re-validating is every six months. Two questions about which people sometimes get confused are whether laboratories need to validate FDA-approved tests and whether it is necessary to validate if you use a validated commercial method (Dako, Ventana, and Abbott-Vysis for HER2) and adhere to the vendor’s protocol. Dr. Swanson answered both questions with an emphatic “yes.” Biannual proficiency testing will not suffice, as indicated by the new checklist item ANP.22997.

While the rules and objective are clear, how to get there is more complicated. For instance, what sample set is appropriate for methodologic validation? Again, there are ideal and real-world answers. Ideally, representative tissues from the original clinical validation would be used. An alternative is standard tissue samples from an approved source, such as the National Institute of Standards and Technology, but for HER2 and several other predictive markers, these are not yet available. One real-world approach is to use consensus positive and negative cases (Fitzgibbons PL, et al. Arch Pathol Lab Med. 2006; 130: 1440– 1445). Another way is to use tissue samples from your own cases known to harbor the target by non-IHC means. One can also use concordance with a central laboratory, either sending local samples selected for consistent and reproducible stain results to a central laboratory or testing samples from a central laboratory in your setting. In this situation, IHC in general is somewhat idiosyncratic. “Since there is no processing or fixation standard, agreement between labs is subject to local preanalytic variables,” Dr. Swanson said. In the two CAP Surveys described in the article by Fitzgibbons, et al., which used tissue microarrays, 70 percent of the samples had 90 percent concordance among laboratories, suggesting their potential utility as consensus specimens for methodologic validation and controls (though it should be remembered that the proposed benchmark in the HER2 guideline is 95 percent concordance).

Another question with no firm answer is what sample number is sufficient for validation. In the guideline, 25 to 100 samples are recommended, based on statistical models that assume the method provides simply a “yes” or “no” answer for a given test. However, 25 samples may not be sufficient to achieve concordance between labs or methods in any given validation trial. “If your ability to do the stain is matched to an external concordance standard, it is increasingly unlikely [as the validation benchmark approaches 95 percent] that you will meet the concordance standard for the assay unless you use a much larger number of samples,” Dr. Swanson said. “With only 25 cases, the probability of meeting the 95 percent concordance standard is significantly less than 0.5.” (For a quantitative demonstration of why this is so, see Table A8 in Appendix F of the HER2 guideline.)

If a laboratory does decide to validate its assay by interlaboratory comparison, further conundrums arise. For instance, which facilities can provide appropriate comparisons? One possibility is a laboratory that has clinically validated the method. “Yes,” said Dr. Swanson, “but who is that?” Comparing to a large-volume central laboratory that has validated its method against a clinically validated assay is a good option. However, you still have to identify an appropriate facility. And, Dr. Swanson asked, “How do I know their method is valid?” These questions do not have simple answers.

It is no surprise that larger-volume laboratories report higher concordance values in the literature. “There is a concern from HER2 and ER/PR surveys and the literature that for smaller labs that do their own work but send samples to central labs [for checking], the local to central concordance rate is pretty low, around 75 percent,” Dr. Swanson noted. Larger laboratories, on the other hand, approach a higher internal and external concordance level, over 90 percent.

Dr. Gown agrees it will be far more difficult for smaller labs to do the kind of validation studies talked about, particularly for tests for which they don’t have a high volume. He says: “It may not be practical for them to do it. I know some small labs that do a good job of what they do and some big labs that are horrible, but in general big labs have more resources to do validation studies. They can be done by small labs, but it requires vigilance.”

Even larger laboratories’ attempts at validation can fall short of the standard. Dr. Swanson showed two examples of cross-validation for HER2 (validating IHC against a valid FISH assay), one at a single site (Vogel CL, et al. J Clin Oncol. 2002; 20: 719– 726) and the other between two laboratories (Dybdal N, et al. Breast Cancer Res Treat. 2005;93:3–11). In both cases concordance between methods was 89 percent, below the 95 percent standard. However, cross-validation failure may not be inevitable, Dr. Swanson suggested, based on work in his own laboratory, where more than 94 percent of cases that were 3+ by IHC were also FISH-positive and nearly 100 percent of IHC-negative or 1+ cases were FISH-negative. “You can approach the required standard in your lab with careful attention to the development and optimization of lab methods and interpretative criteria,” Dr. Swanson concluded. His colleagues at the University of Washington are now preparing those data for publication.

Setting rigorous performance standards will have consequences, though it is not yet clear what they will be. In principle, as proficiency is regulated, laboratories not meeting the 95 percent concordance benchmark for HER2, for example, may have to cease testing. “I don’t yet know the circumstances under which that could happen,” Dr. Swanson says. If a governing body were to set a numerical rule for concordance, some smaller labs might stop doing some of the tests and instead send them to a central laboratory using a validated method. Again, that brings up the question of finding a commercial laboratory that has a validated method. “Commercial laboratories will have to provide documentation that their method is valid,” Dr. Swanson speculates. “I don’t know how that will play out because there will be no objective arbiter of what is valid.”

Perhaps some pathologists and laboratorians would admit they need to work on their validation technique. But all would say they know how to fix and stain. Alas, even these most fundamental steps of IHC are not always performed optimally, says Dr. Goldstein, who spoke on standardizing and troubleshooting automated IHC. “There is science behind good and bad IHC,” Dr. Goldstein said. “There is evidence that fixation and antigen retrieval are the main components of a consistent, standardized, reliable IHC assay.” Attending to these factors will reduce preanalytic error.

One piece of evidence that is not always appreciated is that the conventional wisdom that formalin fixes tissue at about 1 mm per hour is a “myth,” Dr. Goldstein said. The truth is that formalin permeates tissue at about 1 mm per hour, but it fixes tissue much, much more slowly. It was almost 100 years after Blum first reported the efficacy of formalin as a fixative that the kinetics of this reaction were elucidated (Fox CH, et al. J Histochem Cytochem. 1985;33:845–853). These authors concluded that complete crosslinking takes 24 hours of exposure at room temperature or 18 hours at 37 degrees. This is because formalin in water is mostly present as methylene glycol (10,000:1). Formalin crosslinks proteins very fast, but methylene glycol converts to formalin very slowly; conversion is the rate-limiting step in fixation.

Microwave heating accelerates the process, but, Dr. Goldstein said, “it is not as short as people think.” More important, when adding microwaving to an IHC assay, Dr. Goldstein says, “you are completely changing the preanalytical component of the assay, so you must re-validate it. You can’t just put a sample into a microwave tissue processor and expect results to stay the same.”

Using needle core biopsies of 24 strongly ER-positive invasive breast carcinomas, Dr. Goldstein and his colleagues tested the impact of fixation time on ER staining by IHC (Goldstein N, et al. Am J Clin Pathol. 2003;120:86–92). They concluded that a minimum of about seven hours of fixation is required to “lock in” ER receptors using a procedure optimized for formalin-fixed tissue. Less than seven hours creates a risk of false-negative results. Dr. Goldstein now uses eight hours of fixation for all ER IHC. Small blocks are treated the same as large blocks: Permeation is faster in small blocks, but the formalin fixation time is the same. Samples are loaded onto the processor based on when they came into the laboratory, not on their size.

Clinicians might be expected not to be a fan of this practice because any biopsy that comes in after 2 PM is held overnight and fixed the next day. “There is perceived pressure on pathologists to have rapid turnaround times for these resection specimens,” Dr. Goldstein acknowledges. But the clinicians, it turned out, are supportive. “When I showed clinicians these data and the consequences of fixing specimens for too little time, they became our advocates in explaining to patients why results come back the next day.”

Fixation time is not an issue limited to ER assay; it affects many of the antibodies used in IHC. “CD20 gives results somewhat similar to ER,” Dr. Goldstein says. “It shows less intense staining with shorter formalin fixation time. Conversely, synaptophysin produces an extreme overstaining in specimens that are fixed for shorter amounts of time in formalin.” To be safe, he recommends, don’t try to shorten the fixation time for some antibodies. “Length of fixation is not the issue,” he says. “There is no minimum magical fixation time. The issue is maintaining a consistent fixation time and modifying your IHC assays around that time. If you find you are getting incorrect results on proficiency testing samples,” Dr. Goldstein advises, “you may need to re-examine your consistent fixation time.”

His chief take-home message: “A relatively narrow range of specimen formalin fixation time is the key to quality IHC stains.”

“That is exactly true,” Dr. Hammond agrees. She takes Dr. Goldstein’s evidence about longer fixation times to heart. “We now recognize that it was a mistake in the HER2 guideline to say that a one-hour fixation time was sufficient for breast core biopsies,” Dr. Hammond says. “We will change the guideline to conform with this higher standard.”

As with fixation, antigen retrieval, or AR, can be made more consistent with an evidence-based approach. A good place to start is to recognize that fixation and AR are related: Predominantly formalin-fixed tissue requires high-energy AR, while predominantly alcohol-fixed tissue requires minimal or no AR. “Antigen retrieval at a level adequate for formalin-fixed tissue when applied to tissue with predominantly alcohol fixation leads to over-retrieval problems,” Dr. Goldstein cautions. These include tissue holes, section fall-off, loss of nuclear detail, and overstaining, including high background staining. Conversely, applying low-AR conditions to predominantly formalin-fixed tissue leads to false-negative results. “In my experience and in my laboratory, the majority of bad IHC stains are the result of over-retrieval relative to the amount of formalin fixation,” Dr. Goldstein says.

Dr. Gown reinforces the impact of antigen retrieval on overall IHC results. “Most antibodies require a tissue pretreatment,” he says. “The AR method published by others may not work in your lab. Often we modify and optimize an AR process. In some situations antibody performance changes so much we go back and re-validate the assay.”

Dr. Goldstein closed his talk with comments about commercial IHC instruments, all of which he says are based on similar concepts and apply similar methods. In his 16 years of directing laboratories he has used five commercial instruments, both closed and open, from different companies. “All are equally capable of producing bad IHC due to poorly controlled preanalytical variables. And when done right, they can all give good results,” he said.

There is no magic about autostainers. The same issues underlie suboptimal stains as with manual staining. In fact, Dr. Goldstein said, “The more automated the staining instrument, the greater the need for consistent fixation time. Vendors promote the idea that autostainers can fix all our problems. That is not true. No automated staining instrument can improve bad IHC stains due to relative insufficient or overfixation.” When first putting an IHC assay on an automated instrument, it must be re-validated. “You cannot assume that rapid processors give the same results as standard formalin fixation,” Dr. Goldstein cautioned.

Barry R. DeYoung, MD, the third speaker at the CAP ’07 course, addressed two topics from an evidentiary perspective: outdating of antibodies and routinely using both cytokeratin (CK) 7 and 20 to work up metastatic carcinoma of unknown origin.

“We are currently in a period when, because of the expansion of IHC into new areas, namely, predicting in vivo response to drug therapy, there has arisen an exceptional amount of regulation and stipulations,” Dr. DeYoung, who is professor-clinical, co-director of anatomic pathology, and director of surgical pathology at the University of Iowa Hospitals and Clinics, told CAP TODAY. It is important to see that regulations about outdating of IHC antibodies are in line with what has been proved. Dr. DeYoung showed that the only two substantive publications on this topic found little or no evidence that antibodies lost efficacy up to two years after the manufacturer’s expiration date (Tubbs RR, et al. Arch Pathol Lab Med. 1998;122: 1051– 1052; Vigliani R, Babache N. Pathologica. 2002;94:121–129). Dr. De¬Young’s conclusion from these publications: “The real potential exists for an antibody to produce good results well beyond its expiration date, assuming that the laboratory utilizes good techniques and employs both positive and negative controls in an appropriate manner.”

In Dr. DeYoung’s view, application of the outdating rule to IHC reagents was an unintended consequence of the FDA’s granting of analyte-specific reagent status to antibodies used in automated analyzers. The rule covered IHC antibodies by extension. “Although the change to analyte-specific status was necessary, perhaps the consequences were not particularly well thought out,” Dr. DeYoung says.

The bottom line now is that the CAP, as a deemed accrediting body by the CMS, is obligated to follow this rule. Since Dr. DeYoung used antibodies beyond their expiration date, he was cited in his first unannounced inspection in June 2007, even though he uses “on-slide” positive and negative controls for all cases. Dr. DeYoung has written a reply. Meanwhile, he is complying with the rule at an estimated additional annual cost of $35,000—a 25 percent to 30 percent increase in the lab’s antibody budget.

The basis for routinely using both CK7 and CK20 in the evaluation of metastatic carcinomas of unknown origin is that CK7 is expressed in many neoplasms while CK20 expression is much more restricted; it is present in colorectal and urothelial cancer and in Merkel cell carcinoma. Dr. DeYoung found more than 200 publications using these two antibodies to differentiate between classes of neoplasms. He considers one review article in particular “great for correct interpretation” (Chu PG, Weiss LM. Histopathology. 2002;40:403–439). “They felt that CK7 could be helpful but that heavy reliance on a CK7 datapoint is probably not in everybody’s best interest.”

Dr. DeYoung showed six neoplasms of uncertain primary stained with CK7 and CK20. All were CK7-positive and CK20-negative, as expected. In two further cases (for example, a 58-year-old male with excruciating penile pain, elevated PSA, and hematuria), CK20 was positive. Dr. DeYoung’s conclusion was that “CK7 didn’t add much to history and H&E morphology.” In a small tissue specimen with no known primary lesion and a poorly differentiated malignant neoplasm, a panel of IHC antibodies is a reasonably effective and cost-efficient tool, Dr. DeYoung suggested. He has, in collaboration with Mark R. Wick, MD, devised an 11-antibody panel that uses CK20 but not CK7 (DeYoung BR, Wick MR. Semin Diagn Pathol. 2000; 17:184– 193). His rationale is that any discriminatory value is based on the CK20 result, since CK7 has virtually ubiquitous expression.

Dr. Gown agrees that reflexively using both CK7 and 20 in all cases of metastatic carcinoma of unknown origin is not a good idea. “If you are dealing with a 65-year-old female with a lung tumor and a history of breast cancer, CK7/20 are not helpful,” he told CAP TODAY. “For a patient with a tumor in the abdomen from the GI or genitourinary tract, they could be useful.” But he is even more stringent. “I don’t believe in the concept of fixed panels,” he says. “And I would rarely use 11 antibodies in a panel.” In his first example, five to six markers would be definitive. Even when evaluating a liver tumor of unknown primary, he says, “I still don’t think I would use more than eight or nine.”

Addressing the bundle of issues around improving IHC, Dr. Hammond says, “It is highly desirable for us to be able to solve this problem and to continue to do testing in a distributive way.” She hopes that the HER2 guideline will show that pathologists can implement the recommendations and improve their performance. “The future of our role in predictive cancer testing is at stake,” she says.

William Check is a medical writer in Wilmette, Il.