College of American Pathologists
Printable Version

  New attention settling on genomic thyroid nodule test


CAP Today




November 2012
Feature Story

Anne Paxton

Cancer diagnostics researchers sometimes labor in relative obscurity, says Bryan Haugen, MD, head of the Division of Endocrinology, Metabolism and Diabetes at the University of Colorado School of Medicine. “Many of us in the field do our marker sets, publish our papers, and have them on our CV, and never see a test come out of it.”

On the other hand, sometimes they develop a test that makes a difference. As a study reported Aug. 23 in the New England Journal of Medicine illustrates, that’s what the Afirma Gene Expression Classifier, a proprietary genomic test of the molecular diagnostics company Veracyte, is likely to do with thyroid nodule assessment. In theory, if results of the study are used to inform clinical decisionmaking, the Afirma test could lead to far fewer surgeries each year on patients who have indeterminate thyroid specimen cytology.

In the study, a 19-month prospective multicenter validation of the Afirma test funded by Veracyte, researchers used the gene classifier to test 265 prospective fine-needle aspirations from thyroid nodules 1 cm or larger that were labeled cytologically indeterminate. With these specimens, the Afirma test demonstrated a 93 percent overall negative predictive value for ruling out cancer; the test was able to reclassify specimens as “benign” with greater than 94 percent accuracy in the major categories of indeterminate thyroid FNA cytology (atypia of undetermined significance [AUS] and follicular neoplasm [FN]) (Preoperative diagnosis of benign thyroid nodules with indeterminate cytology. N Engl J Med. 2012;367:705–715).

“Thyroid cytopathology is critical to evaluate thyroid nodules where the majority are accurately classified as benign or malignant,” says Zubair W. Baloch, MD, PhD, professor of pathology and laboratory medicine, University of Pennsylvania. “In nodules that are cytologically indeterminate, the Afirma Gene Expression Classifier (GEC) complements thyroid cytopathology by reclassifying AUS or FN nodules as GEC benign or GEC suspicious, with a negative predictive value of 94–95 percent and a positive predictive value of approximately 40 percent, respectively.”

“The performance of the Afirma GEC in this validation study suggests that in cytologically indeterminate thyroid nodules, unnecessary diagnostic thyroid surgery may be avoided in half of the patients whose thyroid nodules are actually benign, while lowering the risk of malignancy in those with GEC benign results to the same risk as a cytologically benign nodule,” Dr. Baloch adds.

While the Afirma test has been available since late 2010, the New England Journal study gives the test new prominence by confirming that, for cytologically indeterminate specimens, it has a level of accuracy that is comparable to a benign diagnosis. “This test is a definite leap forward,” says Dr. Haugen, one of the co-principal investigators of the study and a professor of medicine and pathology. Until the Afirma test, the standard of care for indeterminate biopsies—which can be 20 to 30 percent of the total—was to say, “We don’t have another reliable preoperative test, so what we’re going to do based on the cytology and my clinical impression is to go to surgery, or we can wait and watch.”

Surgery will be recommended more sparingly if the test is employed, experts believe. “If we get an initial indeterminate result, we will be able to tell you in a substantial fraction of cases, with high reliability, that we don’t need to worry about that nodule based either on the Afirma test or on a repeat cytology examination,” says W. Stephen Black-Schaffer, MD, associate chief of pathology at Massachusetts General Hospital and associate professor of pathology at Harvard Medical School (who was not involved in the study). “As opposed to saying, ‘Well, if it was clinically worrisome enough for you to go see a doctor and get aspirated, we can’t reassure you that we don’t need to do surgery.’”

Co-principal investigator Erik K. Alexander, MD, an endocrinologist with Brigham and Women’s Hospital and associate professor of medicine at Harvard Medical School, says the assay has increased the precision of the standard diagnostic assessment. “This was the first study to have been done so broadly. It included 49 different sites, it was double-blinded, and there were 4,000 thyroid nodules prospectively enrolled to achieve this endpoint. There’s never been a validation trial of thyroid nodule diagnostics that has come close to this level of transparency and translatability.”

Findings from an additional trial of the Afirma test were published online Oct. 22 in the Journal of Clinical Endocrinology & Metabolism, and are scheduled to appear in the December print issue. In that study, the Afirma test also demonstrated strong accuracy, reliability, and reproducibility under a range of conditions and variables, says Dr. Baloch. “This includes maintenance of FNA sample stability during collection and shipment, analytic sensitivity as demonstrated by the test’s tolerance of low RNA quantities and small amounts of malignant material in patient samples, and analytical specificity.” The test performed well even when samples were highly diluted by blood and other genetic material, and results were reproducible across operators, processing runs, reagent lots, and laboratories.

Another notable finding, Dr. Baloch says: “In addition to the Afirma GEC generally extending the utility of everyday thyroid cytology, it also identifies rare thyroid neoplasms such as medullary thyroid cancer, metastases to the thyroid, and parathyroid tissue. The preoperative identification of these neoplasms helps to significantly improve patient care.” No other thyroid molecular test has been evaluated in two prospective multicenter studies, nor against blinded expert endocrine histopathology, he says.

Predictions of a dramatic impact on surgery rates have been borne out in another recent study published in Thyroid. This study reported a “striking reduction” in the rate of diagnostic thyroidectomy in patients with cytologically indeterminate nodules in a substantial group of medical practices. Approximately one surgery was avoided for every two Afirma Gene Expression Classifier tests run on thyroid FNAs with indeterminate cytology. While the historical rate of surgery for cytologically indeterminate nodules among the practices was 74 percent, the operative rate fell to 7.6 percent during the period the Afirma tests were obtained. (Duick DS, et al. The impact of benign gene expression classifier test results on the endocrinologist-patient decision to operate on patients with thyroid nodules with indeterminate fine-needle aspiration cytopathology. Thyroid. 2012; 22:996–1001).

In his own practice, Dr. Haugen has already witnessed impressive results from the use of Afirma. “Since early 2011, we’ve sent off about 140 of these indeterminate thyroid samples for testing with Afirma, and about 70—roughly half—came back benign. Maybe one or two of those did have surgery because the patient had symptoms or the physician thought, ‘I’m sending them to surgery anyway.’ But the vast majority of them didn’t have to have surgery and they are being monitored as if they had a benign cytology lab result without surgery.” Because the specificity and positive predictive value of the Afirma test are only about 50 percent, “We are still sending a large number of patients without thyroid cancer to surgery. But that being said, it used to be that all 140 of these people would go to surgery, so we’re saving half of them from surgery.”

Each year brings 56,000 new cases of thyroid cancer and about 2,000 deaths from the disease. “It’s serious,” Dr. Alexander says. “But we are encouraged by the fact that if it’s identified and treated early, patients can often be cured. What you want to do is identify the patients who need to have the thyroid removed, and avoid unnecessary intervention in those who have no disease.”

An FNA procedure is minimally invasive and extremely low risk, requiring no anesthesia and less than 30 minutes in the office. The FNA can be very helpful if the results, which typically take seven to 14 days, indicate a benign cytology or a cancerous cytology, Dr. Alexander notes, but there is a vexing middle ground. “Approximately 20 to 30 percent of samples will have an ‘indeterminate’ finding. There are an adequate number of cells, but the sample cannot be anatomically determined to be conclusively benign or malignant.”

Because there is a risk of malignancy, these patients are often referred for surgery for diagnostic reasons. “And what we know is that half or more of these cytologically indeterminate nodules are histologically benign after they are resected. So there is unnecessary surgery for up to tens of thousands of patients a year.” At an average cost of $10,000 to $15,000, those surgeries are a burden on the health care system, but they also expose people to the risks of invasive surgery, ongoing hormone therapy, and associated medical care.

In devising their NEJM study of the Afirma test, the researchers formed the hypothesis that one could develop a test with a very high negative predictive value that, when applied to these cytologically indeterminate nodules, could give a very conclusively low risk of cancer if the result was indeed benign, and therefore allow a more conservative clinical approach that would not require surgery.

Many different molecular markers for thyroid cancer have been assessed over the last decade and many are continuing in the development phase, as researchers look at immunostaining, TSH, serum markers, or other microRNA assessments, Dr. Alexander notes. The Afirma test was developed to maximize its negative predictive value. “This is what sets it apart from most other markers that are available. And what’s even more important is it is the only molecular marker that has been evaluated in a very large, prospective, multicenter validation.”

“This is probably one of the early studies showing the power of looking at the expression of the entire genome and seeing how helpful it is in day-to-day practice. I expect Veracyte and others will now begin looking in a similar fashion at other illnesses using this platform,” Dr. Alexander says.

The Afirma test was empirically developed, he explains. “The concept behind it is a novel one and I think a very important use of expanding technology. The idea was that one could take the expression of the entire human genome and look at all 22,000 expressed genes, and there would be a pattern that could be assessed and accurately predict a benign process among cytologically indeterminate thyroid nodules.” So the test developers followed a methodical, iterative process of repeated developmental testing of cytology and pathology specimens. “They would develop a cohort of expressed genes, then test blindly on a new sample, then go back to improve performance on the next-generation assay and again blindly test.”

Dr. Haugen refers to the process as a “discovery-based” rather than “candidate-based” method. “They basically used this bioinformatics algorithm to repetitively train a gene set and say which genes in it can come together across a broad spectrum of benign and malignant thyroid tumors. After repeated testing, re-tweaking, and retraining, the final test we did locked the classifier in and tested it on 265 unknowns, and that was the study reported in the NEJM article.”

It’s not the most common way to develop a molecular diagnostic test. “Most commonly, they say these genes are mutated, or these genes are over-expressed in thyroid cancer and people have already found that, so let’s put together a set of known markers and then test it. With this test we came about it the other way,” Dr. Haugen says. “We said we know what’s cancer and what’s not cancer, and we’ve got this very robust multi-gene chip. Can we train the chip to do a kind of fingerprint analysis and say, yes, this is a cancer, and, no, this is not?”

Despite Afirma’s high negative predictive value in the study, there were seven false-negatives among the 265 indeterminate samples tested. “We did show we could achieve a low negative predictive value—but it was not perfect,” Dr. Alexander notes. Of the seven false-negatives, six were papillary thyroid carcinomas. “But when we looked at what could be causing these results, we looked for independent expression molecularly of thyroid carcinomas or even thyroid follicular cells, and expression was extremely low in those six false-negative specimens compared to clinical operative carcinoma samples—which means that sampling error may have played a substantial role here.” In collecting a specimen, for example, “the physician may not have fully placed the needle directly into the nodule or could have obtained a sample of soft tissue instead of actually the nodule. We will likely never know for sure.”

The researchers considered other possible sources of error, Dr. Haugen adds. “There were a small number of instances where two different pathologists didn’t agree on a final histopathology interpretation, and we considered whether those were related to the false-negatives. But the answer to that is no; it wasn’t occurring when the pathologists were disagreeing. The biggest reason there were false-negatives is that there probably weren’t enough representative thyroid cells in the sample, so it was a specimen collection issue.” The way the test is conducted now, this is going to continue to be a problem, he believes. “I tell people if you have a nodule that is indeterminate, and it has very low cellularity, you have to be a bit more careful with the benign Afirma result, based on what we found. I would love to see some other sort of marker come along that would say what the cellularity is, but that’s not part of the test at this point.”

As use of Afirma takes hold among clinicians, Dr. Black-Schaffer says, its relatively high cost—$3,000 to $4,000 per test—is likely to lead to a different predictive value from the one found in the NEJM study. “Under the standard algorithm of care, the patient comes back for a second FNA and in a standard cytological exam, a very large proportion will be benign. Now, the logical thing will be to say that, given that this test is more expensive than a second FNA, why don’t we go through that whole algorithm again, do a second aspiration, and a certain number will be benign or malignant, and the ones that are indeterminate, those are the ones we want to use the test for.”

But because no test is perfect, this strategy has pitfalls. “If you take a population and you enrich it with the condition you’re looking for, the people who may actually have cancer, that means that for the same intrinsic ability of this test to give you a precise negative result, the biological significance of what it means you can tell a patient may be different.” As an extreme example, he says, since it’s an imperfect test, “if you ran it by a population that was entirely patients with cancer, somebody is going to test out as not having cancer, and then the test’s negative predictive value in that population is zip.”

“We’re not at anything like that extreme in terms of trying to use this test, which is expensive, judiciously, but as we move away from the population in which it was validated, the initial round of indeterminates, we have to proceed cautiously to make sure we are not over-relying on its ability to predict cancer in any patient population.” The researchers hope the Afirma test will continue to have the same rate of detection of cancer. “And if we do, it will have been a success, but we may discover the predictive value has changed to something less than anticipated due to the enriched population.”

J. Larry Jameson, MD, PhD, professor of medicine, University of Pennsylvania, in an editorial in the same issue of the New England Journal as the study (367:765–767), makes a related point in calling for caution in interpreting the validation study of Afirma. “Can this new gene-expression test reduce unnecessary surgery? The answer is seemingly ‘yes,’ but with important caveats.” Given the five to 10 percent risk of false-negatives, he says, an FNA or even a diagnostic hemithyroidectomy when the gene-expression classifier indicates a benign profile might be reasonable, especially in certain groups. Additional molecular tests may further refine diagnostic accuracy, especially when known mutations associated with 100 percent risk of malignancy are present, such as BRAF V600E, RET/PTC, and PAX8-PPARγl.

Other various clinical factors could also influence the national impact of Afirma on surgery rates, Dr. Black-Schaffer notes. Based on the study findings, it may be a “plausible guesstimate” that a third of the 75,000 surgeries now performed based on indeterminate thyroid specimens won’t be necessary. But patients who are anxious may have surgery even if a test result is benign. “Other times the patient is at high risk for surgery or is worried about it, which is not an irrational concern, and they end up not getting an operation. So these national statistics don’t correspond to the precise methodology that a medical study would have.”

Concern has also been raised about the fact that Afirma is proprietary to Veracyte. “Of course they’ve invested money in it and they own it and it’s nobody else’s test, but gene patents are one of the challenges,” Dr. Black-Schaffer says. “Many organizations, including the College, have argued you should not be allowed to patent genetic information because genes are intangible. You can patent what to do with them if you have a unique process, but not the gene itself.”

A possible downside of the test’s being proprietary, Dr. Black-Schaffer points out, is that its single-source availability can bar researchers from determining objectively whether the test is doing what the company says it’s doing. Another concern, Dr. Baloch says, is that cytopathologists could find themselves with considerably less business in the evaluation of thyroid cytology. “Although I do not have any business affiliations and no longer have research-related associations with the company, I am hopeful that Veracyte in the long run is going to work closely with the cytopathologists and not against them,” he says. “What I gather is that they are actively investigating options to extend access of the Afirma GEC to all thyroid cytopathologists.”

For now, Dr. Haugen is finding that his clinical colleagues are differentially using the Afirma and complementary genetic tests, and the field is adjusting to a new way of evaluating thyroid nodule biopsies. “This indeterminate category has been a difficult area for pathologists to work in, because endocrinologists and surgeons don’t like to hear that result, and pathologists would rather put it in a more definitive category such as benign or malignant. But we’ve been stuck with this for a number of years, and I think the Afirma and the other genetic tests are nice complements to help pathologists in this indeterminate category. The tests really enhance those areas where the cytopathologist—rightly so—just cannot tell.”

From the patient’s standpoint, the benefits are substantial. “No patients want unnecessary surgery,” Dr. Alexander says. “With a cytologically indeterminate, then benign Afirma result, clinicians and patients can now feel quite confident that the risk of cancer in a thyroid nodule is five percent or less. It’s very reassuring and is leading to use of a more conservative approach to management of many patients.”

Anne Paxton is a writer in Seattle.