April 2005
Feature Story
William Check, PhD
Analyzing data from the thousands of sites on a gene expression microarray
is still an evolving science. "Bioinformaticians do not agree on the best
way to do these analyses," said Lynne Abruzzo, MD, PhD, during her talk
on the clinical potential of microarrays, or "gene chips," at the meeting
of the Association for Molecular Pathology last November. To improve the
ability to handle these large datasets, Duke University sponsors a competition
each year called CAMDA—Critical Assessment of Microarray Data Analysis—in
which bioinformaticians are challenged to come up with new ways to evaluate
a publicly available dataset. To illustrate the pitfalls in handling data
from microarrays, Dr. Abruzzo, associate professor of hematopathology
at the University of Texas/MD Anderson Cancer Center, Houston, related
an anecdote from the 2002 CAMDA competition.
In that year, the dataset contained expression profiles of genes in mouse
liver, testis, and kidney. A group led by biostatistician Kevin Coombes,
PhD, of MD Anderson, set out to determine which genes are specifically
expressed in each of the organs. However, data from some of the samples
did not cluster in an interpretable way. Dr. Coombes and his colleagues
realized that one-third of the genes were labeled with the wrong gene
name. When they figured out what the correct labels were and reanalyzed
the data, they got a nice graph.
They reported this discrepancy to the scientists who had obtained and
published the original dataset, and these scientists re-examined their
data. They found that an extra line had been inserted into the Excel spreadsheet
that matched the gene name with its location. All of the data entered
after the insertion improperly matched the gene name with its location
on the array—a registration error. "It’s very surprising that it
was found," Dr. Abruzzo said.
Accurately handling massive amounts of data is only one of many challenges
of using microarrays that Dr. Abruzzo detailed in her presentation at
the AMP meeting. She concluded, "I don’t believe that microarrays are
ever going to be directly useful for diagnostic testing." Many in the
audience applauded her conclusion. But, Dr. Abruzzo told CAP TODAY, "What
I said is kind of controversial. Sometimes when I give talks, people come
up and insist, ’No, we are going to be using these in the clinical laboratory.’"
Dr. Abruzzo stands by her assessment, however. "As a research tool, gene
expression microarrays are terrific," she says. "They allow you to look
for expression of a few thousand to tens of thousands of genes at the
same time. As a research tool, they have revolutionized how we look at
gene expression." As a clinical tool, she does not think that they will
be useful. What will eventually happen, Dr. Abruzzo believes, "is
that expression analysis of a tumor, for instance, will identify a dozen
or 50 genes that are important for diagnosis or prognosis. Instead of
using a microarray with thousands of genes, we will do semiquantitative
real-time PCR [QRT-PCR] for a few dozen genes, which is more reproducible
and has a wider dynamic range."
Karen Kaul, MD, PhD, director of molecular diagnostics at Evanston (Ill.)
Hospital and professor of pathology and urology at Northwestern University
Feinberg School of Medicine, Chicago, predicts gene chips will be useful,
with the qualification that utility will be confined to smaller chips.
"With genetic diseases, I could envision doing cystic fibrosis or some
other conditions by low- or moderate-density arrays," Dr. Kaul says. She
did a trial of Clinical Microsensors’ 36-target array for cystic fibrosis
and found that it worked well. She also sees microarrays being helpful
in solid tumors and hematopathology, though again these will be smaller
arrays—which she defines as a dozen to less than 100 sites, not
thousands or even the upper hundreds. "And there may be a way to multiplex
for different types of viruses or microbes that cause pneumonia or encephalitis,
to put them on a small array," Dr. Kaul says. "Maybe later, when we better
understand the markers, how to use them, and can better handle the data
analysis, larger arrays will be more prevalent."
Matt van de Rijn, MD, PhD, associate professor of pathology at Stanford
University School of Medicine, agrees that gene expression arrays are
excellent discovery tools. "Many groups have used microarrays to help
dissect the molecular phenotype of tumors," says Dr. van de Rijn, who
has done some of this work himself. He says it is still too early to point
to immediate results for patients. "In many cases I anticipate that is
just around the corner," he says. "We have already developed markers and
profiles that lead to better diagnosis and prognostication and identification
of new therapeutic targets."
In his view, some version of microarrays will enter clinical service.
"People complain they are so expensive," he says. "But compared to other
clinical tests that doesn’t hold water. There will always be a difference
between the discovery phase, where you want as many genes as possible,
and clinical use, where you may want to contain those numbers. But financially
a 70-gene array versus a 250-gene array or thousands of genes is not that
different." He believes that at some point this will be a clinically relevant
test, though he’s not willing to say that thousands of genes on microarrays
will be the format.
Microarray devices are beginning to become available for laboratory use
"slowly but surely," says Larry Kricka, DPhil, FRCPath, professor of pathology
and laboratory medicine and director of the general chemistry laboratory
at the Hospital of the University of Pennsylvania. "For the DNA type of
microarray, probably one of the most important recent developments was
FDA approval of the Roche/Affymetrix Amplichip," he says, which detects
polymorphisms in cytochrome P450 enzymes that affect drug metabolism.
"That was a real landmark."
Still, Dr. Kricka says, "I think most people would agree that very large-scale
DNA or protein microarrays are still research tools." Dr. Kricka sees
arrays containing small groups of markers being useful in diagnosing and
managing disease. Of quadruple testing for disease in the unborn, for
instance, he says, "I could imagine an array for that." Testing for the
25 mandated mutations for cystic fibrosis also falls in this area, as
might allergen testing. "Beyond that virtually everything we do is discrete
testing," Dr. Kricka notes. "Laboratories don’t do that many groups of
tests bundled together."
While there are many types of microarrays, the term is most often used
to describe a glass slide or silicon chip holding thousands of segments
of nucleic acid at specific sites to which RNA or cDNA from cells or tissue
is hybridized to create a gene expression profile. Another type of array
is represented by the Roche/ Affymetrix Amplichip, which contains thousands
of single-nucleotide polymorphisms, or SNPs. With an SNP chip, test nucleic
acid is hybridized to see whether the subject’s genome contains any of
these SNPs, which is a digital (yes or no) question, as opposed to the
analogue (quantitative) question posed by gene expression profiling. SNP
chips are well suited to screening, such as to identify individuals who
have some specific characteristic, for instance, salt-sensitive hypertensives,
or to classify patients according to their ability to respond to specific
drugs. A separate type of array is a protein array, in which proteins
are affixed to a support. Tissue arrays are an entirely different category
(Related article: "What are tissue arrays?").
Two types of expression arrays are in common use. In the spotted array,
pioneered by Pat Brown, PhD, and colleagues at Stanford University, solutions
of nucleic acids containing fragments of genes are spotted onto a glass
slide. In the printed array, pioneered by Stephen Fodor, PhD, and colleagues
at Affymetrix, oligonucleotides (about 20 bases long) representing gene
segments are built one nucleotide at a time on a silicon wafer by photolithography.
For clinical utility, both types of expression microarrays pose similar
problems, which derive from the exact feature that is the array’s main
strength—its ability to gather thousands of data points from a single
run. One issue Dr. Abruzzo identifies is how you report a result if you
don’t know its significance. "Do you report results for every gene or
only the positive genes?" she asks. "How do you report 10,000 data points?"
Moreover, she asks, "Why would you get thousands of measurements if it
turns out you need only a dozen measurements to make a diagnosis? One
of the things we do as pathologists is to narrow down the tests we want.
We don’t order every stain at our disposal."
A second issue poses more of a statistical conundrum. "Many statisticians
are saying that nobody really knows how to build models for analyzing
these large datasets yet," Dr. Abruzzo says. To illustrate, she points
to a paper by biostatisticians at a French research institute (Michiels
S, et al. Lancet.
2005;365: 488-492). These investigators re-analyzed data from the seven
largest published studies that attempted to predict prognosis of cancer
patients on the basis of DNA microarray analysis. Their finding: "The
list of genes identified as predictors of prognosis was highly unstable,"
with five studies publishing "overoptimistic results." As Dr. Abruzzo
puts it, "They couldn’t cross-validate the results of those studies."
She cites a statistical explanation: "When you do studies using at best
only a couple of hundred samples but taking thousands of measurements,
statisticians say you can find models that can classify tumors correctly
based on noise. So I think microarrays are not ready for prime time in
the clinical arena yet."
Dr. Kaul expresses a similar reservation. "One of the things that concerns
me about translating large microarrays into the clinical laboratory is
that I don’t believe we know what to do with that much information or
have systems to handle it." Even with the Amplichip, she worries, "I’m
not sure we will be able to deal with 1,000 or 2,000 data points."
Because SNP chips operate in a digital yes-or-no mode, software may be
able to handle their data output more easily than expression microarrays
which are quantitative, speculates Theodore Mifflin, PhD, research associate
professor in the Department of Pathology and director of automation in
the Medical Automation Research Center at the University of Virginia Health
Sciences Center, Charlottesville. "We will see when people have experience
with the Amplichip," he says.
Dr. Kricka says handling results from thousands of sites is a research
issue now. "I cannot envision the clinical laboratory being involved with
a test that will be based on the results of a thousand-location array.
It is hard to see how we would do QC on that," he says.
Dr. Kricka sees quality control as a general issue with microarrays.
"In the clinical laboratory we are used to doing tests one at a time in
single-channel mode and controlling each individual assay separately,"
he says. "When you use a microarray you bundle all the tests together
on one analytical device. Issues of QC suddenly become more complicated.
If you have a 10-analyte device and you run a control to see if it works
properly, what if one location doesn’t work? Do you ignore that and use
data from the other nine locations? It’s not clear."
A British company, Randox, has a much-awaited analyzer called Evidence,
which will use a biochip array with up to 20 tests per chip and perform
multiple immunoassays on each patient sample—for example, thyroid
array, tumor monitoring array. However, Dr. Kricka poses this question:
Once you report the five or 10 analytes ordered on a given patient, what
do you do with the other 10 or 15 results? One option is to leave the
other results stored. Perhaps the clinician will look at the reported
results and decide results from one of the other tests would be helpful.
"Then you could report it immediately," Dr. Kricka says. However, a second
school of thought says all results that were not ordered should be completely
suppressed. Laboratories don’t face this question now. "We used to bundle
tests together into profiles," Dr. Kricka says. "We moved away from that
practice to discrete testing because of reimbursement rules. Microarray
devices may send us back to that profiling approach," and indeed the Evidence
has a retrospective reporting facility to enable retrieval of previously
unreported results.
Dr. Mifflin is concerned about a number of technical issues. Regarding
whether profiles derived from expression arrays are reliable, he says,
"The literature contains a number of contradictory reports about whether
these are useful at this time. Results may be conflicting or taken from
a very narrow perspective." An array asks whether any genes are over-
or underexpressed. But compared to what? And how large are the differences?
"All those answers depend on what the investigators choose as reference
material," he says. Even the definition of normal has come into question.
"Some early studies used immortalized cell lines as reference material,"
Dr. Mifflin says. But these are basically malignant cells, or at least
precursors to malignant cells, and may not be representative of normal
healthy cells. "Most people are waiting to see what comes out of ongoing
studies," he says.
Dr. Mifflin also cites several assumptions underlying gene expression
studies: The process of extracting, amplifying, and hybridizing will accurately
quantify all messages (mRNAs) with equal efficiency; message lifetimes
won’t have an impact on outcome of the analysis; and messages actually
translate into changes in phenotype. Regarding the second point—message
lifetimes affecting outcomes, it is known that some messages are quite
short-lived (seconds to minutes) and so are probably lost during sample
processing. As to whether a change in expression translates into altered
survival, Dr. Mifflin says, "What we see in the message analysis may be
modulated or diluted by the biological systems we are trying to evaluate."
With microarrays the "information hogs" they are, says Dr. Mifflin, archival
storage may also pose challenges. Microarray data are stored as TIFF files,
which can take up to several megabytes just for one image. "If you scan
a hundred slides, you are looking at gigabyte storage capability," he
says. "That’s an information requirement not so common in clinical laboratories,
not to mention the corresponding image and information-processing capabilities
needed."
In Dr. Mifflin’s view, controls and surveys will be crucial for clinical
laboratories that run microarrays such as the Amplichip. "People are starting
to use it and we should be figuring out ways to validate the results,"
he says. "Right now basically there aren’t reference materials out there
for expression or SNP microarrays. If you run controls it is because you
made them yourself. CAP or a similar organization such as AACC could serve
an excellent purpose here."
Dr. Abruzzo agrees that QC is difficult with microarrays, and it’s "because
they have many moving parts," she says. Variability can arise in at least
five places: array fabrication, target preparation, target hybridization,
imaging, and data analysis. For each source of variation, Dr. Abruzzo
has collected two or three real-life examples, which constitute her "Microarray
Hall of Shame." For instance, when the MD Anderson microarray laboratory
was being set up a few years ago, 2,300 clones were purchased from a reputable
supplier. When the clones were grown and sequenced, only 79 percent could
be verified. Another problem in array fabrication occurred in 2000 when
a commercial vendor printed arrays in which about one-third of the mouse
sequences were wrong because they were copied from the wrong strand. Target
preparation is also crucial. "Garbage amplification is exponential," Dr.
Abruzzo says. "You start with a couple of crappy RNA samples and you end
up with reams of crappy data."
Dr. Abruzzo found what she calls a "somewhat scary" result demonstrating
variability in data analysis when she did experiments in her primary area
of research—gene expression profiles in chronic lymphocytic leukemia,
or CLL. Cases of CLL with somatic hypermutation have a much better prognosis—a
median survival of 25 years compared to eight years for cases lacking
hypermutation. "I was trying to find genes differentially expressed between
those two groups," she says. When results were analyzed using a standard
statistical method, called dChip, the results were "not particularly biologically
interesting," Dr. Abruzzo says. She asked the statistician if there were
other statistical tests that could be used to analyze the data. He then
used two other common methods, the two-sided T test and Wilcoxon analysis.
To their surprise, they ended up with three different lists of genes with
little overlap.
"I thought that was kind of shocking considering that anyone doing microarray
work could be using any of these tests," Dr. Abruzzo says. To find out
which statistical test gave the "true" result, she took about one dozen
genes identified by each test but not the other two and subjected them
to QRT-PCR validation. "I was expecting to find that one test was giving
the true result and the others were picking up noise," she says. Instead,
she found it didn’t matter which statistical test she used: QRT-PCR confirmed
differential expression for about 85 percent of the gene candidates irrespective
of which statistical method identified them. "That means that even though
all three methods gave different profiles, they were all equally good
at finding differentially expressed genes," Dr. Abruzzo concludes. "So
if your goal is to get a complete list of differentially expressed genes,
you probably have to use more than one statistical test on your expression
array data."
Given the drawbacks of expression microarrays, it is not surprising that
many experts predict microarrays will be used in research mode to identify
diagnostic or prognostic genes, which will then be assayed clinically
by a simpler method. One example of this is Genomic Health’s Oncotype
Dx assay. From a set of 250 candidate genes, many identified on microarrays,
21 genes were selected that predicted response of breast cancer to tamoxifen
(Paik S, et al. New
Engl J Med. 2004;351:2817-2826). These genes are assayed by multiplex
RT-PCR.
While it is a good approach, multiplex RT-PCR has limits, too, according
to Dr. Kaul, particularly on the detection end. "In our lab, we are big
fans of distinguishing sequences based on melt curve differences," she
says. "But there is a limit to what you can do with that." She has been
looking at how to identify and differentiate atypical mycobacteria. "We
have some PCR assays to do that," she says. "But it is very difficult
with melt curve analysis. As you put many targets in there it can become
difficult to discern them all."
A preferable approach might be to do multiplex PCR with a low-density
array as a detection system. "Say you have a tube of amplicons," Dr. Kaul
explains. "How do you tell what is in there rapidly? You could do gels.
You could set up fluorescent probes—there are systems that allow
you to identify and quantitate a couple dozen markers. But the least work-intense
and most attractive option might well be a low-density array."
Several companies are working on low-density arrays for clinical use.
For instance, Nanogen has several applications slated for release this
year as ASRs, including factor V/II, a 37-mutation cystic fibrosis chip,
a chip for identifying seven viral respiratory pathogens, one for CYP450,
one with three hyper homocysteinemia markers, and an Ashkenazi Jewish
panel, according to a spokesperson.
Work by Dr. Brown and his coworkers at Stanford is an interesting example
of the reductionist approach—identifying a significant expression
profile, then finding a relatively small subset that is clinically useful.
A group of Dutch investigators reported in 2002 that a subset of 70 genes
identified from expression microarray analysis predicted survival in primary
breast carcinoma (van de Vijver MJ, et al. New
Engl J Med. 2002;347:1999-2009). Later, a postdoctoral fellow
in Dr. Brown’s laboratory, Howard Chang, MD, PhD, using a completely different
approach, identified a different profile or "signature" on microarray
analysis that was prognostically significant for several types of epithelial
tumors, including breast carcinoma (Chang HY, et al. PLoS
Biol.
2004;2:E7). (Dr. Chang proceeded from the premise that metastasis was
biologically akin to wound healing; he set out to find a gene expression
profile in metastatic tumors that was similar to the profile in fibroblasts
exposed to serum. Thus, he calls his profile a "wound-response gene expression
signature.")
Using the 295 primary breast carcinomas that the Dutch group tested initially,
Dr. Chang found that the wound-response signature improves risk stratification
"independently of known clinico-pathologic risk factors" and independently
of the Dutch 70-gene set (Chang HY, et al. Proc
Natl Acad Sci USA. 2005; E-pub ahead of print). Dr. van de Rijn
notes that the Dutch gene set is being offered in the Netherlands prospectively
for breast cancer patients. "Howard’s wound-healing gene set will probably
be validated in a number of other ways," he says, "and we hope it will
not only predict outcome but also perhaps identify tumors that respond
to certain therapies."
Research being done by Lawrence True, MD, professor of pathology at the
University of Washington Medical Center, Seattle, on prostate cancer exemplifies
several of the principles expressed in this discussion. Dr. True reported
at the USCAP meeting in March that he and his colleagues had identified
on expression microarrays several genes that appear to be specific for
each Gleason grade. He used laser capture microdissection, or LCM, on
prostate biopsies to isolate tumor cells (2,000-5,000 cells per array)
that he identified microscopically. LCM took about one hour per specimen,
which included specimen block selection, confirmation of cell composition,
and cutting the frozen sections. "With this step we greatly increased
our confidence that the genes expressed were made by cancer cells and
not by adjacent stromal cells," Dr. True says. Samples were hybridized
to two arrays, a generic array containing about 20,000 genes and a custom
18,000-gene array generated by Peter Nelson, MD, a project and core leader
of the research group.
No single gene was specific for any Gleason grade, but an RNA expression
profile was able to differentiate between grades. "In our initial test,
the overall profile distinguished all low-grade from high-grade cancers,"
Dr. True says. (Samples were dichotomized to have a Gleason score of =6
or >6.) In a validation experiment, a subset of 54 genes was tested against
an independent set of cancers. It was 85 percent accurate. "So we can
say that that set of 54 genes at the RNA level distinguishes the large
majority of high-grade from low-grade cancers," Dr. True concludes. He
then immunostained sections for proteins made by four of the genes. (Initially
he tried 10 proteins, but six antibodies didn’t work reliably on fixed
tissues. Commercial antibodies are not available for most of the proteins.)
"Those four antibodies confirmed the difference in expression," Dr. True
says. "So the four gene products can potentially be used as markers for
different grades."
Previous studies came up with different sets of prognostic genes, Dr.
True says, but they did not use laser microdissected preparations. "A
first explanation for us is that some genes could be differentially expressed
by stromal cells," he says. Dr. True and his colleagues have in fact found
differences in expression of stromal genes associated with cancerous versus
noncancerous prostate tissue.
Dr. True sees this work moving in two clinical directions. First, gene
products that characterize each Gleason grade can be used as more specific
diagnostic tools and assayed with immunohistochemistry. Second, in a process
similar to what is being done with breast cancer, a subset of genes will
be identified that more specifically characterizes each Gleason grade
of prostate cancer, and single genes from that set can be measured. "Initially
IHC will supplement standard methods at the biopsy stage," Dr. True says.
After assigning risk, the pathologist will decide whether RNA measurement
would be helpful. If so, a second, frozen sample would be taken and assayed.
For higher-risk patients, a radical prostatectomy sample would be used.
An enrichment step will probably be needed, Dr. True suspects, though
LCM, which is impractical for routine use, may not be necessary.
Summarizing her experience doing research on microarrays, Dr. Abruzzo
says, "I really thought a few years ago when I started that we would put
RNA on a chip and get answers. But now my viewpoint has changed. I don’t
think this is the way we are going to go diagnostically.
"And," she adds, "I’m not hanging up my microscope yet. When pathologists
look down a microscope, we are actually doing gene expression profiling.
We are looking in a morphological way at the end result of the gene expression
program."
William Check is a medical writer in Wilmette, Ill. The Roche/Affymetrix
Amplichip will be covered more fully in the June issue in an article on
clinical pharmacogenetics and laboratory medicine practice guidelines. |
|
|