William Check, PhD
Perhaps pathologists should adopt as their motto the axiom of the ancient Greek philosopher Heraclitus, who said in the 5th century BCE, “Change is the only constant.” In the past few decades radical change has occurred in laboratory practice, much of it driven by the molecular revolution. An understanding of the working of the genetic code and its constituents has led to cytogenetics, FISH, nucleic acid amplification, comparative genomic hybridization, microarrays, and DNA sequencing.
In research settings as well as in clinical laboratories, DNA sequencing has been synonymous with the Sanger method. First described in 1977 (Sanger F, et al. Science. 1977;74:5463–5467) and joined to fluorescence-based electrophoresis separation technology in 1987 (Prober CG, et al. Science. 1987;238:336–341), the Sanger technique, which is based on chain-terminating dideoxynucleotides, has become the very definition of sequencing. As its speed and cost dropped 100-fold over the past 10 or so years, it made the Human Genome Project possible.
However, true to the rule of constant change, just as this method has become a fixture, radically new approaches to sequencing have appeared. They are variously called next-generation sequencing, massively parallel sequencing, or high-throughput sequencing. They can be applied for deep or ultra-deep sequencing, in which very low-abundance variants—for example, in tumors or HIV—can be detected. Basic scientists often refer to it colloquially as just “454 sequencing,” after 454 Life Sciences, the company that developed the most widely used version of high-throughput sequencing and that marketed the first commercial high-throughput massively parallel sequencing instrument.
Next-generation sequencing has quickly found its way into basic research laboratories, and clinical research groups are adopting it as well. “Next-generation sequencing represents a revolution in DNA sequencing almost equivalent to when we got the first sequencers at the start of the Human Genome Project,” says Wayne W. Grody, MD, PhD, who is professor of pathology and laboratory medicine, pediatrics, and human genetics in the UCLA School of Medicine and who reviewed this technology recently (ten Bosch JR, Grody WW. J Mol Diagn. 2008;10:484–492). He says it is two orders of magnitude faster than standard DNA sequencers and puts laboratories in reach finally of analyzing many genes at a time or even a whole genome. “It is still too expensive to sequence whole genomes, but it brings us much closer to this goal,” he says.
Eventually next-generation sequencing will enter the clinical diagnostics arena. Dr. Grody sees whole-genome sequencing being done on a routine basis for diagnostic purposes “or perhaps in the form of a screening program that could be used to guide personalized medical treatments throughout an individual’s lifetime.”
Currently, next-generation sequencing is being applied at the level of basic or translational research, says Karl V. Voelkerding, MD, associate professor of pathology, University of Utah, and medical director for advanced technology at ARUP Laboratories (and whose review of this technology is in press at Clinical Chemistry). “That reflects the fact that the first commercial next-generation sequencing platform, 454 Life Sciences’ GS20, was launched only four years ago,” he says. “In the basic research community already there has been a tremendous amount of work published using next-generation platforms.” With this technology, he adds, investigators can do experiments that previously were technically impossible or impractical, from a labor or cost standpoint. Dr. Voelkerding estimates that more than 300 articles have already been published using next-generation sequencing.
“I have been involved in molecular diagnostics from the mid-1980s,” Dr. Voelkerding says, “and I have observed that adoption and publication rates are often fairly good indicators of whether a new technology will have legs. When I see a technology take off so rapidly and become so disseminated in the research community in such a short time, I have to say that it has very serious potential to have an impact downstream on clinical diagnostics.”
When, or whether, next-generation sequencing will enter clinical diagnostics is “right now ... a little hard to say,” says David A. Relman, MD, professor of microbiology and immunology and of medicine at Stanford University and chief of the infectious diseases section in the VA Palo Alto Health Care System. But “there is a distinct possibility,” he adds, “that some sizable number of clinical pathologists will either be using the technology or need to be familiar with it soon.
“This technology is evolving so quickly and becoming so much more user-friendly, as well as powerful, that you have to suspect there will be some applications that clinical pathologists will have to learn to embrace in some form,” he says.
To Victor E. Velculescu, MD, PhD, associate professor of oncology and director of cancer genetics in the Ludwig Center for Cancer Genetics and Therapeutics at the Johns Hopkins Kimmel Cancer Center, it’s safe to predict that in several years, “next-generation sequencing will fundamentally change the way we do genomic analysis, both for discovery and for clinical applications.” Dr. Velculescu foresees that next-generation sequencing will be “tremendously useful” for discovery research. “It will allow us to look at a variety of genetic and epigenetic changes in a much more facile way than has previously been possible. It will transform the way we do these studies,” he says.
Next-generation sequencing can do a lot of sequencing cheaply and relatively quickly, says Dr. Velculescu’s colleague at Johns Hopkins, Nickolas Papadopoulos, PhD, associate professor of oncology and director of translational genetics in the Ludwig Center. Their group has used the Illumina Solexa platform to do transcriptome expression profiles. To illustrate the power of next-generation sequencing for expression analysis, Dr. Papadopoulos says that, in one study using traditional sequencing, they sequenced 200,000 expression tags. “We would think that is a good number,” he says. “In contrast, in our latest study [using next-generation sequencing], we sequenced 3.6 million tags. So that is how deep we went. With the new technology, we are able to bring expression analysis to its full potential.”
Roger Klein, MD, JD, medical director of molecular oncology at the Blood Center of Wisconsin, describes as “very exciting” what is taking place now with next-generation sequencing, or NGS. “Genomic discovery work currently being done in cancer [with NGS] is informing us of somatic variants that will be used for classifying disease, establishing prognosis, and predicting responsiveness to therapy.” Dr. Klein also sees genomic information derived from NGS as important for inherited diseases. “At first this information will probably be utilized with current modalities, such as SNP detection or Sanger sequencing,” he says. “Eventually next-generation sequencing techniques will offer much to clinical practice.”
In particular, Dr. Klein foresees NGS being helpful for private mutations that are scattered along many exons of a large gene. “Next-generation sequencing techniques will make mutation detection for these diseases much more widespread,” he predicts. “To amplify and sequence 60 exons takes a lot of work. Even if you do it in multiplex, you still have to do individual sequencing reactions, which is very labor-intensive and dissuades labs from performing these tests.” On the other hand, he points out, “If you can rapidly sequence a large gene, these tests would be much more accessible.”
In addition to 454 Life Sciences (which Roche pur-chased) and Illumina, ABI also markets a next-generation sequencing instrument called SOLiD. It’s based on sequencing by ligation. Illumina’s platform uses sequencing by synthesis with reversible dye terminators. Roche 454’s is a pyrosequencing platform; the latest model is the GS-FLX.
“Pyrosequencing is a novel method that takes advantage of the release of pyrophosphate when a nucleotide is added to a growing single-stranded nucleic acid chain,” Dr. Klein says. On the Roche pyrosequencing platform, nucleotides are dispensed in a programmed manner. DNA polymerase integrates the nucleotides into the growing chain, releasing pyrophosphate, which is used to generate ATP, which in turn reacts with an enzyme to produce a bioluminescent signal.
Detailed descriptions of the methodologies are presented in the papers of Dr. Grody and Dr. Voelkerding, along with technical comparisons of the instruments. What underlies NGS’s advantages is the use of massively parallel reactions. A Roche 454 plate, for example, has 1.6 million picoliterscale wells, each of which contains one DNA segment to be sequenced and each of which can carry out a reaction whenever a supply of nucleotides is dispensed.
Roche 454’s instruments can produce longer read lengths than the other platforms. With the GS-FLX, fragments of 250 to 400 bases can be sequenced, compared with 30 to 40 for the Illumina and ABI platforms. (Sanger sequencing reads up to 800 base pairs.) How much of an advantage this presents is not clear. “At this point all of these methods are huge overkill, beyond what we would ask in clinical situations,” Dr. Grody says.
Dr. Velculescu identifies some “hiccups” with the technology, particularly in terms of accuracy and quality. “Next-generation sequencing still has a higher error rate than Sanger sequencing,” he says, “although it is improving.” One group reported that, with stringent data selection, NGS with the Roche 454 GS20 “can surpass the accuracy of traditional capillary methods” (Huse SM, et al. Genome Biol. 2007;8:R143). Also, pyrosequencing still has trouble with homopolymers—repetitive strings of nucleotides such as As or Ts.
Dr. Voelkerding notes that the accuracy of NGS can be enhanced by doing multiple reads per sequence, called depth of coverage. “We can achieve a high number of reads for each region because the technology has such high throughput and massively parallel processing,” he says. Studies suggest a minimum of 15 reads per chromosome copy for each genomic region. “We try for higher coverage,” he says, “and because there are two alleles for each gene, we want at least 30 reads per region.”
In addition to the three “conventional” NGS instruments, two companies, Helicos and Pacific Biosciences, are developing even more radical NGS platforms—single-molecule, real-time sequencing instruments (Harris TD, et al. Science. 2008;320:106–110; Fid J, et al. Science. 2009;323:133–138). With the three conventional NGS platforms, a considerable amount of sample processing occurs before sequencing the templates. DNA in the sample is fragmented, an oligonucleotide adapter sequence is added, and each individual fragment is amplified to generate sufficient amounts of DNA fragments to do subsequent sequencing. In “single-molecule” technology, individual DNA strands are captured directly from the sample and sequenced without amplification. In the Helicos method, for instance, DNA fragments get a poly-A tail, are denatured and hybridized directly onto the flow cell slide with oligo-dT for capture, and sequenced. “The prospect of streamlining sample preparation is very attractive,” Dr. Voelkerding says.
He calls the Helicos Heliscope “the first true single-molecule sequencing platform to become commercially available,” and adds: “A fair amount of work still needs to be done to demonstrate its performance.” He says Helicos is starting now to place the instrument for evaluation at some larger genome centers.
Dr. Relman says of single-molecule sequencing, “I think it’s pretty clear that in a few years it will be a much more prominent feature of the genomic technology landscape.”
CAP TODAY spoke to several people who are using next-generation sequencing for clinical research. Dr. Voelkerding described two projects he is carrying out. First, he is using NGS to analyze genomes of several species of nontuberculous mycobacteria. “These organisms have been implicated in human illness,” he says. “We are trying to understand better the genetic relatedness of nontuberculous mycobacteria among themselves.”
Second, he is investigating whether NGS will provide a better approach to sequencing multiple genes which when mutated have an overlapping clinical phenotype. As a model system, he is studying hypertrophic cardiomyopathy, or HCM. At least 16 different genes have been implicated in the pathogenesis of HCM, Dr. Voelkerding says. Pathogenic mutations have been described in each of those 16 genes. One of the genes codes for beta-myosin heavy chain. “In that gene alone already 193 different mutations have been described and identified in patients,” Dr. Voelkerding says. In all, 455 different mutations have been identified. “How do we analyze all 16 genes, or at least a subset that accounts for 90-plus percent of the genetic mutations?” he asks. “By virtue of its high throughput, next-generation sequencing is an attractive technology to investigate for its potential use in diagnostics in these kinds of settings.” Similar considerations apply to the multiple genes involved in congenital hearing loss.
In addition to his own work, Dr. Voelkerding lists several other pertinent areas in which NGS is being investigated:
- Recent studies use NGS to analyze cell-free DNA of fetal origin present in the plasma of a pregnant woman. “With this method we can analyze total DNA [in a pregnant woman’s blood] to ascertain whether the fetus carries a chromosomal abnormality such as trisomy 21,” Dr. Voelkerding says. “This is a very sophisticated test that would simply require a single blood tube.” In the longer term, he sees NGS as a possible adjunct for amniocentesis or chorionic villus sampling or protein metabolic biomarkers.
- NGS is setting a foundation for use of cell-free DNA generally, Dr. Voelkerding says. “I could envision using it to analyze for mutations in cell-free DNA derived from tumor cells.”
- With NGS, sensitive and relative quantitative information can be obtained about resistant subpopulations of microorganisms, such as in people infected with HIV (Eriksson N, et al. PLoS Comput Biol. 2008;4:e1000074; Wang C, et al. Genome Res. 2007;17:1195–1201) or hepatitis B virus (Solmone M, et al. J Virol. 2009;83:1718–1726). “We would like to know about the presence of resistant subpopulations earlier in the course of therapy,” Dr. Voelkerding says.
To Dr. Relman, detecting suspected pathogens in a complex microbiota where there is no clear idea what might be there is one of the easier areas in which to see clinical relevance for NGS. “Pyrosequencing has been used to uncover those rare members of the mix,” he says. In one instance, NGS detected the presence of Campylobacter jejuni in a patient with diarrheal illness (Nakamura S, et al. Emerg Infect Dis. 2008;14:1784–1786).
Dr. Relman and his coworkers are using NGS to analyze the diversity and richness of the human gut microbiota. “These technologies’ throughput and depth are particularly well suited to allow us to start looking beyond the most abundant members of the microbial community,” he says. They analyzed 18 samples of intestinal microbiota taken from three healthy individuals. Among all 18 samples, they estimated there were 9,000 to 10,000 operational taxonomic units (equivalent to strains or species), a much higher number than previously thought. After evaluating the normal microbiota, they gave each person ciprofloxacin for five days and re-sequenced their intestinal microbiota. The impact of the antibiotic on the human gut microbiota was “pervasive,” Dr. Relman says. “Many more taxa seem to have been affected by ciprofloxacin than we expected.” Everyone in the study did well clinically, but the analysis revealed that about one-third of taxa in the fecal samples showed evidence of disturbance. By four weeks after the five-day antibiotic course all individuals seemed to have restored their original microbiota. “That is good news,” Dr. Relman says. “It suggests that our microbial community has resilience, which is a feature that we should be trying to understand better.”
Dr. Relman’s group has also been studying the microbial composition of amniotic fluid of women in preterm labor. So far they have used standard methods. “Some continuing aspects of that project will involve deep sequencing,” he says.
Dr. Velculescu and his colleagues are using NGS to study the genomics of human cancers, such as pancreatic cancer and glioblastoma multiforme (Parsons DW, et al. Science. 2008;321:1807–1812; Jones S, et al. Science. 2008;1801–1806). “If you take a comprehensive and integrated view of gene alterations in cancer, you find several things,” he says. First, there is great complexity. Heterogeneity is seen both in the genes mutated between different tumor types and between individuals with the same tumor type. “This reflects what many clinicians have seen for years,” Dr. Velculescu says. Patients who are similar in age, tumor type and site, with perhaps the same tumor histology, respond very differently to therapy. “These gene differences may be the underlying cause of the clinical differences,” Dr. Velculescu hypothesizes.
Second, comprehensive genomewide studies reveal surprises. For example, the gene for isocitrate dehydrogenase 1 (IDH1), which was never previously thought to be involved in cancer of any kind, turns out to be mutated in more than 10 percent of glioblastoma multiforme cases. It affects the majority of patients under age 40 years. Also, IDH1 affects 80 percent of patients with secondary glioblastoma multiforme. In more recent work, most cases of certain other types of gliomas (for example, astrocytomas) also have mutations in IDH1.
Dr. Papadopoulos raises a third point: Data from genomewide sequence analysis could help to devise a new approach to therapeutics that targets pathways rather than individual genes. In pancreatic cancer, hundreds of genes are mutated or upregulated. “But when we group genes into pathways, it seems like only a handful of pathways are important for tumor growth,” Dr. Papadopoulos says. According to Dr. Velculescu, 12 core pathways are altered in two-thirds or more of pancreatic cancers. Such an approach will present challenges. “Our first and biggest challenge,” Dr. Papadopoulos says, “will be to figure out which genetic changes are the ones that drive the oncogenic process.”
Dr. Velculescu sees future targeted therapies being determined by much more complex genetic patterns than the simple analyses now being done for Herceptin, Gleevec, or Iressa. “Next-generation sequencing will be one of the methods by which cancer will be diagnosed and treated in an individualized way,” he predicts.
For several years Charles W. Caldwell, MD, PhD, and his colleagues have been studying DNA methylation in human tumors. (The positions and density of methylation of a gene promoter frequently determine whether that gene’s expression will be down-regulated.) Recently they started using next-generation sequencing in this project. “Most of our earlier work used microarray techniques,” says Dr. Caldwell, professor of pathology and director of the Ellis Fischel Cancer Center at the University of Missouri-Columbia. However, they had to confirm some of the microarray data by an independent method because arrays give false-positive and false-negative results. Then two colleagues pointed out they could use high-throughput sequencing to confirm a large number of genes at the single nucleotide level. Dr. Caldwell calls their recent paper using Roche 454 sequencing to analyze DNA methylation patterns “proof-of-principle.” They simultaneously looked at methylation of 25 genes in 40 people with five different disease conditions, such as acute lymphoblastic leukemia and chronic lymphocytic leukemia (Taylor KH, et al. Cancer Res. 2007;67:8511–8518). “We found that we could see whether methylation was present or absent on each and every cytosine in the regions we were looking at,” Dr. Caldwell says.
As a pathologist, Dr. Caldwell is interested in whether patterns of gene methylation can be used to help discriminate between different tumors. His data say this is possible. “So this technology could be a diagnostic tool,” he says. Using NGS to monitor gene methylation patterns could also tell whether a tumor is responding to therapy. For this purpose Dr. Caldwell proposes using circulating DNA in plasma as a surrogate. “At the time of diagnosis, levels of methylated DNA in the tumor are pretty high,” he says. “With successful treatment they go down.” In non-Hodgkin lymphoma, one gene is methylated in more than 80 percent of cases. In those cases the gene can be used to monitor response to chemotherapy. Monitoring is now done by imaging. “Imaging costs a few thousand dollars, compared to a few hundred dollars for a biomarker,” Dr. Caldwell says.
He calls high-throughput sequencing “a great discovery tool,” but says that for clinical medicine in the future, “it will be reduced to much less complex techniques.” Clinical use of microarrays is the analogy. At first they were used to look at the expression of many thousands of genes. “Once we figured out we could get the same conclusions by looking at 10 or 20 genes, we began to fabricate much smaller arrays,” he says. In his view, the same will happen with NGS for epigenetic analysis.
Clonality is another characteristic of cancers that is amenable to analysis by deep sequencing. “Essentially all cancers are products of Darwinian evolution occurring at the cellular level,” says Michael R. Stratton, PhD, FRS, professor and deputy director of the United Kingdom’s Wellcome Trust Sanger Institute. An ancestral cell with an initiating oncogenic mutation undergoes further events that allow that clade to develop and become the dominant clone of the cancer. “Probably every cancer cell has a very slightly different set of mutations from every other cell in that tumor,” Dr. Stratton says. “If we know those mutations, we can make a genealogical tree of the cancer and see how it evolved from that initial abnormal cell. Previous technologies were not powerful enough to do this. Deep sequencing is.”
In a recent article, Dr. Stratton and his colleagues applied NGS to investigate intraclonal diversification in 22 patients with B-cell chronic lymphocytic leukemia (Campbell PJ, et al. Proc Natl Acad Sci USA. 2008;105:13081–13086). They found they could backtrack the dynamics of clonal evolution.
Clinically this approach could be valuable to detect resistance to therapy, which is most likely due to mutations. “Those mutations were probably present in the cancer as a small subclone before treatment,” Dr. Stratton says. Once treatment is administered, that subclone is selected and becomes dominant. “It is conceivable that we could do this sort of deep sequencing of selected regions of the genome to see if subclones preexist in the resected specimen,” he says. “We have drugs that can overcome those resistance mutations. And there is good reason to think it is beneficial to give those drugs early when resistant clones consist of only a few hundred cells.”
Before next-generation sequencing enters clinical use, several issues will have to be addressed, Dr. Voelkerding notes. First is cost. Capital equipment outlay for the three major platforms ranges from $500,000 to $600,000. Reagent cost per analytical run ranges from $3,000 to $5,000. For “single-molecule” sequencers the cost is much higher. “For the Helicos platform,” Dr. Voelkerding says, “capital equipment cost is approximately $1 million and reagent cost per run is $15,000 to $20,000.”
The high-throughput capacity allows more than one sample per run to be analyzed, and this can be leveraged to reduce costs per sample. “All instruments can make compartments, lanes, or channels for separate samples on the flow cell,” Dr. Voelkerding says. In addition, one can create “molecular compartments” by attaching a bar code—a molecular sequence—to each individual sample.
Data analysis is itself a fundamental technical consideration for implementing next-generation sequencing. Extensive data analysis and bioinformatics are required. All platforms capture images and convert them into sequences. “Each run generates gigabyte- to terabyte-sized files,” Dr. Voelkerding says. “To store and analyze these images requires lots of memory and processing power.” Instruments have dedicated algorithms and pipeline servers either onboard or offboard that turn the image files into sequences. All of this computing demands expertise. “The instruments themselves are relatively walkaway,” Dr. Voelkerding says. “Once they are programmed, then you have a considerable data processing component that someone needs to oversee. In our setting we have a dedicated individual with a background in biomedical engineering and computer science.” Many undergraduate and graduate programs now offer a major in bioinformatics or computational biology. “Individuals trained in those programs are at the forefront of moving the data processing part of this technology forward,” Dr. Voelkerding says. It took a fair amount of time for Dr. Voelkerding to become familiar with data handling. “I was on a pretty steep learning curve,” he says.
Dr. Voelkerding’s colleague, Rong Mao, MD, FACMG, is doing a collaborative experiment with Roche 454 and Nimblegen that highlights data-handling issues. Dr. Mao, co-medical director of the molecular genetics section at ARUP Laboratories and assistant professor of pathology at the University of Utah School of Medicine, is attempting enrichment for the region of the human genome that encodes for NF1, the gene involved in neurofibromatosis type 1. A custom-designed high-density Nimblegen array was used to capture that gene to provide enriched material for sequencing. From two cell lines isolated from individuals with neurofibromatosis type 1, DNA was extracted, captured by microarray, and sequenced. One sample had a known single base deletion of the NF1 gene, causing the amino acid reading frameshift that was successfully detected by this procedure. However, the other sample, which contained a large insertion of the Alu sequence, was not identified on sequencing. “Missing the detection of the large deletion and insertion with next-generation sequencing is due to the data alignment criteria setup,” Dr. Mao says. “This issue needs to be further investigated with the algorithm for next-generation sequencing data analysis software.”
Dr. Grody also worries about data interpretation. Even with current sequencing technology, the information generated from the CFTR or BRCA genes has been “informative and humbling,” he says. “Just concentrating on a single gene at a time or a pair of genes and doing full sequencing on them has been in a way more complex than we are ready to handle,” he says. “Even though those genes have been studied in so many people already, we are still finding new variants that we don’t know what to do with.” It can be difficult to know whether a new variant of BRCA causes breast cancer or is a benign polymorphism. Dr. Grody cites the experience of Myriad Genetics with the BRCA genes, which they have been sequencing for many years. “Every week they find new variants they have not seen before,” he says. “Often they can’t interpret what they mean.” Enter next-generation sequencing. “Now that we could do 100 genes at a time,” Dr. Grody says, “that level of complexity will be amplified 100 times. Even more, because the genes we will test with this new technology will be new and every mutation will have to be analyzed.” One major pitfall of large-scale sequencing is that for every real mutation picked up there will probably be 100 missense variants that might or might not have disease impact. “And we won’t know which is which,” Dr. Grody says.
Despite all these problems, NGS is rapidly becoming the sequencing modality du jour. Dr. Relman shares a recent conversation he had with the director of a newly established genome center that bought Sanger machines and several NGS instruments. “[The director] told me a few weeks ago that they haven’t even used the Sanger machines so far,” Dr. Relman says. Clinical laboratories operate on a different set of cost calculations, he says. “They don’t so much need many reads on one sample as few reads on many samples.” Still, he adds, the clinical future of NGS platforms is an open question. The bottom-line message, he says, is “stay tuned.”
William Check is a medical writer in Wilmette, Ill.