College of American Pathologists
Printable Version

  Sanger sequencing here for the long haul


CAP Today




October 2011
Feature Story

Karen Titus

While next-generation sequencing is grabbing headlines these days—with some suggesting it will replace Sanger sequencing, or even every molecular assay in the lab—for now, Sanger sequencing remains the proletarian of clinical diagnostics. It’s considered the gold standard. And it will continue to exist as an important companion to next-gen sequencing, said Elaine Lyon, PhD, associate professor of pathology, University of Utah, and medical director of molecular genetics and genomics, ARUP Laboratories, Salt Lake City.

In her nearly 15 years at ARUP, Dr. Lyon has amassed plenty of clinical sequencing experience, including setting up ARUP’s first sequencing-based clinical assay. She’s also set up data management and automated data reporting informatics as a routine practice. She offers no dire predictions about Sanger’s disappearance from the clinical laboratory; at ARUP, clinical sequencing is still a big player, Dr. Lyon said.

For all its prevalence, Sanger sequencing is not a gimme, as Dr. Lyon made clear in an Association for Molecular Pathology webinar, in August, on clinical sequencing. Since her background is in inherited diseases, she spoke about sequencing from that point of view, though many of the issues she discussed have oncology and infectious disease applications as well, she noted.

Sanger got under sail nearly 35 years ago (Sanger F, et al. Proc Natl Acad Sci USA. 1977;74:5463–5467). In its earliest incarnation, sequencing used four lanes per patient and radioactivity. About a decade later, fluorescent terminator dyes came into use, allowing users to place all four bases in a single reaction. “This has been the state of the art since 1986,” said Dr. Lyon.

Familiarity may breed contempt, but it’s also crucial for laboratory professionals to be familiar with the genes they want to sequence. Every gene has its own nuances and characteristics, she said.

To get on familiar footing with the gene in question, Dr. Lyon advised picking a reference gene at the outset. She gave as an example PTEN. She then picked the reference sequence, or GBK file, for the analysis (in this case, NC_000010.10), but she noted that the analysis file would be different from the reference gene that would be used for reporting (here, NM_000314.4). The former uses a chromosomal position (hence the NC label), while the latter uses NM, indicating transcript numbering.

Be aware of any alternative transcripts and decide which one is most appropriate for your application. If it’s a gene that affects development, say, look to see what the transcript is in brain tissue, for example, and if that’s different from what’s found in blood.

Doing homology checks against pseudogenes is crucial. Figure out what they are and their possible impact on your analysis. “Are there gene conversions that happen with these pseudogenes?” she asked.

Seemingly basic, but also crucial, is knowing inheritance. If it’s an autosomal dominant disease, you’re only looking for one mutation that will explain the cause of symptoms. If it’s recessive, one mutation delivers carrier status only. Know whether it’s an X-linked inheritance.

As part of the familiarization and planning stage, she said, you also need to look at databases. “We have some very good locus-specific databases,” she says, with improvements on the way. But beware: Databases may have different numbers. “So we need to be aware if the numbering system is different in the database than what the laboratory is going to be reporting.” That’s true with mutations as well as exons, she said, noting that some databases may give, for example, an exon 6A and 6B, while others use straight numbering one through 10.

Get a list of known benign variants. “Whenever we see them we can quickly pass by them, knowing that they will not be clinically significant,” she said.

Now is also the time to decide which regions to interrogate. Is it a targeted exon, such as MEN2? Mutations have been described in only a half-dozen exons to date. “So do we do that, or do we do the whole gene?” she asked. “Most of the time, most of the genes are being done by full gene or full sequence analysis, which includes all coding exons.”

This typically includes intron/exon boundaries, which usually are +20 to +50 base pairs into the intron. If there are known deep intronic mutations, Dr. Lyon said, they can be added to improve clinical sensitivity and mutation detection rate. Moreover, mutations that have been described in the 5’ or 3’ UTR or the promoter region can be included in the regions to be interrogated—though she warned that if little is known about those regions, “You may be getting yourself into more trouble than you want by looking at them, because you may be coming up with more variants and variations than you’re able to interpret.”

The PCR primer design portion of the assay is often done per exon—exon 1, exon 2, etc. Covering large exons might require several amplicons. Conversely, two small exons with a small intron in between could potentially be covered in one amplicon. Generally, however, Dr. Lyon recommends one exon per one amplicon.

“We are careful to design around pseudogenes,” she continued. This includes doing alignments to ensure that primers are placed in positions where they will not amplify the pseudogenes. You also need to avoid known variants. If there’s a variant at the 3’ position of the primer, for example, an allele dropout is likely. “And you won’t know it,” she said. If variants cannot be avoided, make sure they’re not at the 3’ end of the primers.

Dr. Lyon said she and her colleagues try to do everything at the same PCR conditions—a laudable, if not always achievable, goal. A high GC content, for example, can require using different conditions.

Showing a typical sequencing alignment, Dr. Lyon noted that the reference sequences are on the top, read in forward direction, and on the bottom, representing the reverse. The two lines in between show the differences between the reference sequence and the sequence trace. Anything that is not clean—a small amount of background, say—will appear here.

Sequencing through difficult regions is sort of a laboratory version of bushwhacking, with rough spots including places with high GC content or regions that have a secondary structure. These need to be carefully optimized, she said, and sometimes differently. Sometimes, it’s just better to avoid them, the way one might a swamp, or a rockslide. “But at times we’re not going to be able to,” she conceded.

Benign insertions or deletions in the intron can cause headaches as well, as can pseudogenes and repeat motifs. She gave as an example CFTR, a GATT repeat, occurring at the start of the exon. If the patient is heterozygous for 6 versus 7 of the GATT repeats, you’ll have clean sequence up to this point, but the sequence from that point on will be very dirty. “Basically, you’re reading double sequences there.”

Cystic fibrosis, which has an intron 8 with a TG repeat followed by a T repeat, is one example of a repeat motif. In other cases, you might see a string of A’s or T’s that may be difficult to get through. They may also be polymorphic.

Dr. Lyon and her colleagues use primer design to handle these difficult regions, designing long amplicons and short amplicons. These are not nested assays, she said, because you don’t amplify within it again. She and her colleagues will have one set of amplicons with the longer primer, and they’ll place the short primer within the difficult region, to avoid its being amplified; they’ll have a reverse primer as well. The forward long amplicon primer will work through the early portion of the sequence, she said, providing a good, clean sequence of 5’ of the gene. The long reverse will take them through the gene, though at a certain point they’ll see the heterozygosity because of polymorphism.

The internal primers are the next set. Once again, the forward will miss the repeat completely. “We’ll probably go about 35 bases into the exon before we can pick up clean sequence,” she said. But they can get a reverse sequence coming from the other direction. (See “Design short and long amplicons to cover all regions.”)

In this design, she said, it may be only in the repeat region itself that they’re unable to do double coverage. There will be single coverage at the 5’ end, from the forward primer of the long amplicon.

Another possible solution for evading polymorphic regions is doing a loop-out, or a masking. For an example of a gene showing a poly A track and two poly T tracks, one of which is polymorphic and located fairly close to the end of the exon, they designed a reverse primer that “loops out” a portion of the T’s. If there are sufficient bases on either side of this loop-out, the primers can still work well.

Aim for accuracy. Dr. Lyon and her colleagues try to get true positive control samples from patients who have been clinically diagnosed with the condition. Some disorders are quite rare, so such samples are not a given. So Dr. Lyon will include other samples that are not from that condition. “We wouldn’t expect to find true mutations in them,” she said, “but they do have polymorphisms,” which help them evaluate accuracy.

They use quality checks, such as trace scores, to double-check sequence quality. They look at signal intensity and signal-to-noise ratio, and they also use a parameter known as the %QV20+, which is the percentage of bases with quality values greater than or equal to 20.

Reproducibility is another must-have. “What we’ve realized is that if we have a good PCR product, we’re going to get good sequencing,” she said. So that’s where they put much of their effort: showing that their PCR is reproducible, and minus primers or primer dimers. They’ll do intra-run variability testing—same day, same instrument, using the same reagents—as well as inter-runs (different days, different thermal cyclers), redesigning the exon as needed to eliminate primer dimers.

ARUP creates a graph combining all the reactions—accuracy and reproducibility runs—to show how many times they got a good PCR product, then shares it with the clinical laboratory. The graph quickly highlights the problem exons. (See “Reproducibility—PCR products.”)

At this point, they’re ready to tackle workflow to make the process routine. The steps include:

  • sample receipt, extraction, PCR setup
  • amplification
  • PCR cleanup, sequencing setup, sequencing
  • sequencing cleanup, detection, analysis

Several of these steps—extraction, PCR setup and cleanup, and sequencing setup and cleanup—can be automated.

They typically use M13-tagged primers. If they’re doing a low- throughput test, it’s best to set up a per sample workflow, Dr. Lyon said. If it’s high throughput—96 samples, say—it’s much easier to do a per exon workflow, because you can change cycling conditions for the different exons, using different plates. She and her colleagues rarely have high throughputs for their sequencing samples, so they set up a plate to include a number of samples and different exons. “One of our efforts has gone into making PCR primer plates for our sequencing, so that our technologists do not need to manually add these PCR primers for every run.” At low throughputs, however, manual methods are faster, she said.

Her lab has automated liquid handlers. If they’re precise enough to dispense in the 1- to 2-µL range, she said, then you could have a PCR plate set up already containing the primers (either frozen or lyophilized), which again would mean less work for technologists.

Most of the time, she said, they do medium throughput, with one plate containing one to eight samples, with three to 48 reactions per sample. After the cleanup, every PCR plate is then taken to a forward and a reverse plate for the sequencing primers. And since it’s M13-tagged, “Our forward plate uses only the forward M13 primers; the reverse is only the reverse M13 primers. The point is that for every PCR plate you do, you end up doing two plates for sequencing.”

For sequencing in inherited diseases, the term “clinical sensitivity” refers to the percent of the affected individuals in whom the mutation can be found in the gene. “So our clinical sensitivity will go down if there are other genes that can cause the same condition.” It will also drop if deletions and duplications account for a large percentage of the mutation.

Clinical specificity is the percent of unaffected in whom mutations are found in the gene—in a genetic sense, she says, this refers to the penetrance of the mutation for the disease.

Labs must report a reference or reportable range to clinicians. “We use this as a description of the gene regions that we interrogate,” Dr. Lyon said, such as all coding regions and all intron/exon boundaries, as well as whether they do intronic mutations or regulatory regions. Specific mutations being tested for can also be considered a reportable range, as can zygosity.

To transition sequencing assays into the clinical lab, you need the summary of the analytic validation—including the reference sequence, known benign mutations and double mutations (that is, two mutations on the same chromosome), and database information. Other steps: writing a standard operating procedure, training the clinical laboratory, evaluating costs, and preparing test information for clients and clinicians. You’ll also need to figure out how to report the results and how to set up internal databases and proficiency testing.

Reporting, like diplomacy, has its own special language. The main component, of course, is the result itself. “There’s been discussions of whether we should go to all standard versus a traditional nomenclature,” Dr. Lyon said. The latter can have different number schemes, she noted. For example, beta globin amino acids are commonly known from the mature protein (-1 amino acid). “So the mutation that causes sickle cell is traditionally known to be in position 6. But if you use the standard nomenclature, you’re going to be calling it position 7.

“Many of us, in our discussions, [have] decided that if the traditional nomenclature is very well established in the literature, we have to continue to use that,” she said. They will also add “commonly known as” or “also known as” language to the report.

The numbering for the nucleic acid can be different, too, depending on whether the numbering began from the first of the exon 1 or the ATG start site. “We are now shifting to standard nomenclature from the ATG start site,” she said.

Some laboratories may be reporting out the amino acid. It’s easier to recognize the mutation and figure out what it means by looking at the amino acid, versus the nucleic acid nomenclature, she said; however, the lab is detecting the nucleic acid, not the amino acid. Her preference is to report the nucleic acid; labs can then add the amino acid, if desired.

There’s another reason to be a stickler on this point. If a lab is performing a family-specific mutation test for a mutation originally found at another lab, she said, listing only the amino acid won’t provide the precise base or the base change being targeted.

Include the reference sequence and version as well as the numbering scheme you’re using, along with interpretation and recommendations.

The American College of Medical Genetics recommends reporting clinical interpretation as well as the best estimate of clinical significance. An ACMG algorithm (Richards et al. Genet Med. 2008;10:294–300) says to check the literature/databases after finding a variant.

Simplifying the ACMG chart, Dr. Lyon broke the search down into several categories. Is the mutation:

  • previously reported? If so, is it pathogenic or benign? (Be sure to check the original reports. “There can be errors in databases. We certainly don’t want to be propagating a mistake.”)
  • previously unreported? Is it expected pathogenic, suspected pathogenic, uncertain, or suspected benign?
  • in need of further classification (severe, moderate, mild, very mild)?

If they find an exonic mutation, and it’s a frame shift, they assume it’s pathogenic. Ditto for a nonsense, with the possible exception of the 3’ end. Depending on the function of the protein at that end, a variant at the 3’ end may not be pathogenic, she cautioned.

In-frame deletions or duplications often are pathogenic, but not always. “But those are not as common as the missense mutations,” she said, which likewise may or may not be pathogenic.

When they find a missense mutation, they start collecting evidences:

  • Has it been reported before? Seen in affected or control individuals?
  • Is it in a conserved amino acid, over a gene family or species?
  • Is it an active site in the protein?
  • Would it affect mRNA levels?
  • Does it occur in the general population? “If we see a variant that has a fairly high frequency—two to three percent of the population—but of a very rare disease, it’s less likely to be pathogenic.”
  • Has the variant occurred with other causative mutations?
  • Can you track it with disease in the family?
  • Are functional studies available?

She’ll also get help from a surprising quarter. “I’m not above Googling. It’s amazing what I can find” on Google.

The point isn’t to track down every answer for every missense mutation you find. Dr. Lyon is a realist. “We collect what we can.”

There are several amino acid prediction programs on the market, including PolyPhen 2 and SIFT, but Dr. Lyon is eyeing the future: predictions that use machine learning classification tools. Gene-specific algorithms work better than generalized tools aimed at any gene, and researchers are trying to develop a standardized metric for evaluating uncertain gene variants and a way to visualize this for clinicians. The advent of so-called clinically curated databases has helped with this. Right now, she says, she’s not confident in relying on predictions from just one program. “If I have five of them telling me the same thing, then I’m more confident.” If multiple programs told her a mutation was pathogenic or benign, she’d couch it as “suspected pathogenic” or “suspected benign” in her report, she said, and would add an explanation of her reasoning.

Intronic mutations require a different approach. Has it been reported before? If it’s not clear, programs such as the one at can do a prediction for both donor and acceptor regions of the gene.

Rare variants have their own cautions—similar to beginning birders, who often learn that their “rare” bird find may actually be a sparrow. When you find two variants that are rarely seen, do parental studies, Dr. Lyon said, to determine if they’re on the same or opposite chromosome. She gave an example of a child with F508del/I102T. When the clinician first saw this—the result was from another laboratory—he didn’t look at the report carefully, and didn’t realize the two mutations could be on the same chromosome. “He thought they had an explanation for the child’s symptoms” and asked Dr. Lyon and her colleagues to test the siblings. They instead asked to test the mother first, where they found both mutations. Being in cis, the mutations did not explain the child’s symptoms.

In another case, this one involving the alpha globin gene, they found an apparent homozygous rare mutation, known as Hb Seal Rock. But because it was homozygous, with no indication of consanguinity, they also did a deletion analysis, which showed a -3.7Kb deletion. “So it is a compound heterozygous between the Seal Rock and the 3.7,” she said, a combination that results in a mild Hb H disease.

Labs can go further with genetic evidences by doing family concordance studies, which “are very strong for autosomal dominant disease,” she said. “We can also do some with X-linked, and specifically if we’re thinking there may be a de novo mutation.” They test an affected individual from a family, then test additional affected family members (though unaffected individuals may also provide clues). A distant relative who is also affected gives even greater statistical power. They use a Bayesian analysis to evaluate the pedigree data for evidence of causality.

At ARUP, parental samples for de novo mutations can be brought in as so-called controls, she says. “In this case, we may not be giving the reports back to them, but we will be incorporating their results into their child’s report.” Nor would they charge for the test. Further family concordance studies have been designated as research by ARUP’s institutional review board, requiring a family concordance study to be set up through the IRB. Testing for family members is again free; likewise, the lab doesn’t report individual results. “We are talking with some of the genetic counselors to see if our IRB would change that,” says Dr. Lyon. For now, however, the report is limited to stating whether a variant has been reclassified based on the family study. If family members want the results put in their medical record, they can pay to test for the specific family mutation.

Even with the advent of next-generation sequencing, labs will still need to use Sanger sequencing to confirm that next-gen findings are clinically relevant. “And once we do find a mutation, we will then most likely go back to Sanger sequencing to do the familial testing for that specific mutation.”

It will continue to play a role in complex regions, which are difficult to align with next-generation software, Dr. Lyon said. Next-generation sequencing may also provide false-positives; in these cases, Sanger sequencing can be used to confirm if the variants are analytically “real.”

In short, she says, Sanger sequencing, long a workhorse in the clinical laboratory, isn’t ready to shrug off its saddle just yet.

Karen Titus is CAP TODAY contributing editor and co-managing editor.