Matthew W. Anderson, MD,
The human leukocyte antigen (HLA) genes represent the most diverse loci in the human genome with over 14,000 alleles identified as of December 20151. The high allelic diversity in the HLA genes reflects HLA protein function in binding and presenting a diverse array of peptide ligands derived from microbial pathogens. As a result of selection pressure from infectious organisms, the HLA system now constitutes one of the major genetic differences between individuals and among different ethnic populations. Therefore, the HLA genes are clinically relevant as key determinants of compatibility in organ and bone marrow transplantation and in the genetic susceptibility and pathogenesis of immune-mediated diseases. However, the highly polymorphic HLA genes present unique challenges for the development of molecular approaches to genotype HLA alleles. This brief review will summarize the utility of next-generation sequencing (NGS) for HLA genotyping, highlighting the advantages of this approach over other molecular methods for typing HLA alleles.
To accurately determine an HLA genotype, phase (cis/trans) must be established between polymorphic positions both within individual exons and between different exons in the HLA gene. Using standard Sanger sequencing methodology, both alleles of a particular HLA locus are amplified and sequenced together resulting in multiple heterozygous positions in the electropherogram tracing. (Figure 1A).
Figure 1. Phase Ambiguity. A) Standard Sanger methods for HLA genotyping result in multiple mixed heterozygous positions in the electropherogram. The number of potential cis/trans relationships is 2n where n=number of heterozygous positions in the sequence (3 in this example). B) Next-generation sequencing methods utilize clonal amplification whereby eavh DNA fragment from each parental HLA allele is amplified and sequenced independently, thus establishing the phase of linked polymorphisms.
As the phase of the polymorphic positions cannot be visually determined, additional steps are required to assign an HLA genotype. These include the use of informatics software to query the IMGT/HLA sequence database and assign the most likely combination of alleles, PCR amplification of only one allele, or the use of sequencing primers which anneal to only one of the two potential HLA alleles. In some instances, alternative allele pairs cannot be excluded, leading to genotype ambiguity. The additional steps required to generate HLA genotypes using Sanger sequencing are laborious and time-consuming, thus increasing the costs associated with HLA genotyping. As a consequence, only a few select exons of an entire HLA gene are routinely sequenced in clinical laboratories to determine a patient's HLA genotype and ultimately the degree of HLA match between donor and recipient.
A common feature of NGS technologies is that each fragment of DNA is amplified and sequenced independently, dramatically reducing the phase ambiguities encountered with Sanger sequencing (Figure 1B). Since 2009, many different approaches for using NGS for HLA genotyping have been reported using a variety of capture strategies and sequencing platforms2. The first applications of NGS for HLA genotyping utilized the 454 platform as the read length of this technology (~ 250-500 bp) was sufficient to cover the average size of an HLA exon3. Similar to Sanger approaches, the 454 HLA typing strategies utilized exon-targeted amplification (Figure 2A), which led to challenges in primer design and required numerous PCR reactions during library preparation. Although automation and microfluidic PCR technology were able to mitigate some of these issues, the amplicon-based sequencing approach was gradually replaced by a shotgun sequencing strategy (Figure 2B) in which long-range PCR is used to amplify each HLA locus in a single reaction4-7. The resulting large PCR amplicons are fragmented to produce appropriately sized sequencing templates, and the short (100-250 bp) sequencing reads are aligned to re-create a full-length HLA sequence. The advantage of the long-range PCR and shotgun sequencing approach is that primers can be designed to anneal in less polymorphic regions of the HLA genes (5' and 3' UTR's, for example), and more of the HLA genetic sequence can be captured and sequenced. Paired-end sequencing can also be utilized to bioinformatically phase HLA sequence data over longer genetic distances (Figure 2C), including between exons. Multiple commercial assays that utilize the shotgun approach for HLA NGS are now available for the Ion Torrent (Thermo Fisher Scientific) and Illumina sequencing platforms.
Figure 2. Sequencing strategies for HLA genotyping by NGS. A) Standard Sanger and 454 methods for HLA genotyping utlize an amplicon-base approach in which select exons are amplified and sequenced bi-directionally. B) Whole-gene HLA genotyping strategies use long-range PCR to produce a large amplicon which is then fragmented to produce library fragments of suitable size for sequencing. C) Within an HLA gene, the distances between HLA exons (highlighted here in reb, blue, and yellow) cna vary widely (230bp to 2.3 kb). To phase between exons, sequencing templates that vary in size (0.8 to 2 kb) are generated and paired-end sequencing is used to generate linked sequence reads that enable phasing between exons.
One of the inherent limitations to the shotgun sequencing approach are the bioinformatic challenges associated with clustering and alignment of highly homologous, yet polymorphic sequence data. For example, novel sequence variants (representing HLA alleles not yet characterized in the IMGT/HLA sequence database) may lead to errors in alignment, producing incorrect HLA genotyping calls due to mismatches with the reference. To solve this problem, long-read sequencing technologies such as the PacBio RS II platform have been used to generate long (>3 kb) HLA sequence reads to create a consensus full-gene HLA sequence that can then be aligned to a reference database to generate an HLA genotype8.
Finally, bioinformatic approaches have also been developed to produce HLA genotyping information from targeted capture (exome) and non-targeted whole-genome sequence data9. While these analysis pipelines perform generally well, a major hurdle is the relative lack of sequence coverage from the HLA region obtained through exome capture technologies and sequence read misalignment caused by pseudogene sequences. Despite these challenges, improvements to capture strategies and HLA genotyping software may soon offer the ability to generate accurate HLA genotyping from exome and whole-genome datasets. With the development of NGS and companion bioinformatic approaches for HLA genotyping, we now have the ability to better define genes within the HLA region, identifying polymorphisms that may elucidate the evolutionary history of HLA alleles, contribute to gene expression, and control disease risk.