Sequencing and phasing the human leukocyte antigen region by either conventional or next-generation sequencing has proved challenging due to the complex nature of the genes. Now, however, researchers from the National Institute of Genetics in Japan have devised a method that uses next-generation sequencing and indexed multiplexing, and demonstrated that it can completely sequence and phase six of the HLA genes.
The method, published online in May in Genome Biology, relies on the clonal amplification benefits from next-gen sequencing, creating multiple libraries with various insert sizes, 2x250 base paired-end sequencing, and a "gene-tagging" protocol, explained Ituro Inoue, senior author of the paper and professor within the division of human genetics at the National Institute of Genetics in Japan.
The 3.6-megabase HLA region comprises six polymorphic genes and at least 132 protein coding genes that play a role in the regulation of the immune system. It is also associated with 100 different diseases, including type I diabetes, rheumatoid arthritis, psoriasis, and atopic asthma. Currently, there are two common methods of HLA genotyping — sequence specific oligonucleotide hybridization, or SSO, and Sanger sequencing.
Nevertheless, both of these methods have limitations. SSO requires preparing in advance oligonucleotides that correspond to the various genotypes. Additionally, new alleles can cause difficulties. Sanger sequencing sequences both chromosomes simultaneously, so phasing is difficult, and allele determination can also be ambiguous because different alleles often share similar sequences.
As such, a number of groups have been working on developing next-gen approaches for HLA sequencing. Many have focused on using Roche's 454 instrument due to its longer reads, and a more recent approach used the HiSeq and MiSeq (IS 1/25/2011 and CSN 5/23/2012).
In the recent Genome Biology study, Inoue's team used a transposase library prep method on long-range PCR amplicons, followed by 2x250 paired-end sequencing on the Illumina MiSeq to sequence and phase the HLA genes HLA-A, -C, -B, -DRB1, -DQB1, and – DPB1 from 33 HLA homozygous samples, 11 HLA heterozygous samples, and three parent-child trios.
The long-range PCR was used to amplify the six highly polymorphic genes. As with other next-gen methods, Inoue said, designing primers to amplify the HLA genes is tricky due to the highly polymorphic nature of the genes. "This is not straightforward and was trial and error," he said.
Libraries were constructed using Nextera's transposase method, which also adds adaptors for multiplexing.
One of the main differences between other protocols and this one is the variable sized library that the team generated, which ranged from 500 bases to 2,000 bases. Inoue said this helped them to get haplotype sequences at remote regions. "This is the most crucial part of our analyses," he said. "We need to prepare large and variable library size."
The team also tested two different tagging methods: an individual tagging method, by which all of the amplicons from the six HLA genes were pooled before being subjected to transposase treatment and a gene-tagging method where each PCR amplicon was subjected to transposase-based library construction separately, thus adding a gene-specific index.
The researchers found that the individual tagging method did not work for the heterozygous samples — they were unable to obtain phase-defined sequences, "probably due to mismapping of paired-end reads," the authors wrote.
"If we combine different sets of genes of heterozygous samples, mismapping disturbs the authentic sequencing," Inoue said.
He further explained that the team has since abandoned the individual tagging method completely because of "biased amplification during the library preparation."
Instead, the team is working on refining its gene-tagging approach. For this method, each PCR amplicon undergoes transposase-based library construction, creating a library with insert sizes between 500 bases and 2,000 bases.
The researchers successfully obtained sequences of six genes in the 11 HLA heterozygous individuals. Reads were aligned to the respective HLA genes of the reference genome, allowing at most, 80 mismatches per read. On average, 73.1 percent of all reads were successfully mapped to the reference sequence for all 66 amplicons. Average depth of sequencing ranged from 146x to 6,678x, with a mean of 2,281x, they reported. In addition, HLA class I genes had higher average coverage than HLA class II genes, which the authors said may be due to the larger amplicon sizes for the class II genes.
To phase the genes, the researchers first looked at specific SNVs on both forward and reverse reads to try and separate the two chromosomes and determine the two phase-defined sequences. "Taking advantage of [the] highly polymorphic nature of the HLA genes, wide-ranged library size, and deep sequencing, it becomes possible to phase sequence reads on a chromosome and tile phased reads to generate HLA gene haplotype sequences," the authors wrote.
The team was able to obtain 132 phase-defined sequences, from which they generated 103 complete haploid sequences that covered the entire HLA gene. The remaining 26 haploid sequences covered more than 95 percent of the gene. Another three sequences had less than 95 percent coverage, which the authors said was due to the remaining unphased regions that may have included large gaps.
Next, they tried to designate HLA allele numbers by searching for known allele sequences in the IMGT/HLA database. If they did not get a hit from the complete HLA gene in the database, they used just the exons and looked for known cDNA sequences. The team was able to determine the closest HLA number for all of its gene haplotype sequences, while 104 alleles were matched by searching the full gene sequences and 28 were matched from the cDNA sequences.
To confirm the method, they tested it on three parent-child trios and found that the sequencing protocol was consistent with the SSO genotyping data and hereditary pattern of the families.
Inoue said the next steps are to apply the approach to the entire 3.6 megabase HLA region, to "completely determine the haplotype of the HLA region." Additionally, he said he wants to continue to improve on the analytics to make them more user-friendly.
In the future, Inoue said that he thinks HLA sequencing will become important for clinical applications. Indeed, already a number of groups are designing protocols for such applications and Life Technologies recently had its 3500 Dx Sanger instrument and HLA assay cleared for diagnostic use (CSN 2/13/2013).
In March, for instance, the Red Cross Blood Transfusion Service of Upper Austria received accreditation for an HLA typing method using Roche's 454 GS Junior (CSN 3/6/2013). Roche has also launched HLA kits for research purposes, but with the goal of making them suitable for clinical use (CSN 4/5/2011).