NEW YORK (GenomeWeb News) – A team of researchers from the Broad Institute have published a proof-of-principle study showing that they can fill in some of the remaining sequence gaps in the human genome using Roche 454 sequencing.
In a paper appearing online in Genome Biology today, the researchers demonstrated that while some gaps in the human genome are a consequence of structural variation, others reflect low complexity sequence that's difficult or impossible to clone into bacteria. Using 454 sequencing, which doesn't require this cloning step, the team was able to generate sequence covering the chromosome 15 gaps.
"We have demonstrated a simple and scalable method for finishing non-structural gaps in genome assemblies," lead author Manuel Garber, a researcher with the Broad Institute's Genome Sequencing and Analysis Program, and his colleagues wrote. "While clone-based methods remain an effective means of attacking structural gaps, they will not resolve gaps that arise from sequences recalcitrant to bacterial cloning."
Despite extensive efforts to finish the human genome, hundreds of gaps persist in the human genome. For instance, in their previous finishing and analysis of chromosome 15, the team noticed that, despite chromosome 15's small size, its sequence was far from complete.
"We noticed that there were many, many gaps," Garber told GenomeWeb Daily News. And while some appeared to be a consequence of structural variation, he explained, "For a few of them that excuse is not valid."
The current version of chromosome 15 contains nine sequence gaps, including three that don't seem to coincide with copy number variations. Garber and his colleagues started tackling these non-structural gaps in their spare time.
Based on their analysis of the Celera genome, the researchers estimated that these gaps were roughly 9,000, 10,000, and 12,000 base pairs in size. To begin evaluating these, the team designed six primer pairs spanning the chromosome 15 gaps and used them to amplify human genomic DNA. When they attempted to clone this DNA into bacteria and sequence it, though, they didn't get the products they were looking for.
Similarly, when they tried to sequence so-called "shatter" libraries — created by breaking the PCR products into bits that were an average of 500 base pairs long — the team was unable to come up with sequence that was the correct size to fill the gaps.
The researchers speculated that such attempts were thwarted by bacteria's cells intolerance of DNA sequences in the gap regions, making it difficult or impossible to clone these regions into bacteria. So far, the team has not found a good explanation for this bias.
"Why exactly that sequence is toxic to bacteria, we don't know," Garber said.
To overcome such issues, the researchers decided to bypass the bacterial cloning step and directly sequence sheared PCR products from the shatter library using the Roche 454 Life Sciences GS FLX platform. These reads were assembled by hand at first and later using the ARACHNE assembler.
In general, the non-structural gaps that could be filled with this method tended to be low complexity sequence that was enriched for G and T nucleotides on one strand and C and A on the other. "We conclude that sequence composition plays a significant role in what makes these regions difficult to clone, but there are likely to be other factors as well," the authors noted.
When the researchers examined the 454 reads used to construct the Watson genome, they found that there was a bit of sequence that corresponded to chromosome 15 gap regions, though not enough to cover any of the gaps. These sequences are missing from the Watson genome, Garber explained, because that genome was not assembled from scratch but based on previous human genome sequence.
The new approach offers a potential avenue for filling in "Type III" gaps in the human genome, which consist of unique euchromatin sequence gaps but is not aimed at "Type I" sub-telomeric gaps or "Type II" duplicated euchromatin gaps. An estimated 127 Type III gaps remain in the human genome.
While the researchers aren't planning to tackle these gaps themselves, Garber said he and his co-workers decided to publish their chromosome 15 results in case the findings prove useful for those involved in such efforts. They noted that a similar approach could also prove useful for filling sequence gaps in other finished or almost finished genomes as well.
"[G]iven the great effort already carried out to close them, we expect that flanking clones will be very close to the refractory region and, therefore, many gaps will be small ... and likely to be finishable by the method described here," the authors concluded. "The technique we present could also be applied to the targeted closure of gaps in other finished or near finished genomes such as mouse and dog."