SAN FRANCISCO (GenomeWeb News) – Researchers from the US and UK are using an admixture-based mapping strategy in their effort to properly place human genome sequences that are currently missing from the reference genome, attendees heard at the American Society of Human Genetics annual meeting.
By leveraging patterns in admixed genomes to assess these sequences — which consistently turn up in human genomes, but fail to map to the reference — researchers are able to glean long-range sequence data that can help find homes for these missing bits of the reference assembly, the Broad Institute's Giulio Genovese said during a presentation in a population genetics session on Friday.
So far, the reference genome is missing an estimated 1 percent of its "euchromatic" sequence, he noted, and around 6.5 percent of its tightly packed, less accessible "heterochromatic" sequences.
For the euchromatic portions of the reference genome alone, that adds up to nearly 30 million base pairs of missing DNA, Genovese explained, including some coding sequences. In addition, millions more bases within the existing version of the reference itself are believed to be affected by misalignments stemming from the absence of these sequences.
In an effort to begin stitching missing sequences into the human reference genome, Genovese and his colleagues from the Broad Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, and elsewhere turned to data for more than 240 admixed individuals sampled for phase I of the 1000 Genomes Project looking for linkage disequilibrium patterns to help map the new sequences.
Generally speaking, the mapping strategy they are using involves finding polymorphic markers in the unlocalized contigs of interest. Those informative markers are then genotyped in admixed individuals, Genovese explained, and the information gained through this admixture mapping is leveraged to determine the polymorphic markers' location in the genome.
Based on the data they've generated using this approach so far, the team has mapped more than 13 million bases of sequence. The larger contigs have been somewhat easier to place, Genovese said, since there are usually more SNPs to use in the mapping step.
At first glance, the missing pieces appear to map across the human reference genome, though researchers have seen a propensity for previously unmapped sequences to turn up at sites near centromeres, creating what Genovese called "euchromatic islands in a heterochromatic ocean." Moreover, he noted that many of the sequences seem to represent segmental duplications that have become fixed in the human lineage.
"Our approach, based on mapping clones through population admixture, is complementary to conventional clone tiling path approaches based on overlapping sequence at the end of clones," Genovese and his co-authors explained in the abstract accompanying the ASHG talk, "and might play an important role in completing physical maps of the euchromatic part of the human genome, particularly in cases where euchromatic sequence is buried inside extensive repeat-rich sequence."