Skip to main content
Premium Trial:

Request an Annual Quote

Sequences Outside of Reference Genome Characterized in Icelandic Population

NEW YORK (GenomeWeb) – A large collection of non-repetitive sequences that are missing from the reference genome appears to include ancestral sequences and sequences involved in human traits or disease, new research suggests.

Using variant calling software called PopIns, researchers from Iceland and Germany analyzed genome data for more than 15,000 sequenced Icelanders. Their search led to almost 3,800 non-repetitive, non-reference sequences, dubbed NRNRs, including many sequences present in chimps and other non-human primates.

Adding in existing genome-wide association data, the team uncovered more than 100 NRNRs in linkage disequilibrium with variants implicated in human conditions. Using information for more than 300,000 heart attack cases or controls, the researchers also identified an NRNR on chromosome 17 with ties to lower-than-usual myocardial infarction risk.

"Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease," Decode Genetics/Amgen researchers Bjarni Halldorsson and Kari Stefansson, the study's co-corresponding authors, and their colleagues wrote.

Several large population studies have started unraveling the human sequences and structural variants that are not yet represented in the human reference genome, the team noted, though the prevalence and precise positions of these sequences have not been fully defined.

As they explained in Nature Genetics today, Halldorsson, Stefansson, and their colleagues analyzed genome sequence reads for 15,219 Icelanders, using PopIns to track down reads that did not readily align with the human reference genome or to known bacterial or viral genomes. After weeding out private and fixed mutations, the team was left with candidate NRNRs that were subsequently tested by imputation in 151,677 genotyped individuals from Iceland — a search that led to more than 6,700 apparent NRNR markers.

When they took a closer look at stretches of sequence tagged by such markers, the researchers narrowed in on potential NRNRs spanning nearly 327,000 bases, which they attempted to map back to chimp and human genomes based on breakpoint sequences.

Overall, the team tracked down 3,791 breakpoint-resolved NRNRs, with most of the NRNRs larger than 200 bases turning up in the chimpanzee genome. Based on almost 18,000 markers from prior GWAS, coupled with Icelander genotypes, the group noted that at least 149 NRNRs overlapped with loci linked to human diseases or traits. And using data from case-control studies done by Decode Genetics, the researchers identified a myocardial infarction-related NRNR on chromosome 17 that also appeared to be associated with decreased heart attack risk in samples from a large heart disease consortium.

Though the researchers conceded that it will be daunting to include all human sequences into a single reference genome, particularly when sequences are depicted in a linear manner, they called for efforts to update the reference genome in ways that reflect NRNRs present across human populations.

"While the reference genome in its linear form lacks some human sequence, the NRNR sequences discovered in this study are not truly novel. Most are either ancestral or translocations of human reference sequence," the authors wrote. "If the NRNR sequences are confirmed in other populations, we recommend that they be included in future releases of the reference."