NEW YORK (GenomeWeb News) – A large multi-institution consortium is identifying functional regions in mammalian genomes using low coverage genome sequencing and comparative genomics.
The 2X Mammalian Sequencing and Analysis Consortium has already sequenced nearly two dozen mammalian genomes at low coverage. By comparing these genomes to one another, and incorporating sequence information from published genome sequences, the group is pinpointing constrained regions in mammalian genomes with increasingly finer resolution.
The consortium formed when researchers from the Broad Institute, the National Institutes of Health, and elsewhere got together and devised a plan to identify functional regions in the genome, Broad researcher Michele Clamp told GenomeWeb Daily News. Clamp, who has been involved with the consortium for several years, presented data from the 2X Mammalian Sequencing and Analysis project at the Biology of Genomes meeting last week in Cold Spring Harbor, NY.
The researchers reasoned that comparing many genomes would uncover areas resistant to mutation — places expected to house sequence that is functionally important. Aligning multiple genomes also offered an opportunity to see variation within constrained elements, identify increasingly smaller constrained sequences, and find clade-specific patterns, Clamp said.
So far the team has sequenced 22 mammalian genomes at least 2X coverage using Sanger sequencing. Along with sequence data from published studies, the researchers now have genome sequence data for 29 mammals.
Whereas previous data from four mammalian genomes suggested just 60 million bases or so were constrained, comparisons between dozens of mammalian genomes indicates that roughly 100 million bases — about five to seven percent of mammal genomes — are constrained. And by comparing the dozens of genome sequences now available, more than half of these can be pinpointed to within about a dozen bases, Clamp said.
Unraveling the function of these constrained regions is a bit trickier, though, since most of the bases in the newly found constrained regions seem to be non-coding. Researchers have been searching the regions against reference databases and assigning them to classes, if possible, Clamp explained. For instance, some correspond with known or potential transcription factors, insulators, non-coding RNA sequences, and so on.
Members of the team are also gearing up to do experiments aimed at detecting and characterizing protein-binding to constrained regions of mammalian genomes, Clamp said.
Down the road, the researchers plan to top up the genome sequences they have so far with next-generation sequencing. The group also has funding to sequence additional species, Clamp noted, though they still need to decide where to look next in the mammalian tree.
Members of the consortium also are writing up data from the project for publication.