NEW YORK (GenomeWeb) – Members of the International Wheat Genome Sequencing Consortium have established a draft genome for bread wheat, Triticum aestivum — work they reported online today in Science.
The draft, which includes survey sequences for all 21 wheat chromosomes and involved allocating roughly half of the known wheat genes, is a stepping stone en route to a wheat reference genome, explained Kellye Eversole, director of the IWGSC.
"As the result of opportunities to do some shotgun sequencing of individual chromosome and chromosome arms, we launched an interim or short-term milestone to achieve survey sequences for all 21 chromosomes," Eversole told GenomeWeb Daily News.
To produce the type of reference sequence required for more detailed regulatory and genome-wide studies, she explained, the team first needs to complete physical maps for 20 of the 21 wheat chromosomes.
In an accompanying article in Science today, the team presented a reference sequence for the T. aestivum chromosome 3B, which was physically mapped for an earlier phase of the IWGSC project. That chromosome sequence established the feasibility of creating a broader wheat reference and is expected to serve as a template for these future efforts.
"The real breakthrough came in 2008, when we completed the physical map of 3B," Eversole said. "Once we did that, people thought, 'It can be done' and we had a lot of additional people who wanted to become involved."
Two more papers in the same issue of the journal outlined efforts to apply the draft sequence and transcriptome data to decipher wheat's evolutionary history and sub-genome dynamics during grain filling, respectively.
The wheat genome's complexity stems from its size and repeat sequence content as well as its ploidy — the presence of multiple genome copies that reflect polyploidization events over hundreds of thousands of years of evolutionary history.
In particular, researchers suspect T. aestivum resulted from an ancient hybridization between T. urartu-related species to form T. turgidum, which eventually hybridized with the Aegilops tauschii grass species to produce an allohexaploid plant with three chromosome sets or sub-genomes comprised of seven chromosomes apiece.
The IWGSC originated in 2005 with 18 coordinating committee members and has since grown to more than 1,000 members in dozens of countries. Investigators involved in the project have done everything from generating wheat genetic maps and gene catalogs to investigating individual chromosomes and assigning genes to sub-genomes with the help of genetic studies on plants related to wheat's ancestors such as T. urartu and Ae. tauschii.
"We developed a strategy based on chromosome sorting to divide these up into individual chromosomes," Eversole said, noting that this approach required BAC libraries for each chromosome.
Using wheat lines developed from the Chinese Spring cultivar of bread wheat, the researchers generated and assembled sequences for each of the plant's chromosomes using Illumina paired-end sequencing and the ABySS assembler.
With the exception of chromosome 3B, which was sequenced in its entirety, the sequence data represented individual chromosome arms that had been isolated and shotgun sequenced.
With the 17-billion-base ordered draft sequence in hand and RNA sequence data from several wheat tissues or developmental stages, the team then went on to scrutinize repeat distribution, gene content, microRNA patterns, and more in each of the three wheat sub-genomes, known as A, B, and D.
For example, the researchers tracked down more than 124,000 apparent gene loci, including more than 75,000 that could be mapped to a particular chromosome site. In addition, they identified almost 300 potential miRNAs and high copy repeats comprising roughly one-quarter of the wheat sequence reads.
The distribution of genes assigned to chromosomes so far indicate that protein-coding sequences are spread fairly uniformly along the chromosome and even into centromeric sequences, Eversole noted, rather than residing in discrete gene-rich regions.
The available sequences also point to a preponderance of duplicated genes in the wheat genomes, suggesting relatively few gene copies have been lost in the wake of past genome duplication events, noted co-principal investigator Klaus Mayer of the German Research Center for Environmental Health.
"Roughly 90 percent of genes are retained, or seemed to be retained and gene loss seems to be unexpectedly low," Mayer told GWDN. "So maybe there's a selective pressure on the retained gene copies."
The team saw somewhat more genes falling in wheat's B genome than in its other two sub-genomes, though there were examples of A or C sub-genome chromosomes with higher gene counts than their B genome counterparts.
At the moment, there does not seem to be clear dominance of one wheat sub-genome over the others. Mayer noted. Instead, the researchers saw autonomy for the sub-genomes, combined with examples of highly orchestrated transcriptional modules from a given sub-genome that exerted dominance over other transcriptional units.
The survey sequence data on hand at the moment is still insufficient for analyzing or annotating regulatory or structural elements in the wheat genome, according to Eversole, who also cautioned against applying it for most genome-wide analyses.
On the other hand, the newly available data is expected to help in annotating genes, searching for lineage-specific coding sequences, and linking particular genomic features to a particular sub-genome chromosome, she said.
Ultimately, the group is aiming for a bread wheat reference genome that's on par with the quality of the rice genome. IWGSC members are working to complete physical maps of the remaining 20 wheat chromosomes by the end of this year, with the goal of producing assemblies for each chromosome that are comparable in quality to the existing chromosome 3B reference.
"Our long-term project continues to be the reference sequence for all 21 chromosomes with quality comparable to 3B and comparable to rice," Eversole said, noting that the reference genome sequence could be available in as little as three years if sufficient funding is secured.
Data from the current IWGSC studies are publicly available through a central repository house in France and raw reads have been deposited to the European Bioinformatic Institute's short read archive. The team plans to continue to make information available to other researchers, crop breeders, and producers as it becomes available.
"With the survey, we've been able to get some tools and resources out for breeders and plant scientists in short order, which has been one of our strategies," Eversole said. "We never wanted to be in a situation where we made people wait until we had the whole sequence. We wanted to push resources out as they became available."