NEW YORK (GenomeWeb) – Researchers from the International Barley Genome Sequencing Consortium (IBSC) have used a combination of hierarchical shotgun sequencing and chromosome conformation capture mapping (Hi-C) to create the most complete reference of the barley genome to date. The genome they developed represents 4.79 Gb, approximately 95 percent, of the plant's genomic sequence content.
"Large plant genomes consist mainly of highly similar copies of repetitive elements such as long terminal repeat retrotransposons and DNA transposons," the researchers wrote in the study, published today in Nature. Consequently, these large complex genomes have been difficult to sequence. Last year, a group of European researchers used a technique, called MutChromSeq, to reduce the complexity of the genome by breaking it down into chromosome parts.
IBSC was founded by researchers from the Australian Centre for Plant Functional Genomics, the United States Department of Agriculture Agriculture Research Service at Iowa State University, the Leibniz Institute of Plant Genetics and Crop Plant Research, the National Institute of Agrobiological Sciences, Okayama University, the Scottish Crop Research Institute, the University of California, Riverside, the MTT Agrifood Research, and the University of Helsinki. By using Hi-C to reduce the barley genome's complexity, the researchers took inspiration from the research team that recently produced a lettuce genome assembly.
"The exciting methodological advances in sequence assembly and genome mapping have enabled even large and repeat-rich genomes to be unlocked, and hold the promise of constructing reference-quality genome sequences, not only for a single cultivar, but also for representatives of major germplasm groups," the authors wrote.
They began by sequencing a total of 87,075 bacterial artificial chromosomes on the Roche 454 Titanium or an Illumina short-read system. Then they used BioNano's Irys system to generate physical maps and identify non-redundant map clones. The team then constructed super-scaffolds composed of merged assemblies of individual BACs, that were then assigned to chromosomes using a population sequencing (POPSEQ) genetic map. Finally, the team used Hi-C to order and orient BAC-based super-scaffolds to produce a chromosome-scale assembly of 6,347 super-scaffolds.
After mapping the transcriptome data and reference protein sequences from other plant species to the assemblies, the researchers identified 83,105 putative gene loci, which were broken down into two subgroups — high-confidence genes (39,734) and low-confidence genes (41,949).
The researchers noted that "the composition of genes and repetitive elements differs between distal and proximal regions." They also wrote that "gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains."
In an interview, IBSC Chair and Leibniz Institute researcher Nils Stein noted that the method does have room for improvement. "We are still lacking a lot of information about the purpose of these repetitive elements," he said.
Going forward, Stein and his colleagues in the IBSC plan to use their shotgun sequencing/Hi-C method to gain a richer understanding of the genetic diversity present in the barley population and what implications that might have on creating better strains of the cereal. They are also using the same technique to assemble a reference wheat genome.