NEW YORK (GenomeWeb) — After more than a decade of collaborative work, a team led by researchers at the University of California in Riverside has published details of its successful efforts in sequencing large gene-rich portions of the barley genome.
The results, published last week in The Plant Journal, have greatly expanded the scientific knowledge of barley genetics, and could go a long way toward fully elucidating the genetic code of barley and other important plant food sources, such as wheat.
The research team, which comprises investigators from more than two dozen public and private organizations, started shortly after the first library of the barley genome was created back in 2000.
Timothy Close, plant geneticist at the University of California in Riverside and corresponding author on the newly published paper, told GenomeWeb this week that the team's goal was to sequence the gene-rich regions of the genome to discover more genes than previous research had been able to uncover. Although a draft of the barley genome was published in 2012, it included equivalent information on only a relatively small number of these gene-containing regions.
Over many years, Close and colleagues were able to generate approximately 1.7 Gb of genomic sequence containing an estimated two-thirds of all barley genes. Sequencing the whole barley genome has been a challenge because barley possesses a large and highly repetitive genome of 5.1 Gb — almost twice the size of the human genome (3.2 Gb).
Close said one of the biggest challenges was finding a way to "shrink the barley genome" by focusing "only on a small section that held the information that people were interested in."
Close worked closely with his colleague, Stefano Lonardi, a computer scientist at UC Riverside, to develop new algorithms to sequence these smaller gene-rich areas of the barley genome. Their first hurdle was finding the best way to make optimal and cost-effective use of their Illumina HiSeq 2000 sequencer, which had seven channels for different sequencing runs. After much consideration, the researchers decided to use a previously described combinatorial pooling system to sequence these regions. This involved multiplexing pools of bacterial artificial chromosomes (BACs) that had been manually prepared.
They began by calculating a minimum tiling path that helped them determine sets of 2,197 BACs that were distributed into 91 pools. Each BAC was put into seven different pools to optimize the sequencing information they received.
After the sequencing was done, the researchers used the Velvet assembler to reassemble the BACs and then used CLARK, a supervised classification method, to assign the reassembled BACs to chromosome arms. They also used published sequence from goat grass (Aegilops tauschii) to develop a synteny viewer that allowed them to compare sequences from the barley genome with data from the ancestor of the wheat D genome.
That viewer along with all of team's published data are available in a downloadable database called HarvEST:Barley. The database was originally used to support gene function analyses and oligonucleotide design, but has since grown to handle additional activities including microarray content design, SNP identification, genotyping platform design, comparative genomics, and the coupling of physical and genetic maps. The database includes not only the research published by Close and his team, but the rest of the available mapping and gene data.
The higher-resolution barley sequence also revealed some new insights into plant genetics. For instance, the research team found that contrary to previous assumptions, gene-rich areas are not found only in high recombination regions. "It is true that high recombination regions that are out near the telomeres are gene rich," said Close. "But now we see that there are gene-rich regions that are in low recombination regions." This is critically important information for plant breeding where it is often difficult to select the specific genes in the middle of these gene-dense low recombination regions. There is a much higher chance of selecting neighboring genes in the region that will have unknown and possibly unwanted effects, whether it's barley, rice, or some other species.
The researchers contributing to and using the HarvEST:Barley database have helped fill in many information gaps for the barley genome, but the International Barley Sequencing Consortium (IBSC) is leading the efforts to produce a whole sequenced genome. The IBSC aims to have more than 95 percent of the genome sequenced in the coming year. Their eventual hope is to create a complete reference genome for barley to help scientists better understand this important global food crop, how it might react to the effects of global climate change, and if there is anything that can be done to keep barley a viable food source decades down the line. However, they have their work cut out for them. "That remaining 30 percent or so is going to be the hardest part," said Close. "We kind of did the easy part."
In addition to the IBSC, Close indicated that researchers on the International Wheat Sequencing Consortium also hope that more reference data from the barley genome becomes available. The wheat genome has a similarly long and repetitive structure that has proved difficult to unravel. However, barley is a much closer relative to wheat than other grains that have been used as a comparative reference in the past, such as the rice genome, and scientists hope that the barley genome offers useful information to help them get closer to a complete wheat genome. "[This new barley genome research is] not the missing link for every need," said Close, "but it'll help."