NEW YORK (GenomeWeb) – Researchers are tantalizingly close to achieving chromosome-scale genome assemblies for crop plants, thanks to recent work by researchers from the University of California, Davis, BGI, and Dovetail Genomics.
By combining in vivo and in vitro proximity ligation technologies, highly accurate and contiguous genomes could help accelerate more traditional, genetic-marker based crop breeding and unleash the potential of targeted genome editing to improve plants other than corn, wheat, and soybeans.
In a new publication in Nature Communications, the collaborators used Chicago libraries, an in vitro scaffolding improvement technique, to assemble a lettuce genome, the Salinas cultivar of Lactuca sativa. Using Dovetail's in vitro proximity ligation Chicago library preparation, the researchers increased the n50 of scaffolds from an Illumina short read assembly from 476 kb to 1,769 kb.
Since then, Dovetail has launched in vivo Hi-C as a commercial service and has started combining it with Chicago. They've worked with UCD professor and the paper's lead author Richard Michelmore on the L. sativa genome using the complementary techniques. "This has recently provided an even better assembly than that described in the paper," Michelmore said. "There are now real chromosome-scale assemblies coming out" for plant genomes in the next six to 12 months.
Over the last year, several research groups have published papers indicating the ability to create scaffolds spanning entire chromosomes, telomere to telomere — for animal genomes.
Early last month, a team led by the US Department of Agriculture published a study in Nature Genetics reaching chromosome-length scaffolds for a de novo goat genome assembly, highlighting a combination of several technologies including short- and long-read sequencing, optical mapping, and Hi-C.
Later in March, researchers from the Baylor University and Harvard University reported generating chromosome-length scaffolds in assemblies for the human genome and two mosquito genomes.
But plant genomes are a different kind of beast.
"It is difficult to order and orient contigs and scaffolds in assemblies of large plant genomes because of the unfavorable relationship between physical and genetic distances as well as the sparse distribution of informative low-copy sequences due to the high repeat content," Michelmore said.
That was a problem evident during the first phase of the lettuce genome project. The assembly based on Illumina reads was good, Michelmore said, but many smaller scaffolds could not be assigned to linkage groups due to a lack of markers, he said. "The physical resolution of the assembly was smaller than that of the genetic map; therefore scaffolds could not be ordered or oriented within genetic bins."
Having heard a presentation from Dovetail at the 2015 Plant & Animal Genome conference and having used Hi-C in plants and bacteria, Michelmore knew where he wanted to turn, though he also considered using optical mapping from Bionano Genomics. "If I can get someone else to do my work for me, I will do so," he said. The initial Chicago library-based assembly was done as a paying customer. "We fed them a good Illumina assembly and made it a lot better."
Dovetail improved the physical resolution and resulted in greatly increased contiguity, he said. "It provided a physical resolution at least as great as that of the genetic map, so that we could orient and order many of the scaffolds, particularly in complex regions such as clusters of disease resistance genes," he said. "It also identified infrequent erroneous joins in scaffolds and allowed the inclusion of some of the smaller scaffolds that had not been mapped previously." The resulting assembly contained 90 percent of the genome in 1,520 scaffolds.
Following the initial work, Dovetail and Michelmore launched an ad hoc, informal collaboration. While Dovetail had used Hi-C on the Xenopus laevis frog genome, it hadn't built experience with plant genomes. At the time, Hi-C was not available as a service yet. Dovetail created two more assemblies, one using another round of Chicago and the second using Hi-C, said Veronica Mankinen, Dovetail VP of commercial operations.
The data helped the firm improve its HiRise assembler software, she said. "He used genetic mapping data to validate the nine chromosomes [of the lettuce genome]. Utilizing mapping data, we compared to assembly and helped to identify where there were mismatches." The firm's developers then used those mismatches to go back and tweak the assembly algorithm slightly, she said. Michelmore will be presenting data on the Hi-C assembly at an upcoming Dovetail webinar.
For Michelmore, chromosome-scale assemblies means getting in on the CRISPR/Cas9 genome engineering revolution. "With genome editing, the game has changed totally," he said. "You've got to know the gene to edit it."
Though commercially important, lettuce hasn't previously gotten the attention that, say, soy or corn has. But with the next level of accuracy, a lettuce genome assembly could help serve as a reference for the Compositae plants, an enormous phylogenetic family of plants with immense agricultural significance.
"They are the most successful family of flowering plants in terms of number of species and habitats inhabited," Michelmore said. That includes several dozen crop species, more the 200 domesticated species, and many noxious weeds. "These new genome assembly approaches allow the rapid and inexpensive development of highly informative genome resources for species within this and other plant families that were previously inaccessible," he said.