SAN DIEGO (GenomeWeb) – Researchers from Johns Hopkins University, Cold Spring Harbor Laboratory, and elsewhere are in the process of sequencing 100 tomato varieties in 100 days with the Oxford Nanopore PromethIon platform, according to presentations here yesterday at the Plant and Animal Genomes conference.
Johns Hopkins investigator Michael Schatz outlined the rationale for the effort and progress made so far during a session on bioinformatics, while Baylor College of Medicine researcher Fritz Sedlazeck provided additional details on sequence-informed sample selection and structural variant detection from long-read sequence data at an Oxford Nanopore-sponsored workshop later in the day.
Tomato is among the most valuable crops plants, Schatz noted, with more than 175 million tons of the fruit, valued at around $85 billion, produced globally each year.
An international team published a tomato reference genome based on the Heinz 1706 cultivar in Nature in 2012, using a combination of Sanger sequencing, Roche 454 sequencing, BAC-end sequencing, genetic mapping, and other approaches typically employed for early reference genomes.
The 950 megabase diploid genome has been a valuable resource and has a good assembly quality "of its era," Schatz explained. But it is not well suited to analyses of structural variation, which is suspected of contributing to agriculturally and economically important traits — from fruit size to stem snappiness — in the more than 15,000 named tomato varieties.
"Previous studies have hinted at the role of structural variations in domestication of tomato and other important phenotypes," Sedlazeck wrote in his abstract, adding that these variants "are hard to capture and phase with short reads alone, challenging the feasibility of a population-wide approach."
For their National Science Foundation-funded effort, Schatz and his colleagues set out to use long-read sequencing technologies, computational biology, and functional studies for finding and characterizing structural variants in tomato for future studies on everything from natural variation and domestication to crop improvement.
Starting with available short-read sequences for hundreds of tomato varieties, falling into 10 major clusters phylogenetically, the researchers computationally tackled the question of sample selection: rather than selecting samples at random, they developed an open-source sample selection tool called SVCollector for optimizing variant detection and validation.
As members of the team reported in a BioRxiv preprint last June, this approach helped them to narrow in on 100 samples for collectively capturing as much genetic diversity as possible.
The initial proposal for the project called for profiling these samples with a combination of short and long reads, Schatz explained. But results from Oxford Nanopore PromethIon test runs at CSHL last summer prompted them to attempt the project with that instrument.
At CSHL, the nanopore instrument routinely produces between 60 and 80 gigabases of data per flow cell, he noted, and can cover the tomato genome to an average depth of 100-fold in two days. The team is currently sequencing between a dozen and 16 tomato samples per week, on six to eight flow cells running in parallel twice a week, and has sequenced 62 tomato genomes to average depths of at least 40-fold.
This throughput has led to some data management challenges, Schatz pointed out, including a need for upgraded fiber connections and increased storage.
But the investment appears to be paying off in structural variant results: using a combination of de novo assembly methods and approaches for structural variant detection that include both alignment- and assembly-based approaches, the researchers have been identifying between 15,000 to 50,000 structural variants per sequenced tomato genome.
Schatz noted that the researchers have already started tapping the tomato variant dataset to find targets for CRISPR-based gene editing in the plants. In particular, they identified a tandem duplication that leads to two desirable stem traits associated with genetic alleles that normally lead to negative epistasis and poor plant production when found together.
In the abstract for Schatz's presentation, he and his co-authors suggested that the same strategy used for the tomato structural variant pan-genome study "will become the new gold standard for [structural variant] analysis in all species."