Schloss is program director for technology development at the National Human Genome Research Institute, and in recent talks with sequencing centers he has learned that's what they want.
"These different technologies with different capabilities are going to be useful for different kinds of studies," Schloss told GenomeWeb News this week. "Having multiple different devices, which may or may not mean multiple different vendors of those devices ... most of the centers feel would be a healthy situation, as opposed to having a single vendor to which you need to go for all of your sequencing needs."
To be sure, most sequencing centers want sequencing instruments that are speedier, cheaper, and accurate. But they also see a need for instruments with various read lengths and different abilities, such as collecting information other than base identity.
This is noteworthy given the varied playing field that is emerging.
According to Schloss, most centers say they need a machine that can generate read lengths of 500 to 700 base pairs for de novo sequencing of mammalian genomes, and most centers are achieving read lengths of 800 to 1,000 bases with traditional sequencing machines.
Opinions vary, however, on the read length needed to resequence the human genome. A recent study in the journal Nucleic Acids Research argues that read lengths of 25 bases are sufficient to probe 80 percent of the human genome while 43-base lengths can cover 90 percent. "It becomes a statistical argument -- what proportion of the genome can you uniquely assign at some read length," Schloss explained.
To Schloss, a market exists for sequencers that can generate varied reads. "If one can deliver very, very cheap 25- to 100-base reads that are of sufficient quality, people will figure out ways to use those for lots of different things," Schloss said. "And if people can figure out how to deliver really cheap reads that are 100,000 bases long, we'll figure out interesting ways to use those data in ways that we perhaps haven't thought about today."
For example, he said, if scientists want to look at single-base variations in 1,000 individuals, short reads could work very well. "In some cases, you are going to essentially be looking at models of what you are looking for in a genome and you'll be able to extract information with very short reads without assembly information," Schloss said. "For other kinds of studies you need assembly information, and that may mean you'll need ways to localize reads relative to each other, and it may require longer reads."
Some researchers working on next-generation sequencers, such as those focused on nanopore strategies, are devoting their efforts to achieving single-read lengths of 10,000 or 100,000 bases. While many in the field question whether these machines will be able to achieve a high-quality per-base read, if they could, then assembly problems would almost disappear.
Scientists could also contemplate different types of experiments, such as using the instruments to analyze soil, air, and water samples. "If you think about the problem of trying to assemble genome sequences when you have 100 or 1,000 organisms present in different ratios ... if you have short reads [then] that's really going to be a problem, particularly if some members of that population are organisms whose genome sequence is not known," Schloss said.
He also pointed out that considering only individual human-genome sequencing is too limited a view of the potential use of emerging sequencing technologies. "There is a lot of focus on sequencing individual human genomes to understand disease susceptibilities and potential adverse drug reactions, but other uses are also recognized, whether it is understanding the biota, or looking for infectious diseases that are spreading in a hospital setting or a shopping mall, or for biowarfare detection," Schloss said. "So, there are all different kinds of reasons for collecting genomic information, on top of the ones that we are most focused on, which are for human disease prevention in a medical setting."
The needs of sequencing centers are not confined to the way people have defined large-scale human genome sequencing to date. Perhaps researchers will also figure out a way to collect not only base identity, but how a base is modified.
"If one can achieve sequencing with nanopores, then maybe it will also be possible to distinguish between the nucleotide and its methylated form," Schloss said. Methylation has been implicated in some cancers, so this information could be used to diagnose and monitor treatment effectiveness.
"We are sort of looking under lamp posts, looking at the things that we know to look for," Schloss said. "If you could look at the whole genome, maybe you would find some surprises. You'd find things that we are not looking for now, because we don't know to look for them. Part of the idea of developing these technologies is [that] it allows you to go after questions without having a specific hypothesis."