Back in the early days of next-gen sequencing, when the Solexa brand was still around, some serious challenges faced the field, MassGenomics' Dan Koboldt writes. Bioinformatics algorithms for capillary-based sequencing didn't scale, reads were shorter and error-prone, tools were expensive and hard to access, and few people had any experience with NGS.
Most of those problems have now been solved, he says. But hold the confetti and the victory parade, as newer, harder challenges have cropped up in place of the old ones.
As the cost of sequencing has dramatically beaten the Moore's Law curve, more machines, and faster ones, means more data. Sequencing studies are churning out so much data it is a serious challenge to find a place to put all of it, Koboldt says. Most researchers have to choose between deleting data, spending more money to store it, or holding up the flow of data production and analysis.
Another problem is scaling up NGS studies to achieve statistical significance. If 10,000 samples is a good number of samples for a common disease study, the cost of sequencing all of these, even at the low price $1,000 per-genome, is still probably out of reach for most research groups. Investigators are either being forced to get by using fewer samples, are combining some sequencing with follow-up genotyping, or collaborating with other labs and consortia whose sample populations, phenotypes, or study designs may vary.
Finding samples also can be a problem. The widespread availability of exome and genome sequencing has made samples a new commodity, Koboldt says. If you are using NIH or other public funding there is another layer of difficulty, as all the data must be dumped in public repositories. That requires informed consent for data sharing from the volunteer and from IRBs, meaning that many samples that show up for sequencing don't cut muster and are rejected.
Privacy is another big problem, as more detailed information about more disease risks, ancestry and other traits is available and can be used to identify people.
Although NGS has proven to be immensely powerful at discovering distinct variants, and 50 million of them are housed in NCBI's database of human sequencing variation, validating them functionally is not so easy, Koboldt says,
"Our inability to predict the phenotypic impact of genetic variants lurks beneath the veneer of genetic discoveries like a shark following a deep-sea trawler."
Lastly, there is the clinical dimension. "We all know that NGS is destined for the clinic," he writes. However, there are "many hurdles" to jump before a new technology can be used in patient care, such as CLIA/CAP certification and the high level of confidence that is required before clinical decisions could be made based on genomic findings.