By Vivien Marx
Armed with a $110,000 Small Business Innovation Research grant from the National Human Genome Research Institute, Geospiza plans to collaborate with the Mayo Clinic to explore how to develop software systems that will push second-generation sequencing closer to use as a cancer diagnostics tool.
"The plan is to test the feasibility of developing new approaches to detecting sequence variants at low levels," Geospiza CEO Todd Smith told BioInform.
While the near-term goal for the project is to deliver improved methods for detecting and visualizing sequence variation, this capability is eventually expected to enable non-invasive cancer detection methods, he said.
The project aims to improve methods for detecting rare mutations that determine how cancer cells grow and respond to treatment. As Geospiza states in the grant abstract, "achieving this goal will require that we have streamlined procedures for sample preparation and laboratory processes, a complete understanding of [next-generation sequencing systems], error profiles and assay dynamics, and robust validatable software systems to support diagnostic tests in the clinical enterprise."
Deep sequencing provides "more sensitive ways" of detecting the germline and somatic mutations that signal tumor development and can also be indicators of tumor growth or resistance, according to the abstract. "Making this possibility a reality requires that we can detect a cancerous cell by genetic changes in its genome in a population of 1,000 normal cells," Smith said.
The Mayo team will be providing support and feedback "to ensure that we are aligned with the goal of moving toward diagnostics and the clinic," Smith said.
Geospiza will also work with some of the Mayo Clinic's datasets, he said, but is currently focused on analyzing data from the National Center for Biotechnology Information's short-read archive, which Smith described as an "amazingly rich source."
Geospiza is currently working on ways to efficiently harvest data from the database, which, as he said, is "affectionately" referred to in the field as a "write-only database."
The resulting web-based software will be "platform neutral" as the company seeks to support "all of the broadly available" second-generation sequencing platforms such as Illumina's Genome Analyzer and Life Technologies/ABI's SOLiD.
Geospiza notes in the grant abstract that its existing software already "addresses a large number of issues related to operating [next-generation sequencing] instruments and laboratory processes in clinical environments," but adds that "our understanding of NGS errors and how to completely characterize NGS datasets, with respect to their potential to deliver high quality information, is incomplete."
As a result, the first phase of the project will test the feasibility of developing clinical systems by analyzing "a limited number" of next-generation sequencing datasets for true variants, false positives, and false negatives "by cataloging discrepant bases relative to control sequences, with respect to sequence contexts, random noise, laboratory steps, and instrument artifacts."
These catalogs will then be used to work on statistical algorithms that can work on "large numbers" of aligned reads, Geospiza said. The plan is to "assign variant detection probabilities to individual bases, as well as calculate summary statistics that can be used to assign descriptive values to datasets from individual samples, and subsequently identify sample artifacts and issues related to sample processing."
One goal of this part of the project is to characterize datasets "with respect to their ability to deliver high-quality information," Smith said
[ pagebreak ]
Another goal is to further extend data quality models by applying base probabilities across a set of reads to see if the team can improve "the sensitivity of detecting variation without sacrificing specificity," he said.
"We will use data from control experiments to test our ideas and then expand the work into other datasets such as those being contributed by the 1,000 Genomes Project," Smith said.
GeneSifter, BioConductor to Benefit
Geospiza plans to use the knowledge gained from this project to update its existing GeneSifter product line. The hope is that this will give researchers better ways to work with second-generation sequencing data and more clear-cut methods for visualizing genetic assay results presented in web-based interfaces, Smith said.
He and his team expect that "integrating visualization with algorithm development will create an iterative test, view, design cycle that will accelerate this kind of development," Smith said.
As Laura Lucas, Geospiza's vice president of marketing, explained in an e-mail, after acquiring GeneSifter from VizX Labs last fall [BioInform 11/21/08], the firm decided to rename its FinchLab platform. Now the firm has two product lines, GeneSifter Lab Edition, formerly FinchLab, and GeneSifter Analysis Edition, which is the acquired product.
Smith said that the similarity of the technological underpinnings of the two products helped with the integration.
GeneSifter AE "rounded out" the firm's analysis capabilities and was part of a long-term plan to integrate lab-management software with data analysis software, "because our belief is that it's one thing" and it allows better ways to discern true results from artifacts, Smith said.
Geospiza plans to make some of the algorithms developed through this project available through the open source BioConductor project. GeneSifter's statistical analysis functionality is based on the R statistical software package that underlies BioConductor, Smith said.
The problem with proprietary software is that "it is really hard to compete with the community" of computer scientists working in genome centers and elsewhere who are "connected to the data" and solving "frontline problems," Smith said.
"Instead, by participating in the community, we contribute to, and benefit from, ongoing work to improve systems rather than reinvent technology," he said.
Smith said he expects some of the firm's customers will want early access to technology being developed in this project, so the plan is to "recruit additional collaborators" as the venture moves into its second phase, he said.