Life Technologies subsidiary Ion Torrent said this week that it will sponsor the first year of a project to develop a suite of computational tools that could make incorporating patients' genomic information into clinical decisions as easy as using Apple's Siri on the iPhone.
Ion Torrent is providing an undisclosed amount of money for the first year of the project, which is led by Carnegie Mellon University.
Funds for subsequent years are expected to come from a number of other sources, including government agencies and private foundations, Robert Murphy, director of the Lane Center for Computational Biology in Carnegie Mellon's School of Computer Science and head of the project, told BioInform.
Besides money, Ion Torrent will provide CMU with access to tools for variant calling and other secondary analysis algorithms that work with its data structure, Alan Willliams, Ion Torrent's vice president of software and informatics, told BioInform. However, the company plans to leave the "interpretive algorithm development" portion in CMU's hands.
Williams said the company decided to fund the CMU study because of its interest "in helping to push ... the use of computers and machine learning ... to interpret and sift through the vast amount of genomic information."
The sequencing instrument vendor also plans to launch a cloud-based variant analysis tool, dubbed Ion Reporter, in the first half of this year. (See related story this issue).
The ultimate dream is to develop what Ion Torrent Founder and CEO Jonathan Rothberg dubbed "doctor in a box" software, which would use machine-learning methods to analyze patients' sequence data, diagnose disease, identify disease susceptibility, and predict effective therapies or treatments that would cause the fewest side effects.
Rothberg, a CMU alumnus, added in a statement that "the promise of 'doctor-in-a-box' is that by using artificial intelligence, like we've seen with IBM's 'Watson' computer, we will be able to associate the variations in the human genome with the vast amount of information we have about human health."
En route to that goal, Murphy and his colleagues plan to create a "framework that will allow us to tackle this problem one piece at a time and to do so at a scale that makes sense when all of those pieces are put together."
The team will make the initial fruits of its labor open source so that other groups can contribute their code to it, he said.
Murphy noted that open source software would also be more likely to be adopted in routine laboratories where investigators may not be able to afford expensive commercial offerings.
'A Never-Ending Learner'
During the first year of the study, the researchers will focus on identifying the genomic features associated with a single disease or patient population.
Specifically, CMU collaborators at Baylor College of Medicine's Human Genome Sequencing center and Yale University's Center for Genome Analysis will sequence whole genomes of the patients selected for the study as well as provide de-identified medical records that contain information on things like disease treatments and outcomes and the results of clinical tests.
This information will be analyzed by CMU researchers, who will use machine-learning programs to look for relationships between the genomic data and the clinical outcomes for each of the anonymous patients. Additionally, they will incorporate information from the biomedical literature about gene and protein expression and disease pathways.
Ultimately, their analysis is expected to provide models based on these personal sequences that can be used to determine patients' disease risk, predict treatment responsiveness, and select preventive therapies.
"One of the important concepts here is to design the system so that it’s a never-ending learner," Murphy explained. That means designing a system that is "continually retraining itself" as new genomic input is entered.
This way, when users download the latest version of the software, they will also be downloading the latest model that has been built, he said.
While the researchers haven’t yet selected an initial disease focus or patient cohort, Murphy expects that those details will be ironed out over the next several weeks. The group also plans to explore whether providing open source software that makes predictions about patients' health would raise regulatory concerns.
He also said that the investigators plan to release the first version of the software within a year.
The software will be trained initially to analyze sequence data produced by Ion Torrent's Personal Genome Machine. Later it will also handle data from Ion Torrent's upcoming Proton sequencer, which is slated for launch by the middle of the year.
Williams told BioInform that the two sequencers target very different markets.
PGM is a mid-throughput system that is tailored to meet the needs of researchers who are "interested in drilling into a particular set of genes" and would like to do deep sequencing and rare variant detection within that particular gene set, he said.
The Proton sequencer, on the other hand, is geared toward researchers interested in sequencing whole exomes and genomes, he said. The company is promising that by the end of the year, the Proton system will be able to sequence a whole human genome in less than a day for under $1,000.
Although the software developed under the CMU project will initially handle data generated exclusively on the Ion Torrent platform, it could be extended to handle input from other sequencing technologies as well, as those instruments continue their push into the clinical sequencing space, CMU's Murphy said.
"The key is to have software that can provide reasonable re-coverage so that you can do an entire personal genome," he explained. "That’s pretty difficult to do by other methods right now, but of course that could change."
Currently, Ion Torrent isn't planning to commercialize the algorithms produced by the CMU study, Williams said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.