Golden Helix said this week that it has tapped six academic partners to help it improve its new copy number-analysis software.
The company said it has signed six “inaugural” members for the effort so far: the University of California, Los Angeles; the Zucker Hillside Hospital; Montreal Heart Institute; Emory University; Case Western Reserve University; and University at Lübeck in Germany. It is in “negotiation and discussion” with 20 additional institutions, according to CEO Christophe Lambert.
Golden Helix formed the network to provide user feedback based on its CNAM software, which stands for Copy Number Analysis Module. The company initially released CNAM in November 2007 as an application for its SNP & Variation Suite.
Asked whether the company is using the group to trial run its software, Lambert said no.
“Our software is addressing some of the particular challenges that we may not have accounted for [as we] help the customers with really large data sets,” Lambert said. “The goal was really to help them be successful and trouble shoot any issues; it’s not so we can beta test.”
He said that the company hopes to help people unfamiliar with copy number studies, adding that for some participants whole genome copy number association “wasn’t something they had done before but they needed to understand what are the steps I need to do to get from my data to get to some answer.”
“[A] lot of people have been very excited about the possibility of whole genome copy-number association but didn’t know exactly how to do it or ran into technical challenges,” he said. “Like the size of the data sets, batch effects, population stratification, signal extraction — finding the actual regions of copy number variation’ for example.
Lambert said the company’s algorithm can simultaneously find copy-number regions and look at “thousands of samples” to find regions that are shared across multiple samples. For the purposes of the collaborative project, the shop will analyze data from the Illumina and Affymetrix genotyping platforms.
The effort sprang from a webcast Golden Helix hosted last month to highlight the CNAM software. The company said that around 200 people attended the webcast and around 20 institutions submitted proposals to participate in the collaboration.
Berit Kerner, assistant research geneticist in the Department of Psychiatry and Biobehavioral Sciences at UCLA’s Center for Neurobehavioral Genetics, told BioInform that the university joined the project to help it study the link between copy-number variations and various psychiatric disorders.
“[I]n psychiatric or neurological disorders — and probably, in particular, bipolar disorder — there is some evidence that copy-number variations probably have some applications,” Kerner told BioInform.
She said that SNP-based studies are limited, which is why the group is looking at copy number-variation studies.
“Every investigator is looking for new ways to find cause-and-effect relationships between genetic information and disease,” said Kerner. “SNPs were one angle; copy number association is yet another sort of way to look at the data that might find additional cause-effect” relationships.
“I think for many people [for whom] we provided both our software and our support, there hasn’t been a way to handle these really large studies,” said Lambert. “A lot of these collaborations [study a] minimum [of] thousands of samples. … And there hasn’t been anyone who can handle this until us.”.
“We are helping them catch the first fish together, and then [we say] here’s the fishing rod, and the instructions to use it for all the other ones to come.”
While a number of other software vendors, including Agilent, BioDiscovery, and SAS subsidiary JMP, sell software for copy-number analysis, Golden Helix claims that CNAM is the first software package to analyze copy-number variation for whole-genome-association studies.
CNAM is based on a proprietary optimal segmenting algorithm that the company said outperforms competing methods such as circular binary segmenting and hidden Markov models in terms of computational speed, false discovery rates, and sensitivity.
On its website, Golden Helix cites two studies — one by Weil Lai and colleagues at Harvard and another by Hanni Willenbrock and colleagues at the Technical University of Denmark — that found that HMMs were fast but “performed poorly” and had high false discovery rates and low sensitivity. On the other hand it said that CBS is much more effective but “not computationally efficient for whole genome analysis.”
Golden Helix said that it has adapted the same optimal segmenting algorithm it uses in its HelixTree genetic-association analysis software to meet the demands of CNV analysis. The algorithm, based on the work of Douglas Hawkins at the University of Minnesota’s School of Statistics, uses dynamic programming to search through “all possible change-points in data to find the optimal segmentation without succumbing to the inherent combinatorial explosion.”
In validation tests using the same simulated data used in the Willenbrock study, CNAM provided comparable results to the CBS approach, but in much less time, the company said: around five minutes per sample for Affymetrix 500K data and six minutes for Illumina 550K, as opposed to 20 minutes and 45 minutes per sample, respectively, using CBS.
The partnership program will move the software beyond simulated data and “help us refine our methods and software and ensure they are used optimally for real-world research, while furthering the studies of our collaborators and advancing the copy number analysis field,” Lambert said in a statement.
He said the timeline for the collaborative project depends on the schedules of the research groups involved.
“Each study is starting at different times, often based on availability of data – so we are staggering the project over the course of months,” Lambert said. He said that CNAM has already identified “statistically significant associations that were very interesting regions to the investigator, where they fell within known genes.”
In a statement, Lambert said the program will “combine expertise in informatics and statistics with our collaborators’ in-depth knowledge about the biology of their diseases under study.”
“By the end of these collaborations, what we … expect is investigators will be able to do this [type of copy number evaluation] on additional studies,” Lambert said. “Another [part of the process then] is a set of steps so these investigators can do additional studies on their own.”