By Aaron J. Sender
Willy Valdivia Granda’s latest project is bound to tick some people off. “A lot of patents might be invalidated,” he says. The algorithm he is pumping across patent databases, he says, will unmask hundreds, if not thousands, of filed gene sequences for what they really are: pseudogenes. Upon publication within several months, says the North Dakota State University grad student, “Many companies are going to get nervous.”
He didn’t set out to pester IP attorneys to the point of paranoia — although he does harbor a soft spot for the idea of unleashing basic scientific research. Valdivia Granda and his colleagues in the plant pathology and computer science departments developed the algorithm as part of a microarray analysis tool, called Bison Array — a tribute to the school’s mascot.
At first blush, pseudogenes look much like genes. The important difference, though, is they don’t code for functional proteins. To microarray users, they amount to little more than noise. By identifying them, Valdivia Granda says, “we can minimize the effect of the noise they produce in the data analyses.”
The algorithm recognizes a complex nucleotide signature, which differentiates pseudogenes from genes. The telltale sequence is a combination of a string of five adenines and 12 different codon patterns in a certain combination and frequency.
The project began about a year and a half ago as a quest for microarray analysis software to investigate hypoxic stress in plants. “More than 95 percent of both the commercial and public microarray data analysis software used clustering, and they represented the gene expression in red and green,” Valdivia Granda says. With a strong mathematical background rooted in nuclear physics experience in his native Peru, he wanted something more statistically rigorous and more flexible.
Without Bison Array, researchers can get different results from the same data using the same software over and over, says Valdivia Granda, because much of microarray data is noise. So sometimes a correlation among genes and its connection to a particular disease may be a statistical accident, a pattern carved out of chaos. In Bison Array, due out this month, all analyses are supervised.
“Biologists participate more in the analyses,” says Valdivia Granda. A parallelized version of Blast called Bison Blast scours databases and collects information related to a spot’s sequence. The researcher can then decide what information to include in the analyses.
“We have about 20 different parameters that the algorithms use to classify microarray data,” says Valdivia Granda, including the subcellular location of the gene and the protein family of its product. Along with a gene’s expression level, Bison Array also shows the gene’s position in the chromosome and its relative expression compared with the rest of the chromosome.
Unlikely as it may seem, it’s no accident that the Peruvian landed in Fargo, ND. Friends and professors at prestigious universities tell him, “‘Just come to my lab,’” Valdivia Granda says. “And I say, ‘No, in your lab I will be just another student.’” In North Dakota, he’s a big fish in a little pond. At 29, he heads his own group and founded the Virtual Conference on Genomics and Bioinformatics, which this year attracted about 2,000 participants across nearly 50 countries.
With the Bison software package he is no less ambitious. “Since we got kind of late into the game, we had been very careful in designing our tools so when we release them, they become something like a new standard, where everybody will use them. And whoever wants to develop something else has to take into consideration our tools and improve on them.”