French plant biotechnology firm Biogemma has made recent investments in IT infrastructure in a bid to better manage and analyze large quantities of genomic data associated with agricultural research.
Like the rest of the genomics field, agricultural genomics struggles with data-related bottlenecks that are further complicated by the size and complexity of plant genomes.
In an interview with BioInform, Oliver Dugas, upstream genomics coordinator and bioinformatics manager at Biogemma, echoed the frustration that’s voiced quite often by animal and human genomics researchers: while there are plenty of systems available to churn out huge quantities of data, there aren’t sufficient analysis platforms to make sense of it.
However, he said, the challenge is in some ways more "strategic" than it is "technological," because rather than a shortage of software, the field lacks analysis workflows that provide researchers with a "relevant and precise view of plants' genomic content."
To handle its computational and data-mapping needs, late last year, Biogemma tapped GenomeQuest and SGI to install a genome analysis platform at its headquarters in Clermont-Ferrand, France. The platform implements GenomeQuest's RNA-seq analysis, multi-genome analysis, de novo assembly, and polymorphism discovery workflows on SGI's Altix UV 1000 supercomputer (BI 12/24/2010). It also includes a genome read mapping tool called GASSST, which was developed by the French National Center for Scientific Research, or INRIA, and licensed by GenomeQuest in January (BI 01/21/2011).
More recently, after using Tibco Spotfire's DecisionSite platform for six years, Biogemma purchased version 3.1 of the Spotfire platform last month, to "provide a broader view" of its data.
It's easy and relatively inexpensive for ag-bio companies to resequence 10 maize genomes, for example, but "the bottleneck is the analysis," Dugas said — "to be able to manage the data ... do the analysis ... and from the analysis set up an analysis strategy."
A second challenge for the firm is that among the species it studies — maize, wheat, sunflower, and oilseed rape — only maize has a complete reference genome. Dugas noted that since the reference was published about two years ago, it has "completely changed the way" Biogemma has studied the crop, allowing it, for example, to identify genes much more quickly.
However, while reference genomes are important, Dugas cautioned against placing too much faith in them since there are often genetic forces at work that can significantly alter plant behavior.
For example, Dugas said some recent publications have show that certain disease-resistance genes found in genotypes of maize growing in China are absent in the genotypes of maize grown in Europe and the US.
Bigemma is using the Spotfire platform to process genetic maps and to analyze the expression of tens of thousands of genes at a time.
"After the computing and the mapping, we still have [a lot of] data," Dugas said. "Writing statistical tests to do the analyses ... takes too much time and doesn’t give us a graphical view, which is very important."
He continued "When we [write] a script... we ask a precise question ... and we [get an] answer, but we can't see if there are any other phenomenon."
Biogemma will use the platform to centralize, integrate, and format data from outside projects, collaborators, and public resources.
Prior to purchasing Spotfire's Platform, Biogemma used Tibco's DecisionSite software for six years to perform gene expression analysis.
"The new version is more flexible and allows us to integrate more data and to be able to have this broad view of our data," Dugas said.
The new platform includes DecisionSite as well as a newly developed web player, which Ben McGraw, director of life sciences industry solutions at Spotfire, said lets users visualize data stored on the Spotfire server. Users can visualize data as tree maps, cross tables, and network graphs, for example.
Furthermore, 3.1 incorporates tools that let users link data from both public and private sources as well as import data from these sources and display them on a single screen.
The platform also provides several statistical modules, including the S-plus statistical modeling engine and R statistical scripts, which let users manipulate data positioned on genetic maps.
A Research Focus
Biogemma was formed to merge the research activities of French ag-bio firms Limagrain, RAGT, and Euralis.
The firm's research and development efforts are focused on yield improvement, biotic and abiotic stress resistance, and specialty grain compounds.
Biogemma provides genetic information to its ag-bio partners who in turn use the data to create genetically modified plant products.
Their research entails identifying the genes that encourage plant growth adapted to various growing locations based on climate constraints such as cold and drought; biological constraints such as the presence of insects, fungi, and parasites; and environmental constraints such as reduced fertilizer and water.
More precisely, Dugas said, Biogemma performs transgenetic research for maize and wheat to provide its clients with disease-resistance genes, for example. For rapeseed and canola, the company identifies genetic markers that are linked to traits of interest.
Biogemma does some sequencing on its in-house Roche 454 sequencer and outsources the rest to an Illumina service provider that Dugas declined to name. Biogemma generates close to 1 Terabyte of data every 2 months
In addition to commercial software packages, Biogemma has also developed some software of its own, including a tool that uses expressed sequence tags to reconstruct the complete structure of genes that haven't been sequenced, as well as tools for biomarker discovery.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.