French bioinformatics firm Genostar has expanded an ongoing alliance with biopharmaceutical firm Merial by updating its software package Iogma to include its Metabolic Pathway Builder version 3.4.
Merial, a joint venture between pharmaceutical firms Sanofi-Aventis and Merck, is using the software to search for vaccines and treatments of bacterial infections in livestock and companion animals.
The modular Iogma software was developed to help researchers mine, analyze, and visualize prokaryotic genomic, proteomic, and metabolic information, according to Genostar, which was formed out of a public-private consortium of the same name.
The software, which lets scientists compare diverse data types such as genes, enzymes, and metabolic pathways, is the brainchild of François Rechenmann, senior researcher at Genostar partner Institut National de Recherche en Informatique et Automatique, or INRIA, the French National Institute for Research in Computer Science and Control.
The new software package updates three modules — GenoAnnot, ProteoAnnot, and PathwayExplorer — from the software’s total collection of seven modules that together help analyze high-throughput genomics, proteomics, and metabolomics data.
Iogma also comprises a database, microB, which contains information about more than 450 microbial organisms and is updated regularly.
According to Edmond Jolivet, Merial’s associate manager and co-director of genetic engineering and bacteriology, his department chose Iogma to help the drug maker better address classic, emerging, and re-emerging infectious diseases in animals.
“It is a nice way for us to make many comparisons between different bacterial genomes,” he said. The software also helps in large screening projects that require researchers to classify genes and proteins, and then to further investigate their potential for diagnostics, treatment, or vaccines, he added.
“We need to know exactly what the targets on the genome are,” he said, referring to projects in which he and his colleagues explore the causes of a given disease, the pathways involved, and the factors that determine virulence in pathogens. “If you want to make vaccines using antigens that can protect animals, [the] bioinformatics approach [is] faster than the in vivo approach,” he said.
The quest for virulence factors in pathogens is the focus of much research, said Timothy Read, microbial genomics and bioinformatics researcher at the Henry M. Jackson Foundation for the Advancement of Military Medicine, based in Rockville, Md. Finding those factors is “a combination of bioinformatics and wet-lab work,” he said.
‘Very Fast Approach’
“We can compare a virulent strain [genome] to a non-virulent strain, [and] what we really want to know is the specificity of what is putatively involved in virulence of the strain,” Jolivet said. He would not disclose what organisms or genes he studies.
Jolivet and his colleagues already use Iogma for genome annotation, gene prediction, and protein exploration, and he anticipates incorporating pathway analysis into his work with the new software.
He said he believes that data integration contributes to understanding disease causes and virulence determinants, and said logma “will help us make decisions. It is a very fast approach.”
The ability to integrate heterogeneous data has been an important part of Iogma’s development, said Genostar’s director of development Jean-François Mouret.
“We can import a lot of different types of data, including DNA, proteins, and metabolic compounds [such as] enzymes,” he said. In different market segments researchers can query and explore these data sets for various types of answers.
“It is a nice way for us to make many comparisons between different bacterial genomes.”
Buoyed by second-generation sequencing capabilities, many scientists in industry and academia are trying to use genomic data to better understand, detect, and beat bacterial virulence with new compounds or vaccines, said Mouret.
Another market for this software environment is not about vanquishing bacteria but rather harnessing them more efficiently in biofactories, “helping companies to produce or increase the yield of a compound of interest,” he said. “The way we can provide our customers with these kinds of topics is by helping them understand the regulation process of the bacteria,” said Mouret.
Genostar employs in-house programmers, developers, and bioinformaticists, said Mickey Farrance, Genostar’s communications director, who also highlighted the way Iogma’s single interface lets scientists study data from heterogeneous sources.
“Somebody who is accustomed to looking at KEGG data and accustomed to looking at Swiss-Prot data over the Internet through the browsers that these tools provide, they’re forced to change interfaces every time they change source,” said Farrance.
She said Genostar’s customers beyond Merial include Sanofi Toronto, bioMérieux, and a number of smaller companies whose names she did not disclose.
Looking Under the Hood
Genostar grew out of a public-private consortium comprising INRIA, the Pasteur Institute, Hybrigenics, and Genome Express. The Genostar venture was privatized in 2004.
“There is a law in France [that] allows a researcher to spend a quarter of his time to help a start-up company, which distributes the results of his research or collective research,” said Rechenmann, who initiated development of Iogma. Rechenmann was a member of the consortium as an INRIA scientist and remains Genostar’s scientific advisor.
Public research centers closely monitor this type of activity, he said, but he is permitted to wear the “two hats” of researcher and consultant. He describes his work on Iogma these days as “chef d’orchestre,” or orchestra director, which involves keeping the software current with scientific advances, and interpreting client needs.
Despite the French origins of the project and the fact that geographic proximity facilitates collaboration, Rechenmann said the project is nation-agnostic, or, as he put it, there is “no préference nationale,” meaning that collaborations and partnerships with researchers in any country, and not just in industry, are welcome.
The R&D team at INRIA working on logma is made up of 10 programmers and developers, some of whom are part-time employees. “It’s a huge piece of software,” Rechenmann said. A total of 30 to 40 people have contributed to the software, adding up to 60 to 80 man-years, he estimated. One of its strengths is that a scientist “doesn’t need to skip from one Web site to another or try to … glue some methods together in order to get the right pipeline of analysis,” he said.
Rechenmann headed the group that originally developed Iogma and then consulted with Genostar as software modules were integrated into a single framework.
“It was quite easy to do,” he said. “These software [modules] were developed separately but with the same philosophy.” Some modules, such as Pathway Explorer and Genetic Network Analyzer, were developed by his team at INRIA alone, while others, such as GenoAnnot and the overall architecture, were developed by Genostar consortium members.
The software is programmed in Java with a layer on top “for representing the biological entities and their relationships,” said Rechenmann. “For that we use AROM, which is an object-oriented model.” AROM, Allier Relations et Objets pour Modéliser, or Associate Relationships and Objets for Modeling, is a method of knowledge representation similar to Unified Modeling Language, or UML.
In Iogma, object-oriented programming is used to represent the data as objects. According to Rechenmann, there are “two different levels of abstraction: A lower level in which you speak of objects, that is data structures, and a higher level, which is the modeling of the biological entities and their relationships.”
Rechenmann developed AROM while working in object-oriented knowledge and data modeling before he became involved in bioinformatics. AROM “is a very powerful object-oriented modeling language and at that time, no equivalent was available so we decided to develop it and then later use it for developing with Genostar,” he said.
For example, he explained, when looking at genes located in a sequence, ‘located in’ is the connection between the object ‘gene’ and the object ‘sequence.’ “Everything in Iogma — all the data [that] we manage, manipulate, and analyze — are objects connected together through relationships,” he said.
The model, he explained, lends itself to scalability. “It is quite easy to add new classes of objects, new classes of relationships so that we can deal with new types of data, for example metabolic reactions [or] metabolic pathways,” he said. Genes, enzymes in a pathway, or metabolic reactions can all be linked and queried via their relationships. In Pathway Explorer, for example, there is a “set of viewers, of analysis methods, which work on the entities and relationships,” Rechenmann said.
A Kind of Markov
One gene-prediction tool within the software has been developed specifically for prokaryotic genomes. The method, called Prokov and found in the software’s GenoAnnot module, allows researchers to look for coding sequences.
Based on Markov models, Prokov detects sequence variations indicative of genes. “When you run the Prokov method on a bacterial genome, it detects the four thousand or so coding regions,” he said.
When comparing genomes, for example the harmless bacterium Listeria innocua and its close relative, the pathogenic Listeria monocytogenes, scientists can use Prokov to computationally hone in on the regions that may account for the difference in virulence, he explained.
“Virulence is a wide variety of different traits,” said Read from the Henry M. Jackson Foundation. But while a fairly straightforward genomic comparison may yield results for some traits, “there are some traits that are much more subtle,” he noted.
“Like with the story of the human genome, there are traits that are going to be much harder to tease out and you are going to need an awful lot of data to pull them out,” Read said.
A Database Foundation
Iogma draws on the subscription-based microB database, which contains information about more than 450 microbial organisms.
The database integrates data on genes, proteins, and biochemical compounds from GenomeReviews, UniProt/SwissProt, KEGG, Gene Ontology, ENZYME, and NCBI Taxonomy. “MicroB is an integrated set of related data,” said Rechenmann. “For one genome we can go from the genome to the metabolic data and back if necessary.”
To allow for confidentiality in searches, this database is not web-based but stored locally. “Obviously, microB has to be updated because the data sources are themselves updated,” said Rechenmann. Genostar issues an updated database to its clients every three months, he said.
The software is also updated regularly, he explained. “It is a dynamic computer system, software that evolves constantly.” Adding better methods to handle new types of data and problem sets is what he and his colleagues seek to provide, he said.
The software package is well suited for studying virulence as Merial has chosen to do, Rechenmann said. Once researchers have identified genes that might contribute to a pathogen’s virulence, they look at genomic function to determine which genes lead to enzymes catalyzing a reaction in a pathway to pathogenicity.
“We have in Iogma, some bioinformatics methods, algorithms, [that] are able to predict that such and such protein has an enzymatic activity and moreover using data from microB we can connect them: the protein to the metabolic reactions they catalyze,” he said.
Usability features have also been engineered into the package. As Merial’s Jolivet explained, three Genostar training sessions got him up and running to apply this software to problems of interest.
“I am not a bioinformaticist; I come from genetics and microbiology,” he said. “The software is easy for me to use.”