Skip to main content
Premium Trial:

Request an Annual Quote

Solutions: J&J and Novartis Research Groups Take Gene-Centric View of Integration

Premium

The Johnson & Johnson Pharmaceutical Research and Development group in San Diego had the usual integration problem of how to link several disparate public and private databases, with the additional burden of a cDNA chip database for 20,000 microarray experiments. J&J scientists were looking for a way to not only integrate this data so that they could easily find and retrieve new information, but wanted to add a new level of automation to the process.

With this goal in mind, J&J’s Heng Dai and a team of six other developers created an in-house system they call GeneView, which monitors public and private data sources nightly. When the system detects that a source is updated, it automatically downloads and processes it, so that the researcher’s local copy of the data is synchronized with public sources, including LocusLink, Unigene, RefSeq, HomoloGene, OMIM, the Gene Ontology, SwissProt, and InterPro, as well as proprietary and third-party sources, such as Incyte’s LifeSeq.

The team developed a gene mapping technique that Dai said overcomes a principal obstacle in data integration: discrepancies in gene identifiers between different systems. The approach cycles between three steps — a gene identifier match, a cluster-based match, and a Blast match — to map genes from any source to a central database of genes of interest with a unique J&J identifier. Users can then track a single gene across a set of linked databases with the single identifier via a web interface. GeneView “cards” provide a single page with relevant annotation information.

A similar system is under development at the Genomics Institute of the Novartis Research Foundation, with several key differences, according to developer David Block. While Dai’s team has filed for a patent on its method, Block said his group is building its integration system with open source components such as BioSQL, BioPerl, GAME, and the Apollo genome browser. The complete system, called SymGene, will also be available under an open source license once it is completed, Block said. Novartis developers are permitted to contribute to open source projects, Block noted, adding that the company “understands that it’s developing drugs, not software.”

Symgene also preserves the structure of the original data source rather than “flattening” it in the integration process, Block said. However, the end result is the same: A non-redundant set of genes mapped to chromosomes, with annotations of interest combined in a single view.

The Novartis system is still in development, but the J&J system has already successfully annotated over 50,000 unique clones in its proprietary microarray database, according to Dai, and has successfully integrated data from Affymetrix and cDNA microarrays as well as several types of microarray analysis software. Future plans include adding integration with Lion’s SRS and an annotation alerting system.

Despite the companies’ different software development paths, their parallel solutions to the same problem indicates they have much more in common than their software distribution plans may indicate.

— BT

Filed under

The Scan

New Study Highlights Role of Genetics in ADHD

Researchers report in Nature Genetics on differences in genetic architecture between ADHD affecting children versus ADHD that persists into adulthood or is diagnosed in adults.

Study Highlights Pitfall of Large Gene Panels in Clinical Genomic Analysis

An analysis in Genetics in Medicine finds that as gene panels get larger, there is an increased chance of uncovering benign candidate variants.

Single-Cell Atlas of Drosophila Embryogenesis

A new paper in Science presents a single-cell atlas of fruit fly embryonic development over time.

Phage Cocktail Holds Promise for IBD

Researchers uncovered a combination phage therapy that targets Klebsiella pneumonia strains among individuals experiencing inflammatory bowel disease flare ups, as they report in Cell.