Skip to main content
Premium Trial:

Request an Annual Quote

Solutions: J&J and Novartis Research Groups Take Gene-Centric View of Integration


The Johnson & Johnson Pharmaceutical Research and Development group in San Diego had the usual integration problem of how to link several disparate public and private databases, with the additional burden of a cDNA chip database for 20,000 microarray experiments. J&J scientists were looking for a way to not only integrate this data so that they could easily find and retrieve new information, but wanted to add a new level of automation to the process.

With this goal in mind, J&J’s Heng Dai and a team of six other developers created an in-house system they call GeneView, which monitors public and private data sources nightly. When the system detects that a source is updated, it automatically downloads and processes it, so that the researcher’s local copy of the data is synchronized with public sources, including LocusLink, Unigene, RefSeq, HomoloGene, OMIM, the Gene Ontology, SwissProt, and InterPro, as well as proprietary and third-party sources, such as Incyte’s LifeSeq.

The team developed a gene mapping technique that Dai said overcomes a principal obstacle in data integration: discrepancies in gene identifiers between different systems. The approach cycles between three steps — a gene identifier match, a cluster-based match, and a Blast match — to map genes from any source to a central database of genes of interest with a unique J&J identifier. Users can then track a single gene across a set of linked databases with the single identifier via a web interface. GeneView “cards” provide a single page with relevant annotation information.

A similar system is under development at the Genomics Institute of the Novartis Research Foundation, with several key differences, according to developer David Block. While Dai’s team has filed for a patent on its method, Block said his group is building its integration system with open source components such as BioSQL, BioPerl, GAME, and the Apollo genome browser. The complete system, called SymGene, will also be available under an open source license once it is completed, Block said. Novartis developers are permitted to contribute to open source projects, Block noted, adding that the company “understands that it’s developing drugs, not software.”

Symgene also preserves the structure of the original data source rather than “flattening” it in the integration process, Block said. However, the end result is the same: A non-redundant set of genes mapped to chromosomes, with annotations of interest combined in a single view.

The Novartis system is still in development, but the J&J system has already successfully annotated over 50,000 unique clones in its proprietary microarray database, according to Dai, and has successfully integrated data from Affymetrix and cDNA microarrays as well as several types of microarray analysis software. Future plans include adding integration with Lion’s SRS and an annotation alerting system.

Despite the companies’ different software development paths, their parallel solutions to the same problem indicates they have much more in common than their software distribution plans may indicate.

— BT

Filed under

The Scan

Dropped Charges

The US Justice Department has dropped visa fraud charges against five Chinese researchers, according to the Wall Street Journal.

More Kids

The Associated Press says Moderna is expanding its SARS-CoV-2 vaccine study to included additional children and may include even younger children.

PNAS Papers on Rat Clues to Human Migration, Thyroid Cancer, PolyG-DS

In PNAS this week: ancient rat genome analysis gives hints to human migrations, WDR77 gene mutations in thyroid cancer, and more.

Purnell Choppin Dies

Purnell Choppin, a virologist who led the Howard Hughes Medical Institute, has died at 91, according to the Washington Post.