Skip to main content
Premium Trial:

Request an Annual Quote

NCI Wants to Share its Core Infrastructure Tools with the Bioinformatics World


The National Cancer Institute Center for Bioinformatics wants you to use its data — along with its bioinformatics tools, middleware, ontologies, vocabularies, and other resources. With last month’s 1.0 release of its caBIO set of APIs, the NCICB effectively dropped a welcome mat in front of the already-open door to its bioinformatics infrastructure toolkit.

CaBIO (Cancer Bioinformatics Infrastructure Objects) serves as the primary programming interface to a broader bioinformatics platform that the NCICB has been developing for over four years, called caCORE. While a Java-based beta version of caBIO has been available since October of last year, the 1.0 release offers a robust set of three APIs that bioinformatics programmers of varying skill levels can use to suit their needs, said Peter Covitz, director of the NCI’s bioinformatics core infrastructure.

caBIO acts as an abstraction layer that developers can use to retrieve data “in a programmatic way“ from the NCICB’s Cancer Genome Anatomy Project and Genetic Annotation Initiative, as well as 14 other sources including GenBank, Unigene, Homologene, LocusLink, Ensembl, RefSeq, BioCarta, GoldenPath, and DAS servers. This approach, which offers the choice of a J2EE, SOAP, or HTTP API, “makes bioinformatics developers extremely happy,“ according to Covitz, because they can easily plug their own programs into the different data sources. The result is a degree of flexibility that far surpasses data resources from the NCBI and other data providers who offer only a single web interface to access their data, he said. The NCICB aggregates these different data sources into a single database hosted at the NCI, and supports public access through the three APIs.

Covitz noted, however, that the caBIO data sources are not yet portable for in-house installation.

Over 40 caBIO objects in the 1.0 release represent key bioinformatics entities, such as genes, chromosomes, sequences, agents, trials, and ontologies. Developers can use the APIs to obtain information on specific objects, such as sequences affiliated with a specific gene, or related groups of objects, such as genes and proteins associated with a cellular pathway.

In addition to caBIO, the caCORE infrastructure encompasses a set of controlled vocabularies for cancer research called Enterprise Vocabulary Services (EVS) and a set of common data elements for clinical cancer research stored in the Cancer Data Standards Repository (caDSR). Covitz noted that caDSR metadata does not describe clinical trials data itself, but rather the terms used in the forms patients must fill out when enrolling in the trials. The caDSR database was migrated to a new production server in July, and more sophisticated user interfaces and tools are planned for future releases.

The caBIO interfaces are available through the NCICB’s public servers, and the underlying software is available for use at local sites. CaBIO 1.0 is released under a “homebrew“ open source license from the NCI and SAIC, Covitz said, which permits redistribution and incorporation into commercial products, but prohibits users from adding the software to third-party tools and reselling the package as a new product.

Covitz said the NCICB welcomes contributions to caBIO from the broader bioinformatics community, and would cooperate with commercial entities interested in releasing a commercial version of the software. Covitz said the NCICB is also working on developing a flexible data wrapper object in the object model that will allow users to bring up their own data in the caBIO environment.

More information on caBIO, along with full technical documentation, is available at

— BT

Filed under

The Scan

Ancient Greek Army Ancestry Highlights Mercenary Role in Historical Migrations

By profiling genomic patterns in 5th century samples from in and around Himera, researchers saw diverse ancestry in Greek army representatives in the region, as they report in PNAS.

Estonian Biobank Team Digs into Results Return Strategies, Experiences

Researchers in the European Journal of Human Genetics outline a procedure developed for individual return of results for the population biobank, along with participant experiences conveyed in survey data.

Rare Recessive Disease Insights Found in Individual Genomes

Researchers predict in Genome Medicine cross-population deletions and autosomal recessive disease impacts by analyzing recurrent nonallelic homologous recombination-related deletions.

Genetic Tests Lead to Potential Prognostic Variants in Dutch Children With Dilated Cardiomyopathy

Researchers in Circulation: Genomic and Precision Medicine found that the presence of pathogenic or likely pathogenic variants was linked to increased risk of death and poorer outcomes in children with pediatric dilated cardiomyopathy.