Skip to main content
Premium Trial:

Request an Annual Quote

Could XML and Oracle’s 9i Help Enforce “Bioinformatics Business Rules?”

Premium

Richard Casey is a project manager in the IT division of Agilent Technologies in Fort Collins, Colo. He manages enterprise software and database development projects. His background is in life sciences and information technology. He can be reached at [email protected]

 

One of the more daunting challenges for genomics and proteomics researchers is integrating and sharing information among the hundreds of databases, applications, laboratory information systems, gene arrays, and myriad other sources of bioinformatic data. Any methods or systems that ease the burden of integrating data from these many sources are welcome.

For the past several years, the Extended Markup Language (XML) has evolved to allow individuals and organizations to share XML-based data and documents over the Internet. Because of its ability for sharing information in a standardized way, XML has gained wide acceptance in the bioinformatics community as a method of storing and exchanging gene expression, proteomic, and annotation data. Some well-known XML-based methods that researchers use to exchange such data include the Gene Expression Markup Language (GEML), Bioinformatic Sequence Markup Language (BSML), and Genome Annotation Markup Elements (GAME). In addition, a web of public and private databases and applications support genomic-proteomic research, including the Protein Database, SWISS-PROT, GenBank, BLAST, and FASTA. Many of these databases and applications support XML for importing, exporting, exchanging, and storing bioinformatic data.

Oracle 9i, the newest database from Oracle, supports XML in a way that could dramatically improve the exchange of bioinformatic data between individuals and organizations.

Technically, version 9i supports a new datatype called XMLType. What this means is that XML data can be treated like any other native datatype (i.e. character or numeric data) in the database. Entire XML documents, and sets of documents, can be stored directly in tables in 9i databases.

Table columns in turn can be defined such that they hold XMLType data, and each row or record in the table can hold an entire XML document. Once stored in 9i tables, a full set of standard, built-in SQL functions can be used to insert, update, delete, extract, and query XML data and documents, just like any other datatype.

Because XML is treated as a native datatype, developers and software engineers can develop database queries using simple, standard SQL calls with which they are already familiar. They do not need to learn a new programming language to access the data. Furthermore, if a large amount of XML data is stored in the database, indexes and other standard performance-enhancing methods can be employed to speed up queries and perform data management functions.

This is an important factor in database design considering the large amount of genomic and proteomic data being created today. Also, queries can be run against XML documents such that only specific sections or subsets of the document are searched and retrieved, thus allowing for powerful data manipulation capabilities.

Operational data stores (ODS), sometimes called data integration hubs, are databases that collect, transform, and integrate data from a variety of sources and send it to data warehouses, decision support systems, and reporting tools. Bioinformatic data hubs could be built to integrate XML data derived from various source systems and deliver it to bioinformatic warehouses. In the ODS, developers could enforce “bioinformatic business rules” to ensure that only correctly integrated and properly transformed bioinformatic data winds up in the data warehouse. By acting as data integration hubs operating on standardized XML data, the data stores could perform an invaluable, integrative service for the bioinformatic community.

 

Opposite Strand is a forum for readers to express opinions and ideas about trends and issues in genomics. Submissions should be kept to 550 words and may be submitted to [email protected]

The Scan

Not as High as Hoped

The Associated Press says initial results from a trial of CureVac's SARS-CoV-2 vaccine suggests low effectiveness in preventing COVID-19.

Finding Freshwater DNA

A new research project plans to use eDNA sampling to analyze freshwater rivers across the world, the Guardian reports.

Rise in Payments

Kaiser Health News investigates the rise of payments made by medical device companies to surgeons that could be in violation of anti-kickback laws.

Nature Papers Present Ginkgo Biloba Genome Assembly, Collection of Polygenic Indexes, More

In Nature this week: a nearly complete Ginkgo biloba genome assembly, polygenic indexes for dozens of phenotypes, and more.