Skip to main content
Premium Trial:

Request an Annual Quote

Study Says XML Presents More Obstacles than OpportunitiesforBioinformaticsVendors


The XML format may be gaining widespread acceptance within the pharmaceutical, biotechnology, and genomic sectors, but vendors thinking about using the technology face a number of challenges, according to Emmett Power, CEO of research and consulting firm Silico Research.

In a recent survey of pharmaceutical companies, biotechnology companies, genomics companies, and bioinformatics solutions vendors, Silico Research concluded that 88 percent of the user community had adopted some form of XML while only 63 percent of the vendor community is using it, indicating that users are adopting the technology at a much faster rate than vendors.

Power told BioInform that there are a number of factors that contribute to this discrepancy. The cost of entry for users is very low and XML is very easy to learn, so many companies have found XML to be an easy, low-cost way to integrate diverse data sources within the enterprise.

“XML enables you to create a fairly simple central standard for labeling data and linking that to your databases and it allows you to integrate it all fairly cheaply. So there’s quite a lot of incentive for the user community to get on board the technology and not a lot of inhibitors in terms of cost of entry and skills hurdles,” he said.

However, the very advantages that XML offers the user community represent challenges to bioinformatics vendors, Power said. Because the language is so easy to learn, almost every lab has developed its own document type definition (DTD) for in-house use. Power estimates that there are at least 50 XML DTDs currently in use, and the number may be as high as 500.

Each DTD is semantically different, ultimately rendering XML incapable of integrating disparate data unless two users are using the same DTD.

Thus, vendors are faced with a tough choice, according to Power: “either map their DTD to the user’s DTD or convince the user to use their DTD.” LabBook’s recent release of its BSML (bioinformatics sequence markup language) DTD and viewer falls into the second category. “Obviously what LabBook wants to do is persuade everyone to use their DTD,” Power said.

“On the other hand, all the other vendors are writing their own DTDs,” Power added. And other DTDs, such as BIOML (biopolymer markup language) and GAME (genome annotation markup elements) are also relatively popular (see chart).

Despite the ever-growing proliferation of XML DTDs, Power said that the best approach for vendors is to write their own DTDs rather than tailor their products to match an existing one “because there’s no guarantee that lots of people are going to be using the other DTDs.” Users can modify in-house DTDs to interpret disparate vendor-supplied DTDs, he said.

Power said that the biopharmaceutical sector would be resistant to setting standards in this area. Unlike other industries that have complex supply chains and a clear incentive to agree to data standards, large pharmaceuticals bring in very few supplies from outside. “The value is generated in house, so there’s not much of an advantage in sitting down and agreeing to common standards,” he said.

David States, associate professor in the department of genetics at Washington University at St. Louis and a proponent of XML for bioinformatics applications, agreed with Power’s assessment of the difficulty vendors face. “In a commercial setting where a ‘top down’ solution can be imposed, XML has been very useful and successful, but the biological tradition of multiple ‘bottom up’ solutions does not mesh well with XML,” States said.

“It’s not obvious that standards are going to emerge and, if there are standards, which of them will emerge,” Power said. He added that “the marketplace is rapidly evolving in a number of directions,” and that it is still too early to tell whether a single XML DTD will emerge as the clear winner among users.

The full report, XML in the Pharmaceutical Sector, is available from Silico Research.


Filed under

The Scan

Missed Early Cases

A retrospective analysis of blood samples suggests early SARS-CoV-2 infections may have been missed in the US, the New York Times reports.

Limited Journal Editor Diversity

A survey finds low diversity among scientific and medical journal editors, according to The Scientist.

How Much of a Threat?

Science writes that need for a provision aimed at shoring up genomic data security within a new US bill is being questioned.

PNAS Papers on Historic Helicobacter Spread, Brain Development, C. difficile RNAs

In PNAS this week: Helicobacter genetic diversity gives insight into human migrations, gene expression patterns of brain development, and more.