One of the world’s largest professional engineering societies has entered the often-contentious world of bioinformatics standards, and hopes it can succeed in an area where numerous others have failed.
The IEEE (Institute of Electrical and Electronics Engineers) recently kicked off an effort to formalize several grassroots standards projects in the bioinformatics community. While the initiative is still in its early days, a nascent working group has proposed a broad framework for the effort that it has dubbed P 1953.
P 1953, named in honor of the year in which the structure of DNA was elucidated by Watson and Crick, “will provide a consistent reference system or database framework for maintaining and representing dynamic biological information,” according to an initial draft of the framework (see p. 7 for a list of the proposed bioinformatics sub-domains under P 1953).
Vicky Markstein, technical chair of the IEEE Computer Society’s Bioinformatics Standards Committee, told BioInform that the group plans to exploit IEEE’s formidable infrastructure to codify existing community-driven bioinformatics standards. She said that the first of these projects to fall under P 1953 will likely be the Sequence Ontology (SO), an offshoot of the well-established Gene Ontology that describes annotation terms for DNA and protein sequences.
According to Markstein, SO has already gained the support of many in the bioinformatics community through its association with GO, but is still early enough in its development to benefit from the IEEE standard-setting processes. Suzanna Lewis of the Berkeley Drosophila Genome Project, who leads the SO development effort, agreed. She said that SO’s small size relative to GO — several hundred terms as opposed to tens of thousands — should “make it easier to reach consensus.”
Lewis acknowledged that she had some concerns about the “serious bureaucratic overhead and red tape” inherent in most standards-setting bodies, but that the benefits of aligning with IEEE seem to outweigh the potential drawbacks.
“The alternative would be for us to just declare ourselves a standard, a de facto standard,” she said. “That’s all well and good, but by associating with [IEEE], it becomes an institution, so even if the original people who inspired it go away, hopefully there will be enough infrastructure left there that it can be maintained. Because that’s the big fear — that over time, if you lose the leaders, then the whole thing loses its vitality.”
One appealing aspect of the IEEE process for bioinformatics is its versioning methodology, Lewis added. “One of the things that did worry me is that, if it’s a standard, that means it’s frozen. But that’s not the case at all.” IEEE provides very clear guidelines for assuring the flexibility of standards, she said, “and that’s essential, because ... for the ontologies, they’re like language — they evolve. So they’ll be changing all the time. And sure, there’s going to be a core that’s stable, but if it were frozen, it would just be unworkable.”
Lewis described IEEE’s process as essentially formalizing the “common sense” aspects of establishing a standard, such as securing community acceptance. The organization requires that any proposed standard gain the approval of a “balanced” group of users, producers, and other interested parties prior to its being ratified. “If any one of those groups is more than 50 percent, we have to reject it at the standards board level,” said Bob Davis, chair of the IEEE’s Microprocessor Standards Committee, which currently oversees the Bioinformatics Standards Committee.
“The whole idea of the IEEE is that they are looking for the largest consensus,” Markstein said.
Yet Another Standards Effort?
IEEE first dipped its toes into the bioinformatics field several years ago when it ranked bioinformatics at the top of its list of 50 emerging technologies that fell within the purview of the association. In 2002, IEEE hosted the first Computational Systems Bioinformatics Conference, which is now an annual event held each August at Stanford University.
Markstein said the standards initiative grew out of a panel discussion that was held at the most recent CSB conference. Representatives from a number of prominent bioinformatics projects were present at that discussion, and many of them — around 50, according to Markstein — have continued their involvement as the effort has progressed, including researchers involved with GO and SO, the Protein Data Bank, BioPAX, and the National Cancer Institute.
This level of participation from the research community may help set the IEEE effort apart from previous attempts to coordinate standards development in the bioinformatics community. Groups such as the OMG and I3C found it difficult to attract a broad spectrum of public research projects, which resulted in participating bodies heavily weighted towards industry.
One criticism commonly leveled against these previous efforts was that they were biased towards commercial interests — something that Markstein said IEEE should not encounter because it is a “neutral party.” In addition, because the IEEE initiative seeks to formalize existing standards, rather than develop new ones from scratch, it may have a better chance of success than previous attempts.
“I think what’s different is that they’re trying to use the resources of IEEE to provide support for standards that the bioinformatics community is already gathering around, so things that are already sort of standards — like GO, or the PDB — things that people are using and are pretty well known,” said Sherri De Coronado of the National Cancer Institute’s Center for Bioinformatics. De Coronado said that NCI is currently “monitoring” the IEEE project. “At this point, we’re not committing to actively participating,” she said, “But I think what they’re doing is great, and I hope they’re successful.”
The IEEE Bioinformatics Standards Committee is slated to hold a conference call on Jan. 10 to hash out some particulars around submitting SO as its first standard under P 1953. If the group does decide to go ahead with that proposal, Davis said the committee will then finalize it and present it to the Microprocessor Standards Committee, which currently serves as the “nurturing” committee for the bioinformatics group. It would then be passed out to the broader community for review and balloting, and from there to the IEEE Standards Board for formal ratification.
Davis said the entire process can take from a few months to several years, depending on the complexity of the standard and the degree of consensus-building required. The Bioinformatics Standards Committee does not have a timeline in place yet for SO.
Davis said that once the bioinformatics group releases its first standard, it will likely stand on its own as a parallel organization to the microprocessor group, which is responsible for a number of widely used standards, including the “754” floating point standard used by all manufacturers of microprocessers worldwide.
Markstein is also spearheading a broader effort to create a new society under IEEE called the Life Sciences Society, which will oversee all computational and engineering aspects of life science research. IEEE currently supports 38 different societies, ranging from aerospace to vehicular technologies, including the Computer Society, which boasts 100,000 members and currently oversees the efforts of the bioinformatics technical committee. If approved, LSS will spearhead future bioinformatics standards efforts.
Researchers interested in supporting the creation of the Life Science Society under IEEE are requested to fill out a short form at http://lifesciencessociety.org/proposal_support.php.
IEEE’s P 1953 Framework and Classification for Bioinformatics Sub-Domains
- 1953.0000: Bioinformatics Structures
- 1953.1000: Nomenclature and Taxonomy (across several domains of biology)
- 1953.2000: Databases
1953.2300: Structural Proteomics
1953.2500: Gene Ontology
1953.2600: Sequence Ontology
- 1953.3000: Biological Pathways
- 1953.4000: Pharmagemonics Knowledge Base
1953.4100: Clinical Outcome
1953.4400: Molecular and Cellular Functional Assays
- 1953.5000: Drug Discovery
- 1953.6000: Medical Bioinformatics
1953.6100: Foundational Model of Anatomy
- 1953.7000: Forensic Bioinformatics
- 1953.9000: Agriculture and Plant Bioinformatics