Computer scientists and biologists tried to find common ground last week at the “Computational Challenges in the Post-Genomic Age II” meeting. Making it happen was a bit of a challenge in itself — it was rescheduled from September 13-15 last year — but despite the rivaling BioIT World conference and expo in Boston (see p. 4), about 160 participants from academia and industry found their way to Durham, NC, to exchange ideas.
The conference started with an afternoon meeting of Sun’s Computational Biology Special Interest Group, during which Stefan Unger, business development manager for computational biology, mentioned that another Sun “Center of Excellence” would shortly be announced. Though the company has not made public the participants, Molly Broad, president of the University of North Carolina, hinted that UNC would be among them in her talk.
One of the more persistent problems in the field, voiced by several speakers, is a lack of communication — life scientists and informaticists speak different languages, software applications do not use the same standards, and different disciplines produce diverse types of data that cannot be integrated easily. Timo Hannay from the Nature Publishing Group exemplified the lack of understanding between the two communities, citing a bug report to a computer scientist sent by a Nobel Prize winner with a biology background, which simply stated, “It doesn’t work.”
Hannay’s company, in collaboration with the Alliance for Cellular Signaling, a consortium of US cell biology groups, is currently exploring new ways of more data-centric scientific publishing, creating a web-based “signaling gateway” that will contain peer-reviewed, regularly updated, and machine-readable information about 1,000 major cell signaling proteins. The site, which will be free, should be fully operating by the end of the year.
Hannay’s gloomy view on the division of cultures gave rise to some hope when Lincoln Stein from Cold Spring Harbor Laboratory took a handcount of who in the audience considered themselves computer geeks — about one third — and who experimental lab scientists — very few. Many attendants, it seemed, did not think they belonged to either faction.
In his talk, Stein described the “self-stocking supermarket” or the Distributed Sequence Annotation System (DAS), an XML-based protocol for exchanging genome annotations — without the need for a “stocker” or curator — currently used by WormBase, Ensembl, the Institute for Genomic Research and others. “It’s taking off,” he said, and DAS could be extended to other types of databases in the future.
Rather than addressing a single type of data, Scott Lett from Physiome Sciences talked about the difficult task of integrating complex data from different levels of biology — ranging from molecules to cells, tissues, organs, and organisms — and from various technologies, all in order to build models. Computers not only help to tame the data in this process, he said, but also provide several representations of the problems for different experts.
Besides the challenge of data integration, another recurrent theme was the familiar problem of collecting, storing, organizing, and analyzing growing amounts of data. Phil Andrews from the University of Michigan discussed the Michigan Proteome Consortium’s efforts to build a proteomics data infrastructure — just four mass specs in his lab churn out 7.5 terabytes of data every year, he said, noting that twenty percent of the consortium’s budget goes to the informatics group.
William Walster from Sun, who presented during the SIG session, raised some attention with his persuasive presentation of interval arithmetic as a potential solution for some nonlinear problems in biology. Interval algorithms — long known but rarely used for genomics or proteomics problems so far — can solve some otherwise intractable problems by narrowing the solution down to an interval, he said.
Walster invited members of the audience to send in suitable biological problems — as long as they are described in mathematical terms — and pointed to his web site for further information (www.sun. com/forte/info/features/intervals.html).
One of the few speakers pointing to solutions coming out of the sea of genomic data was Charles Perou from the University of North Carolina at Chapel Hill. He presented microarray gene expression studies of breast cancer tumors that resulted in their classification into subtypes that correlated with the disease outcome and could add to existing tumor classification markers.
Based on these results, Perou and his colleagues are currently developing a PCR-based clinical test.