By Meredith W. Salisbury
In the past five years, columns and authors have come and gone — but the constant throughout has been GT’s IT and informatics review written by a computational biologist from the front lines. How well have these ideas and predictions stood the test of time? In this retrospective, we take clips from the informatics column from our September issues over the last five years, and revisit the theme to see what progress has been made in the intervening time. Informatics Insider, now written by a rotating staff of experts including Fran Lewitter, George Bell, and Ron Beavis, is our updated version of the IT Guy column written for four years by Nat Goodman, now a scientist at the Institute for Systems Biology. The following excerpts are all from Nat’s columns.
2000: Sequence Analysis Sites
Then, Nat wrote:
This summer, while the world welcomed the human genome sequence, I paid visits to some of the new commercial bioinformatics websites that are meant to help you analyze it. The idea behind each of these online products is to offer sequence analysis services on the Web just as public sites have done for years.
The main weakness of all four sites is that they are firmly rooted in pre-genomic thinking. They are aimed at individual scientists who need to conduct de novo functional analyses of one or a few novel sequences.
Had these sites been established five or even two years ago, they would have been exciting. But as we enter the post-genomic era, their relevance is decaying by the day. None is likely to be a major player in the post-genomic era, unless they evolve rapidly.
Of the four sites Nat tested — BioNavigator from eBioinformatics; GeneScape from CuraGen; DoubleTwist; and LabOnWeb by Compugen — one is defunct, another was wrapped up into a different offering, and the others have added significantly to functionality over the years. The firm then known as eBioinformatics merged several months later with Empatheon to form Entigen, and the combined entity eventually went out of business. Biosift acquired the IP and rolled the BioNavigator tool into a product called Radia. DoubleTwist couldn’t make a go of it and finally closed its doors in 2002. CuraGen still includes GeneScape on its website, and the tools there include gene- and sequence-calling programs; a function to find SNPs has been added, and the company says its tools can also be used in pharmacogenomics and toxicogenomics partnerships.
2001: Biofx Graduate Programs
Then, Nat wrote:
I’m going to review the curricula of several bioinformatics graduate programs and then ask how well these programs train students for the real world.
A key issue is to allocate the course time between specific bioinformatics content and foundational material from outside. If we spend too much time on current bioinformatics techniques, we run the risk of turning out students who are experts on obsolete methods. But if we spend too little time, our students won’t be able to do anything useful without yet more training.
Nat evaluated several programs — including ones at the University of Pennsylvania; the Universities of California at Los Angeles, Santa Cruz, and San Diego; Georgia Tech; George Mason University; and the University of Washington — in 2001, and today those represent just a fraction of the degree offerings for bioinformatics. According to an annual degree-program survey published by GT’s sister publication BioInform, at least 74 US universities had a BS-, MS-, or PhD-level degree program in the field, graduating more than 300 students that year — up from 32 graduates of such programs in 2000. UC Santa Cruz and the University of Pennsylvania, two of the original schools Nat discussed, are among a handful offering all three levels of degree in bioinformatics. In its 2004 survey, BioInform reported that more than 1,900 students were currently enrolled in undergraduate and graduate bioinformatics programs.
2002: Online Courses
Then, Nat wrote:
In this article, I’ll flip the pages of the premier academic online course, the offerings of a leading commercial vendor, and online lecture notes from several flesh-and-blood courses. There are plenty of choices. The challenge is to find the courseware that fits your background, learning style, and budget.
Bioinformatics courseware is within everyone’s reach. The open-source S-Star is a great place to begin. For those with more money than time, GeneEd is a great next stop. For those with enough time for a full academic program, the Stanford Center for Professional Development and the Bioinformatics Institute of India sound intriguing, though I haven’t seen the actual courseware of either.
To choose among the other online material, you’ll need to spend an hour or two surfing the stacks.
It seems that since Nat wrote in 2002, there’s been a major proliferation of online course offerings for the bioinformatics field. An informal survey indicates that there are at least 30 such programs available to the curious Web surfer. Of the major programs Nat tested, both S-Star and GeneEd are still being offered and expanded upon.
2003: Bad Job Market
Then, Nat wrote:
The bioinformatics job market is bleak. I hear regularly from old friends who’ve lost their jobs and are having trouble finding new ones, and from new people struggling to break into the field.
One headhunter said flatly that he had seen no new openings in the past six months. Another said that the market was so bad, he’d essentially shut down his bioinformatics practice. The most optimistic words I heard were: yes, the market’s been bad, but it’s picking up slowly.
The model of hiring great software people and teaching them biology hasn’t worked very well. The opposite approach of teaching biologists to write software also hasn’t worked very well, but has emerged as the strategy of choice.
While reports in 2003 saw a steady decline in bioinformatics jobs since the heady days of 2000, these days the market appears to be picking up, according to recruiters and headhunters in the field. The number of job postings on sites like bioplanet.com has increased from the low hundreds to closer to 1,000, and the major genome centers say they currently have several job openings for bioinformatics positions.
Then, Nat wrote:
I cringed when I heard the announcement of the H-Invitational Database: a new “human gene database” featuring “integrative annotation” of human genes … blah, blah, blah. Just what the world needs: another human gene database!
H-InvDB was produced by a large consortium (the paper has 158 authors from 68 institutions) who analyzed more than 40,000 full-length cDNA sequences from seven large sequencing projects.
While hardly a unique resource, H-InvDB is a good addition to the roster of integrated gene databases. Its incomplete coverage will not be a problem if you use it to supplement other, more comprehensive resources, such as NCBI Entrez Gene or the databases listed in the box.
H-InvDB still exists, and its contents have been wrapped into other libraries in the time since Nat wrote. Late last year, Gene-IT launched GenomeQuest GeneRef Max, a data source containing more than 266,000 transcripts from major public repositories, including H-InvDB. And early this year, scientists from the Japan Biological Information Research Center published a paper introducing the Human Anatomic Gene Expression Library, a resource for distribution and expression of gene transcripts that was built using H-InvDB and RefSeq data.