It would be hard to beat 2000 as a major milestone year for scientific achievement and market recognition of genomics as the new frontier. With the human genome sequenced and bioinformatics companies raising big bucks in the capital markets for the first time, the sector sure had a lot to celebrate.
As we head into 2001 the future doesn’t look quite so bright. A recession looms and scientists must now work out loads of kinks in order to realize genomics’ promise. Nevertheless, industry experts are looking forward to another fun year ahead.
In conversations with BioInform, several leading bioinformaticists from business and academia gave their predictions for the upcoming year. They expect to see big headway into comparative genomics and proteomics. Plus the cost of microarrays is likely to head south, which will make high-throughput research cheaper.
Anthony Kerlavage, Celera Genomics’ senior director, product strategy:
Having the complete human genome in hand, as well as mouse and potentially pufferfish, is going to fling the doors wide open for comparative genomics. The microbial genome research community has been doing great things in this area for years. Now we have to adapt what we have learned from that experience to much more complex systems. I anticipate a major push in nomenclature and semantics, which will be key to being able to manage the information from any sort of comparative work. The more data we generate, the more important improved solutions for data management become. There will be a flurry of activity in development of comparative genomics tools at the mammalian chromosome scale. Visualization of massive and complex data sets will be increasingly necessary. I expect to see lots of nifty new Java tools for this.
This next year will also witness the arrival of proteomics on an industrial scale. Bioinformatics support for proteomics is about where genomics support was eight to 10 years ago. Several groups around the world, academic and commercial, will be putting major resources into solving the computational and data management issues for this field.
Finally, real data mining capability is going to become a necessity rather than just a gimmick. This need will push companies to get powerful but usable tools into the hands of researchers.
Lincoln Stein, associate professor at Cold Spring Harbor Laboratory:
I have cynicism and optimism raging. I predict that there will be a new era of collaboration between the human annotation databases now that the human genome is finished. There will be a broad agreement on a common set of sequence annotations among the National Center for Biotechnology Information, the European Bioinformatics Institute, Oak Ridge National Laboratory, University of California, Santa Cruz, and Incyte Genomics and Proteome. If broad agreement on standards for annotation doesn’t come about, they’ll be lynched by the scientific community just because of the great need for it. Standards will allow people to easily find a set of genes that they are interested in and see what the different annotations have to say about that gene without having to struggle to find their gene in each of the databases. Right now it’s very difficult.
I think there will be gradual improvements in the ability to predict protein structure and function from primary DNA sequence but [the finish line] for that is still a long way off. We’ve had incremental improvements every year and we’ll continue to proceed at a slow and steady pace. As more genomes are completed, our function prediction will become increasingly accurate. And I think more and more we’ll see a move away from individual model organism databases to integrated databases of genomic sequence and function.
Larry Hunter, director of the Center for Computational Pharmacology at the University of Colorado:
Problems in bioinformatics often need to be solved quickly or they become obsolete. For example, although the completion of the human genome depended on shotgun assembly algorithms, it’s clear that innovation in that area is declining in importance.
2001 is likely to be the year of gene expression array algorithms and databases. There are a lot of interesting challenges in extracting the most value from this kind of data, and a lot of academics and commercial ventures are working on the problem.
As proteomic technologies make the transition to high-throughput levels, important opportunities will arise. One area that is ripe for new informatics approaches is the combination of expression array data and proteomics to help unravel the details of signal transduction, a key question for the pharmaceutical industry.
The great game of musical chairs continues in bioinformatics employment. The new twist is that some top minds from industry, like Michael Liebman [who left Roche to go to the University of Pennsylvania’s medical school] are deciding to leave and take academic posts.
On the other hand, the inability of the University of Michigan to fill any of 20 new faculty positions in bioinformatics — including an endowed chair — illustrates the continuing crisis we face in training the next generation.
Steven Salzberg, director of bioinformatics at the Institute for Genomic Research:
We’re going to see a lot of people jumping on the human genome data and getting to work on it. It’s been appearing in pieces for several years but we’re going to have a huge influx when the human genome sequence is released by Celera. That may be a galvanizing event that will get a lot of people jumping in to see what they can discover because there’s undoubtedly a wealth of discoveries to be made there. That’s enough data to keep us all busy for a long time. I’m certainly planning to spend some time looking at it.
When there’s something this new, you expect that there’s going to be a lot of discoveries waiting for the first person to stumble over them. If you wait too long then you have to do a lot more work to make an interesting discovery. So we’ll probably have a lot of people working fast and furiously starting in February or whenever Celera’s actual release date is.
I’ve been looking at some of the human data already and it makes you realize very quickly that there’s a real need for better tools for looking at human data in the context of mouse data, for example. You come to understand a lot of this data in the context of related genomes. And those same tools would be applicable once we have other human genomes to look at the differences between two humans. There may be many people working on new tools as I speak but there’s not much you can go and download right now.
Steve Gardner, Viaken’s vice president and chief technical officer:
There will be a big shift to proteome-wide investigation of protein structure and protein function and, more specifically, the transcriptional/post-translation and other variants that make the proteome much more complex and interesting than the genome. One of my pet loves, the protein structure database, will be making a big comeback in 2001.
Gene expression analysis will begin to go mainstream with a variety of lower cost, high quality products and technologies hitting the market in the coming year.
There is likely to be continued consolidation in the bioinformatics companies. The funding marketplace for technology companies has dived after the highs of early 2000. The big spenders will find it very difficult to maintain their current burn rates without a matching revenue stream as venture capital gets more demanding. Some of the bigger players who took advantage of the biotech share price rises in 2000 may be able to pick up some of the casualties.
Michael Liebman, director of computational biology at the University of Pennsylvania Medical School’s cancer institute:
From the science side they’ll probably finally figure out how many genes there really are in the genome. They’ll find out how involved the mechanisms are that actually make all the real functional proteins. They’ll start to appreciate how complex that is, which they don’t fully appreciate yet.
In doing that they’ll start to learn that function and structure is related as everyone expected but there are a lot of subtleties that they never really fully appreciated.
They’re going to start to find out that managing clinical data is a bigger problem than expected. It’s one that they’re going to have to start to face up to to get the value out. That will require new tools, new interactions, new dealing with patient confidentiality issues.
The economic impact of the new administration on the pharmaceutical industry may force it to start to look at the bigger picture of how it’s doing things and how it could be changed because right now pharma doesn’t do it on its own.
There will always be new bioinformatics companies. The question will be whether they are doing anything new. It depends on how they confront business pressure versus competitive pressure versus trying to solve the real problems. It’s the same story that the problems are hard problems and the products that get sold are not solutions, they’re tools.
The cost of chips is going to go down a lot. The real focus isn’t going to be on doing the arrays anymore but on interpreting the data coming out of the arrays.