AT A GLANCE: PhD in biology with minors in genetics and statistics from the University of Maryland, College Park. Prior to joining Bayer, headed the genomic informatics group at the University of Maryland, where he managed the informatics area of the USDA’s Plant Genome Program. Enjoys sailing, hiking, and sports of almost any variety.
QWhere will bioinformatics be in two years? Five years?
AGiven the radical changes over the last five years it would be foolish to give any precise predictions. That said, I think there will be a couple of trends.
First, there will be refinement of existing tools, but I don’t see any major breakthroughs in at least the next couple of years. Second, the focus will be less on sequence analysis and more on functional analysis. For the pharma world, the focus is shifting from finding new genes to validation of genomic targets as drug targets.
QWhat are the biggest challenges bioinformatics must overcome?
ARight now, it’s the ability to integrate a large amount of diverse data. Biological data is notoriously “messy” to deal with, but it is imperative that we squeeze out the most knowledge possible. It’s going to be difficult to completely automate the process of connecting information in a meaningful way, so we must do a better job of presenting the data to the experts so that they can make the scientific leaps. Furthermore, we must make sure that once these leaps are made that they are captured so that others can build upon them.
QWhat hardware do you use?
AWe use SGI equipment for servers. The general user community has Windows machines that act as clients. We also have a Paracel Genematcher and Compugen hardware for accelerating sequence searching.
QWhich databases do you use?
AWe have a wide variety of data sources that we can tap into. Certainly, we have all the large public databases in house. We also generate data internally via sequencing and microarrays as well as other expression profiling technologies. We have access to large amounts of proprietary data through our collaborations with Millennium, Curagen, and Myriad. There are a number of smaller partnerships as well that are sources of proprietary data.
QWhat bioinformatics software do you use?
AWe licensed LifeTools from Incyte three years ago as a base platform. We also have all of the common tools such as GCG, Sequencher, etc. We also have access to Lion’s tools via our partnership with the Lion Bioinformatics Research Institute. We have a number of commercial tools for analyzing array data including Spotfire and GeneSpring. However, there has been a substantial amount of in-house development. Generally these efforts have been focused on building custom analysis pipelines and data visualization. Most of our work has been using Java and Perl.
QHow large is your bioinformatics staff? Is the company hiring additional bioinformatics staff?
AOur in-house staff of bioinformaticists is relatively small right now, but we are hiring additional staff. We rely on our partners to do a good portion of the work. I believe bioinformatics should be pushed as far down to the bench level as possible. It’s the scientists at the bench who are best equipped to understand the meaning of the data. Therefore, the number of people actually doing bioinformatics, at least in part, is quite large.
QWhat non-existing technology do you most wish you had?
AI wish that we had the ability to place data from expression experiments into some kind of metabolic context. We are a long way from doing a good job with this. Also, proteomics technology that allows us to get at rare proteins.
QWhat projects are you working on now?
AMuch of what we are doing is related to our collaborations with our research partners: Millennium is providing cancer targets, Curagen is providing obesity and diabetes targets, and LBRI is doing data mining with us. We also are focusing much more on target validation for in-house projects.