Q Where will bioinformatics be in two years? Five years?
A For the next two or three years, I see bioinformatics still coping with the data overload problem. After that, the exponential increase in data we now see should level off and the problems of data mining, visualization, and biological analysis will dominate our efforts.
Q What are the biggest challenges bioinformatics must overcome?
A We in bioinformatics have focused on the computational aspects of the field, while ignoring the basic problem at hand – biology. As a scientist at a pharmaceutical company, I am concerned that in years to come there will be a shortage of trained biologists who wish to tackle biological problems using informatics tools. In many ways I see it happening right now. We try to develop bioinformatics scientists through mentoring programs within the laboratories, fostering relationships with academic institutions, and training young scientists in our intern programs.
Q What hardware do you use?
A Five bioinformatics data centers are supported worldwide at Merck (Rahway, NJ, West Point, Penn., Montreal, Terlings Park, UK, and Tsukuba, Japan). New facilities will be coming soon to our new research laboratories in San Diego and Boston. Silicon Graphics Origin 2000 systems have been our major computational engines, although we are also investing heavily in Compaq Alpha systems and have installed several Compugen Biocellerator and Paracel systems for advanced homology searching capabilities.
Q Which databases do you use?
A We currently perform a daily mirror of 125 different public domain databases. In general, we do not allow scientists at Merck to query outside resources with Merck proprietary data. We also have key associations with several biotech companies, including Derwent and Proteome, which provide additional data. We also generate data at all of our research facilities at Merck in a variety of areas, including sequence, expression, and proteomics.
Q What bioinformatics software do you use? Do you use in-house developed or third party software?
A Very early on we developed an internal platform for bioinformatics application support and deployment within the laboratories. Initially, the system was X-Windows-based but has been re-engineered and is now fully Web-enabled. There are currently 12 production servers worldwide. There are over 1,000 scientists worldwide that use our systems each day. In addition, we also license several other third-party applications, including InforMax’s Vector NTI and products from Genetics Computer Group.
Q How do you integrate your data?
A We provide several different forms of the data since most commonly used applications require different input data types. For sequence data, native formats of the data are perfectly acceptable. However, to fully integrate all “informatics” data into drug discovery research, we ensure all data goes into our Oracle-based BFXdb relational database system. We have adopted an XML information-flow strategy within the group that allows us to separate information content from display and easily store data relationally.
Q How large is your bioinformatics staff? Is the organization hiring additional bioinformatics staff?
A We are always looking for new scientists to join our team here at Merck. Our group currently has 20 individuals worldwide, with a wide variety of skills, including computer science, mathematics, molecular biology, evolutionary biology, and statistics.
Q What non-existing technology do you most wish you had? What’s lacking in the bioinformatics toolbox?
A Much is yet to be done to further the integration of genome annotation, high-throughput screening, SNPs, and pathway information. We are looking into both data warehousing and database federation to see which will provide the best solution.