AT A GLANCE: PhD in computational molecular biology from Edinburgh University. Joined OGS in 1999 from Glaxo Wellcome, now GlaxoSmithKline, where he was head of the advanced technology and informatics department. Enjoys cooking and listening to music.
QWhere will bioinformatics be in two years? Five years?
AI think all the action in bioinformatics over the next couple of years will be in finding which genes produce which proteins. There’s maybe between 30,000 and 60,000 genes but there may be as many as a million proteins, so the informatics efforts around that sort of thing are going to be very exciting.
I think within two years we will know most important proteins as far as the pharmaceutical industry is concerned. I would say that genomics and DNA expression chips will essentially be over because those techniques are really pointless when you have direct access to the proteins. So my prediction for bioinformatics over the next two years is that in essence it’s going to be support for proteomics.
If you go out to five years, I think all the action is going to be around building larger structures out of the information we have about proteins.
QWhat are the biggest challenges bioinformatics must overcome?
AThere’s hardly any kinetic data about substrates binding to proteins or proteins binding to each other. What’s available in the literature is also generally unusable because the data was collected under different circumstances, so I think the real challenge is actually still a biology one, which is developing techniques that can generate sufficient data.
Mathematical modeling of pathways is still a purely academic dispute because of the lack of data. It’s a bit like trying to do sequence analysis 20 years ago when there were only one million bases of DNA in the EMBL and Genbank databases and the techniques were way ahead of data. In the case of sequence analysis, it’s the other way around now, of course.
QWhich databases do you use?
AWe use public domain databases, we subscribe, and we also make our own. We supply proteomics databases to our collaborators that integrate genomics and genetics and cDNA databases into our proteomics databases.
QWhat bioinformatics software do you use?
AWe buy whenever possible, and develop when it’s not available. We have a program called PC Rosetta that we use with our customers in order to deploy proteomics data.
QHow do you integrate your data?
AWe have very active software development to integrate data and we use data warehousing, data federation, and data marts. We have relational databases with almost a billion rows in tables.
QHow large is your bioinformatics staff?
AAlmost 50. There are about 10 doing infrastructure support and about 30 doing software development. The others do a variety of things: statistics, project management, production support, database administration, things like that.
QHow is your bioinformatics unit organized within the framework of the company?
AI’m responsible for the entire IT department. Because we’re a proteomics company, IT and bioinformatics are the same thing. It’s very tightly integrated. We think that’s essential.
QWhat projects are you working on now?
AWe’ve just commissioned a first-generation high-throughput proteomics factory and we’re setting up relationships to start setting up the second generation based on ICAT and TOF-TOF and we’re working on the informatics for that.
QWhat made you decide to enter a career in bioinformatics?
AI had always been interested in computing and did a degree in biochemistry in 1977. At about that time molecular biology was really taking off and a lot of my friends had sequences that they couldn’t analyze so I started to write some simple computer programs in Basic to analyze the sequences. The rest, as they say, is history.