AT A GLANCE
Prior to joining Beyond Genomics, served as vice president of life science informatics at Third Millennium.
Founder and organizer of the international BioPathways Consortium and an advisory member of the I3C.
Q What is Beyond Genomics strategy for tackling the challenges of systems biology?
A It is a very tight combination of the analog form of systems biology, which is getting all these technology platforms the proteomics technology, the gene expression microarray facilities, the metabolomics, and future technologies that will be appearing having them in place and then plugging that seamlessly into the digital systems biology part, which takes all the data from all these different pipes, mixes it together, finds intriguing evidence and patterns using sophisticated software, and mines those leads. These are not necessarily drug leads, but systematic leads to novel mechanisms and pathways. From that we do systems modeling not a simulated approach, but actually working with it at a particular level to look for additional experiments to mine the next piece and the next piece to make the story unfold with unique pieces of evidence.
QWhere does bioinformatics fit into this process?
ABioinformatics can be interpreted different ways, but typically its been applied to the analysis of sequence and structural information related to genes and proteins. In biosystematics one is faced with not just identifying a structure or trying to infer a function from that structure, one is trying to understand the process.
QWhat do you mean by biosystematics? How does it differ from bioinformatics?
ABiosystematics is applying bioinformatics and a lot of other tools that typically are not associated with bioinformatics. Those include advanced correlative technologies, powerful statistical approaches that were developing here and have moved over from other industries into our area. That combination of bioinformatics and statistical and numerical pieces, and then working at the knowledge level and the models that are associated with it really goes outside of the box of bioinformatics.
QWhat approaches are you taking to deal with the complexity of the data in systems biology?
AWere using different strategies than going out and trying to find every single peptide in each tissue and trying to identify what it is just by sequence. Were trying to look at the mechanisms and functions of these entities. This occurs over several steps. Say youre focusing on metabolomics and youre looking at a thousand or so signatures, both normal and diseased. Basically, you do statistical analysis to say, ëWhich of these things has changed. Some of the same tools that have been applied to finance and large-scale economics are applied here. And out of those thousand things you can really reduce the dimension. The complexity can be squeezed down to something more manageable, so theres a big data reduction piece.
QHow do you validate your results after this dimensional reduction?
AThats why, in addition to the analysis of all this data we include a modeling component. It balances the analytics with synthetics. If we have it right the model should be something we can test and validate. You can take the models and do predictive work and compare it to evidence that you have. Or you can analyze those systems with each other at an informatics level and come up with the experiment you need to do to be able to determine which model is the right one.
The entire analog side of systems biology needs to be coupled up front with the knowledge thats being derived from the digital side. Its a whole feedback loop.
QDo bioinformatics tools already exist to handle the variety of data youre dealing with, or are you developing most of it in house?
AWeve done a fair amount on our own, but theyre based on tools and algorithms from mathematics so theyve been around. Weve optimized them to work with mass spec data and NMR. Its not like were doing this from nothing. Theres a vast amount of knowledge in this space.
Some things will require new developments of software in conjunction with specialized hardware. We need to do something analogous to assembling shotgun sequence data with patterns and correlations. The key is that we began work on a method of going from correlations to causal mechanisms. It will eventually become a heavy compute issue.