AT A GLANCE: Holds a PhD in cell and molecular biology from the University of California, San Francisco. Did postdoctoral studies at the European Molecular Biology Laboratory and Genentech. Interests include volleyball, Baroque music, and skiing.
Q Where will bioinformatics be in two years? Five years?
AHigh-throughput experimental methods are evolving rapidly in many areas of biology, including differential gene expression, cell-based assays, and proteomics, among others. The integration of these scientific disciplines will require bioinformatic processing and this trend will continue beyond the next five years.
Bioinformatic scientists will compile and filter this information to elucidate networks of gene and protein interactions and cellular mechanisms of disease. Computer simulated systems will become useful over the next 10 years: simulated metabolism, ADME (absorption, distribution, metabolism, and excretion), simulated cells and cell networks, and perhaps computer simulations of humans.
Q What are the biggest challenges bioinformatics must overcome?
A The biggest challenges at present are determining gene functions and interactions. Currently, sequence searching methods, including Blast, motif matching, and various sequence-based algorithms predict function for approximately half of the known genes.
At Hyseq, we are also assigning function to additional groups of genes using high-throughput structure searching methods, that is, three-dimension structural modeling of amino acid sequences corresponding to known gene sequences. Evolutionarily, protein structure and function are more conserved than sequence, and therefore protein structure, especially of the active site, may be more useful for predicting gene function than sequence.
The Structural Genomics Initiative expects to have completed protein structures of 10,000 genes within five years, so that all others may be modeled from this initial set. Bioinformatic scientists will continue to assign functions to novel genes by integrating information from 3D protein structure, gene expression experiments, high-throughput cell-based assays, and new datasets of proteins found in serum and spinal fluid, for example. This integrated information is useful for developing assays for further wet laboratory experiments, and therefore it is essential that bioinformatic scientists work closely with research and development biologists for better results.
Q What hardware do you use?
A Linux, Sun Microsystems, and SGI. We use predominantly Linux boxes for rapid, high-performance computational solutions.
Q Which databases do you use?
A Currently at Hyseq, we utilize the public databases and our proprietary databases of sequences detected using our hybridization and clustering technologies.
Q What bioinformatics software do you use?
A Hyseq uses in-house software for data analysis and annotation of sequences. We also use software from some outside sources including Molecular Simulations for high-throughput protein structure modeling.
Q How is your bioinformatics unit organized within the framework of the company?
ACurrently, Hyseq’s bioinformatics division encompasses expertise in many areas, including analysis of gene expression data, development of novel predictive algorithms, protein structure/function and modeling, statistics, mathematics, molecular biology, software development, database development, and image analysis. We are always looking for exceptional computational biologists to join Hyseq’s bioinformatics division.
Q What non-existing technology do you most wish you had?
A Software for prediction and analysis of multiple protein-protein and gene interactions.
Q What projects are you working on now?
AWe are working closely with the data analysis group to utilize Hyseq’s extensive gene expression data, and we are modeling 3D protein structures and assigning functions for novel genes from our database.