AT A GLANCE
Oversees $400 million budget that supports research done by universities, national laboratories, and private institutions.
Holds a PhD in Mechanical Engineering and Astronautical Science from Northwestern University.
On the board of trustees for Crohn’s and Colitis Foundation of America, Greater Washington, DC, chapter.
Q: Where will bioinformatics be in five years? Ten years?
A: In five years we’ll have robust tools for genome-scale analysis and comparisons, not just at the sequence level, but also for deriving information from gene expression, proteomic, and imaging technologies. Ten years from now we should be seeing dramatic increases in our ability to predict and design molecular and system-wide function.
Q: What are the biggest challenges bioinformatics must overcome?
A: There is a shortage of personnel with the necessary mix of biological and computer science skills to be effective bioinformaticists. Training interdisciplinary scientists who mix knowledge of molecular biology and computational science is a major challenge both in academia and elsewhere. Additionally, both the Department of Energy laboratories and universities are having difficulty retaining qualified personnel given the growing number of very high-paying industrial jobs. Also, small academic groups do not have the size or range of expertise needed to build production scale bioinformatics software and databases, and hence there is a risk that future advances will be limited to the private sector.
Q: What hardware do you use?
A: The DOE funds and administers a number of bioinformatics projects, including groups at Oak Ridge National Laboratory in Oak Ridge Tenn., and the Joint Genome Institute in Walnut Creek, Calif. The JGI has a variety of Sun Microsystems computers, ranging from a pair of Enterprise compute servers (20 CPU E6500, 8 CPU E3500) to about a dozen smaller machines hosting various local databases. ORNL has similar in-house resources and is currently a major user of the 736 processor IBM SP3 at ORNL. In the future, the JGI is also planning to make use of massively parallel supercomputers at the nearby National Energy Research Supercomputing Center.
Q: Which databases do you use? Public, proprietary, or third party.
A: The DOE programs depend on a number of major public databases including GenBank, dbEST, and other archives, as well as curated databases such as Swissprot, KEGG for pathways, and Pfam for protein domain families. The DOE programs also develop and maintain public bioinformatics databases, such as ORNL’s Genome Channel, which provides views of human, mouse, and microbial genomes with an analysis of the genes.
Q: What bioinformatics software do you use? Do you use in-house developed or third party software?
A: At the sequencing level, we use a commercial base-caller from Cimarron and a parallel version of Phrap for assembly as well as a variety of publicly available tools for viewing and editing sequence assemblies. For sequence assembly we use a number of the publicly developed methods and are developing advanced assembly software in-house. For gene finding, we use Grail EXP developed at ORNL as well as the third-party package Genscan for human sequences.
Q: How do you integrate your data?
A: Right now we do it mostly through relational databases, but we’re experimenting with a distributed approach along the lines of the XML-based system proposed by Lincoln Stein at Cold Spring Harbor Laboratory and Sean Eddy and Robin Dowell of Washington University in St. Louis, and a kind of federated database in which different data types are stored in their own relational system.
Q: What non-existing technology do you wish you had? What’s lacking in the bioinformatics toolbox?
A: The most important thing on our wish list is a greater supply of experienced bioinformaticists who will be the key to creating the bioinformatics technologies for the future. With the completion of the draft human genome sequence an incredible resource now exists for biologists to begin to ask questions of the organization, structures, functions, interactions, and mechanisms of living systems in a detail never before possible.