Skip to main content
Premium Trial:

Request an Annual Quote

Green, at Work Recalibrating Phred, Tells Computer Scientists to Consider Biology


SEATTLE--Phil Green and his sequence assembly program Phred, both already well known throughout the genome sequencing community, got some general public-exposure earlier this year when a New York Times headline asked, "Who'll Sequence the Human Genome First? It's Up to Phred."

Although some researchers who were quoted in the article criticized the program for producing short read lengths, Green is quick to defend it. Phred, he said, was developed to assign quality values to the base identifications deduced by sequencing machines. "People are using Phred. It needs a little bit of recalibration. The base calling is actually fine," he said, adding that he is at work finetuning the program now.

BioInform met recently with Green at his University of Washington office to discuss his work on Phred as well as his thoughts on the future of bioinformatics and the Human Genome Project.

Green holds a mathematics degree from Harvard, has conducted research at and taught at Columbia University and the University of North Carolina Chapel Hill, and spent three years as a senior scientist in the human genetics department of the Massachusetts-based company Collaborative Research before landing at the University of Washington in 1992.

BioInform: What are you working on at the moment?

Green: I'm most interested in biological content. Over the next few years or decades the focus is going to be on interpreting the sequence. There's a lot of interesting questions there because of the amount of sequencing data coming down the road. There's a lot of analysis to be done. I see that as where the future lies.

We've been focusing recently on making use of expressed sequence tags information--partial gene sequences. One of the things we're doing is trying to assemble data and get rid of the redundancy.

The next question that we're working on is determining how these molecules interact with each other. I call that figuring out the wiring diagram. I'm interested in determining how the proteins interact with the DNA and finding these regulatory sites. We'll be comparing sequences from different genomes to do this. We're also studying protein evolution to understand the process of evolution. It's going to take the combined efforts of a lot of computational researchers to do this.

BioInform: How can bioinformatics companies help you achieve those goals?

Green: The field gets best served by multiple investigators trying to come up with their own approaches to research questions. My experience with bioinformatics companies is that they tend to focus on things like developing databases, developing user interfaces that allow you to bring together the output from various analysis program. But most of those analysis programs have been developed in academia. I would like to see companies putting more effort into extracting the biological information. It requires people to combine a solid background in biology with solid skills in developing models as well as software development. It's hard to find those people.

BioInform: Is this an area where you think bioinformatics companies are failing?

Green: The purpose of bioinformatics is to advance biology. I think a lot of people come into the field with quantitative skills, or thinking that their background in mathematics or computer sciences is enough. That's completely the wrong attitude. They needed to be guided more by the biological questions.

BioInform: What do you see as some of the next major computational or bioinformatics challenges?

Green: For genomics research it's doing a reliable job of finding all the genes. One popular area to work in at the moment has to do with gene expression arrays. They need to be well integrated with work on sequence analysis.

BioInform: You opted to sell your own software through the university rather than launch your own company. Why?

Green: We do have companies that distribute our software, so it's not all distributed through the university. But as to why I didn't start my own company, well, it's a lot of work and I'm an academic at heart. I'm not sure I have the right business skills. I like being at a university and interacting with people from a variety of disciplines.

BioInform: If we could look at bioinformatics in the next century, what would we see?

Green: What I see next century, in terms of the goal, is to understand cells and living organisms as complex systems of interacting molecules. We're still at the early stages of getting a list of what those molecules are. Once we understand what those interactions are, we still have to quantitatively model them, and that's going to take a long time.

Bioinform: Once we get to the point of understanding exactly how living organisms work, then what?

Green: Then we will be able to do all sorts of things, like manipulate systems to correct defects and improve on evolution, prevent things like cancer. It's not out of the question that we will be able to prolong our lifetime. We might be able to make ourselves more intelligent, to make the brain's complex system of nerves more reliable. It's a little scary, it's like playing God, but it's how we're going to progress. It's how we're going to take control of our bodies.

--Amy Nevala

Filed under

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.