AT A GLANCE
BS in biochemistry from the University of Wisconsin. PhD from the University of Washington, where he studied tissue-specific gene regulation using transgenic mice.
From 1993 to April 1996, served as an NIH postdoctoral fellow at the Fred Hutchinson Cancer Center in Seattle where he studied gene trapping and gene targeting technology.
Joined Lexicon as senior scientist in 1996, served as vice president of research util promotion to senior vice president of genomics in 2000.
QWhat role does bioinformatics play at Lexicon?
ABioinformatics has two big components at Lexicon. One is sequence-based bioinformatics and the other is function-based. We use sequence-based bioinformatics for our gene trapping process, we use it for storing and analyzing the sequences of the genes we have mutated in mouse embroyonic stem cells, and we also use it to collect and analyze the data we get from our gene discovery program using gene trapping in human cells. Sequence-based bioinformatics is focused on mining out what we see as the most valuable genes in the genome.
Our goal internally is to look at the in vivo function of 5,000 genes over the next five years using knockout mouse technology. We call it our Genome5000 program and we are choosing these 5,000 genes based on their potential as therapeutic proteins themselves, as targets for antibody based therapeutics, or because they belong to the so-called “druggable” gene families. All companies now have the same gene sequence information. What will determine who brings value to that information is which companies can most rapidly and efficiently identify the best targets for drug discovery and move forward.
Once we choose the relevant genes we put them through our genetics pipeline to understand function and identify the best targets. We remove each gene one at a time from the mouse and study the resulting mice to define gene function. After making mutations in the potential targets we put the animals through a large number of pathophysiological screens. These include blood chemistries, CAT scans, MRI, and other medical tests that feed data directly into the LexVision database, which provides an interface for us and our partners to look at the data and identify the best targets.
A lot of companies are dealing with sequence-based bioinformatics. I think the function-based bioinformatics is fairly unique for us, especially since it’s in the context of mammalian physiology.
QWhat kind of bioinformatics software do you use at Lexicon?
AWe’ve developed all our bioinformatics software internally — our database and the web-based interactive software. We do use Blast and some other publicly available software. Over the years we looked at a lot of software developed by third parties, but usually it would take a lot of work to adjust commercially available software to our exact needs.
QHow large is the bioinformatics staff at Lexicon?
AApproximately 40 people. About half are in sequence-based bioinformatics and the other half are in function-based bioinformatics.
QAside from LexVision, do you have access to other public or private databases?
AWe use public databases for mining the genome and we also have access to Incyte Gold. In addition to that we have our OmniBank database of gene traps and sequences in mouse embryonic stem cells. Currently OmniBank contains over 200,000 clones and corresponding sequences. These sequences group into over 39,000 non-overlapping sequence clusters and bioinformatics tells us that we have mutations in a little over half of all genes — close to 53 percent.
We also did a human gene trapping project for gene discovery purposes. We took in over 550,000 human sequence tags clustered into over 51,000 non-overlapping sequence clusters. At the time we did that — around three years ago — about 45 percent of those sequence clusters were novel compared to the public databases and that’s driven our program for full-length sequencing of human genes and patenting those genes. So we have several hundred of what we think are the best human genes out of that.
QSo you use the public sequence data for the mouse as well?
AThe public mouse data is 95 percent coverage and they now have the map of the mouse genome on Ensembl. The human and mouse genomes have been very useful for both our gene mining processes as well as in designing the mutations for studying gene function.
QWhat microarrays do you use?
AWe use Affymetrix arrays for two purposes. One is for prioritizing genes going into our genetics pipeline. However, we do not like to rule things out based on expression because we’ve found that expression is often a very poor indicator of function.
The problem is that a lot of people are trying to use expression alone as a target validation tool and it''''s not very good for that. We can see hundreds of thousands of genes that change in expression in a particular disease model, for instance, and yet how do you choose which of those hundreds or thousands is really causative or most important? That data is just suggestive and it''''s a long way from that to a truly validated target.
Also, once we have an interesting phenotype, we’ll look at those animals and see if there’s significant changes in the levels of expression of any genes that could give us further indications from a molecular standpoint for understanding the biology of the targets.
QHow do you integrate your data?
AOur sequence-based bioinformatics is all linked to our function-based bioinformatics and all of that is also hyperlinked to everything publicly available that will bring additional value.
For instance, if we’re working on the function of a target then we’re linked in with our sequence-based bioinformatics that tells us everything there is to know about that target — nucleotide sequence, protein sequence, map position in the human genome, OMIM disease relevance for that region of the genome, PubMed articles related to that gene or gene family.
Data integration is critical. We try to tie in everything and have it right at the fingertips of our researchers.
QWhat projects are you working on now?
AWe’re pretty busy mining the genome for the best gene families. We’re continually working with the function information and I think that’s really new. We’re trying to bring together information on mammalian physiology for each of the targets. We do all sorts of tests to determine function but in order to interpret them each test needs to be interpreted in the context of all the other tests we do.
One of the things we’ve been working hard on is being able to have user profiles to automatically pull up the different types of data that would be relevant to a given area of disease biology and allow one to make conclusions about the quality of a target. We’re working to not only show and analyze all the data on function but to be able to interpret it.
We also recently acquired a chemistry company, formerly Coelancanth Corporation and now Lexicon Pharmaceuticals. This addition brings us compound libraries and medicinal chemistry so cheminformatics is very important to us. We’re near the completion of integrating our technology with theirs and expect to have high throughput screens running on some of our favorite targets within the next couple of months.