Skip to main content
Premium Trial:

Request an Annual Quote

BioArray Q&A: Andres Metspalu on the Future of the Estonian Genome Project


By Justin Petrone

metspalu.jpgName: Andres Metspalu

Titles: Director, Estonian Genome Project; head, department of biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Estonia; senior scientist, biology work group, Estonian Biocenter, Tartu

Education: 1982 — postdoc, Yale University; 1981-82 — postdoc, Columbia University; 1979 — PhD, molecular biology, Institute of Molecular Genetics, Ukrainian Academy of Sciences, Kiev; 1976 — MD, Tartu University, Estonia

TALLINN, Estonia — For seven years, the Estonian Genome Project has been collecting blood samples and phenotypic information on the population of this small northern European country.

Its goal is to use the resulting tens of thousands of samples in association studies, and to eventually use them in personalized medicine.

Originally a private-public endeavor, the Estonian Genome Project was reorganized in 2004 into a venture overseen by the University of Tartu and supported both by the state and the European Union.

Since then, the project has been led by Andres Metspalu, who also heads the department of biotechnology at the Institute of Molecular and Cell Biology at the university and chairs the scientific advisory board of Asper Biotech, a Tartu-based genotyping company.

In recent years, Metspalu has been involved in a research project to create a comprehensive genetic map of individuals in Europe that will allow scientists to take regional differences in populations into account when doing inter-population genetic studies.

He presented data on this work at the Human Genome Variation and Complex Genome Analysis meeting, held here over the weekend, which he also helped to organize.

Beyond studying European genetic structural variation, though, Metspalu envisions using the samples collected so far to better inform healthcare decisions in Estonia, and sees increasing opportunities to take part in drug-development activities.

BioArray News spoke with Metspalu during the conference to learn more about these future activities. Below is an edited transcript of that interview.

You wear many different hats. What do you spend most of your time doing?

Most of my time is spent at the Institute of Molecular and Cellular Biology where I am professor. That's where all of my students are. My second position is as director of the Estonian Genome Project. In the future, I think I will have to fuse everything. My roles at the lab and the biobank will have to become one thing. Once we move everything to our new building on campus, then I will move my lab, and I will be able to transfer my research projects from the institute to the biobank. It will be just one position.

What is the status of the Estonian Genome Project?

It's going very well. We invested quite a bit in public relations and set up everything. Now, even when the government reduced its budget for this year because of economic problems, the people are still coming and the project is still popular. Of course, now we have more demand than we actually have money to pay for it. It used to be the opposite. Now we have enough gene donors and money is the limiting factor. By the end of the year we will have collected 40,000 samples, which is a good size. We plan to have 50,000 samples collected by the end of next year. Then we will stop the massive collection of data to really focus on the analysis. We also have to do a follow up, because the first samples were collected in 2002. It's time to see how their health has changed.

What is the point of the biobank? Is it just to have a ready database of samples for scientific projects?

There are three main aims. It is a huge and expensive project and it can't be that we are only satisfying a small group of scientists who are interested in genetics. Of course, one of those aims is research. And it's not only local people who are interested, and we have lots of collaborations. Because of the biobank, our visibility internationally has increased. We are collaborating with Decode [Genetics], we are collaborating with the Broad Institute, and we have collaborations all over the world now, with the best places. They never before recognized Tartu as a place, but now they are calling and asking, 'Do you have 2,000 samples with a certain heart rate?' or 'Do you have 2,000 samples with a certain eye color?' There are at least 20 different projects that are going on this way.

[ pagebreak ]

You have collected a lot of phenotypic data. What was the strategy you laid out when you decided which things to take into account during the sample-collection process?

The idea was to collect as much as we could with the limited amount of money we had. We had no particular idea or hypothesis in mind; we just wanted to cover most phenotypes and to be open to every type of collaboration. If we had just, for instance, concentrated on cardiovascular disease, then the eye doctors would like to know why we left them out. So everybody has an equal opportunity to use it.

But this is research. A second aim of the project is that I think we can transfer the information into one practical tool that GPs can use in their everyday work in disease prediction or health prediction or risk analysis. It used to be that you would just interview your patients and part of that interview would include questions about parents and grandparents to give the GP an idea of what diseases they had.

But there are issues associated with this. In many cases in real life, nobody knows about the father. Maybe the mother remembers his name, but the health information of the father's grandmother? Nobody knows. From the data we collect, all of this information will come out. The doctor will have more information on the screen when he or she has access to a whole-genome map.

I guess at some point in time, perhaps by 2016 or so, we should be able to provide a tool for GPs, and GPs should be ready to use it. We'll also have to start a program to reeducate the GPs.

But what functions would that tool have?

The tool has to be very simple. If you were my patient, and you were a gene donor, then it's simple, we'd just get your data from the biobank. If you were not, then we would just have to run a small test to identify where you are in this family structure. Of course, you are coming from the US, so you are not going to match. But let's say that we can put 80 percent of Estonians into some kind of framework. And then, from existing data, which we believe that when we have 10,000 people analyzed using arrays, we can impute most of the others. So, you get data in a way that you don't really do a full-genome test, but you are just using imputed data, because we know from which family you come, and from this 10,000, there will be so many representatives from your family, so we can actually spot you in the right place and say, 'OK, your variants are like this.' And then we'll calculate: your risk of glaucoma is maybe increased or your cancer risk is reduced. But it has to be really simple for the GP. The GP can then advise you to, say, measure your eye pressure once a year and, if it goes up, refer you to a specialist. This is the main thing I had in mind when we started this project; to do something for real life. Research is always on the screen but this is something that can be used practically.

The last aim of the project is that I see some opportunity for business, especially in drug development. Our cohort of 50,000 people already consented to be part of the biobank. If we need to test them for a particular new marker that has predictive value, or we need a new sample for RNA analysis, I would guess that these people would be more eager to participate than anybody off the street or from a hospital. And if we even had to do a drug trial, let's say to take a placebo or a certain pill, we could identify enough people from this cohort of 50,000 that would be willing to do this. This is really drug development and we are discussing how to do it. We have now in Estonia a critical mass of experts and small companies and [we have been wondering] how to put them to work together. It looks like some kind of drug development initiative could do it. I guess this is something that will probably happen within one or two years. We believe that the government initially has to put money into it, but when it is up and running, we hope that the government's share at some point can be bought out by a serious company.

What tools are you using to screen these samples?

The basic tool in our place is the Illumina genotyping platform. We will also get a sequencing unit next week. I got a huge grant from the EU for the biobank, about EEK 20 million ($1.9 million), which allows us to buy a sequencing instrument, as well as to hire more people, et cetera.

But will you continue to use genotyping arrays going forward?

Genotyping arrays will stay because there are lots of different applications. Of course, we'll now also have sequencing, but it depends what the task is. We are even still using TaqMan and [arrayed primer extension hybridization – a method commercialized by Asper Biotech — Ed.] and other techniques, because we are participating in lots of these replication experiments now. Let's say Decode found 10 SNPs that determine which hand you are using, and they want to replicate it in our population, and so we just run the TaqMan assay. Depending on the task, we choose the best platform. The important thing is to have a full array of technologies available to do the task the most rapid and economical way.

[ pagebreak ]

You discussed a study at the conference where you used the Illumina array to investigate the difference between European populations. Why did you do that project?

There was always the question of, if Estonian samples were modeled, how relevant would they be. Lots of people think that we are so different, or that we have had so many foreign rulers – Germans, Swedes, and Russians – that you couldn't really get anything out of it because we are a mixed bag that is not relevant for anybody else. And I wanted to show how it really is, and this was the only way of doing it. We did it to show what populations are closest to us, so even if we looked at populations in Germany or Sweden, we would know what the difference between us and them is, and if there is a loss of power, we can take it into account. Finally, it shows that our population is quite related to Western Europe. I mean, we can leave southern Europe out and the Finns, but other than that, we're quite homogeneous.

Are you planning to continue that work?

I guess it's more or less done because there are more publications on the different populations we did study. If we combined all the data, I guess we have this type of data on every country, except some of the Balkan countries, but basically Europe is covered now. But basically, for this task, it's enough. We know with whom we should really collaborate, and what is the loss of power. So, we don't need to do any more work at the European level. What we would like to do, though, is look on the Estonian level. In order to get this family structure, by going back 10 or 15 generations, it is important to have in mind if we are designing studies here. There have been some studies done in Iceland where they have 40,000 scans out of a population of 300,000 people. They see the very fine structure of the population, which is not only by the county and birthplace, like we can see, but at a much finer scale. If we do it here, we can see family structures. Some people don't believe it, but it's not a question of whether you believe it or not. You just have to do it. Some of the families are probably so mixed and there has been a lot of internal migration, but perhaps we could see some of the fundamental, ancient family structure in Estonia. This is what we want to do more of. That means more people scanned. Right now, the tools have minor allele frequencies found in 5 percent of the population or more. Now the new tool is 2.5 percent, the Human Omni-1 Quad Beadchip, which has 1 million SNPs per sample on the array. But probably early next year, when we will buy a new genotyping scanner, it will probably be 1 percent. I guess 1 percent is exactly what we need for population-based structure.

One question of this conference has been the future of GWA studies. Companies are predicting a second round. What do you think?

Next will be sequencing. The original hypothesis was if we just looked at SNPs, we would understand common diseases, but it was not the case. We discovered lots of new markers and genes and pathways, but all combined they still explain maybe 20 percent or 25 percent of genetic risk. So, where is the rest? The next thing is that we will look at rare SNPs. But what came out this year is that it is important from which side, mothers or fathers, you get these SNPs. And the impact is different. In one case, if you get the same SNP from your father, your risk is maybe 5 percent above population-based risk. On the other side, it's 50 percent. To find that out, though, you have to go through a process of imprinting, where you actually separate the chromosomes. So, I think we have lots of things that are still in the bag that have not been used. One is imprinting, one is epigenetics. I think that just looking at SNPs is not enough. We will never solve the genetic risks of common diseases, even with rare SNPs, because if it is rare, then it is rare, and it explains less. I mean, if 20 percent of people have elevated blood pressure, you can't explain it with rare SNPs. So, I guess, we still know quite little. We have to do lots of basic biology and genetics to understand it. The bottom line is that you would like to have lots of genetic information, and sequencing is what gives you lots of genetic information.

Finally, I think the companies are the ones driving the field. They produce the new tools, and everyone can go and purchase a new tool. In a way, this is kind of awful. I would like to see it the other way around — that we come up with an idea and the companies will produce the stuff, which is what we did with APEX. When I started, there was nothing to buy, and so we just came up with an idea and built an instrument. Now, the companies are producing new tools. They go to the Sanger Centre and the Broad and give them everything first. They publish the first papers in Nature and Science. Then everybody is rushing in the same direction. And this is why we are always behind. And so, in a way, it's a technology-driven field. But once that information is obtained, it is up to the individual investigator to determine how to analyze it, which brings it back to small science, in a way.

A recurring theme at this meeting seems to be that phenotypic information is extremely valuable. Has that always been the case?

When we started 10 years ago we discussed how to analyze SNPs. Five years ago it turned to informatics. Basically, the way we are putting this meeting together, there are four or five people who know what the front line is. It came up that biobanks, variation, phenotypes, new methods were important topics. And then we just invite people who are addressing these questions. The program is different every year. It is trying to address the most interesting questions. So the topics keep changing.

Phenotype is probably the most important thing, but good phenotypes are expensive. You can imagine that you can't hospitalize 50,000 people for five days to take all the measurements that are needed. Another problem is that diagnostics is not evidence based. It's all agreement. Who has Parkinson's disease? There may be 10 symptoms. If a person has six, it's agreed that they have Parkinson's. But another patient may also have six, but a different six. And it just comes down to the point that everyone has their own disease. The reasons you or me might have a certain disease might be different, even if the end point is the same. So if we can concentrate on a small group of people that is really well phenotyped, then we don't need 2,000 people in GWAS. We might only need to look at a few hundred people to identify the sources of their disease, which is something that we can do quite easily here.