Name: Yongyong Shi
Title: Professor, Shanghai Jiao Tong University
Background: Principal investigator, Bio-X Institute, SJTU, Shanghai, 2006-present
Education:PhD, SJTU, 2006; BA, biotechnology, international economy and commerce, SJTU, 2001.
If you are looking to collaborate with a Chinese investigator in a genome-wide association study, Yongyong Shi just might be your man. So far this year, the 33-year-old investigator has co-authored 13 papers focused on genetic mutations associated with psychiatric disorders, cancers, and other diseases in the Han Chinese population.
For example, in Nature Genetics in July, Shi helped collaborators identify four variants associated with coronary artery disease; and he co-authored a paper that appeared last month in the American Journal of Human Genetics and discussed two regions that are associated with multiple cancers that afflict Chinese patients.
Whole-genome genotyping arrays have been Shi's main tool for GWAS, though his choice of arrays has shifted over time. When Shi and colleagues learned that up to a third of the SNPs found on the widely used Affymetrix SNP 6.0 Array were not informative for the Han Chinese population, he and other collaborators from Shanghai Jiao Tong University's Bio-X Institute worked to promote and develop population-specific arrays that could be used in future GWAS in China.
BioArray News met with Shi at STJU's campus in Shanghai last month. Below is an edited transcript of that interview.
What is Bio-X?
Bio-X is a research institute within the university focused on genetics [and] developmental biology, and we study a lot of kinds of diseases. Bio is for biology and X means cross disciplinary. So it is a cross of different subjects, including mathematics, physics, and others.
Are there any common diseases that you, in particular focus on?
I think we run quite a lot of projects and we have published three manuscripts based on internally run GWAS, one on schizophrenia, a psychiatric disorder, one on gastric cancer, and then there was one on … polycystic ovary syndrome — we have found 11 regions that are associated with it. Some of the regions are also associated with type 2 diabetes. [PCOS] has more than 20 percent overlap with type 2 diabetes. Some genes are shared with type 1 diabetes. And some genes are shared with erectile dysfunction in men. So, very interesting findings, I think.
Have you always used genotyping arrays to do those studies?
In the past four or five years, the genotyping array is the main tool that we use to genotype those mutations to get those sequences for the analysis of different diseases.
One issue we have discussed is the need for population-specific products. These only became available in the last year or two. Did you have problems earlier because you were using arrays that were optimized for European populations?
When we started our collaboration in 2008, we got our first genotyping product from Affymetrix, the SNP 6.0. In fact, we didn't think about population specificity at that time. We were very impressed by the large number of SNPs that could be detected on that chip. Yet very soon, we discovered that in the Chinese population, only two-thirds of those SNPs were informative. The number was still OK. There were about 600,000 informative SNPs on that chip. We finished our work and published some papers; we just thought about next-stage issues, what we should use.
Just at that time, a collaborator came to us, but they weren't sure of which platform they would use. So we, together with Affymetrix, thought of ways to help them, and we realized that population specificity [was] a problem for the chips at that time, all kinds of genotyping chips. If Affymetrix could provide a product that addressed this problem, and produce a Chinese population-specified chip, that might be more attractive to future Chinese researchers and for the customers of Affy. I think that's a good strategy.
At the same time, Affy told me that they had a new technology called Axiom, and that they could design a chip for the Chinese market. So I thought, fantastic, we also want that chip. And why not? It would be much more informative than the SNP 6.0, and I think this was a factor in enticing the collaborators to work with us and finish genotyping on the Axiom platform.
Now, I think it has become a common platform that is being sought out by all groups that want to carry out a genotyping study. There are new arrays, though. In China, Illumina provided a chip relatively soon after [Affy's] CHB array [for the Han Chinese population] hit the market, called the HumanOmniZhongHua BeadChip. But then CHB 2 came out and it had more SNPs.
Did you play a role in the design of CHB 1 or CHB 2?
As for the CHB 1, we just promoted the design of it; we weren't involved in the selection of SNPs. But for the CHB 2, indeed, we made some decisions. And we persuaded Affy to put all the informative SNPs from the SNP 6.0 chip onto the new CHB chip, because we thought that if the new chip was used to generate a lot of data on the Chinese population, it should be compatible with the previously used chips, so that all the data should be useful. I think it's very important, especially for traditional customers.
Have you seen any benefit from using a population-specific approach toward your research goals?
We have not compared directly the results of the different chips in the same project. It doesn't make sense to us because we don't want to spend so much money just to compare the results. So I cannot say; I do not have any direct information on that. But, theoretically, the coverage of the genome is much better on the specific chip for our population. This is the target of the design, I think. Another thing is that all the SNPs on the specific chip provide information. If you use a non-focused chip, 10 percent or 30 percent of the SNPs may be uninformative.
How many SNPs are on the CHB2?
1.3 million. The SNPs were selected at the end of last year. I think we are the first customers. It became available in February or March, I believe.
Are you planning on embarking on new association studies with the new CHB2 array?
Oh yes. We are going on with using that chip, especially for some interesting traits we are going to study.
A lot of GWAS continue to be published. But how can people tell whether a GWAS study is a good or successful one?
I cannot decide whether a GWAS is good or not. The top journals define whether a GWAS is good or not. So their criteria are the criteria for researchers, in most cases.
Let's consider the CAD study your team published in Nature Genetics. What were their criteria?
For Nature Genetics, the criteria are that you find new, things, new loci, that were not identified in previous studies and that you validate it, and that you get a P value of less than 5 x 10-8. If you cannot find new loci compared with previous studies, by their criteria, you can say that it is not a good GWAS because it didn't give you any new things. But the data generated by genotyping all those chips should not be wasted. Maybe later you could combine data with a collaborator and find some new things.
I think for GWAS there are a lot of challenges. You get some positive signals for a region, but you cannot tell whether the gene is turned on or not, or which one is the causal or functional mutation. You have no such information. It just gives an odds ratio of 1.05 or 1.1 or sometimes 1.2, which I think is very high. It doesn't tell me any form of the diagnosis of the disease. But I think that GWAS is just the first step, an important step — you get some information, but you have to continue to study; you have to search for those functional mutations and the risk genes in the region.
Another thing is that you need to enlarge your sample size. Enlarge your sample size and you get more regions. For example, [for] the GWAS of height, there were more than 300,000 samples analyzed as part of that study, and they found more than 180 genes associated with height.
What's the next step in GWAS for you?
I think we have two directions. One is enlarge the sample size for a specific disease and carry out the GWAS study using arrays, to identify more loci of this disease and to get a map with a much better idea of pathways of these genes. And when you get a specific region, we can sequence the region for large samples in the population, and try to get all the possible causal or functional variants, and validate them in functional experiments — in cell studies, for example.
Meta-analyses are also increasingly common.
Indeed, meta-analysis is frequently seen in current GWAS. We need to carry out meta-analysis of our data that was generated at different stages and combine them together to see whether the final P value is small enough. And for different GWAS studies, even groups, we also should carry out a meta-analysis to see where we can find some new things. It's also a way to enlarge one's sample size and to collaborate with different groups. The chip information has to be compatible too, otherwise it cannot be achieved. Imputation is a way to solve that problem, but imputation is complex and not so easy to be carried out just based on chip data and the 1000 Genomes reference data.
In the US and Europe, the heyday of GWAS is clearly passed. It's also become more difficult to obtain funding for such studies, given budgetary concerns. How is the situation in China?
In China, the scale of GWAS is much smaller when compared with the US. A lot of researchers will try to get money not only from the central government, but also from local governments. Our samples are mostly collected through the university, with the help of the local hospitals. I think there will be more and more resources available for such studies in China, though. I think the government has realized that it is important to keep the population healthy, and genetics studies are always a very important area of this. New technology will help us to discover pathology of different diseases and give us a chance to get new drugs and … pre-diagnosis of the diseases. So this area should be well-funded.
Have you considered using even more focused, lower-density arrays as a validation tool in future studies?
That might be a good strategy for chip-based studies. But another important thing is that we might want to carry out target resequencing analysis of those GWAS regions, because you cannot know which SNP will be functional and [whether it will be] available in a population. Those SNPs associated with the disease might be of very rare frequency and found only in a few patients. You cannot include those rare variants if the chip has been designed previously. So, I think target resequencing, haplotype analysis, might be a good strategy for us to uncover those causal or functional variants.