RIKEN Omics Science Center and Yokohama Institute
Name: Yoshihide Hayashizaki
Title: Director, RIKEN Omics Science Center, RIKEN Yokohama Institute, Japan
Experience and Education:
— Chief scientist, department director, and chairman, Genome Science Laboratory, RIKEN, since 1994
— Professor at Tsukuba University Medical School, since 1995
— Professor at Yokohama City University, graduate school of integrated science, since 2001
— Senior research scientist, RIKEN Tsukuba Life Science Center, 1994
— PhD, Osaka University, 1986
— MD, Osaka University, 1982
Yoshihide Hayashizaki directs the new Omics Science Center at the RIKEN Yokohama Institute in Japan, which opened last month. As part of a systems-biology approach to studying molecular networks and pathways in differentiating cells, scientists at the center use high-throughput sequencing technologies.
They have been studying genome-wide gene expression and promoter activity, for example, using a method known as Cap Analysis of Gene Expression, or CAGE.
The center currently houses a 454’s Genome Sequencer FLX, an Illumina Genome Analyzer, and an Applied Biosystems’ SOLiD system, and is considering buying a Helicos Genetic Analysis system.
In Sequence spoke with Hayashizaki two weeks ago at the Cambridge Healthtech Institute Next-Generation Sequencing conference in San Diego.
Tell me about your past research at RIKEN, and how it led to the Omics Science Center.
In 1995, the Japanese government discussed how to implement a new type of large-scale life science research and appointed me as a chief scientist and project director of a genome project team at RIKEN. Originally, I worked in the Tsukuba Life Science Center, which is in the Northern part of the Tokyo area. But RIKEN decided to build a new campus in Yokohama next door, which opened in 2000. I was the first group to move into the new RIKEN Yohohama Institute.
Back in 1995, the US National Institutes of Health announced that they would try to start sequencing the complete the human genome within eight years. The Japanese government asked me to contribute to this community effort as a representative of Japan, but with or without us, the Human Genome Project would have obtained the human genome sequence. So I decided to start a full-length cDNA sequencing project instead.
To make full-length cDNA, a series of new technologies would be required, and we started to develop those. At the time, for example, we did not have a capillary sequencer, no-one had one. So we decided to develop our own system, a 384 capillary sequencer, in collaboration with Shimadzu. We started using ours in 1997, one year before Celera started using the Applied Biosystems 96-capillary sequencer. We also developed, in collaboration with RIKEN’s engineering section, a fully automated plasmid preparator.
Then, we started to construct libraries and collect clones from 263 different tissues or developmental stages from mouse. Because we needed to have so many different tissues, we decided to focus on mouse first. We actually picked almost two million full-length clones and used end-sequencing to characterize them. We also sequenced more than 20,000 full-length cDNAs and deposited them in the public databases. But in order to do so, annotation was required, which we could not do by ourselves. So in the summer of 2000, we called an international meeting to annotate the clones. That was FANTOM, the Functional ANnoTation of Mouse conference.
Later, as we included data from the human transcriptome, FANTOM became Functional ANNotation of Mammalian cDNAs. In FANTOM-2, a consortium of researchers sequenced and annotated another 40,000 cDNAs, increasing the total number to almost 61,000. FANTOM-3 expanded this number to 103,000 clones. In addition, it is a collection of promoters, and we identified transcriptional start and termination sites in this project.
In FANTOM-2 and 3, we found that many, many RNAs are transcribed from everywhere in the genome. Until that time, scientists thought that only 2 percent of the total genome was functional, the protein-coding genes. But we found that more than 70 percent of the entire genome is transcribed. The majority of that product is non-coding RNA. It’s the so-called RNA continent that we discovered. Initially, everyone criticized us very severely. But after our reports of FANTOM-2, which were published in 2002 and 2003, many other papers appeared to describe the functions of non-coding RNAs.
The results of FANTOM-3 were published in a special issue of Science in 2005 that focused on RNA. We found 180,000 independent promoters. We also found that one gene, on average, has several promoters.
FANTOM-4 is still ongoing. Now, we are trying to explore how genes relate to phenotype. This is part of systems biology, based on very large-scale data.
That’s what you are doing at the new Omics Science Center?
‘Omics’ is aimed at the entire system; it’s a very arrogant name. We are currently establishing a so-called ‘life science accelerator,’ a large-scale analysis system that rapidly analyzes molecular networks by different technologies. Using those, we can characterize the networks of genes active in cells that differentiate from one phenotype to another phenotype. If the cell keeps a certain phenotype, the concentration of the active form of the transcription factors and the non-coding RNAs, for example, must be the same, or at least oscillating. We are trying to find out what kind of regulated network keeps all of the transcription factor concentrations constant. That kind of network picture needs to be drawn.
The Omics Science Center just opened in April and has more than 170 employees. The center is divided into five groups: Three groups will focus on the establishment of the ‘life science accelerator.’ They are the LSA system development group, the LSA technology development group, which I direct, and the LSA bioinformatics team. We also have a functional genomics technology team and an RNA function research team.
How do you employ new high-throughput DNA sequencing systems in your research?
We have been using the new sequencing platforms in CAGE, which stands for cap analysis of gene expression. In CAGE, we sequence 20- to 27-nucleotide tags from the 5’ end of full-length mRNAs and map these tags onto the genome. This approach of sequencing the 5’ end of RNAs and counting their frequencies is the only way to measure the activity of the promoter region of genes; a microarray experiment cannot detect this.
In order to identify the promoter activity, we proposed a new concept, “motif activity,” that represents the actual concentration of active transcription factors. Our approach can draw all of the edges[, or relationships,] of the transcription factors, promoters, and ncRNAs
What sequencing platforms do you currently have at the Omics Science Center?
We have the 454 GS FLX, Illumina Genome Analyzer, and ABI SOLiD now, and we are considering introducing additional instruments, [including the Helicos Genetic Analysis system], later this year. Each of them has its own characteristic features. Depending on the purpose, we have to select the most appropriate sequencer.
What other technologies do you use?
We also use qRT-PCR and microarrays. We have to take many time points in a cell differentiating from one state to another. For all of the points, we produce data using CAGE, qRT-PCR, and other technologies.
What experimental system do you study, and what is the goal?
We study differentiation of monoblasts to monocytes, which is well known. Our analysis, which is at the single-molecule level, is validated by a lot of known data. We can detect one RNA molecule in 10 cells. We have already produced such data, but we are now establishing a system to analyze a network of genes, not just discover RNA.
There is a known trigger, which starts differentiation in monoblasts. From that trigger to the last monocyte phenotype, we try to understand the phenotype at the molecular level. That is the goal.