Name: Philippe Hupé
Title: Research Engineer, 2nd Class, Institut Curie
In a paper published last week, bioinformaticists from the Institut Curie in Paris detailed the functions of a new database and software tool called ACTuDB, which they developed for array-CGH and copy number data analysis for tumors [Hupé, et al. Oncogene. ACTuDB, a new database for the integrated analysis of array-CGH and clinical data for tumors. 2007 Oct 11;26(46):6641-52.]
ACTuDB includes array-CGH profiles and clinical data for tumors. Users can browse and compare this data with VAMP, a data analysis and visualization tool that was also developed by IC’s Service Bioinformatique. Specific functions include the ability to compare expression data with the genomic profiles and to identify subgroups of tumors with clustering techniques.
The VAMP software also allows users to upload their datasets into ACTuDB, which accepts copy number data, transcriptome data, chromatin immunoprecipitation (ChIP)-on-chip data, and loss-of-heterozygosity data.
To learn more about ACTuDB, which can be accessed here, BioArray News last week spoke with Philippe Hupé, the lead author on the paper and IC research engineer.
Why is there a need for this kind of database for array CGH data and why did you at Curie decide to do this?
We had been asking the biologists to retrieve publicly available data to compare results with existing profiles. We started to integrate the different datasets we retrieved into this database and then we decided to publish this work.
There exists a need for biologists to compare their results with other publications and there is a need to do some meta-analysis work. In our database we have different datasets for bladder cancer or colon cancer, for example. If we find an alteration in our own research, it is then useful to look at the data sets to see if we can find the same alteration in different data sets.
What have been the challenges in creating this kind of database?
The main challenge is that when you want to use data made available by other scientists, you have to identify the location of each clone on the genome. The problem is that the locations are often not the same on the working draft version of the human genome. So to compare things we need to remap each clone to the working draft version.
We had to retrieve data through all the public databases like UCSC Genome Browser, Cipher, and so on. We have now developed an automatic pipeline that allows us to remap automatically each probe and each clone to the working draft version. And if the [Human Genome Project] releases a new version next month, then we are able to remap each clone to the latest working version.
So that was the main challenge and it is the main originality of our software. You can’t compare things if they are not located on the same working draft. We need to have the same location for each publication to compare the data.
Why did you choose to build ACTuDB on the VAMP software?
The main advantage is that it is a very intuitive and interactive visualization tool that also has been developed at Institut Curie. You can easily compare many profiles simultaneously. Say you want to compare two different data sets — it is very easy to do it within the software. The user can play; there are many functions for visualization or analysis.
For example, clustering is very useful in microarray analysis and you can do it on two different data sets using the VAMP software to compare the two sets, which is very easy. You can also find a common alteration for a set of tumors. So if you [have] two sets for bladder cancer, you can search for alterations in one, then in the other, then compare to see if you find the same alteration. VAMP is very interactive and you can do many things with it. That is why we decided to base our database on this software.
We plan, though, to add new functions and to have more sophisticated queries that the user can make. For example, let’s say that I want a dataset that deals with a particular pathology like breast cancer, where there is some clinical info available, like the status of p53. In our database, [it is not possible right now] to compare queries on the type of pathology, the clinical data and so on, … but we will improve it.
What kind of data is in ACTuDB right now and when will the new functionalities be added?
Right now we have data from 19 publications integrated, which corresponds to 2,000 genomic profiles, but there will be somebody at IC that will be in charge of integrating additional datasets.
I can say that by January 2008 the functionalities will be improved. We will also try to add as many publications as possible, but integration is very time consuming. We will mainly integrate cancer datasets that are useful for our research, such as data for breast cancer, cervical cancer, and all pediatric tumors. But if there are other datasets and [if] we have time we will integrate other pathologies.
How are data sets normalized before being entered into ACTuDB?
We use the data as provided by the author. We do not perform normalization. It would be very useful to add a unified normalization but it is a huge task to implement. What we say is that we rely on the normalization as provided by author. But what we do regarding statistical analysis is that we detect alteration on breakpoints using an algorithm developed at Institut Curie. We also use this segmentation algorithm to detect chromosome alteration. The same algorithm is used for each dataset.
What is the current availability of ACTuDB and will it remain free to all users?
This has been available in advanced access since spring and it was released last week. We have seen that many people have visited our website, but I can’t tell you if people have used all the functionalities of the database. It is open for everybody; there is no limitation of using this database. I think most [users] will be from academic institutions. All people working with copy number studies may be also interested in this database. But it is completely free.
The policy here in Institut Curie is to release such kinds of things for free. We have another pipeline devoted to array CGH analysis and other software that are free for academics and paid by companies. But the policy is to release it free for all academic purposes.
Who has done the work in creating ACTuDB?
I have collected the data and I have integrated the data. The software has taken a huge amount of work to develop if you take into account the VAMP software, and many people have been involved in the project. In terms of the integration functionalities, it has been three people: myself and two of my colleagues.
Are there similar databases available at this time?
Yes … there are others. But the originality of our database is the remapping of each clone on each working draft version of the human genome. We are the only ones to do it. The functionalities of data comparison offered by VAMP are also unique. These are the two originalities of our database.