The National Institutes of Health will spend nearly $3.5 million over the next two years to finance the creation of an online database of copy number variation information related to abnormal phenotypes.
The project, which officially started Sept. 30, is being led by investigators at Emory University and will be carried out in partnership with the International Standard Cytogenetic Array Consortium, a group of clinical cytogenetics and molecular genetics labs that seeks to standardize the way arrays are used in clinical cytogenetics.
According to Christa Martin, senior director of the Emory Genetics Laboratory, the database will complement the widely used Database of Genomic Variants.
Hosted by the Hospital for Sick Children in Toronto, the DGV contains CNVs from normal populations that are considered to be benign. The Center for Applied Genomics at the Children's Hospital of Philadelphia also recently introduced a database of CNVs found in healthy individuals (see BAN 7/21/2009).
The ISCA, in contrast, will "take a complementary approach and build an atlas of abnormal regions of the genome, places where we know deletions or duplications will cause an overt phenotype," Martin told BioArray News this week.
The award is an NIH Grand Opportunities grant under the American Recovery and Reinvestment Act of 2009 and is scheduled to end Aug. 31, 2011. The principal investigator on the project is David Ledbetter, director of the division of medical genetics at Emory. A co-investigator on the project is Ronald Wapner, director of maternal fetal medicine at Columbia University.
Martin said that investigators from the George Washington University Biostatistics Center, the University of California at Santa Cruz, the Mayo Clinic, GeneDx, and the National Center for Biotechnology Information will also participate in the construction of the database. Martin said that the group hopes to make preliminary data available to the public via the database by the end of next year.
Most of that data will come from the project's participants. Martin said that Emory has over two years worth of data from patient samples surveyed with its whole-genome, oligonucleotide-based EmArray Cyto, which is manufactured by Agilent Technologies. Like many cyto labs, Emory used bacterial artificial chromosome-based arrays in its services prior to moving to an oligo-based platform.
According to the grant abstract, the overall goal of the project is to "leverage this large clinical dataset generated in the course of clinical care to create a research resource for gene discovery related to human developmental disorders as well as to build an invaluable clinical resource for learning about the clinical and public health impact of CNVs."
The project has four specific aims. The first is to collect "very large standardized datasets from clinical array testing in pediatric and prenatal populations." According to the abstract, the Emory-led team will develop methods for a large consortium of clinical sites and clinical genetics testing laboratories to collect and submit CNV and clinical data to a central, public data repository.
The second aim of the project is to standardize array design and CNV data formats for clinical laboratories. Via the ISCA, the team is developing standards for array design, resolution, format, and guidelines for interpretation of benign versus pathogenic CNVs, the abstract states.
A third aim of the project is to develop standardized clinical data. A phenotype workgroup within the effort will be established to develop standard vocabularies and data dictionaries for phenotypic information using current international recommendations, according to the abstract.
The final aim of the project is the creation of a data collection repository, as well as curation and visualization tool development. For this purpose, a database workgroup will oversee development of software bridges and adaptors to automate data de-identification, reformatting and transfer to a central public repository.
Methods for the automated and expert data curation will be developed prior to public release for the research community as well as clinicians, according to the abstract. User-friendly tools for data visualization and analysis will also be developed in partnership with academic groups and commercial vendors, the abstract states.
"This project will support the development of bioinformatics tools for labs, so when they are done with their clinical array data it will be deposited in the database," Martin said. "The creation of annotation tools will help to resolve differences between lab calls so that, in a few years time, people can look up variations and asses it with their patient data," she said.
According to Martin, more than 70 international labs are taking part in the ISCA. If all of those participants begin using the CNV atlas, it could grow by between 50,000 and 100,000 cases per year. "Hopefully the data will not only accelerate quality of clinical testing, but also be used by researchers to look at areas of interest," she noted.