Sometimes a research project can be so large in scope that other initiatives must be set up just to make sure the goals of the first one are achieved. Such is the case with the Human Microbiome Project, an international effort launched by the National Institutes of Health Roadmap for Medical Research that aims to provide a comprehensive resource for the characterization of the human microbiota and its role in human health and disease. The folks behind the HMP knew that in order to manage all the data coming from sites spread across the globe, a centralized data repository would be needed. To this end, the HMP Data Analysis and Coordination Center was established for the purpose of managing and analyzing the flood of data being generated by the myriad sites participating in the HMP. The DACC, hosted by the Institute for Genome Sciences at the University of Maryland School of Medicine, kicked off in October of 2008, roughly a year after the HMP got its start, and is in essence a collaborative project within the context of a larger collaborative project.
The center's Web portal aims to provide researchers with web-based visualization and query applications and software, as well as a robust computational analysis of HMP data. The Website also hosts information on standing operating procedures, a resource for community involvement in reference strain selection, and quality control measures. In order to meet all of the resource requirements of the DACC, the IGS is collaborating with the Joint Genome Institute, the Lawrence Berkeley National Laboratory, and the University of Colorado at Boulder. These sites work together to support the DACC by providing resources for several focus areas for the HMP, including a collection of reference sequences expected to eventually reach up to 1,000 genomes, the collection and analysis of 16S ribosomal RNA sequences used for characterizing microbial communities at individual body sites, and analysis of metagenomic whole genome shotgun sequencing data. "The DACC is functioning to help out with the analysis and other various aspects of what's going on in the sequencing centers where they're doing the sequencing and analysis of the human samples from the normal volunteers," says Susan Garges, program director for the Human Microbiome Project at the National Human Genome Research Institute. "DACC is also heavily involved in helping [HMP] demonstration projects — those are medical sequencing projects — and many of the people involved in that have not been in involved in large-scale sequencing effort before, so it is helping them get their data into the public databases."
The DACC is headed up by Owen White, director of bioinformatics at the University of Maryland School of Medicine, and Jennifer Wortman, the associate director of bioinformatics at the Institute for Genome Sciences. Wortman says that it was clear from the get-go that an informatics resource capable of supporting a project on the scale of the HMP would require more resources than IGS had at its immediate disposal. In order to get things rolling rapidly, they initially had sequencing centers funded by NHGRI and the National Institute of Allergy and Infectious Diseases work by leveraging existing contracts. But for the DACC, RFAs had to be written and submitted through the peer review process, so it took a bit longer to get up and running.
"When we were first conceiving of even writing a grant to fulfill this Data Analysis Coordination Center for the microbiome project, we realized that the best way to get these pieces together ... was to pull in people with the expertise in different areas that we need and people who have tool sets that we could make use of immediately rather than having to get off the ground by developing things ourselves that we didn't have in house," Wortman says.
To do this, they first had to identify who had the best Web portal for microbial gene annotation and who had the best system for tracking metadata for genome projects. "From the beginning, it was just defining the core needs of this ambitious project and being aware enough to realize that the most effective and efficient and rapid way to get something off the ground was going to be to reach out to folks at different institutes," she says.
But Wortman, who is also working in collaboration with Stanford University on the Aspergillus genome database project as well as various other collaborative genome projects, knows that that's only half the battle — the tough part is getting all of these sites on the same page. "They're at different time zones and it's a challenge to have that start-up communication," she says. "But it's important to have a number of conference calls and a face-to-face meeting as soon as possible at the beginning of the project, and really a collaborative effort to come up with an overall project plan, in writing, that reflected what we wanted to accomplish in our first year."
The question of whether or not a research project will morph into a large collaboration is usually -quickly determined by an evaluation of local resources and skill sets. In the case of the HMP, all involved were aware that there was going to be a flood of data coming in very quickly. They knew they had to reach out to other institutes for informatics help or pare down the resources that the DACC needed for the HMP to really get a bang out of its buck. "The answer to that was we need to do leverage existing systems, identify the best existing systems to fulfill these needs, and try to leverage them on startup. … They can evolve during the course of the project to meet project-specific needs, but the idea is for the start-up to be as rapid as possible and to have a network of collaborators that have complimentary strengths and resources that can really be brought to bear if that happens," Wortman says.
It may sound like a no-brainer, but when it comes to successfully managing any collaboration, communication is key and, if at all possible, there should be at least one individual involved in the project whose sole responsibility is making sure that wires do not get crossed. "You just can't underestimate the importance of constant [one-on-one] communication and communication using different methods," Wortman says. "Some people do better on the phone, some people with instant messenger, some do better with e-mail, so [it's important to combine]those communication methods and having Web conferences followed up by written minutes followed up by written project plans. You can't communicate too much."
NHGRI's Garges agrees that keeping an open line of communication going at all times is essential for a collaboration of this size. She is also in the position of being one of the only people associated with the HMP whose sole job is to help oversee the project. "There's a good communication system already going and the major way information is conveyed is through teleconferences that the individual working groups within the HMP have usually on a weekly basis," she says. "I get to hear the weekly update on what is happening, so communication is key to bringing the group together ... and the DACC really serves as a centering group within the HMP."