The emerging field of meta-genomics got a boost from the Gordon and Betty Moore Foundation in January when it awarded $24.5 million to the University of California, San Diego, and the J. Craig Venter Institute to build a publicly available informatics infrastructure to help store, analyze, visualize, and disseminate the massive amounts of data gleaned from environmental sequencing.
The seven-year grant will support a project called the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis — known as CAMERA — which will include hardware, software, and data resources related to marine metagenomics.
The UCSD division of the California Institute for Telecommunications and Information Technology will lead the project, along with the Venter Institute and UCSD’s Center for Earth Observations and Applications at the Scripps Institution of Oceanography. Other partners include UCSD’s San Diego Supercomputer Center, the Scripps Genome Center, and the National Biomedical Computation Resource at UCSD.
The CAMERA project will present challenges in data integration, visualization, and communications infrastructure, says Peter Arzberger, director of the NBCR. Arzberger says that in addition to advancing the understanding of marine ecosystems — the primary goal of the effort — the collaborative aspects of the underlying IT infrastructure will “revolutionize how we work with data.”
The backbone of the system will be the so-called OptIPuter optical network, a project funded by the National Science Foundation that will eventually enable other scientists to plug their compute clusters into the CAMERA infrastructure. NSF kicked off the OptIPuter project in 2002 with a five-year, $13.5 million grant.
OptIPuter is expected to offer a hundred-fold increase over current connectivity standards, meaning that “distance is no longer a bottleneck” for collaborative projects involving large amounts of data, Arzberger says.
On the hardware side, CAMERA will have a dedicated cluster of approximately 1,000 processors and several hundred terabytes of storage, and will also be plugged into the NSF’s TeraGrid distributed computing infrastructure.
This patent covers a “segmentation method of a frame of image information including a plurality of spaced DNA spot images corresponding to a plurality of DNA spots,” according to the abstract. The technique utilizes image intensity level information, grid point, and other data to provide analysis.
This invention “relates to methods and systems for quickly determining the statistical significance of a raw alignment score produced by aligning a first sequence to a second sequence,” the abstract says. “The claimed methods and systems determine multiple estimates of the p-value of an alignment score. Each p-value estimate is then compared to a pre-defined threshold p-value” to determine significance.