NCBI to Remove 350-kb GenBank Sequence Length Limit
NCBI said last week that it would lift the current 350-kilobase limit on the sequence length of GenBank records as of June 2004.
The limit, which was originally put in place “as an aid to users of sequence analysis software, some of which might not be capable of processing megabase-scale sequences,” was deemed unnecessary at the May 2003 collaborative meeting among representatives of GenBank, EMBL, and DDBJ.
According to NCBI, significant exceptions to the 350-kb limit have existed for several years, including high-throughput genomic sequences generated by the Human Genome Project and assemblies of whole-genome shotgun data. “Given these exceptions, and the technological advances which have made large-scale sequencing practical for an increasing number of researchers, the collaboration has decided that the 350 kbp limit must be removed,” the NCBI said.
As of June 2004, the length of database sequences will be limited “only by the natural structures of an organism’s genome.” As an example, the NCBI noted that a single record might be used to represent all
of human chromosome 1, which is around 245 Mb in length.
According to the NCBI, software developers for “some of the larger commercial sequence analysis packages” were asked what timeframe would be appropriate for this change, with answers ranging from “immediately” to “one year,” so the one-year timeframe was selected to provide enough time for developers to upgrade their software to megabase scale.
NCBI has made sample records with very large sequences available at ftp.ncbi.nih.giv/genbank/LargeSeqs so that developers can begin to test their software modifications.
Computational Biology to Play Major Role in New Broad Institute
Computational biology is destined to play a central role in research conducted at the Broad Institute, the biomedical research powerhouse that was announced last week as a collaboration between MIT, Harvard, and the Whitehead Institute.
The institute, which will focus on the development and application of genomics-based tools and technologies for the advancement of biomedical research, has identified computational biology as “increasingly central in converting the explosion in biological information into useful biomedical knowledge,” according to a statement released by the partners.
The Broad (pronounced “code”) Institute received a founding gift of $100 million over 10 years from Los Angeles philanthropists Eli and Edythe Broad, and plans to raise an additional $200 million in private support, along with federal research grants, to support its work over the next decade.
The institute will begin operation in a new facility in the Kendall Square area of Cambridge later this year, but has not identified a site yet. Eric Lander, the director of the Whitehead Institute/MIT Center for Genome Research, will be the director of the Broad Institute.
The institute expects to employ 12 core faculty members and around 30 associated faculty members from MIT, Harvard, and the Whitehead. The initial core faculty will include Lander; Stuart Schreiber of Harvard University; David Altshuler of the Harvard Medical School and the Whitehead; and Todd Golub of the Dana-Farber Cancer Institute and the Whitehead.
It was not immediately clear who would head the institute’s computational biology activities.
The Broad Institute said it expects at least 15 associated faculty members to be appointed before it is launched later this year.