NEW YORK, April 20 (GenomeWeb News) - An international consortium of 152 researchers has wrapped up a two-year "annotation marathon" in which it validated 21,037 human genes in public databases using cDNA clones.
The consortium, called H-Invitational (short for Human Full-length cDNA Annotation Invitational) and led by Takashi Gojobori of the Japan Biological Information Research Center in Tokyo, last week released the fruits of its efforts via a public database called H-Inv DB.
A paper describing the project was published today in PLoS Biology.
The consortium used human full-length cDNAs produced by six high-throughput cDNA sequencing projects to annotate the genome. This dataset included 41,118 cDNAs derived from 184 cell types and tissues, which were aligned and clustered to 20,190 loci on the genome. The remaining cDNAs yielded an additional 847 unmapped clusters.
In their analysis, the H-Invitational researchers found that up to 4 percent of the most recent assembly of the human genome (NCBI build 34) "may contain misassembled or missing regions," and that 6.5 percent of predicted human genes did not have a good protein-coding open reading frame.
In addition, the project identified and mapped 72,027 SNPs and indels to unique positions on 16,861 loci. Of these, 13,215 were nonsynonymous SNPs, 358 were nonsense SNPs, and 452 indels were found in coding regions and "may alter protein sequences, cause phenotypic effects, or be associated with disease," the authors wrote.