Though he’s only 35 years old, Akhilesh Pandey says he has long been plagued by the question, “Just how many human proteins are there anyway?” Now the India-born biochemist, who studied under Harvey Lodish at MIT and Matthias Mann at the University of Southern Denmark, says if anyone is going to settle the protein-count question once and for all, it will be him and his team. They’re creating what they hope will be the definitive proteomics data resource, the Human Protein Reference Database — “a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks, and disease association for each protein in the human proteome.”
Pandey, an assistant professor at the Johns Hopkins University Institute for Genetic Medicine, exhausted his life savings to establish the nonprofit Institute of Bioinfor-matics in May 2002. In a Bangalore, India, office park, 35 biologists and software engineers work in tandem with Pandey’s 10-person lab at Hopkins to manually curate the Human Protein Reference Database. In a little over a year they’ve extracted and classified protein references from more than 300,000 scientific articles. Current HPRD stats, according to the website: 2,750 proteins; 10,534 protein-protein interactions; 417 domains; 2,000 post-translational modifications; and 25,050 PubMed links.
Since Pandey quietly unveiled the database in March, he’s had 1.5 million hits through word-of-mouth advertising, and he’s been invited by Protein Standards Initiative coordinator Henning Hermjakob to join the project.
Hermjakob says that while his review of HPRD reveals that most of the protein annotation data it contains are “more or less redundant with Swiss-Prot or InterProt, the interaction data is valuable … and there is a good, sizeable quantity of human interaction data.”
Hermjakob adds, “I don’t want to go into whether this money would be better spent on one of the existing projects.” Indeed, HPRD appears to be competing with established efforts, such as BIND, UniProt, and the BioKnowledge Library available from Incyte’s Proteome division.
Why create yet another protein database, anyway? Pandey explains on his website: “We believe that biological databases are still in their early stages and no protein database can be considered as an established standard… We want to offer biologists the possibility of choosing instead of imposing one database by default.” Plus, Pandey contends, HPRD is set apart by manual curation — the only reliable way to control quality, he says.
HPRD content will be freely available to academic researchers upon publication of a paper in Genome Research this month. Commercial entities will have to pay a fee for use of the data, under a licensing agreement similar to that of Swiss-Prot. The underlying software used to create the resource will also be freely available.