The Swiss Institute of Bioinformatics and bioinformatics firm GeneBio this week launched NextProt, an online platform for the study of human proteins.
The resource, which builds upon the contents of the SIB-developed UniProtKB/Swiss-Prot database, aims to be a comprehensive source for human protein data, integrating a number of proteomics and protein research repositories.
"We're only at the beginning, and there's a huge amount still to be done, but basically it's the first generation of a system that will be a one-stop shop for human protein [research]," Amos Bairoch, professor of bioinformatics at the University of Geneva and leader of the project, told ProteoMonitor.
Bairoch is also the leader behind the UniProtKB/Swiss-Prot database, which currently contains more than 530,000 protein sequences, including more than 20,000 human proteins. But while that resource aims for breadth – offering sequence and functional information on proteins from a number of different species – NextProt will aim for depth, he said, containing information on human proteins only, but a considerably larger amount of such information.
"It's an extension of UniProt in the sense that we're going deeper into human data, but it's not an extension in the fact that we're going narrower – we're not universal in coverage," Bairoch noted. In addition to the annotated human protein data available in UniProt, NextProt will contain "information on identified peptides, antibodies, on expression, much more extensive information on protein-protein interaction," he said
Additionally, the database will feature a number of tools aimed at the pharmaceutical industry that drug companies will be able to license through GeneBio, with revenues then going to support the development and upkeep of the NextProt database. According to Bairoch, the first of these tools will likely be ready by the end of 2012.
"We want to build software tools on top of the platform to go deeper into mining some of the information that we have," he said. "And some of these tools will be much more interesting to pharmaceutical companies than to academic users … tools that go more into toxicology or medical information or so on."
For example, he said, "if we were to put in data on protein-small molecule interactions, or protein-drug interactions, those are things that many academics wouldn't be interested in," he said. "The biggest target for that type of information is the pharma industry."
Bairoch maintained that it is too early to say whether interested academics would be able to access these fee-based tools for free or whether they would also have to pay a license fee. He said also that SIB hoped to collaborate with outside informatics groups and companies on tools for the database, noting that in these situations the organization might not have control over whether or not license fees are charged.
"If we collaborate with a group that is building a tool that is itself proprietary and sold to industry, if we integrate that tool we won't have any choice but to basically do whatever that company wants to do" regarding licensing, he said.
With its focus on human proteins and its plans to offer tools catering to pharma users, NextProt would appear more clinically focused than UniProt, an impression that Bairoch allowed has some truth to it.
"It's more applied that UniProt in that it's human proteins, and human proteins are the main targets for people working on clinical research," he said. "So, of course people who are interested in clinical research will like to go there."
However, he noted, "it's not only [for] clinical researchers. It's for anyone working on human proteins." Efforts to move into the clinic aside, "discovery proteomics is still only at the beginning," Barioch said.
"There are thoughts of using proteomics to discover biomarkers for clinical research and so on, but [for that researchers] need to have a stable set of data," he noted. "If you want to look at biomarkers it's useful, for instance, to know what is the quantity of different proteins in different tissues and organs and body fluids and so forth."
That said, NextProt "certainly helps in moving proteomics beyond the descriptive discovery phase to a more quantitative clinically focused system approach to proteomics," Institute for Systems Biology researcher Robert Moritz told ProteoMonitor in an email. "The major focus is to provide a comprehensive portal for understanding human proteins utilizing data provided by more high-throughput approaches, and [it] certainly focuses the effort towards the clinical understanding of protein behavior in systems analysis."
SIB is building the database in collaboration with a number of other groups that have established protein data repositories, including the ISB. Recently it added information on organ-specific and tissue-specific protein expression obtained from the Human Protein Atlas Project led by Royal Institute of Technology, Stockholm researcher Mathias Uhlen. The group is now working to add information on proteotypic peptides for single-reaction monitoring mass spec assays from the SRMAtlas developed by the ISB and the Swiss Federal Institute of Technology Zurich.
The aim, Bairoch said, is not for NextProt to store these groups' data, but rather to provide a portal that allows researchers to better integrate it.
"We're not a proteomic repository," he said. "If someone, for example, does a proteomic survey of a tissue, they should submit that data to [the PRIDE Proteomics Identifications Database] or PeptideAtlas. But then we can link to this data or use the information they provide to show what peptide has been identified."
"It's a portal, but with enough data so that people can query," he said. "If you want to, say, identify proteins that are mitochondrial and expressed in liver, [NextProt] will take information on subcellular location and expression looking at mRNA and antibody [data] and send you any proteins that have an annotation that says they are localized in the mitochondria and expressed in the liver."
In practical terms, this mean a lot less compiling of resources on the part of individual researchers, Moritz noted.
"Nothing is more frustrating than having to string together manually information from multiple data resources, and hopefully NextProt can provide this service to the community," he said. "Clearly the ability to interrogate multiple databases such as the Peptide- and SRM-Atlas databases alongside other databases such as the Pathway Interaction Database and other omics data is a major application."
All data in NextProt will be classified as either silver or gold, Bairoch noted, with gold being the highest confidence data and silver being of still-high, but lesser, confidence. Users can conduct their queries looking only at the gold data, or they can include silver data as well, which will return a larger but less reliable set of results.
SIB also hopes to obtain data from pharmaceutical firms for the platform and is currently "in discussion with a number of companies," Bairoch said. Big pharma, he noted, has made noises in recent years about being more open to data sharing; however, he said, thus far SIB has yet to receive any large datasets from the industry.
"They are more and more open to saying that they are going to provide data, but there's still a big lag time in terms of their higher-level researchers saying they want to do this and their legal people releasing the data," he said.
Have topics you'd like to see covered in ProteoMonitor? Contact the editor at abonislawski [at] genomeweb [.] com.