This story originally ran on Aug. 19.
Using a strategy combining four "major" proteomic technologies, Chinese researchers identified 6,788 proteins in the human liver, the largest reported proteome dataset for a human organ to date.
In addition, the researchers reported a human liver transcriptome dataset of 11,205 genes, and showed for the first time a direct association between a proteome and its transcriptome derived from the same sample.
Described in a study published Aug. 5 in the online edition of the Journal of Proteome Research, the work is the product of an initiative begun in 2004 by the Ministry of Science and Technology of China called the Chinese Human Liver Proteome Project, or CNHLPP. The CNHLPP, whose goal is to develop a proteomic atlas of the human liver, is part of a larger initiative called the Human Liver Proteome Project, launched in 2003 by the Human Proteome Organization, and is one of the first large-scale proteomics projects coordinated by the Chinese Human Proteome Organization.
CNHLPP created an expression profiling pilot subproject with three primary goals: to identify and establish comprehensive and complementary methods and technologies with a particular focus on identifying low-abundance proteins; construct the "primary proteome profile" of the adult human liver; and integrate and compare the proteome with its transcriptome and the human plasma proteome.
The results of the subproject are described in the JPR paper. In total, 11 laboratories were involved in the expression profiling subproject.
Because the liver is the second largest human organ after the brain, the proteomics community has been especially interested in mapping out its proteome. The liver serves a vital role in metabolism, is the primary source of plasma proteins, and plays a key role in the pharmacokinetics of drugs by filtering and eliminating drugs in the human body.
But "despite the importance of the liver in health, the variety, number, and abundance of liver proteins have not been extensively characterized," the authors of the JPR study wrote. "Because the liver is such a complex biological system, global analysis at the -omics level is necessary to fully elucidate its functions."
For their work, which is based on 10 samples from volunteers of Chinese Han ethnicity who had been screened to ensure they had no liver disease, the researchers ran the samples on different platforms, duplicated the experiments in different labs, and duplicated runs of the same sample within a single lab in order to generate a variety of replicates.
Four "major" proteomics technologies were used for analysis — 2DLC-ESI, in which strong cation exchange and reverse-phase liquid chromatography was used to separate digested peptides, coupled with electrospray ionization tandem-mass spectrometry; 3DLC-ESI, where proteins were pre-fractionated by LC, and then digested peptides were separated by strong cation exchange and reverse-phase LC combined with ESI-MS/MS; 2DE-MALDI, in which proteins were digested with 2D gel electrophoresis and peptides were analyzed by MALDI; and 1DI-LC-ESI, a workflow calling for protein pre-fractionation by SDS-PAGE, reverse-phase LC for separation of peptides, followed by ESI-MS/MS analysis.
Each approach was implemented at two centers, and results were run in parallel at least three times at each center in order to reduce "the random aspect of peptide capture" and to increase low-abundance peptide capture, the authors wrote.
In addition, they constructed a bioinformatics system to process the large amount of data generated. The same reversed-shift database was used to evaluate all identified peptides and the false-positive rate of peptide identification. Also, key parameters used to assess data quality — such as the least number of amino acids of the MS/MS spectra peptides, and the least number of peptides and the lowest mass error in PMF results — "were determined by statistical analysis of real experimental data rather than by empirical judgments," the researchers said in the study. Only when two peptides were matched to a protein was the protein considered a true identification.
In total, the researchers identified 607,851 mass spectra corresponding to 62,117 peptides containing 45,781 peptides with tandem mass spectra. At a confidence level of 95 percent, 23,345 proteins were identified.
The number was whittled to 12,951 after duplicates were removed. The researchers then eliminated proteins with only one peptide match, bringing the final number of identified proteins for their Human Liver Proteome to 6,788, spanning six orders of magnitude in dynamic range.
[ pagebreak ]
To determine the extent of coverage of the HLP data, the researchers determined the corresponding transcripts in parallel with the same 10 samples. For this, they employed massively parallel signature sequencing, or MPSS, and Affymetrix high-density oligonucleotide arrays. The result was 10,224 and 5,422 genes detected by microarray and MPSS, respectively. Integration of the two datasets resulted in a final dataset of 11,205 genes, which they defined as the Human Liver Transcriptome.
Analysis of the HLT showed that 61.13 percent of the transcript was covered by the proteome database.
Next, the researchers compared their HLP to the Integrated Liver tissue Proteome of mouse and human, consisting of 3,011 proteins; the Human normal Heart Proteome, made up of 619 proteins; the Human Plasma Proteome with 3,885 proteins; and the Liver Disease-related Genes and Proteins database with 228 proteins. Overlap with the ILP was 28.6 percent; with the HPP 23.1 percent; with the HHP 6.3 percent; and with the LDGP 46.1 percent, which suggests that a "significant" number of genes or proteins associated with diseased liver are also found in normal liver, according to the authors.
They said that their dataset also had greater frequency of medium- and low-abundance proteins than other liver proteome datasets such as the ILP, Swiss-Prot, and the Human Protein Reference Database. The HLP contained 3,721 novel proteins, of which 82.5 percent were at relatively low abundance.
Proteins for biological functions in which the liver plays a major role were significantly enriched in liver, the authors reported. For analysis of metabolism proteins, they extracted all 94 human metabolic pathways from the Kyoto Encyclopedia of Genes and Genomes and applied it to the HLP and HLT datasets. Within the metabolic pathways, 1,040 proteins were detected through HLP determinations, including 24 pathways that were completely covered by the HLP dataset. Another 24 pathways had coverage of 80 percent with the HLP dataset, the researchers reported.
The HLP dataset also contained 938 transporters and associated proteins, including three ion channels that are reported in human liver for the first time — sodium channel type V alpha subunit, alpha 1A-voltage-dependent calcium channel, and voltage-gated potassium channel beta-3 subunit.
They additionally reported 800 proteins related to signal transduction, primarily involved with cellular recognition, localization, communication, and inflammation.
Finally, they compared their dataset with the HPP dataset, based on three pairwise datasets with three different levels of confidence of protein identification. The number of overlapping proteins varied from 4,241 to 1,214, to 216. The number of proteins that overlapped between HLP and HPP at all three confidence levels was 184. Gene ontology analysis indicated the 184 proteins are associated with immune response, transport, metabolism, cytosis, signal transduction, cell adhesion, and coagulation, the scientists said.
"Quantitative comparison of HLP and HPP exhibited a close correlation … [and] taken together, the quantitative comparison demonstrated that the expression levels of secreted proteins in liver were in agreement with the concentrations of corresponding plasma proteins, particularly for coagulation."
The authors did not respond to questions e-mailed to them, but in their study they said that their work does not represent a comprehensive analysis of their "vast" proteome dataset. Rather, their analysis represents some "enticing highlights … that we think will encourage future in-depth analysis."
In addition, the proteome described should be viewed as a draft of the human liver proteome, "which may be considered a first step in the construction of a highly accurate and comprehensive human liver proteome," they said. Advances in technology and methods for protein separation and identification and improvements in data mining will be needed to build out the proteome. "Ultimately, the proteomes of the individual cell types that make up the liver, as well as their organelles, will be profiled," they added.
Their results are contained in two databases they created, the PROTEOMESKY 1.0 Human Liver Expression Profile, and Liverbase, which includes information about the human liver proteome, including the function, abundance, and subcellular localization of proteins, and associated disease information.