Skip to main content
Premium Trial:

Request an Annual Quote

Multiple Techniques for Data Analysis Needed, Siena Presentations Assert

Premium

This week’s sixth international proteomics meeting in Siena, Italy, began with this message: Researchers can benefit from using two or more different techniques to analyze the same samples.

The message came from Marc Wilkins, credited with coining the term “proteomics” 10 years ago, who cautioned that some techniques may generate data that are of little use.

“There’s a pressing need to look deeply at the issues of data analysis so we better understand the quality of data being produced and ensure that it is of the highest possible biological value,” said Wilkins, executive vice president of discovery and bioinformatics of Sydney, Australia-based Proteome Systems.

Wilkins noted that in comparing results of researchers working on a certain disease or problem using different techniques, the findings rarely overlap. For example, a paper published this year by George Miklos, vice president and chief scientific officer of Human Genetic Signatures, and Ryszard Maleszka, of the Australian National University, found that various researchers using Affymetrix microarray techniques to study schizophrenia identified 89 genes associated with the disease, while researchers using spotted arrays identified 49 genes. Of those results, only one gene overlapped in common between the two techniques.

Along the same vein, Wilkins noted that of the 12,524 interactions described in the BIND database, only 198 of them are common to two techniques, and in a comparison of 2D gels versus the amino acid coded tag technique, only four proteins were overlapping in the results.

To be sure, the meaning of these differential results is not clear, Wilkins said. It may be that researchers studying schizophrenia are looking at different types of disease or different tissues in different parts of the brain, but the striking results raise a cautionary question: How well is our technology serving us?

How to treat increasingly massive amounts of data was another issue addressed by numerous proteomics researchers who gave talks throughout each day of the four-day-long conference.

Peter James of Lund University in Sweden noted that though the work is tedious, it is necessary for his group to run 2D gels from large numbers of patient samples because more patients are needed to compensate for the biological variation among patients.

“We need a dishwasher to deal with the 2D gel plates,” said James. “When you’re analyzing 240 patient samples, automation is key.”

James’ group has used robots and gel-cutting machines, but they still don’t have a robot to change gels after spending three hours scanning each gel. Once gels have been scanned, researchers in James’ group use cluster trees to analyze fluorescent spots.

Amos Bairoch of the University of Geneva, on the other hand, described the UniProt/Swis-Prot database as a way of dealing with the volume of protein data being generated. Each UniProt entry contains the name of the protein; a selection of references; a description of what is known about the protein; what factors it reacts with; what products it is associated with; a description of important sequence features; cross-references, and a selection of keywords.

“The whole proteome can be annotated,” said Bairoch, after estimating that the human genome contains 25,000 protein-encoding genes. “So far, Swiss-Prot includes 11,400 human genes and there’s a backlog of about 4,000 genes whose protein can be annotated.”

Most researchers agreed with Wilkins that it is valuable to validate data by using different techniques to analyze the same samples, even though it means spending more time and resources.

“It’s OK because you are never sure when you start which technique will give you the best result,” said Raili Seppala-Lehlo of the National Public Health Institute in Helsinki, Finland. “I have the crystallography of a protein and I’m going to resolve it using NMR because I want to get a more detailed answer.”

Wilkins noted that some techniques may be perfect for a certain question, but he stressed that one technique won’t do everything. Different techniques should be used and data from the results should be looked at critically, he said.

“I’m hoping to see more work where different techniques are used for the same problem,” said Wilkins. “There’s an enormous value to doing that kind of differential display.”

Wilkins concluded his opening talk by suggesting that once data from different techniques are obtained, statistical techniques should be used to analyze how much of the variation is analytical, and how much is biological.

— TSL

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.