By Monica Heger
Microarray technology and next-generation sequencing, when used in conjunction, could improve on using either method alone to evaluate gene expression, according to researchers from the University of Barcelona.
The team combined three different microarray platforms — from Agilent, Operon, and Illumina — with digital gene-expression profiling on the Illumina Genome Analyzer to study genes related to the epidermal growth factor, a regulatory growth factor related to cell proliferation and survival. They found that the combination of the techniques was able to establish a validated gene set — including novel genes previously unrelated to EGF — as well as help piece together the network of how those genes interact.
The study, published in BMC Genomics this month, suggests that using the two different technologies could help generate more reliable data sets than when either method is used alone, and should help improve the confidence of functional analysis.
"Most studies are based on a single platform, maybe with replicates, but usually not validated on a global scale," said Lauro Sumoy, senior author of the study from the Institute of Predictive and Personalized Medicine of Cancer in Barcelona. "There will be a few PCR validations, but no one has looked extensively at thousands of genes through another technology."
The researchers wanted to look at the EGF-dependent transcriptome because of its importance in regulating cell survival and because studies that have tried to elucidate the set of genes regulated by EGF have been performed on different cell lines under different conditions and have not been validated on a broad scale.
The team used microarrays and a variation of RNA-seq known as digital gene expression, which counts tag sequences at restriction enzyme cut sites. The method is similar to microarrays in that both "take short nucleic acid target sequences to sample expression of longer RNA molecules containing them, and both are 3' biased because they rely on extension of cDNAs from the poly-A tail with an oligo-dT primer," according to the authors.
However, while microarrays are "closed," only showing "what you have on the array," said Sumoy, the sequencing method is an open analysis, enabling the discovery of new sequences.
The team analyzed the EGF-dependent transcriptome of HeLa cells using both approaches.
"The arrays are a lot more reproducible and less noisy," said Sumoy. However, "for low expressed transcripts, they are not as sensitive [as digital gene expression]. The other main advantage is the ease of analysis. Arrays are simple to analyze, compared to sequencing data," he said.
The sequencing technique yielded around 4.9 million high-quality tags, corresponding to 16,350 genes. The three microarray techniques represented 17,070 genes, of which 16,220 were detected in the digital gene-expression technique. There were 33,972 genes that were detected in either of the three microarray platforms, but had no detectable measure by sequencing, and 130 tag sequencing targets that were not detected by any of the microarray platforms.
"The overlap between [the technologies] was quite high, so we were able to validate over 12,000 transcripts," Sumoy said. Additionally, the transcripts that were unique to each platform "gave us a bigger picture of what's going on."
Comparing the data, the researchers found the concordance between the microarray and sequencing data was highest among the top 100 genes. When they increased the size of the gene list, they found that the proportion that is shared stabilized at around 30 percent.
Next, the team used an algorithm to integrate the microarray and sequencing data, which allowed them to define a list of 638 up-regulated genes and 526 down-regulated genes in response to EGF. While including the sequencing data into the analysis lowered the total number of genes found to be significant compared to just the microarray data, it also added 28 new genes that were not detected by microarrays to the significantly regulated gene list.
For 28 of the genes, the researchers used RT-qPCR to validate the microarray and sequencing results. The majority of the genes analyzed showed concordant results with both technologies, but sequencing "approximated best the fold change detected by RT-PCR," according to the authors. However, sequencing had more false positives than the microarrays, particularly among genes with a low number of sequence tags.
Sumoy said that current sequencing technology would enable more coverage than what the team was able obtain several years ago, when the study was conducted, so the technology would likely be able to detect more of the rarer transcripts with fewer false positives.
A functional analysis of all the data illustrated how combining data from platforms could yield better results, as it enabled the researchers to identify novel genes in the EGF pathway. Metallothionein genes were found to be significant upon validation by RT-PCR, even though neither of the individual platforms identified all of them on their own.
"No one [platform] has the full repertoire of that gene [family] represented," said Sumoy.
The finding could be an indication of a novel function of EGF to activate oxidative stress protection and metal ion homeostasis through up-regulation of metallothionein genes. The finding "highlights the risk of picking up results that are platform biased when relying on just a single platform," the authors wrote.
In addition, the researchers were able to identify pathways regulated by EGF. Most were related to cell growth and proliferation, cell death, and cell cycle, as expected. The top regulated disease-related pathways were mostly related to cancer.
Sumoy said in the future, he plans to use the combined approach to study small RNAs.
Merging the two techniques allowed for the validation of data and highlighted the strengths and weaknesses of each technology. Using both methods "may survey the transcriptome in a better way than each on its own, and therefore generate more reliable datasets and [uncover] addition new functions," the authors concluded.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.