Name: Andrew Cossins
Title: Director, Institute of Integrative Biology, University of Liverpool, UK; Co-director, Centre for Genome Research, University of Liverpool, UK
Professional background: 2010-present, director, Institute of Integrative Biology, University of Liverpool, UK; 2008-present, co-director, Centre for Genome Research, University of Liverpool; 1977-present, lecturer, University of Liverpool
Education: 1974 — PhD, physiology, University of Durham, UK, 1971 — BSc, zoology, University of Durham
A zoologist by training, Andrew Cossins adopted microarray technology early to study environmental stress in non-model organisms. For years he has run the University of Liverpool's Institute of Integrative Biology and Centre for Genome Research, where efforts have grown to encompass next-generation DNA sequencing.
Projects overseen by Cossins include expression profiling in so-called exotic or non-model organisms with the intent to better understand their responses to cold, hypoxia, hydrostatic pressure, and ecotoxicological exposure.
While his early microarray efforts involved overcoming the problems of working with these organisms, including carp and a species of sea snail called dogwhelk, more recently he has adopted sequencing to speed up progress in working with them.
Cossins spoke about studying exotics using arrays and sequencing at Select Biosciences' European Lab Automation conference, held in Hamburg this month.
Afterwards, he spoke with BioArray News. Below is an edited transcript of that interview.
How did you get involved in using microarray technology?
I work on stress and coping with stress, mainly with respect to animals living in challenging climatic or environmental circumstances, such as seasonal cold, or daily hypoxia, or toxicological challenge in polluted waterways, anywhere that the circumstances change. So I am interested in how animals cope via altered metabolism or altered membranes or cellular response systems, and these responses are regulated. We got involved in microarrays back in 2000, when they were first deployed internationally, and with Andy Gracey [now at the University of Southern California] set up a lab to fabricate microarrays from a library of cDNA amplicon probes.
Some researchers are still spotting custom arrays. Others have moved to having them manufactured by Agilent and others.
Exactly, and, to be honest, it's a no brainer in terms of cost, and in terms of quality. The transition was made possible not only by Agilent providing a very flexible and high-quality array fabrication platform, but also the ability now to sequence very inexpensively a transcriptome and to define oligoprobes. If you go back to when the first microarray papers were published in 1998, those scientists made their own robots and printed amplicons of thousands of cloned DNA onto glass slides. DNA fragments from the target species were inserted into a suitable vector and plated on agar to grow into individual colonies in a collection of 96 or 384-well plates. Each well contained a different bacterial clone which on PCR amplification provided a gene probe that you could print onto a separate array. The excitement was that one could explore changes in expression of 10,000 or 20,000 transcripts simultaneously across a number of microarrays. The technical issues of fabrication were solved early, but there were a lot of data processing and statistical challenges that took several years to satisfactorily resolve.
I think people have forgotten the technical challenges in these early studies. For example, most people didn't have the experience of robotic printing or laboriously making a collection of cDNA clones — that just took forever. Our first project was a £1 million ($1.6 million) project that yielded an array of 30,000 gene probes that was used in dozens of experiments over the next five years. It wasn't until 2005 that we started to look at the Agilent platform as the way to move.
So we moved onto the Agilent platform and simultaneously started to get much more sequence data back. Once we got a compendium of sequence data, we just put our sequence into the eArray platform and it predicts probes that can be synthesized on the chip. It made easy what was a tedious, three-year project.
[ pagebreak ]
The organism you originally worked with was the common carp, correct?
It was originally carp, but we have done it for a whole range of non-model species, such as ground squirrel, roach [a freshwater fish], and the intertidal dogwhelk. We did the dogwhelk project last year for colleagues but this project only cost about $35,000. We generated a transcriptome by sequencing cDNA on a Roche 454 sequencer and then we predicted oligoprobes for assembled transcripts within the Agilent eArray platform. Agilent then provided a set of arrays, each containing 180,000 gene probes. We ran an experiment in which animals were exposed to an environmental toxicant and the outcome was a big success. This project was done within three months for just $35,000, which was about one-thirtieth of the cost and the time of what we did 10 years previously. So as you can see, the consumable cost is now comparatively low.
And this was all accomplished in the Centre for Genome Research at Liverpool.
Yes. We are a medium-scale service provider that develops very close relationships with collaborators and clients in environmental and evolutionary biology as well as in the biomedical and clinical fields. Originally we were known as the Liverpool Microarray Facility until 2008. Then when Neil Hall joined us from the Institute for Genomic Research in the US we broadened out into next-gen DNA sequencing and were able to hire a large number of technical specialists in the wet lab and for informatics. So now we support a whole range of genomic applications from de novo sequencing to resequencing to expression profiling to metagenomics, as well as array-based activities for expression profiling and CGH. We interact with dozens of academic labs in the UK and elsewhere, and we have become the focus of a lot of activity on this campus.
What are you doing at the moment with regards to your main area of interest?
One project now in its final year is the development of a microarray-based approach to the regulatory ecotoxicological assessment of materials produced in industrial quantities. The new European-wide REACH [Registration, Evaluation, Authorization and Restriction of Chemical substances] directive mandates a whole series of tests to quantify the toxicological properties of any material produced in industrial quantities, meaning one tonne [around 2,200 pounds] upwards. Current procedures are rudimentary, based on whether the animal is dead or not, and there is a strong ethical demand for alternative methods that avoid pain and suffering. Based on about 40,000 gene-specific probes, our array-based approach can provide a huge amount of highly specific information on the cellular and molecular responses of our test system, zebrafish embryos. A database of responses to known toxicants can be used to predict the potential toxicity of an unknown compound or formulation, and suggest its potential mode of action.
This project was funded by the National Centre for the Replacement, Refinement and Reduction of Animals in Research. This UK funding agency seeks to reduce the pain and suffering in animals used in experimental and regulatory protocols. We estimate that transferring tests [that are currently run] on adult fish onto the embryo platform will save the lives of many hundreds of thousands of fish every year, while giving much more definitive information on responses and hazards.
Coming back to newer technologies; there was some discussion in your talk about next-generation sequencing. RNA-seq is often being compared with array-based expression profiling. What do you think about this issue?
After about 10 or 11 years of intense development, and having gone through the transformation onto Agilent and Affymetrix, arrays have a very mature, well established capability. The machines and protocols work well and the statistical properties and methodologies are worked out, so you can get a very good result using microarrays. Prices have also decreased progressively and therefore people aren't as frightened by large-scale, high-content assay methods as they were in the early days. One of our larger experiments involved almost 1,000 arrays.
But next-generation DNA sequencing has now appeared on the scene. It's been four years since we got our first next-gen machine, but it is now becoming very widely deployed throughout academia and industry. People have expectations of using ‘digital’ approaches to expression profiling by random sampling of transcript fragments using the new generation of sequencers. This is based on determining the abundance distribution of transcripts from random DNA fragments, in much the same way that the fluorescence on an analog spot would give you a measure of abundance.
RNA-seq is widely believed to have technical advantages over arrays. You can discover gene models on the fly by sequencing rather than having to rely on a predetermined gene model [that] is represented on the array. Second, you get a ‘digital’ representation of abundance, which is seen to be less noisy than an analog measure of fluorescence that is intensity based. Third, you can be much more absolute about the identity of your digital entity rather than worrying about cross-hybridization on the microarray between things that are dissimilar. The final advantage is the ability to determine splice forms of transcripts, which you can't really do by microarray. If you have a sequence read that crosses a transcript boundary, you can actually determine what exon is linked to what exon. If you have sufficient data for a given gene you can begin to make assertions about what the profile of all the variant forms is on the transcript that is expressed in the system. You can't do any of that with a microarray.
[ pagebreak ]
So I think a lot of people are saying, 'Oh, let's adopt the digital platform' because it is sexy, whereas microarray is passé. But there are some serious statistical issues concerning random sampling of a highly skewed transcriptome population. [A recent] paper [published in Bioinformatics] by David Kreil [chair of bioinformatics at the University of Natural Resources and Life Sciences in Vienna] is very important in identifying the problem, where 7 percent of the most abundant transcripts account for 75 percent of all sequence reads. That means that a lot of your analytical power is deployed in the assessment of relatively few transcripts, which often are not very interesting ones.
The ones you are most interested in have low abundance and they are very rarely represented in the sequence reads. What Kreil et al. show is that the precision of measurements for the majority of transcripts is poor, especially at the level of sequence analysis that is generally employed, which is 40 million reads per sample. David Kreil used 330 million reads per sample and he still struggled to get a comprehensive view of the transcriptome. So I think there are some really big issues that need addressing, and I think, having identified the issues, we can start to think about ways around them. And one of the ways is to analytically remove the most abundant transcripts, thereby releasing that analytical power onto the less abundant transcripts.
So are you advising scientists against using that approach because of these issues?
No. It's rather like a series of technological tsunami waves arriving in the lab. Each wave is a new technology. When cDNA arrays came in as a first wave, and then the Agilent, and now next-gen, it takes us several years to accommodate the core issues surrounding each of the different platforms.
I am confident that we will get around these problems and use the full power of RNA-seq. I am not saying, 'Don't use RNA-seq, I am saying that it comes with some health warnings. It might take a year or two to figure out how to get around these, and to validate those new methods.
But there's a cost issue. If you went out and did a 40 million read experiment, you'd have a cost that's probably not dissimilar to running a microarray experiment. But the level of precision is poorer compared to what microarrays can provide. And if you are to do what David Kriel did, and have 10 times as many reads, then it is 10 times more costly. Give it a couple years, I think sequencing will have moved on, will be much cheaper, and will be more precise and cost effective.
I think people now are coming to us saying that they want to do next-gen because of its reputed advantages. And we have projects going on now in Drosophila and zebrafish and other organisms. But we've effectively stopped in-house fabrication of arrays. Everything we do now is through Agilent and Roche NimbleGen. Microarrays are here and now, and you know what you get, and it's pretty good what you get, while RNA-seq has a little ways to go.