Name: Shrikant Mane
Position: Director, Yale Center for Genome Analysis, and co-director, W.M. Keck Foundation Biotechnology Resource Laboratory, Yale University School of Medicine, since 2009
Experience and Education:
Deputy director, W.M. Keck Foundation Biotechnology Resource Laboratory, 2006-2009
Director, Yale Microarray Resource, since 2001
Staff scientist, microarray core, Moffitt Cancer Center, 1999-2001
Vascular biologist, W. L. Gore & Associates, 1997-1999
Senior scientist, Cellco, 1993-1997
Instructor, Johns Hopkins University School of Medicine, 1990-1993
Research associate, University of Maryland, 1988-1990
Postdoctoral fellow, Johns Hopkins University School of Medicine, 1985-1988
PhD in applied biology, University of Bombay, India, 1985
MS in human physiology and biochemistry, University of Bombay, India, 1979
BS in zoology, Shivaji University, India, 1976
Shrikant Mane has been directing the Yale Center for Genome Analysis since it opened its doors three years ago on Yale's West Campus, located about seven miles southwest of downtown New Haven. Yale bought the 136-acre campus site, which is not fully occupied yet, from Bayer Pharmaceuticals in 2007.
The YCGA, established with funding from Yale, currently produces about 10,000 gigabases of sequence data per month. The center, which has a staff of 25, started out with seven Illumina Genome Analyzers and a 454 GS FLX. It currently houses 10 Illumina HiSeqs, two of them upgraded to 2500s; one Illumina MiSeq; one Pacific Biosciences RS; and one Ion Torrent PGM that is located in a different lab. The center also runs both Affymetrix and Illumina microarray platforms and expects to receive four Ion Protons in the near future.
In addition, the YCGA has 2 petabytes of storage and a 1,000-core CPU cluster, both located in a building next door, which are supported by Yale Information Technology Services.
In Sequence visited the center last week and spoke with Mane about its sequencing activities. Below is an edited version of the conversation.
Who are your users, and does the facility run at capacity?
This facility is mainly built for Yale investigators, both from the medical school complex as well as from the Faculty of Arts and Sciences — anybody who is associated with Yale has access to this facility. In addition to that, when capacity is available, we do take samples from non-Yale investigators.
Our sequencers run all the time, 24/7; we even run during Christmas recess. Our backlog varies, sometimes it's one month, sometimes two months, sometimes 15 days, but we always have a backlog.
What are the most popular sequencing applications?
Exome analysis is the most popular application — approximately 75 percent of our work. One of the reasons is that we are one of three Centers for Mendelian Genomics [funded by the National Institutes of Health to apply next-gen sequencing to discover genes and variants underlying Mendelian conditions].
About 10 percent of our work is RNA-seq or transcriptome analysis. We also have other applications, like ChIP-seq and methylation analysis.
We are also doing targeted sequencing. We have used it in some follow-up studies to help large projects, like cardiovascular disorder and hypertension studies. I think targeted sequencing is going to get bigger as the number of exomes per disorder increases. I think people are going to use a targeted sequencing approach for validation studies.
How much data analysis do you provide, and how much do you leave up to investigators?
Data analysis is sort of a bottleneck. Big labs have their own bioinformatics staff to do that analysis. We also have PhD-level bioinformatics staff who help investigators with the analysis. In addition to that, there are other resources, like the Keck Biotechnology Resource Lab's bioinformatics, biostatistics, and high performance computing section that also provides that kind of service. But we could use more, and we just recently hired another PhD-level staff member to help out investigators with the analysis. And the university is trying to hire a tenure-track type of investigator who can develop collaborations and new algorithms.
How many samples have you analyzed for the Center for Mendelian Genomics?
We did approximately 2,000 samples last year, which was our first year. I think we will be doing, depending on the cost, 2,000 to 3,000 samples a year. We may end up doing whole-genome sequencing for some of the disorders, but the majority of samples are being analyzed by exome sequencing.
All three centers work together. We are a national center, so we get samples from all over. Also, every center has their own samples, and their own disorders they have an interest in studying, because when we submitted the grant, we had to have some collection of samples. At Yale, we are focusing on abnormal brain development, Gaucher disease, hypertension, some cardiovascular disorders, migraine, and kidney diseases.
Do you follow up negative results by whole-genome sequencing?
That's exactly what is planned, but we have only been operating in full swing for less than a year. We will evaluate things periodically and then determine which disorders, where we haven't found the gene yet, to further pursue by analyzing the whole genome.
Do you do any cancer sequencing?
We do significant cancer sequencing, and we also work with FFPE samples. Yale has a big collaboration with Gilead Sciences, and we have already sequenced more than 1,000 exomes for various tumors through the Gilead partnership. I think approximately 40 percent of our work over the last year has been related to cancer.
Do you report any exome results back to patients?
We are CLIA certified for exome sequencing. Dr. Allen Bale, who is a professor of genetics [and director of the DNA Diagnostic Lab] at the medical school, runs CLIA operations for Sanger sequencing and other platforms, so we collaborate with him. He receives all the samples and analyzes the data, and our center just does the exome analysis — capture and sequencing — and gives the data back to him. We have already been inspected by the state and we are awaiting inspection by the federal agency to get final approval.
Not all exome sequencing will be conducted under CLIA standards; the majority of the work is for research purposes.
Have you automated the exome capture and library production?
We use Caliper [PerkinElmer] robotics to do most of our library preps. We still carry out the hybridizations manually rather than letting a robot do that because it suits our workflow. We also find that when we do it manually, it gives better results.
What do you mostly use each type of sequencing platform for?
Most of the exome analysis is done using the Illumina HiSeq because the throughput is very high and the cost is very low. The error rates are also low, as compared to PacBio, for example.
We use NimbleGen for exome capture and have worked very closely with NimbleGen to develop this product collaboratively initially. We have compared it with Agilent but we did not see any reason to switch. We also tried Illumina's exome capture product but we did not feel that there was a need for us to change, either, because our analysis pipeline was set up well with NimbleGen.
MiSeq is used mostly for targeted sequencing. We also use it for R&D purposes, validation or QC.
On the Pacific Biosciences, we mostly do metagenomic type of work, as well as some fungal and bacterial genomes. We are also trying to do some methylation on that, both in humans and microbes, because it directly reads the modified base. But the platform is not used very heavily at Yale.
How have the Illumina platforms performed in your hands?
We used to have many breakdowns before but they have gotten much more reliable now and the breakdowns have been reduced significantly. I think Illumina has also worked very hard to make sure the machines really run well. They keep some parts on site so we don't have to wait for parts to arrive. Overall, I think support from Illumina as well as the reliability of the parts has increased tremendously.
How long does it usually take for you to receive upgrades after Illumina announces them?
It's very quick. The relationship is so good that within a very reasonable amount of time, all these new protocols are available to us.
Are you interested in using the Moleculo technology that was recently acquired by Illumina?
We would be very interested. There are some applications, for example to study fusion genes, cancer, and de novo sequencing, where I think that technology will be very useful.
What are your expectations for the Ion Proton?
We are really looking forward to that and are very excited. I am really hoping that it will be able to do a genome for $1,000. Our cost for Illumina whole-genome sequencing is around $4,000 to $5,000, so if the Ion Proton can bring that cost down to even $2,000, I think that's a significant improvement and will have a significant advantage.
Right now, the HiSeq 2500 is definitely a growing platform in our hands, because it's working very well, giving consistent data, and it uses the known chemistry we are used to, and we have other Illumina platforms. The Proton would be very, very attractive if it brings the cost down and the data quality is high. Then, I think, the short turnaround time would also be a real advantage.
How important is turnaround time to you?
Most of the time, there is a continuous flow of data to the investigators. For the majority of situations, it doesn't matter that much. Obviously, everybody needs their data yesterday. But for clinical types of applications, it would be a huge plus.
What are your wishes for improvements in your current sequencing platforms? What should the vendors devote special attention to?
With the 2500 upgrade, we now have the ability to complete the exome analysis in a short time. If they can have the reagent cost come down, that would be definitely a great plus. Equipment-wise, they are reliable, but we still have some challenges.
With Pacific Biosciences, the biggest challenge from our end is the throughput. Because of the low throughput, the cost is very high. We can't do exome capture because of the cost, it's not cost-effective, and also, in order to have low error rates, we need to have high coverage. Transcriptome analysis is an application we would be very interested in, but due to the high cost, currently we are not looking into it. I am concerned about how long we can keep this platform because we don't have sufficient demand to purchase the service contract and keep this platform operational. Hopefully, we will get sufficient numbers of samples in the near future.
Looking ahead, what is your take on new sequencing platforms that are currently in the works, like the Oxford Nanopore?
Oxford Nanopore would be a platform to watch for. Library prep costs will be relatively low, it would give long-read sequence, and the equipment cost at launch is going to be very low. These are the pluses. In my opinion, that will be the future of sequencing, where the equipment cost is very low, there is no need for fluorescent tags, and also, the turnaround time is fast. But it's not here yet. That's kind of surprising, considering they did show some data last year.
At what point would you bring in a new platform like this?
Maybe within a few months. Especially considering that hopefully, it won't be very expensive, so we would be interested in adopting it sooner rather than later.
Do you have any interest in platforms that complement sequencing, like OpGen, BioNanogenomics, or Nabsys?
Right now, based on the type of applications we are running, we are not interested. But as we gear more towards cancer applications, especially structural rearrangements, I think we would be interested.