Name: Jane Carlton
Age: 46
Title: Director, Center for Genomics and Systems Biology, New York University
Experience and Education:
Faculty Director of Genomic Sequencing, Center for Genomics and Systems Biology, since 2011
Professor, Department of Biology, New York University, since 2011
Director of genomics, Genome Technology Center, New York University School of Medicine, 2009-2011
Associate professor, New York University School of Medicine, 2006-2011
Associate scientist, The Institute for Genomic Research (TIGR), 2001-2006
Visiting scientist, NCBI (GenBank), National Institutes of Health, 2000-2001
Assistant scientist, University of Florida, Gainesville, 1999-2000
Postdoctoral associate, University of Florida, Gainesville, 1997-1999
Postdoctoral fellow, University of Edinburgh, Scotland, 1995-1997
PhD in parasite genetics, University of Edinburgh, Scotland, 1995
BSc in genetics, University of Edinburgh, 1990
Jane Carlton has been heading genomic sequencing at New York University's Center for Genomics and Systems Biology since 2011, joining the center from NYU School of Medicine's Genome Technology Center, where she established next-generation sequencing.
Earlier this week, she also took on the role of director of the center, which moved into its current location, on NYU's campus near Washington Square Park in downtown Manhattan, about two and a half years ago.
In Sequence visited Carlton last week to find out how the center's GenCore sequencing facility is organized and what role sequencing plays in the research of its faculty. Below is an edited version of the conversation.
What areas does the Center for Genomics and Systems Biology focus on?
It's really taking a genomics and systems biology approach to studying the biology of a whole range of different organisms. We actually hit almost every single branch of the Tree of Life, from protists to bacteria to plants to animals, and hopefully to viruses with a new recruit in that area. We work a lot on model organisms here and look at gene interactions, protein interactions, networks, and how those all come together to form a composite understanding of the biology of these organisms.
There are 14 faculty, one more is coming on Tuesday, and we're in the process of recruiting one or two more, so we're really expanding quite rapidly.
So you don't study humans?
Actually, that's one thing we don't really do. We study microorganisms in humans, and developmental genetics in model organisms, which can then be extrapolated to what might happen in humans. We have one faculty member who does a little bit of human cell line analysis, and another who does some evolutionary biology of humans, but not a tremendous amount.
How important is sequencing for the research conducted here, and how are you equipped with instrumentation?
It's a real cornerstone. Next-generation sequencing underpins an awful lot of what the faculty do here. We have an Illumina HiSeq that we just upgraded to 2500, we have two Ion Torrent PGMs —one here in New York and one in India in a collaborating institution — and we are currently in the process of getting a MiSeq as well.
In addition, we collaborate a lot with the other genome sequencing cores in New York, particularly with the Genome Technology Center at the NYU School of Medicine, but also with Memorial Sloan-Kettering, and then of course there is the New York Genome Center as well.
We do have to outsource sequencing to some of these other places, but I'm not entirely sure if we're going to buy more machines after the MiSeq, simply because there is so much capacity here in the City.
I should also mention that NYU has a campus in Abu Dhabi, where we have a sister institute, which also has a core facility that is run by Kourosh Salehi-Ashtiani. For example, they have a big project to sequence 100 varieties of date palm. They have a HiSeq 2500, a MiSeq, an Ion Torrent Proton, and a PGM.
How do you run the sequencing core facility?
We have a manager, Paul Scheid, and we have a technician, and I'm hiring another technician. We also have a bioinformatics lead, a programmer, and a high-performance computing specialist. I'm helped more on the bioinformatics and the computational side by two faculty directors of bioinformatics, Kris Gunsalus and Rich Bonneau. So altogether, we form this unit.
Whenever a faculty member wants to do some genome sequencing, they usually contact the core manager, and he'll set up a consultation to go through the whole experimental plan for that project, come up with a budget, and understand what kind of downstream processing of the data is required. Once that's done, a plan is made, and they move forward. Sometimes, we have faculty who want to run several [HiSeq] lanes, and other faculty will ask to do the same thing, so we mix and match them. Because we now have the HiSeq 2500, we can also do the rapid-run mode, where you run just a couple of lanes as opposed to full flow cells. And then several people are starting to use the PGM now, and we have the MiSeq coming online as well.
What are the applications most users tend to run?
We do quite a lot of RNA-seq, we have done that a lot for some C. elegans projects. My own lab does de novo sequencing of malaria genomes and other parasites, and then we do a lot of ChIP-seq as well.
We also just had a really big project for resequencing several genomes that was running tens of flow cells.
How is the core funded?
It's supported by investigator grants, NYU funds, and private donors. We charge our faculty the cost of the reagents, and everything else is paid for. They do the library construction but we very much advise them, and we can end up making some of the libraries. We may move to a model where we make the libraries.
How are you equipped on the computational side, and how do you analyze and store the data?
We have a custom-built LIMS that was developed by two of the computational people on the GenCore team. It captures a lot of the metadata associated with the sequencing libraries, all the way through to what flow cell they are run on, the date, the type of sequencing, and then QA/QC data at the other end.
Then we run the sequence data through a pipeline, developed in-house again, to do things like de-multiplexing, for example, and then we provide the processed reads as a weblink directly to the faculty member. That's really where it finishes. Any further downstream analysis they want to do has to be talked to us about at the consultation stage, and then we'll move forward with that if necessary. But the individual PIs usually have the capacity to be able to do the analysis themselves.
We have several clusters: a couple of in-house clusters that the faculty directors of bioinformatics have in their labs, but other people can use as well, and we are part of the NYU high performance computing facility, where we have something like 100 terabytes of storage. That's where all of our Illumina runs go over for storage and processing.
On the clusters, we have a lot of the alignment programs. One set of software that we have is from CLC Bio, which my lab uses, so other people can have access to that as well.
What do you use each sequencing platform for?
It's microbial sequencing, and teaching, on the Ion Torrent; microbiome sequencing for the MiSeq that we'll be getting; and for de novo sequencing, RNA-seq, and ChIP-seq for the metazoan genomes that we work on, it's the Illumina HiSeq 2500. We've also started doing some metagenomic sequencing using the rapid run mode of the HiSeq.
What are your wishes for improvements in the technology? What should the vendors pay special attention to?
I still find there is a significant gap between what the sales person says a next-generation sequencer can do, and when you actually get the machine, there is an issue with every single run. I find there is a real kind of gap that way, and I would prefer the vendors to be a bit more honest.
What kinds of issues have you encountered?
For example, read two will fail on the Illumina, and we're not sure why that is. Or we had a problem with the temperature in the room for the PGM, because the optimal temperature for the Illumina is slightly different for the PGM, and we have both of them in the same room. The thing is, when we are having challenges with the PGM here in the US, where things are pretty perfect, and then I'm trying to do this in India, where things are far from perfect, that's when I am getting a little concerned.
But I think the technology is just improving so fast, and things are going so rapidly and getting so much better. I remember the very first Illumina machine we had, the GAII — we had such problems with that machine. That was really difficult. The data that's produced now, there are huge amounts of it, and the quality is so much better, and the machines are operating so much better, too. There have really been significant improvements. I can only see it get better.
The PGM is also getting better, they just brought out their 400 base pair technology with their 318 kit. We have been beta-testing that, which has gone quite well. Our mean is lower than 400 bases, something like 280 bases. But the curve goes out, so we are reaching 400 bases, but that's not the average.
What are you looking forward to in improvements?
It's read length. Size matters, it really does. I do a lot of de novo assembly [of parasite genomes], and if you have short reads, and really repetitive reads as well, you just can't assemble those genomes. So we're looking forward to Oxford Nanopore's technology. Every year, Paul goes to AGBT, and he went this year, and we were both really disappointed. There did not seem to be any really big thing that's coming through onto the market. And Oxford Nanopore, the year before, had given a big buzz and then didn't seem to be doing anything.
We also work with other companies, like Kapa Biosystems. They generate great reagents, for example enzymes that can be used in library development. We have a nice relationship with them where we do some testing for their new products and send them the data.
How do you see your center's role in relation to the New York Genome Center?
We've had a lot of interactions with the New York Genome Center. We hosted them to give an afternoon symposium here last year, and several of my team are on the working groups of the New York Genome Center, so we have a really good relationship with them. Because Bob Darnell is new as the director, we want to start that dialogue again, because I have a feeling that their development and planning is shifting a little, so it will be good to see how we can interact going forward. Certainly, as one of the founding members of the New York Genome Center, we can take advantage of several of the initiatives that they have started, such as discounts on Illumina pricing, which is very nice. And we go to their symposia and things like that.
Our initiative at NYU, which is almost 10 years old now, is very much to support the faculty at NYU in their genomics and systems biology endeavors, whether that's in other departments, such as chemistry, or at the Courant Institute of Mathematical Sciences, or physics, or perhaps neuroscience. Whereas the New York Genome Center is pretty agnostic as to who they provide sequencing for.
Another major difference is that we use the sequencing core here to teach undergrads and graduate students. For example, the Ion Torrent machine that I have is just a fantastic benchtop sequencer that we now use in four different graduate and undergraduate courses, where the students come and they see the machine, they produce the DNA sometimes from swabs of their mouths, libraries are made, and then they see the sequence results and they use some basic bioinformatics to do some analysis.
We really are this kind of boutique facility. This is another difference between us and the New York Genome Center. I used to work at TIGR [The Institute for Genomic Research], and there we had a great sequencing center, but it was this big behemoth. It has pipelines, and it's a bit like a sausage factory: As long as you put in the right sausage meat, you get sausages out at the end, but you put something else in, or you want to tweak things a little bit, and it can't do that. We have inherent flexibility here, being a small institute where we can try out new protocols or do things slightly differently, and also we have a number of different platforms, we are not this big behemoth-type structure. There are definitely advantages to having different platforms, because there isn't really one catch-all-type next-gen platform at the moment.
Can you mention some sequencing projects you're working on?
Just recently, I was awarded a 'Grand Challenges' grant. These grants were spearheaded by the White House Office of Science and Technology Policy for scientists to come up with big picture kind of projects that would really grab the public's attention. Mine was one of two projects that were chosen for two years of pilot funding. Our project is basically to sequence the metagenome of New York City. One project we have in mind is sequencing the microbial populations found on paper currency, money. So we've taken some one-dollar bills and used the Illumina to do some metagenomic sequencing and see what we can find. The other project, which has just started to gear up, is sequencing sewage from different places throughout the five boroughs, to see if we can identify particular bacteriophages that might be involved in the spread of antibiotic resistance, or if we can monitor before and after storms, for example superstorm Sandy, how that changed the ecology of microbial populations in New York City. And also, perhaps trying to see if we can track things like the cockroach population in New York City, or rats or things like that.
What do you use the PGM in India for?
It's for my big Center for the Study of Complex Malaria in India grant, which is an international center of excellence in malaria, one of 10 centers funded by NIH for seven years that are working on malaria across the world. Mine is based in Delhi, it's the National Institute of Malaria Research, but it has field units throughout India. A big component of that grant is to use genomics to sequence one particular species of malaria parasite, Plasmodium vivax, which you can't culture. I can't grow it here, so I have to go where it infects people, and one of the biggest places for that is India. We can't take samples out from India, due to a lot of restrictions. Also, we really want to build capacity in scientific research in India with the Indian scientists. So part of the deal for this big NIH grant was to set up a small sequencing facility in India, where we would train the personnel to use the Ion Torrent to do the sequencing.
Why did you choose the Ion Torrent for that?
It took me about two years to get my head around this, because the setting that we are in is a resource-limited setting. It doesn't have field engineers or people like that on call 24 hours a day, like we do here. This was very much in the early days of the Illumina HiSeq, and I really thought that was too much of a precious thoroughbred-type [machine] that would have too many issues to use. And then the Ion Torrent PGM came online, where the data is actually smaller, and that's easier to handle. It's ideal for microbial genomes, and the read length is only getting longer. Also, the Ion Torrent server is kind of in-a-box software, it has built-in pipelines, so we would not have to build pipelines ourselves, for example, it would be much easier for the scientists there to use. [Our two PGMs] can talk to each other, and we can see from here what's going on in India. We developed the SOPs and the protocols here that can now be used in India.
Do you have any advice for someone who is setting up a sequencing facility? What should they pay special attention to?
It involves so many different components. You have to put all these different pieces of the puzzle together, and I think people tend to think that that's a really simple and easy thing to do, and it's not. Because if your computational side is not working, then everything fails, or when your LIMS is not working, then it's a total failure. Everything has to work together. And I think having a strong and very cooperative team that can work together is also very important.
Where do you see sequencing technology going longer term?
What I'm seeing is the Oxford Nanopore, where you just have your sequencer that you plug into a USB port in your computer. I love that, that's great. Anything to do with handheld devices. Especially for us working in New York City, to be able to go out and take a sample of sewage and be able to process and sequence that on the spot, that I would love to be able to do. At the moment, we are collecting samples in the field and bringing them back. But if we could do it in real time, perhaps you could even have monitoring stations that would sequence there and then, and send the data back to us. Especially for bioterrorism threats, that could be a really cool and useful thing.