NEW YORK (GenomeWeb) – Researchers at the European Molecular Biology Laboratory's European Bioinformatics Institute in the UK are developing a new database to collect, curate, and convey data generated by CRISPR/Cas9 experiments.
The team aims to release the new resource, dubbed the Genome Editing Catalogue, sometime next year, with the hope that it will become as widely used as other institute databases.
"With all the data coming out, it became clear to us that it was difficult to search through it and that you have to spend a lot of time getting into the nitty gritty," said Sybilla Corbett, a variation annotation curator at EBI in Hinxton.
"The EBI has been working on these kinds of grand projects for a while," she said, noting the success of other EBI databases, such as Ensembl and the GWAS Catalog. "Providing big-scale data in a way that makes it accessible is key to what EBI does," noted Corbett. "When we realized that CRISPR was going to produce this amount of data, it became clear that it would be a good fit for EBI to step in and curate it."
Corbett joined EBI last December from the University of Leeds to work on the CRISPR/Cas9 data archive and is leading the effort. Other members of her team include Daniel Zerbino, who heads EBI's genome analysis team; Myrto Kostadima, the institute's Ensembl regulation project leader; Fiona Cunningham, EBI's variation annotation team leader; and senior scientist Paul Flicek, who oversees Ensembl.
While EBI has the resources and experience to make the Genome Editing Catalogue available, there are other databases that serve researchers undertaking CRISPR/Cas9 experiments. DKFZ, the German Cancer Research Center in Heidelberg, hosts the GenomeCRISPR database, for example, which to date has compiled data on 700,000 single-guide RNAs (sgRNAs) used in approximately 110 CRISPR/Cas9 experiments performed in 63 different human cell lines.
While Corbett praised GenomeCRISPR as a "great website," she said that given its focus on genome-wide screens in humans, its scope is less broad than that of the resource EBI aims to create. Instead, Corbett said, her team has been inspired by EBI's GWAS Catalog, which currently contains data from 3,079 publications, including 41,893 unique SNP-trait associations.
"There is a big focus here on curating data, from all kinds of experiments, not just from genome-wide screens, so that it is more discoverable for the research community, but also comparable in a way," said Kostadima, who is supervising the effort to develop the Genome Editing Catalogue at EBI. "It provides a good way for people to understand what has been done so far."
"One of the benefits of being at the EBI is that we can reuse the knowledge and know-how across the institute," noted Zerbino. "In many ways, we aspire to be the GWAS Catalog for genome editing."
Although the researchers described their project as the "EMBL-EBI CRISPR Archive" on a poster at Cold Spring Harbor Laboratory's Genome Engineering: the CRISPR/Cas Revolution meeting, held at CSHL in July, the team has since opted to call the database the Genome Editing Catalogue, anticipating a time when the terminology may change. "Imagine a few years go by, and it's not called CRISPR anymore," said Kostadima.
In the poster presentation, the team maintained that it expects CRISPR/Cas9 experiments will have an impact on the biomedical sciences "similar to that of PCR and high-throughput sequencing."
While the poster mentioned a prototype website, Corbett said that the database remains in the development stage, as the team consults with scientists to "talk about the kinds of data they would be interested in viewing, and how we can present that data in the best possible way."
Kostadima reiterated that the team is very much interested in engaging future users about what they need most from the new database. "Feedback from the community not only helps us to decide on what data to import into the database, but also the design, so that all of this information can be easily retrievable by users," she said.
In terms of data collection, EBI will manually curate data entered into the Genome Editing Catalogue, Corbett said, though it will set up a process for users to submit their own data, similar to the GWAS Catalog. On the poster, the team noted that all submitted datasets would be required to provide a common set of parameters, allowing for cross-comparison, regardless of their origin. By including data from genome-wide pooled screens, as well as single-gene experiments, the team said it hoped to provide an "integrated resource to ... facilitate new discoveries."
Once it becomes available sometime in 2018, the catalog will primarily be a resource for people working in the lab with CRISPR/Cas9. "Imagine the next stage of their research is to create a mouse that has a knock-out of this gene, so that they can test for different compounds and how the mouse will react to them," she said as an example. "So they look up their gene of interest in the Genome Editing Catalogue, and find out there are three other people who have done the same experiment that they want to do."
By using the new resource, scientists could therefore look at the sequences that other researchers used, as well as their laboratory protocols, which will also be useful to researchers, given the amount of ongoing experimentation and modification related to the technique, "from gene editing through non-homologous end joining and homology-directed repair, to transcriptional activation and repression, as well as imaging experiments," as the team noted on the poster.
In this way, the EBI developers believe the new catalog will help users find appropriate references in the literature.
"It's our wish that it becomes something that is so useful that it becomes a standard tool that people use," said Corbett. "We'll have to see, and we'll take into account feedback we get, and will try to make it the best we can that way."
Erik Sontheimer, a professor at the University of Massachusetts Medical School's RNA Therapeutics Institute, said that while he was not familiar with the planned database, it could be "very useful" to the community.
"For example, if someone wants to use CRISPR to inactivate a given human gene, there are many possible sgRNAs that could target that gene, and settling on the best one still involves trial and error – some guides will prove to be inefficient, others will have too many potential off-target sites, et cetera," Sontheimer said. "Then, when we consider different delivery modes, cell lines, organisms, Cas9 orthologs, knockout phenotypes, repair templates for precise gene edits, and so on, the utility of a centralized information resource is clear."
One immediate benefit of having a genome editing resource, Sontheimer noted, would be in minimizing the amount of duplicated efforts.
"If someone has already tested, validated, and optimized sgRNAs for a given human gene, then others should be able to find that information and use it for their own knockouts, without having to re-screen an entire panel of potential guides," he said.
Additionally, if useful CRISPR-edited cell lines, or model organism strains, have already been generated by one group, it "would be great for other groups to be able to identify and hopefully obtain them for downstream studies, without having to re-make the edits themselves," he said.
Michael Boutros, professor of signaling and functional genomics at DKFZ who administers GenomeCRISPR, similarly welcomed the establishment of the Genome Editing Catalogue by EBI.
"I think it's good if EBI gets into that because there are so many resources for sequencing, but far fewer for functional assays such as CRISPR, so it's important to have many databases for that," Boutros said. He noted that GenomeCRISPR is focused on large-scale genetic screens rather than on different organisms or types of experiments. As such, he sees EBI's database and GenomeCRISPR as complementary. "What EBI is doing is really worthwhile," Boutros said. "We are happy to collaborate on it."