NEW YORK (GenomeWeb) – Seeking to support a broader pool of users and applications, developers of the freely available Clinical Interpretations of Variants in Cancer (CIVIC) repository are working to expand its content and capabilities and partnering with developers of existing repositories to provide more integrated access to information on genomic alterations in cancer.
Developed by researchers at the McDonnell Genome Institute at Washington University School of Medicine in St. Louis (WUSTL), CIVIC, which was in beta until this week, provides a free forum for capturing and sharing written summaries about the clinical relevance of mutations that are found in sequenced tumor samples. The summaries, which are drawn from published scientific literature, include details about associations between molecular alterations, drugs, diseases prognoses, and more. Recently, the developers published a paper on the BioRxiv site that describes the resource and highlights differences between CIVIC and some existing resources.
Last month, the project received roughly $900,000 in grant funding spread over three years from the National Cancer Institute's Informatics Technology for Cancer Research program to support the development of the CIVIC knowledgebase, web interface, and application programming interface. According to the abstract, the grant will, among other things, support efforts to implement well-designed standards and structured vocabularies for CIVIC data as well as to develop the interfaces to the repository.
Specifically, "we have funding to support the developers to keep improving the curation interface and the web interface," Obi Griffith, assistant professor of medicine, assistant director of the McDonnell Genome Institute, and one of CIVIC's developers, told GenomeWeb this week. Those funds will also support the WUSTL curators who actually read the scientific literature and write the summaries for the variants, he said. The funds will also support efforts to explore ways to use the information included in the knowledgebase to develop clinical testing applications such as targeted sequencing panels and assays as well as methods of incorporating CIVIC's information into clinical reports.
Both of those activities would require establishing community standards and best practices for curating clinically actionable variants in cancer, Malachi Griffith, assistant professor of genetics, assistant director of the McDonnell Genome Institute, and a CIVIC developer, said. There are quite a number of initiatives already underway including the ClinGen and ClinVar efforts "so we are trying to reach out to and interact with them as much as possible to figure out the landscape of standards," he said. "We definitely want to be in sync and not reproducing any effort that [others] are able to accomplish in this area."
One of the ways that they hope to encourage collaboration with other repository developers is through the Global Alliance for Genomics and Health (GA4GH), which provides a communal space to begin discussions and explore partnership opportunities. The CIVIC team is currently working with the GA4GH's genotype-to-phenotype working group to integrate with existing variant curation efforts and repositories such as the Precision Medicine Knowledgebase, which is developed and maintained by researchers at Weill Cornell Medical College. PMKB offers access to clinical-grade tumor mutations, annotations, and interpretations of variants identified in patient samples.
CIVIC's developers and others in the community are also proposing a pilot project that will focus on enabling data and knowledge sharing around cancer variants, the Griffiths told GenomeWeb. Specifically, participants involved in the project will work on harmonizing global efforts focused on clinical interpretation of cancer variants; establishing standards for describing genotype, phenotype, and evidence for clinically relevant cancer variants; and coordinating their curation efforts. The project will also focus on implementing infrastructure for querying variants across knowledgebases, a mechanism for running federated queries, and a web application for displaying clinically actionable recommendations and accepting user contributions.
In the year since the beta version of the database was released, CIVIC's developers have also made several updates to the system. Specifically, they have added a mechanism for notifying contributors to the site about edits made to their submitted summaries. "Before we had this problem where someone would submit a new evidence item that describes some relationship between a mutation and treatment, for example," Obi explained. "If we made a comment, [for instance] a question about the evidence that the user submitted, how would they know that something needed their attention unless they are coming back to the website and refreshing and checking?"
Now, when contributing users log into CIVIC system, they get a list of notifications — in much the same way Facebook users are notified about messages or updates to their posts — that including personal messages from other users as well as comments or revisions made to previously submitted summaries. "That really makes the curation process more efficient," Obi said.
The developers have also formally designated a new sub-category of variant summary editors who are dubbed domain experts. These are contributing individuals who are identified as being experts in particular areas including specific cancer subtypes, genes, or pathways. "It's a new area so we are actively seeking engagement," Malachi said. "They will provide a hands-on review of things that are submitted just like standard editors do but also more high-level guidance on how to make sure CIVIC is moving in the right direction in terms of clinical relevance and that the systems and standards we set up are optimized to ensure quality and engagement with the community of cancer researchers and physicians." So far, the existing domain experts are all from WashU researchers but "we are interested in engaging with other centers," he added.
Other updates to the repository include support for the Sequence Ontology, which adds to existing support for the Disease Ontology; improved browsing and searching functionality; and better documentation for the CIVIC API. The developers have also adopted a new licensing scheme for the CIVIC source code and curated content. Specifically, the source code and API are now available under the MIT open source license and all of the curated content is openly available under the Creative Commons Public Domain Dedication license.
Open access to the CIVIC content is one of the incentives that the developers hope will attract more users to the repository. "Many people operating in the space have had this experience where they contributed to a system as an academic project but then it was exclusively licensed to a company and then they had more limited access to it," Malachi said. "CIVIC has a very publicly stated open-access, open-source policy that basically means if you contribute your time and add your interpretation to CIVIC, you won't lose access to it later." Moreover, researchers seeking more visibility for their published research can contribute their findings as summaries to the repository or add information from their papers to existing summaries, he added. They are also exploring gamification approaches where community members can earn badges or points of some kind for contributing to the resource. "It is a challenge though," he noted.
Anyone can consume CIVIC's content without setting up an account. Users only have to create an account if they want to contribute information such as a new variant summary or a new relevant publication — this way, each user's contributions are trackable and CIVIC curators and editors can contact contributors if need be. Curation tasks that CIVIC users can become involved include providing structured evidence statements, variant-level summaries and variant coordinates, and gene-level summaries. They can also make comments or suggest revisions to information submitted by other CIVIC contributors.
"The fundamental unit of information in CIVIC is an evidence item or record which is sort of a structured description of an assertion from a publication," Malachi explained. These assertions fall into one of three broad categories — predictive, prognostic, or diagnostic. A summary that falls into the predictive category might include information about the cancer subtype that the variant is associated with as well as whether it is predictive of response to a particular therapy. So the summary could, for example, describe the BRAF V600E mutation which predicts response to dabrafenib in melanoma patients, he said.
Those are the basic elements of an evidence record but curators can also add other bits of information extracted from scientific publications that support the summaries they provide. For example, they can add pre-clinical evidence such as results of mouse or cell line experiments as well as published case studies, Malachi said. Submissions can also include a rating of the quality of the information based on things like the size of patient cohort or whether or not the study used appropriate controls.
It takes roughly 30 to 45 minutes to create a high-quality evidence record, according to the developers. When contributors submit summaries, they are reviewed by editors who have the authority to approve them for inclusion in the database. These editors are responsible for evaluating the validity of the submissions as well as for making revisions, if necessary, and getting those approved, before including them in the CIVIC cannon — submissions may be reviewed and approved within a day but the process could take up to a week or more if the summary contains contentious evidence.
"In some cases, people really identify areas where we actually need to refine the data model a little bit before we can properly accept the evidence record so those are the ones that tend to take a little bit longer," Malachi told GenomeWeb. "But it is getting smoother and smoother because we are covering all the major use cases. Cases where we haven't really encountered something at all [are] happening less and less."
According to the most recent numbers from the site, CIVIC contains a total of 1,472 curated clinical relevance interpretations on 585 variants in 245 genes. These interpretations are culled from 959 published studies. In total, it covers 149 cancer subtypes with some bias towards lung, breast, leukemia, colorectal, and skin cancer and treatments associated with these cancers. Current estimates, according to the Griffiths, show that 100 to 200 people visit the CIVIC site per day — these numbers exclude users that access the resource via the CIVIC application programming interface — including researchers from academic and government institutions, cancer centers, and companies. The list includes researchers involved in maintaining the UCSC Genome Browser. According to statistics reported in the paper, more than 10,000 users in total have accessed CIVIC interpretations.
Some of the NCI funds will be allocated to running hackathons and curation meetings focused on selecting and implementing appropriate standards as well as ways to collaborate with existing consortia and developers of complementary cancer variant resources, according to the grant abstract. The CIVIC developers are currently prepping for one such hackathon and variant curation meeting that will be held as part of the annual NGS in Molecular Pathology symposium hosted by the Netherlands Cancer Institute and Agilent's Cartagenia — the symposium starts on Nov. 30 and runs until Dec. 2. "[We've been] in discussion with [the conference organizers] about the kind of talks that we would give to bring them up to speed about what we've been doing here are the McDonnell Institute and also specifically with the CIVIC project in the area of variant curation," Malachi told GenomeWeb. "It became clear that this would be a good opportunity to engage in a very active way with some of the participants of their conference."
The developers have already begun reaching out to some members of the scientific community and asking them to participate in the variant curation and hackathon event — interested researchers can register now. Participants will have the option to contribute to CIVIC's development roadmap through specific coding activities, he said. For example, a group that is interested in incorporating the CIVIC API into its generation system or a clinical workbench could work with the CIVIC developers to accomplish that task. CIVIC's developers also hope to collaborate with developers of other resources to try to integrate some existing resources into the database during the event. For example, Malachi said, they hope to work with researchers from the group that makes MyVariantInfo — which provides web services for querying and retrieving variant annotation data — to integrate that resource with CIVIC.
For the data curation component of the planned event, the developers plan to hold discussions with participants about some of the issues surrounding the development of standards for variant curation as well as get attendants to try their hands at curating CIVIC data. This way, the developers can see what it is like for external curators to use the CIVIC interface and explore ways to make curation easier and clearer within the system. "[We want] to engage experts ... to basically make CIVIC the place that they store that information instead of their current system which is often something that's less flexible and scalable ... [such as] Excel spreadsheets or Google docs," Malachi said.