NEW YORK (GenomeWeb) – A team of Brazilian researchers has developed an open-source software tool to enable researchers and clinicians to home in on disease-causing variants for Mendelian disorders in genomic data.
Their tool, which they've dubbed Mendel,MD, analyzes and annotates exome or genome data, compares it against a number of databases, and filters it to identify a list of candidate disease-causing mutations. As Raony Cardenas, a graduate student at the Federal University of Minas Gerais, and his colleagues described in PLOS Computational Biology, they validated their tool using cohorts both in Brazil and Ireland.
Cardenas said that when they began to build Mendel, MD, there were no available open-source tools to annotate exome or genome data quickly to generate a list of candidate disease variants.
"That's the main reason I built this, because we couldn't find any other open-source tools to do that, only commercial tools," he added. The source code for the new tool is available at Github.
In particular, Mendel,MD can either be downloaded by the user or run through a web-based interface. Users upload their VCF files, which then undergo validation checks before being sent to tools that the developers integrated into the database to annotate the datasets.
The file is sent in parallel to tools like SnpEff, SnpSift, and Variant Effect Predictor for annotation. At the same time, the VCF file is annotated using two python scripts the developers wrote. One uses VCF files from the 1,000 Genomes Project, dbSNP, and ClinVar to annotate the user's file, while the other relies on data from dbNFSP to add functional annotation and prediction scores. Those results are then merged into an annotated VCF file for filtering.
Cardenas and his colleagues developed a "1-Click" automatic search with a set of pre-set filters and thresholds, though users can choose their own settings and filters. For instance, they can search for variants only linked with Mendelian disorders or order variants based on their frequency in the 1000 Genomes database.
"You can't do very sophisticated statistical analyses, but you can annotate and filter, and depending on what you are trying get accomplished, that can be sufficient," said Steven Hart, an assistant professor of biomedical informatics at the Mayo Clinic. Hart, who was not involved in the development of Mendel,MD, has worked on bioinformatics tools like VCF-Miner and on the GenomeGPS analysis toolkit, the DNA sequencing workflow used by Mayo Clinic researchers.
Cardenas estimated that it takes about 20 minutes to run the analysis on an exome sequencing file, and between seven and eight hours on a genome sequencing file, depending on the machine.
He added that the tool's user interface makes it easy to use. "It's very simple. It's a form that people fill, add their patients, and click a button and get their results," he said. "That's what people like, especially the doctors. They like to have their results very fast."
As he and his colleagues reported in their paper, they validated Mendel,MD using 19 exome VCF files they obtained from 11 different clinical cases that had been previously published. The tool was able to independently identify the correct gene and variant for all these cases.
They also applied the tool to a cohort of 57 patients from GENE–Núcleo de Genética Médica, a genetic testing laboratory in Belo Horizonte, Brazil, with suspected but undiagnosed genetic conditions. Using their tool and the patients' exome sequences, they reached a definitive diagnosis in 29 of the cases.
They similarly used Mendel,MD to analyze the exomes of 42 children with early-onset epileptic encephalopathy from Children's University Hospital in Dublin, Ireland. Here, Cardenas and his colleagues reported identifying disease-causing mutations in 26 percent of the patients, including one novel gene.
Cardenas added that to ensure that their tool was user-friendly, he and his colleagues also tasked graduate students with testing it. The students were given VCF files from one of the GENE–Núcleo de Genética Médica cases for analysis. Six of the seven students, they reported, were able to correctly identify the disease-causing variant.
Mayo's Hart said that he tried running the tool but had "some issues with it working." He said that that's not unusual with new tools when they start to see new data to which they haven't previously been exposed.
He added that software tools like this are "not one-and-done solutions" but continue to be developed after publication. Since Cardenas has made the code available on Github, Hart said, other developers could also build upon and extend it.
Cardenas, who is now at the startup Genomics Medicine Ireland, said Mendel,MD continues to be used there to analyze data from patients with rare diseases as well as by his former advisor to examine clinical cases in Brazil. He noted that it can be downloaded by anyone, anywhere, and he envisions it as a tool for busy clinicians.
Hart, though, sees more of a role in research labs for Mendel,MD. He said that many people pursuing clinical sequencing already have workflows in place and that changing a clinical workflow is a much more involved process than it is in a research lab.
Hart also noted drawbacks of website-based tools like Mendel,MD. First, outside users can't be sure that when they upload large datasets, the server will be able to handle them. Also, especially for clinical samples, uploading to a server raises privacy issues. In addition, he said, users have to rely on the developer to keep the tool up to date and the developer may have moved on to their next project. Hart added that a tool like Mendel,MD is challenging to keep current, as its user interface also has to be updated.
The tool, though, can also be brought in house, he said.
Cardenas said that he has plans to keep supporting the tool and intends to add a few new features. Specifically, he said, he wants to add 1,000 new genomes to his database as well as add support for CNV and population analysis.
"My plan is to continue doing support for this tool, but also expand it and add some new methods, update the databases, and make sure this scales well to the genome level," Cardenas said.