CHICAGO – Researchers at the Mount Sinai Health System in New York have come out with a new informatics resource to assist researchers in their quest for drugs to treat COVID-19. The resource, called COVID-19 Gene Set and Drug Library, was published by the lab of Avi Ma'ayan, director of Mount Sinai's Center for Bioinformatics.
It features analyses of thousands of published works on the SARS-CoV-2 coronavirus, COVID-19 and its molecular mechanisms, and existing drug compounds that could be repurposed to respond to the pandemic. As of Thursday, the database contained 69 drug sets on 1,033 unique drugs and 361 gene sets comprising 13,347 unique genes.
Researchers worldwide can view, download, and analyze the compilation, as well as contribute their own data on the novel coronavirus in an effort to accelerate the race for a cure or vaccine. The COVID-19 Gene Set and Drug Library contains a simple visualization tool that produces Venn diagrams to assist investigators in identifying overlapping evidence.
The gene sets come from published articles — including preprint releases — looking at the mechanisms of COVID-19. Sources include genome-wide association studies, gene expression studies, and mass spectrometry-based proteomics, Ma'ayan said.
Besides the published literature, Mount Sinai's resource also scours Twitter and other social media in search of potential trends about drugs people are discussing publicly. The Center for Bioinformatics goes to Twitter daily to search against 14,000 different drug terms that have been mentioned in context of COVID-19 and severe acute respiratory syndrome.
"We're counting how many times each drug is mentioned, and then we can trace all the time trends of which drugs are starting to be discussed or [ceasing] to be discussed," Ma'ayan said.
While social media can be a minefield of hysteria and misinformation — exhibit A might be the hype and apparently false hope around hydroxychloroquine — Ma'ayan and his team are simply collecting information and letting researchers form their own judgments. "We're not really looking at the source of the tweets or the sentiment of the tweets. We're just collecting those counts," he said.
Ma'ayan acknowledged that the tactic may be "a little bit dangerous" because of the potential for misuse of the information. But he said that Twitter is helpful in this case because the disease is so novel and because so many thousands of papers have been released in just the last two months, many without peer review.
"If the community is working together and you start converging all of the experimental evidence and computational evidence that is accumulating, then the right drugs will rise up," said Ma'ayan, whose faculty appointment is in the Department of Pharmacological Sciences.
Drugs discussed on social media are often not the most frequently cited ones in the scientific literature, either. "Let's say a researcher finds a drug that they think might work," Ma'ayan added. "They can find past tweets that talked about the drug and they can see whether it's in clinical trials, whether other people mentioned it."
Ma'ayan said that his lab simply is providing a public service for researchers.
"We wanted to put all of that data in one place and not just give you a laundry list of those things, but an ability to compare and count and look for consensus and do analysis on those publications," he explained. "There are all kinds of lists of either genes or proteins, and we just tried to put all of this together so people can start to see patterns."
The COVID-19 Gene Set and Drug Library started early this year as a solo analysis Ma'ayan was performing to predict potential therapies for the respiratory disease based on published data. Then he asked some collaborators to test his predictions.
"What I noticed is that a lot of other people have started just publishing predictions for drugs, using different methods," Ma'ayan said. He then realized that hundreds of other academic bioinformaticians had the same idea, to apply computational methods to drug repurposing for COVID-19, so he and his lab began collecting some of those published and preprint studies.
The Mount Sinai Center for Bioinformatics coordinates data science for several US National Institutes of Health Common Fund-backed projects, as well as for LymeMind, a platform for predictive modeling of Lyme disease that is funded by the Alexandra & Steven Cohen Foundation Lyme and Tickborne Disease Initiative.
Ma'ayan oversees a team of about a dozen people, including software developers, data scientists, and postdoctoral and graduate students in bioinformatics, to support research exclusively; the center is separate from any biomedical informatics operations at the health system that serve clinicians.
The people working on LymeMind pivoted to this COVID-19 effort when the pandemic hit the US. Indeed, in late February, even before New York became the US hotspot for coronavirus, Ma'ayan decided to require employees of his lab and the Center for Bioinformatics to work from home, two weeks before offices and other public spaces started to close nationwide.
Two months on, the COVID-19 website has close to 800 registered users. A glance at analytics indicated that 80 people logged in on one random day last week, from countries including the US, UK, Brazil, and India. Almost all were affiliated with academic institutions, according to Ma'ayan.
Given that COVID-19 is so new and that the pandemic is spreading so rapidly, Ma'ayan has not developed any long-term plan for the drug and gene resource.
The Mount Sinai team has learned in a short period of time that it is necessary to separate drugs tested in vitro, those predicted in silico, and those simply mentioned on Twitter because of the usefulness and level of confidence around each piece of information, Ma'ayan said.