Skip to main content
Premium Trial:

Request an Annual Quote

Stanford Researchers Turn to Crowdsourcing to Rank Severity of Adverse Drug Reactions


NEW YORK (GenomeWeb) – Researchers from Stanford University and King Abdullah University of Science and Technology have published a paper in the Journal of Medical Internet Research describing a study that used online crowdsourcing to rank adverse drug reactions (ADR) by severity.

The method and results of the study were also described in a poster that was presented at the American Medical Informatics Association's Translational Bioinformatics (TBI) conference last week. In the study, the researchers assigned over 126,000 random pairs of ADRs to more than 2,500 workers on Amazon Mechanical Turk — a platform for creating and assigning small so-called human intelligence tasks that human workers perform for payment — and asked them to select the worst ADR for each pair. They used an optimization algorithm to rank over 2,900 ADRs by severity based on the workers responses.

These rankings help "highlight drug classes based on the severity of their associated ADRs, triage predicted drug-associated ADRs for further investigation, and associate genes with a severity score based on their association with ADRs, with some implications for drug design," the researchers wrote in the Journal of Medical Internet Research. Also, the study provides a more comprehensive publicly available resource of ranked ADR severity information than has previously existed, and represents the first application of crowdsourcing for pharmacovigilance applications, according to the researchers.

Encouraged by the fruitfulness of this initial study, the researchers are looking at additional applications of crowdsourcing to pharmacovigilance projects, Assaf Gottlieb, one of the authors on the paper told GenomeWeb. Gottlieb, who also presented the poster at TBI, is a postdoctoral research fellow in the laboratory of Russ Altman in Stanford University's department of genetics. He said that the researchers are already mulling a second study that, broadly speaking, will use questionnaires to collect information on individual treatment preferences.

They'll once again likely use Amazon Mechanical Turk for that study. Responses from the workers who participated in the first study were very positive, according to Gottlieb, with many respondents seeking additional opportunities to participate in clinically relevant projects.

A ranked list of ADRs ordered by level of severity is a useful tool for clinicians trying to make more personalized drug risk-benefit assessments and provide more patient-centered healthcare, the researchers argue in the paper. ADRs increase hospitalization times and pile on extra medical costs, among other problems, spurring increased interest in accounting for their impacts when making drug assessments.

Current methods of estimating risk focus mainly on ADR frequency, the researchers wrote, and either handle each ADR individually or assign equal weights to all drug ADRs even though there are variations in severity and associated impact — severe life-threatening ADRs such as liver failure, cardiac arrest, and others require more attention while other minor ADRs may not. "Of course, patients' subjective perception of the severity of an ADR varies widely .... Nonetheless, a ranking of ADRs based on perceived severity is a useful starting point," the paper states.

Using a list of over 2,900 ADRs gleaned from the Side Effect Resource, the researchers generated over 126,000 microtasks that they assigned to workers on the Mechanical Turk website. They used the site because "previous evaluations have shown that MTurk can be as reliable as traditional survey methods, and that the use of control validation questions can markedly improve reliability and reduce variability," the researchers explain in the paper. Other datasets used in the study include ADR-drug associations from the Off-label Side Effects database as well as gene-ADR associations and predicted gene-ADR associations from DrugRouter and other databases.

For the experiment, each worker was presented with up to 15 sets of 10 pairwise ADR comparisons and asked to select which ADR was more severe in their estimation for each pair. Each set of 10 pairs included three pre-defined pairs — used as a quality control check to filter out unreliable workers — and seven real randomly chosen pairs. The worker interface also included links to Google queries that provided workers with information about ADRs expressed in unfamiliar medical terminology.

Also, in an effort to maximize results and reduce potential biases, the researchers distributed tasks on different weekdays and used the ranked results generated from the first batch of assigned comparisons to randomly select ADR pairs for future tasks. The list of ADRs ranged from more severe reactions such as cardiac arrest and metastatic bone cancer to less severe reactions such as elevated mood. Each comparison took about five minutes to complete on average and workers were paid about $0.45 per task. After collecting and merging workers' responses, the researchers used a linear programming approach to generate a global ranking of ADR severities based on the results of the pair comparisons.

The final ranked list showed "good correlation between the mortality rates associated with ADRs and their rank," according to the paper. The rankings also showed "significant" correlation between the relative number of deaths in ADR reports and the computed severity rank for the ADR in questions — more severe ADRs tend to have significantly higher death rates.

Not unexpectedly, top ranked severe ADRs included cardiac arrest, metastatic bone cancer, left ventricular failure, HIV infection, and anal cancer. Equally unsurprising, bottom-ranked ADRs included elevated mood, euphoric mood, early morning awakening, dry mouth, and decreased appetite. The middle of the list was a little more variable and the selections that workers made there are very subjective and individualistic, Gottlieb said. That led to some "discordance" between the estimated severity of some of the ADRs in the list and their associated mortality rates as recorded the US FDA Adverse Events Reporting System (AERS), the researchers wrote. They posit some possible reasons for the disagreement including " a misunderstanding by laymen of the true severity of an ADR (e.g., the word 'cancer' may get a high ranking, regardless of its survival statistics), and/or a bias in the associated death rates in the AERS system."

One initially interesting finding was that congenital malformations, which the researchers expected would rank higher was actually ranked moderate to mild by the respondents, he said. A closer look at the data showed that these were relatively harmless malformations such as having an additional toe, which could account for why they received a lower ranking.

As part of the study, the researchers also looked at links between ADRs and some therapeutic drug classes including immunosuppressants, anti-inflammatory drugs, corticosteroids, and anti-Parkinson treatments. Immunosuppressants and anti-Parkinson drugs were among the treatments with high numbers of associated severe ADRs — a median of five or more severe ADRs. The severe ADRs associated with the highest number of immunosuppressants were necrosis, renal failure, and congestive cardiac failure. The most common severe ADRs for anti-Parkinson drugs were cardiac arrest, coma, renal failure, skin cancer, and cerebral ischemia. These classes also showed "large variability between their drug members in terms of occurrences of severe ADRs, suggesting staying vigilant in regard to the effect of drug choice on ADR occurrence in patients," the researchers wrote.

The ranking also "highlights severe drug-ADR predictions, such as cardiovascular ADRs for raloxifene and celecoxib," the researchers wrote. "It also triages genes associated with severe ADRs such as epidermal growth-factor receptor (EGFR) [which is] associated with glioblastoma multiforme, and SCN1A [which is] associated with epilepsy." The full list of ranked ADRs is included in the supplementary section of the paper.

For their next steps, the researchers plan to use the ranked list to prioritize ADR predictions from previous studies for further investigation, Gottlieb said. They also want to use the list to aid clinicians in the treatment selection process, he said. At present, the researchers are working on a patient similarity framework that would let clinicians weigh treatments based on associated ADRs as they occur in different individuals. With this information in hand, physicians can then check what their patients' preferences are before going ahead and prescribing treatment.