Skip to main content
Premium Trial:

Request an Annual Quote

At GA4GH Meeting, Experts Discuss Challenges of Applying AI in Genomic Medicine

Artificial Intelligence Genomics

NEW YORK – The application of artificial intelligence in genomic medicine has the power to transform healthcare but is in its earliest stages and faces many challenges, according to experts who spoke at the annual plenary meeting of the Global Alliance for Genomics and Health (GA4GH), held in Barcelona and virtually last week.

Moreover, completely different mindsets are set to collide in AI-based genomic medicine as the medical and life science fields intersect with the information technology industry, resulting in even more complex questions about the right way forward.

"These are two different areas that have very different cultures, very different incentive structures, [and] different approaches to ethics, and operate under very different sets of regulatory constraints," Harry Farmer, a researcher at the Ada Lovelace Institute in London, said at the meeting.

"The big question [is] about whose culture becomes the dominant one when these two cultures merge, what happens, and how this is reconciled," said Farmer, whose talk covered the societal implications of AI and genomics.

Since March, the Ada Lovelace Institute and the Nuffield Council of Bioethics, a London-based organization, have undertaken AI and Genomic Futures, a research project that aims to explore how AI will transform genomics and what that will mean for individuals and society. According to Farmer, the project will delve into the ethical, political, and economic consequences of the merging of these two fields and, eventually, make recommendations based on its findings.

Farmer noted that AI and genomics are "hugely expansive terms," and that the researchers involved in AI and Genomic Futures decided to look at genomic analysis and exclude genome editing, as well as research into nonhuman subjects. He noted that AI and Genomic Futures is not only interested in genomics in medicine but also wants to better understand how it is used in various contexts.

The project is structured into multiple phases, involving a literary review, bibliometric analysis, and a "horizon scanning exercise" that will involve interviewing experts about trends in AI and genomics over the next decade. Farmer said the researchers are nearing the horizon scanning part of AI and Genomic Futures. There will also be a phase where they explore how to shape the application of AI and genomics to best address arising social concerns.

As Farmer underscored, both AI and genomics on their own pose "huge ethical questions" related to human agency, privacy, bias, and power. Put together, such questions only become more complex. Some challenges he described include privacy of data, accuracy, and built-in bias resulting from the use of skewed datasets to produce algorithms.

"Genomic data is hard to anonymize and sensitive," said Farmer. He added that given the breakneck pace of developments in the application of AI in genomics, it is difficult to provide informed consent, because it is unclear how a subject's data might be used in the future.

Minorities, in particular, are in what Farmer called a "double bind." Historically skeptical about contributing data, they are also less likely to benefit from AI-powered genomic medicine than others if they do not contribute their data, a situation that could result in biased outcomes.

There are also issues about testing bias in AI-powered genomics, as "many AI systems operate as black boxes, whose decision-making processes are obscure." Intellectual property rights are also set to be contested, as some disagree about the ownership of data produced by AI systems. "The public thinks that they own their genomic data," said Farmer, "and that they have rights to it."

Gerardo Jimenez Sanchez, CEO of Genómica Médica, a Mexico City-based clinical and molecular laboratory, also drew attention to the challenges that swirl around the implementation of AI in genomics.

"There are serious questions of how this will work," Jimenez Sanchez said in his talk, noting questions about the trustworthiness of AI systems, and the danger of potentially reinforcing and codifying biases. To avoid creating such inherent flaws, Jimenez Sanchez said that researchers developing algorithms need to have diverse datasets.

"Clearly not having a well-represented database to start with, with regards to race, gender, and so on, would be a risk for getting skewed conclusions," said Jimenez Sanchez. "We need to have better representation not only because it is fair but because it is smart, and to have better results," he said.

Khalil Ouardini, a data scientist at AI firm Owkin, agreed. "Machine learning models should be benchmarked on as many external datasets as possible," he said at the meeting. "And some of the demographic and clinical values should be controlled for."

Ouardini also suggested that developers test for failures in their algorithms. While he said there aren't standard tools for doing so, he said that they should remain aware that they might exist and to seek them out.

Cost effectiveness is also an issue, Jimenez Sanchez noted, as it is costly to manage such datasets, and developers need to be creative, "otherwise their whole budget will go there." Jimenez Sanchez's statement was echoed by Farmer in his talk, where he also mentioned that the environmental impact of managing such data resources should also be taken into account.

Furthermore, said Jimenez Sanchez, researchers need to validate their findings from AI. 

"Let's not think that by doing AI, we don't have to go back to the laboratory and do functional analysis or clinical trials," he said.

Still, he stressed, the need for AI in genomics is real, given the amount of data that has been generated. He also said there have been some success stories to date, citing the use of computer-assisted pattern recognition platforms, such as Face2Gene, as well as Genome-to-Treatment, an automated, virtual system for genetic disease diagnosis that relies on whole-genome sequencing data. A paper describing the latter system appeared in Nature in July.

During his talk, Owkin's Ouardini also highlighted HE2RNA, a deep learning model that the firm claims can predict RNA-seq expression of tumors based on digitized histopathology images. Given that the answers to questions around cancer outcomes depend on a large number of factors, such as sequencing and histopathology data, Ouardini said that it makes sense to use AI for multimodal data integration. "For cancer research, it is increasingly important to incorporate some sort of spatial information at the molecular level," he said. 

While the increased availability of such new resources is "exciting," Jimenez Sanchez said that the field is still in its very early stages. "There are a few success stories, but let's not get ahead of ourselves," said Jimenez Sanchez. "Be conscious that the road ahead is long with challenges we need to meet. I am sure that we will get there at some point."