WASHINGTON (GenomeWeb) – At the annual meeting of the American Association for Cancer Research here on Sunday, researchers and engineers gathered to talk about the next progression in the analysis of cancer research data: artificial intelligence.
Lynda Chin, a researcher and physician at the University of Texas System, helmed a symposium on the subject, noting that the sheer amounts of data being generated through sequencing and other technologies is up to 2.5 exabytes (1 million terabytes) per day. "We are seeing a widening of the gap between what's possible and what's being practiced," she said, as doctors cannot keep up with the thousands of new papers that are produced every year.
That's where AI comes in. It doesn't do the thinking for the doctor, but rather serves as a "virtual expert" that supplements a doctor's knowledge and helps him or her apply the most up-to-date knowledge to caring for patients, Chin said. It mirrors the relationship of a generalist reaching out to a specialist for advice.
MD Anderson Cancer Center's OEA System, for example, is an engine that has been exposed to literature and patient cases. It was developed by engineers in collaboration with subject matter experts, who built a data ecosystem containing not just clinical data, but also environmental data, lifestyle data, and much more, to help clinicians build a complete picture for any given patient.
DARPA researcher Paul Cohen spoke about the agency's Big Mechanism project, an artificial intelligence that DARPA has used to read cancer literature and build models of cell signaling, which he said could help cancer researchers move from identifying correlations in cancer to identifying actual causative mutations.
Big Mechanism starts by reading journal articles, extracting information from the texts on interactions between genes and proteins. After it read 300,000 papers, Cohen and his team attempted to teach it to use that interaction information to create a model of cell signaling networks in a given cancer.
Though there are many challenges to even get to this point — including teaching the machine to process natural language and finding the difference between interactions researchers are sure of and interactions they're uncertain of — such a network model could help researchers determine which targets to drug, and what kinds of upstream or downstream consequences could arise, Cohen said.
Google's Mark DePristo also spoke about his company's efforts to develop deep machine learning and apply those advances to medicine.
We are now at a point where machines are better at identifying images than humans are, he said. Such an advance could have uses in identifying, for example, when a diabetic person is likely to develop diabetic retinopathy by having the machine looking at retinal photographs taken by ophthalmologists. In fact, DePristo noted, Google's deep neural network does better, on average, at identifying patients at high risk for diabetic retinopathy than human ophthalmologists.
Another example where the machine excels is digital pathology. Though slides of cancer samples are more complicated to read than retinal photos, the machine can read each pixel of the image to determine whether it has a cancer cell in it.
Deep learning in genomics is slightly different, as there aren't always images involved, DePristo said. But his team is trying to develop a deep-learning algorithm for germline variant calling by encoding sequencing data as images and training a deep-learning machine to determine the genotype from the image. DeepVariant, as it's called, can learn to call variants in data generated by many different sequencing technologies. And though it's still early in the research, the team is now trying to extend this technology to the study of cancer.
Indeed, in a separate symposium on computational biology, Memorial Sloan Kettering's Alexander Penson spoke about his team's efforts to apply such a deep-learning machine algorithm to help determine what type of cancer a given patient has and whether a tumor is a new cancer or a recurrence of previous disease, all of which has implications for treatment.
The organ of origin has traditionally been used to classify tumors, but diagnosing the tumor type is challenging and proceeds by combining imaging, IHC, histopathology, and other information types, Penson said.
Genomic alteration data can also be used, however, most mutations are not limited to one tumor type. A researcher needs to look for more subtle alterations. Fortunately, he added, this type of information has become available at the time of diagnosis in cases when clinical sequencing has been adopted at the point of care.
At MSKCC, clinical sequencing has been used with a growing number of patients, and 42 percent of the assembled dataset is on metastatic samples. This data has served as an ideal training set in the development of an algorithm that could be used for the interpretation of future clinical samples.
The algorithm looks at a variety of tumoral features and makes probabilistic classifications of tumors using a decision tree approach. Broad copy-number alterations and a broad variation of features are just as important as mutations in classifying tumor types, he noted. Interactions between features are also key to determining tumor type. For example, if three diverse tumor types are driven by the same mutation, other features like total mutation counts, truncations, and so on can be combined to create a probability score for tumor type.
In a brain biopsy, for example, if sequencing reveals five mutations, including missense and hotspot mutations, the algorithm can predict a glioma with high confidence as these features are associated with that disease. Up to 65 percent of glioma samples can be identified out of 22 tumor types with a probability greater than 95 percent, Penson said.
But some tumor types are much less distinctive. For example, esophagogastric tumors are much harder to classify, and the majority of them are classified with low probability. But even in challenging tumor types, Penson said, the algorithm's probability score has matched the accuracy of findings by clinicians.
The team is planning to make the algorithm public and will soon publish data from a paper on how it works, Penson added.