Skip to main content
Premium Trial:

Request an Annual Quote

Google Deep Learning May Improve SNP Analysis, But Don't Call It AI Anytime Soon


CHICAGO (GenomeWeb) – To the world, Google may talk about artificial intelligence with the best of 'em, but internally, the internet giant shies away from that term, particularly in life sciences and medicine.

Nevertheless, the company continues to make progress in applying the technology to these markets, with DNA sequencing analysis being a particularly ideal application, Allen Day, a science advocate at Google, said at the recent Intelligent Systems for Molecular Biology European-Conference on Computational Biology (ISMB/ECCB) conference in Prague.

"There's a lot of hype and branding now about artificial intelligence right now, and we're certainly as guilty as the rest in terms of the marketing department," Day said at the meeting. "Our thinking about it is that it's actually pretty difficult to build an artificially intelligent machine. ... It's hard to describe exactly what intelligence is."

Instead, the Seattle-based Day suggested, call it machine learning — which is a term Google itself features prominently on its "A.I. Experiments" web page. AI itself is the longer-term goal, sort of a far-off utopia.

"Building machines that learn is a more practical problem [than making artificially intelligent computers] because you can set some kind of objective function and talk about classifiers," Day explained during his nearly hour-long presentation that held a rapt audience despite being the last thing standing between attendees and happy hour.

"If you have some sort of labeled data you're giving to this machine under supervision, and establish some kind of function for measuring error of the machine's prediction," Day said, "you can actually make ground, slowly improving the quality of approximating some unknown function through this learning process, eventually, asymptotically, approaching this artificial intelligence idea from further away."

He said this iterative strategy that life scientists like himself tend to follow is "easier than building some kind of thing that you have requirements to document for, which is how engineers think."

To date, machine learning has involved designing criteria for the computer to follow in an effort to reduce errors, explained Day, a bioinformatics expert with a PhD in human genetics.

"The new way of doing this is to not write these rules, but to just tell the machine some labeled data and to allow it to modify itself iteratively to reduce errors, and repeat that in a loop in order to figure [out] what the rules are internally without explicitly telling it what these rules are," he said. Day called this "learning by example."

Rather than talking about artificially intelligent systems, he discussed "deep neural networks" at companies under the umbrella of Google parent Alphabet, including DeepMind Technologies, Verily Life Sciences, and Calico. Much of the work is being carried out by Brain, a research team inside Google focused on machine learning in life sciences and other industries.

Neural networks are not new, but there finally is enough data and computing power to make such networks more common and accurate. That allows designers to put many layers upon layers. "We can build arbitrarily deep networks," Day said.

"Other, classical approaches were performing better than neural networks," Day said. "By adding more computing power, we actually crossed this point where neural networks start to perform better than traditional hand-tuned models."

"The other interesting thing about these neural networks is the fact that they are composed of layers [which] gives us a nice paradigm for scaling up the algorithm," Day said. "As computing power goes up, we have the ability to add more layers."

(Google's deep-learning architecture, Inception, is described in a 2014 paper in the journal Computer Vision and Pattern Recognition.)

Earlier this year, Mark DePristo, Google's head of deep learning for genetics and genomics, spoke at the American Association for Cancer Research annual meeting about the company's efforts to apply machine learning to medicine.

In Prague, Day repeated some of what DePristo said, including that Google has found its deep neural network to be more accurate than human ophthalmologists at scanning retinal images to identify patients at high risk for diabetic retinopathy. "Consistently, the machines, in this kind of medical imaging domain, are outperforming humans," Day said.

"For some images, it's highly subjective and there's not good agreement between [human] graders," Day said. "They don't necessarily agree with themselves or with one another as to what is the correct classification of the image."

Inception got to the point where it could accurately assist physicians in detecting retinopathy only after classifying 130,000 images, Day said.

The Mountain View, California-based company is trying to do similar things for early detection of breast cancer. Day said that about one in 12 cases is misdiagnosed from biopsy results, either with false positives or false negatives. This could be due to biases held by individual oncologists, Day suggested. Computers have no such biases.

"All of this is working fairly well. It's now a matter of translating it into the clinical market via the regulatory authorities," Day said.

What Google's technology really is doing is collecting large amounts of data to measure variance, improving its computer systems' accuracy as the database grows.

"You need to have a lot of training data," Day noted. "You also want high-quality input data and labels."

That makes DNA sequencing a particularly good candidate for applying machine learning, even though there aren't always images involved.

As DePristo said in April, his team has been developing a deep-learning algorithm for germline variant calling by encoding sequencing data as images and training a computer to determine the genotype from the image. Verily DeepVariant, as it's called, can learn to call variants in data generated by many different sequencing technologies.

"There are some well-characterized samples that are available," Day said at ISMB/ECCB. "Classifiers should be complex enough that it's not easily amenable to be solved with classical techniques, standard machine learning techniques."

Single-nucleotide polymorphisms from next-generation sequencing data fit this requirement, he said.

"The errors from this variant calling process can come from a lot of different places, and this is really the source of the problem," Day explained. Current models to estimate the presence or absence of a variant "are making assumptions that these error modes are all the same across sequencing technologies." It also assumes that modes are independent.

"We thought this would be a good area to bring in the deep neural network machines and see if they can make an improvement over what the domain experts have established over the last couple of decades," Day said.

This is one reason why Google Genomics has partnered with Broad Institute since 2015. One of the first things the two organizations did together is make Broad's Genome Analysis Toolkit software available as a managed service on the Google Cloud.

DeepVariant technology outperformed GATK on human genome data, Day reported, based on in-house evaluations now in prepublication.

Verily Life Sciences brought the algorithm into the PrecisionFDA Truth Challenge in 2016 and won for highest SNP performance.