NEW YORK – Nautilus Biotechnology is looking to capitalize on the growing trend in proteomics to use machine learning to develop lower-cost, higher-throughput, and more complete proteomic analyses than are possible with existing technologies.
The San Carlos, California-based company last week said that it raised $76 million in Series B funding, bringing its total funding to date to more than $100 million.
Nautilus was cofounded in 2016 by CEO Sujal Patel, former cofounder and CEO of computer firm Isilon Systems, and Chief Scientist Parag Mallick, an associate professor at Stanford University. The partners declined to provide much detail about the company's platform, but highlighted as one of its distinguishing features its use of machine learning to make protein identifications based on measurements of multiple parameters describing the target molecules.
While the company's approach is not mass spectrometry-based, Mallick has a substantial background in proteomics-based mass spec and is one of the main developers of the ProteoWizard proteomics software platform. Jarrett Egertson, senior principal scientist at the company, also comes out of the mass spec space, having been a graduate student and postdoctoral fellow in the lab of University of Washington researcher Michael MacCoss where he led development of proteomics techniques including the MSX data-independent acquisition workflow offered by Thermo Fisher Scientific for use on its Q Exactive instrument.
Nautilus' approach fits into a broader trend within proteomics towards moving machine learning and deep learning approaches into earlier parts of the experimental process. Such tools have most commonly been used to analyze the output of a proteomic experiment — for instance, building models linking expression levels of various proteins to a particular disease state.
In recent years, though, proteomics researchers have begun using machine learning to integrate the raw measurements used for making protein identifications in order to enable deeper and more accurate analysis of the proteome.
On the mass spectrometry side, tools like Percolator have for years used machine learning to improve the confidence of peptide identifications in mass spec experiments, thereby boosting the number of peptides and proteins researchers are able to identify in a given sample. More recently, multiple research groups have presented deep learning tools for predicting patterns of ion fragmentation, which could likewise help improve the confidence of peptide identifications and allow researchers doing data-independent acquisition mass spec to run such experiments without first generating sample-specific spectral libraries.
This week, Thermo Fisher announced a collaboration with German proteomics informatics company MSAID to develop and commercialize deep learning tools for proteomics research. Specifically, the deal will make MSAID's deep learning software, which is based on the Prosit software developed by researchers at the Technical University of Munich, available as part of Thermo Fisher's Proteome Discoverer 2.5 software package.
Emerging proteomic technologies like nanopore-based approaches are also using machine learning methods to identify proteins by integrating various parameters detected by the pores. Last year, for instance, researchers at the Israel Institute of Technology published a simulation of nanopore-based protein sensing that indicated that nanopore measurements combined with deep learning data analysis could enable proteome-scale studies.
This echoed 2017 work by researchers at the University of California, San Diego and the University of Notre Dame that likewise found that machine learning analysis of nanopore protein data could enable large-scale proteomic studies.
Last year, Lennart Martens, group leader of the computational omics and systems biology group in the VIB-UGent Center for Medical Biotechnology, suggested that such machine learning methods represented the future of proteomics.
He said that both for mass spectrometry analyses and emerging technologies like nanopore-based protein sequencing machine learning approaches would prove crucial.
Nautilus's technology falls into this latter category of emerging, non-mass spec approaches to proteomic analysis, though Mallick said that it is distinguished from some of these newer methods by focusing its analysis at the protein rather than peptide level, which he noted helps with the dynamic range challenges that have long been an issue in proteomics by somewhat reducing the sample's complexity.
He declined to provide specifics on what kind of technology the company's platform uses to make its protein measurements. A patent assigned to the company suggests that it is using an antibody-based approach. Specifically, US Patent No. 10,473,654 covers the use of a panel of antibodies, "none of the which are specific for a single protein or family of proteins," for which the binding properties are known. Proteins are iteratively exposed to panels of antibodies and the identity of the proteins is established based on the patterns of binding between the antibody panel and the proteins.
Patel, Mallick, and Egertson are also listed as inventors on a pair of patent applications covering similar methods of protein identification assigned to Ignite Biosciences, Nautilus's previous name.
Mallick said that the system will have single-molecule levels of sensitivity. He added that it could potentially be used to identify different protein modifications like phosphorylation.
Patel said the company envisions applications for the technology in drug development where it could be used for things like target discovery and mechanism-of-action studies, and in diagnostics development where it could be to identify protein markers for purposes like early detection of disease.
He said Nautilus would initially offer access to the platform as a service with the ultimate goal of packaging it as an instrument. He did not provide a timeline for when the system might become available to customers.
Nautilus currently has around 50 employees and plans to double that number in the next year and a half.