Drug discovery has never been easy. No one knows how many plants our ancestors tried — or were poisoned by — before figuring out that willow bark could ease a headache or that aloe could soothe a sunburn. Clearly, the drug discovery process has been modernized since then, with advances including high-content screening and the advent of semantic Web tools, but the failure rate remains exceedingly high while the time commitment is also incredibly long. It can take anywhere from three to seven years to go from the discovery to preclinical stages, and the rate of failure is around 90 percent.
Due to the sky-high failure rate during discovery and development as well as drug withdrawals from the market, Duke University's Allen Roses says that drug companies are drawing back. "The companies essentially responded — if you look at it across the board — by reducing their discovery capacity. [This has] led to the dismantling of the machine," Roses says. Pharma, he says, is more interested in picking up drugs that are further along the path than in starting from scratch.
At the same time, universities have become more and more interested in translational medicine. "Universities are now very, very interested in what is being called translational medicine, but what is being meant is testing some of their drugs in phase I and II," Roses adds.
In order for academia to shoulder more of the drug discovery and development effort, the process had to be adapted to play to academia's strengths. Don't count pharma out, however. They, too, have new methods and collaborations to feed the drug pipelines. Here, we offer a sampling of different efforts all aimed at finding ways to ease the discovery process.
Doing the Impossible
As researchers uncover the molecular and genetic basis for disease, they realize there are some parts to the process that don't easily lend themselves to being targeted by drugs. It's incredibly difficult to use small molecules to modulate transcription factors, small RNAs, or oncogenes. "People often used the word 'undruggable,'" says the Broad Institute's Stuart Schreiber. "I think we all understand that 'undruggable' doesn't truly mean undruggable. It just means 'we haven't figured out how to do that yet.'"
Schreiber is taking on those "undruggable" aspects of the disease process. "So many diseases are fundamentally diseases of cellular deficiency. If you could make a small molecule drug that causes trans-differentiation of an abundant cell type into the cell type that's deficient, like dopaminergic neurons in Parkinson's, then you could develop a completely new approach to drug discovery," he says.
In tackling these toughest problems, Schreiber and his colleagues had to rethink their plan of attack. They've made a series of innovations to update basic chemistry, change how screening libraries are made, and understand the mechanisms behind how small molecules act.
The most important feature of any drug discovery effort is the screening library — it's what the targets are matched up against. Schreiber says that the molecules included have to enable research and give starting points from which those difficult targets and processes can be studied. Then if molecules complement a target, they'll have to be optimized. To hasten that process, Schreiber begins with a collection that is easily adapted for medicinal chemistry. "You do that by using a short synthetic route that enables ideally every atom in the molecule to be modified without having to change the synthetic pathway," he says. He also keeps that part of the process as simple as possible by restricting the synthetic pathway to no more than five steps.
Instead of employing standard screens, Schreiber and his colleagues are using what he calls niche-based screening. "We need to learn how to bring complex biology to small molecule screens, meaning perform … high-throughput screening with primary cells and tissue that maintain their physiological properties, ideally human cells," he says. For example, in their diabetes-centered effort, they used human pancreatic endocrine cells from organ donors rather than cell lines.
In PNAS a few months ago, Schreiber and his colleagues published an article describing how they used a stable isotope labeling approach to, in an unbiased manner, determine what proteins bind to small molecule probes or drugs. "That gives us a comprehensive list of all proteins with which a biologically active small molecule binds, or interacts, in cells," Schreiber says. "Included in this list would be the so-called target, but also would be proteins that may be not relevant to the intended target but relevant to all the other things the compound might be doing."
Even though Schreiber and his group are still in the early stages of this work, it's begun to yield results. About a dozen clinical trials have been sparked from results of this method, though all of them are testing existing FDA-approved drugs for other targets that Schreiber's groups linked them to. "Only now would I say that approach has yielded compounds that one can at least contemplate their further development for clinical investigation," he says.
Roadmaps and Rare Diseases
After the Human Genome Project ended in 2003, the question on a lot of minds was: What next? Later that year, NIH announced its Roadmap Initiative to highlight gaps in biomedical research. Among those initiatives was the Molecular Libraries program, which set up a network of screening centers to provide a means to research biological pathways and aid in drug development. "The molecular libraries effort was and still is very much a fundamental research, basic tool … and created a pre-competitive space for small molecule science," says Chris Austin, director of the Chemical Genomics Center at NIH. Another recently launched effort at NIH will focus more squarely on drug development, namely for therapeutics for rare diseases.
The Molecular Libraries Initiative was multi-faceted, Austin says. A large portion of it was setting up a network of screening centers, both at NIH and around the country.The NIH repository, housed at BioFocus DPI in California, contains 350,000 compounds, though Austin says it is continually expanding and should hold 400,000 by the year's end. The rest of the network is composed of four comprehensive screening centers and four specialty centers. "The comprehensive centers are bigger and they have to have in them assay optimization, screening, informatics, medicinal chemistry," Austin says, adding that they do have some differences based on what technologies they use and the scientific interests of the center's PI. "The way it was set up was to be enabling to all researchers," Austin says. Two of the specialized centers focus on chemistry and work with the other centers to optimize the medicinal chemistry step of the drug development process. The other two specialized centers focus on screening of ion channels and on flow cytometry.
All of the molecules uncovered by the screening centers are deposited into PubChem, which was also created as part of the Molecular Libraries Roadmap. Since the compounds are then freely available, Austin says it's hard to gauge whether or not drug companies pick up what they uncovered for further development.
At the end of May, NIH announced a new initiative, the Therapeutics for Rare and Neglected Diseases program, or TRND, that will create a drug development pipeline for rare diseases. "In broad strokes, it's going to work, we think, in a similar way to the molecular libraries and the center that I run," Austin says, noting that it will likely be a year and a half until the program is fully up and running. The program will solicit proposals from the molecular libraries centers, other academic researchers, disease foundations, and even biotech and pharma. Those proposals will then be evaluated through a peer-review process to determine their drug development potential.
The program will not be easy; failure rate at this step of the drug development process is extremely high at 90 percent. "One of the very difficult questions is — as in all of science, but perhaps in some ways even more difficult in this arena — is when to stop," Austin says. "How hard do you want to push it?" Currently, he says NIH is considering a baseball-style approach. If a drug doesn't appear to be moving forward, they'll move it back down to the minors and have the primary researcher do a bit more work on it before bringing it back up to the majors. "The TRND program will spend money — 20 to 25 percent of its effort — on the novel paradigms and technologies to make this process from lead to IND, which is the space that TRND works in, more efficient and the success rate higher," Austin says.
High-throughput screening burst onto the scene about six years ago when it was included as part of the NIH Roadmap Initiative. In 2005, the University of Pennsylvania's Center for Molecular Discovery joined the Molecular Libraries Screening Centers Network to provide high-throughput molecular screening to identify small molecules for biological assays and synthetic chemistry. "We have 750,000 small molecules that we manage here," says center director Scott Diamond. "We'll oftentimes screen anywhere from 10,000 molecules, if it requires a zebrafish, to a quarter million molecules, if it's a robust assay."
The Penn center is also under the umbrella of a CTSA grant awarded to the university's Center for Translational Medicine and Therapeutics, which allows them to solicit proposals each year.
Diamond says the team weighs the merits of the proposals based on the tractability of the target and screen, as well as the importance of the target. He adds that they aren't looking to reinvent the wheel, so they avoid targets for which pharma already has clinical candidates. "If the diseases are neglected by pharma and they present a lot of very tractable targets for discovery, that makes them very amenable," Diamond says. Like Schreiber, Diamond doesn't shy away from challenging projects, such as disrupting protein-protein interactions, as long as they have the underlying biology down and the assays to validate the mechanism of action. One focus of the center is infectious diseases for which there isn't much in terms of business prospects, such as SARS and Ebola.
So Much Content
Over the years, even the go-to screens have become much improved and more targeted. "We've gotten a lot smarter about choosing the types of chemical to screen," says the University of Pittsburgh's Billy Day. "Now, we look at these structures of chemicals in given libraries to ask the question, 'Should the structure have the activity that I desire?'"
Day is part of Pitt's Drug Discovery Institute. "We all bring our different areas of expertise together to try to build new, smarter assays for screening purposes and then focus library development after that. [We] come back and screen again, and, finally, take it into the more classical pharmacological evaluations," Day says.
The center's forte, Day adds, is in cell-based assays. "We use live cells and then use multiple fluorescent probes to get simultaneous information about the variety of different endpoints inside of the cell," he says. "The real beauty of that is you do that on [a] thousand cells and your statistics become extremely good. If you do it on more, it gets even better." Not only do these high-content screens generate great data and reliable stats, Day says they can also answer the questions that high-throughput screens can't, such as: Does the drug actually enter the cell? And if it does, is it working in the region where it ought to be?
Day's colleague Andreas Vogt has been pushing multiple parameter fluorescence imaging technology for studying zebrafish. "He's now developed that to be useful in zebrafish embryos and [he's] utilizing some really spectacular rule-learning software that was originally built for looking at satellite images, for example," Day says.
Day and Vogt worked together on a study to come out this summer in Nature Chemical Biology that used this zebrafish-based screen for drug discovery. In it, they searched for an inhibitor of fibroblast growth factor, a factor that is essential for cardiac and hindbrain development in zebrafish embryos. "There's some obvious clinical correlates to the development of the hindbrain and cardiac development," Day adds. Another colleague, Mike Zhang, made a zebrafish that produces GFP unless FGF spindling is off. Those embryos lacking FGF don't develop properly and the researchers screened for a chemical that would turn the spindling back on -- and turn the zebrafish green. With that chemical in hand, Day's lab set to work to do the medicinal chemistry to find an even better version of it.
Integrating Data and Semantic Web
With all the projects out there examining different aspects of potential targets and drugs, there is an incredible amount of data that could be mined to pick better compounds or targets. At Carleton University, Michel Dumontier is working to apply semantic Web technologies to that data so relationships between disparate data sources can be seen and applied to drug discovery. "Just as protein structure prediction is one holy grail of bioinformatics, another one is to incorporate really diverse knowledge all at once," Dumontier says. "The whole idea of the semantic Web is that people can publish their data in a format that makes it a lot easier to integrate."
Then it would be possible to ask questions across all the different types of data. "To answer questions you need the semantics. You need to develop ontologies, you need good reasoning techniques, and you need a coherent data model, some framework by which you can hang all the data," he says.
If you want an ontology about gene regulation, Dumontier says, you'll need to have a group of experts — biochemists, physiologists, and geneticists — who understand the general principles as well as the exceptions of those principles. "They know, 'Oh, that can happen but not in this circumstance.' Your ontology ultimately has to reflect nuance in terminology. It's flexible enough to accommodate 99 percent of the information and maybe that one percent is still debatable," he says. Another way that Dumontier designs ontologies is through use cases. "We'll say, 'I want to ask this question and by the end of the day I want an answer.' So when I integrate all this knowledge, I want to pose a query and get my answers and move on with my life," Dumontier says.
Then they map their rich ontologies with the available databases, of which Dumontier says there are thousands. "The big challenge then is: how do we map the terms and the relations that's in the ontology to what's in the database so that we can pull this legacy data out of its silos and integrate it with the Web and everything that we know about it?" he says. One of his students is attempting to automatically map the database knowledge into an ontology.
Once that's in place, they'll be able to create workflows to move the data along from one program to the next. In particular, Dumontier uses Taverna, a workflow program for life scientists that allows users to tap into Web services and chain them together. For example, Dumontier says it can get the FASTA sequence from the NCBI database and feed that sequence into the next program "That's really the idea of semantic Web services, where output of some program is the input to some other program," he says. "We can actually have a question where there is no known knowledge and the knowledge is not in any database but requires a computation to generate the information."
Pharma Flowing Data
Sometimes pharma can't generate all the data it needs on its own. Merck has teamed up with the H. Lee Moffitt Cancer Center in Tampa to gain access to valuable medical data that Merck researchers wouldn't otherwise see to aid their drug discovery efforts in oncology. Researchers at the Moffitt obtain tissue samples or biopsies from patients in Moffitt's Total Cancer Care Trial, and those samples and redacted patient information are sent along to Merck. "The value of the Moffitt is the access to all of those samples but the thing that is really of value is the medical history, the outcomes associated with treatment, et cetera," says Martin Leach, the executive director of information technology for Merck Research Labs.
That data, though, has to be well organized for researchers to make any sense out of it. Leach and colleagues developed an information pipeline between the Moffitt and Merck that is standards-based. In particular, they use the Clinical Data Interchange Standards Consortium's Study Data Tabulation Model. "We leverage the CDISC SDTM standard, so that the context and the relationship of the information — where it lived in at various repositories at Moffitt and the affiliate hospitals — that context is retained as well as mapping into how we represent oncology at Merck," Leach says. "It's a lot of work."
Each week, Leach says, data from the Moffitt with medical information associated with history, pathology, and outcomes is folded into Merck's databases where it is combined with expression profiles and other basic research information. Those databases are built on IBM Janus and are, Leach says, an industry standard. This mixing is now done through automated coding and streaming. After Merck has captured gene expression data, that information is then flowed back to the Moffitt in a gene expression standard format. The goal is to "apply a standards-based approach so we really can effectively move that information in a systematic way," Leach says.
Internally, they are also working to make the user interface as friendly as possible for the researchers to query the vast amount of data that would normally be kept in separate silos. "We are working with [our] partners at the moment to essentially take information from multiple disparate databases and provide the scientists a very simplistic, graphical way of [searching]," Leach says. "Internally, we have this platform that we are licensing to essentially allow people to use a drag-and-drop interface. 'OK, this database here, this query here, we're going to combine these two data streams and we're going to say that this one is greater than that.' It's very graphical and very, very simplistic — but extremely powerful in how it can perform that data aggregation across silos."
With the data now accessible, researchers at Merck are getting a better look at the underlying biochemistry and 'omics of oncology. The data, Leach says, can be interrogated to find and validate biomarkers. By combining gene expression data and outcomes data, they can essentially do pharmacogenomics and find a patient population that will respond to a certain drug. Then Merck can "proactively engage patients prior to giving the drug in the future. That's just one of the outcomes that we're looking at, novel biomarkers," Leach says.
Virtual Institutes and Virtual Companies
After Allen Roses went back to Duke University in 2007 from GlaxoSmithKline, he launched the Duke Discovery Institute, a "virtual institute" to provide academic researchers with the know-how to further develop candidate drugs all the way to proof-of-concept trials. "What we are providing is expertise where people who are in the university who have an idea or data or have something that they think is going to move toward making a drug can get some consultative help in terms of what they need to do next and what they don't need to do," Roses says.
At the same time, Roses has set up three companies — not affiliated with Duke — that also aid in the drug development process. One of those, called Zinfandel Pharmaceutical, is gearing up to do what Roses calls "drug discovery in reverse." Ever since his lab uncovered the ApoE gene in the early '90s, researchers have been searching for the other half of the Alzheimer's disease puzzle. Roses says he's found it. He plans on presenting these new genotypes to the International Conference of Alzheimer's Disease in July.
With this new information, Roses is beginning to study whether risk predictions for Alzheimer's can be made based on age, ApoE status, and these new genotypes. "If you look at a population between the ages of 62 and 87, you can make a determination based on their age, ApoE genotype, and these other genotypes that we're about to report, [and] basically say who is at high or low risk over the next five years of developing Alzheimer's disease," Roses says, noting that this risk differs from a person's lifetime risk. "This basically allows you to divide the population into manageable groups exposed to high risk for the next five years and those at low risk for the next five years. It takes an epidemiological approach and now narrows it down with pharmacogenetics into a manageable number of people who can be looked at for a prevention trial."
Roses' other companies, Shiraz and Cabernet, are diagnostic commercialization and project management companies, respectively. All of his firms, he says, are very small and created for specific aims. "Zinfandel is an example of a very specific company. It's been formed solely for the purpose of prosecuting this Alzheimer's disease age-of-onset delay or prevention study," Roses says. If the government decides to fund the Alzheimer's prevention study, Roses says, then Duke will carry it out and Zinfandel will be the project management group, but if the funds come from companies, Zinfandel will hold the reins. "It's a construct, an internal virtual company that gives us choices in how to move forward," he says.
Filling a Void
A good portion of the hold-up in drug discovery is the need for new tools and technologies to aid the search for new targets. However, it doesn't make much sense for pharma to spend its resources so far upstream where it is far from certain it will be able to commercialize the effort. A group of drug companies, including Eli Lilly, Merck, and Pfizer, teamed up with venture capital firm PureTech Venture to create a company that would focus on developing technologies for this early part of the drug discovery pipeline.
"It was a recognition by the pharma partners that there were opportunities to collaborate and share resources around what I would describe as enabling technologies — technologies that make drug discovery and development better or more efficient, but themselves are not competitive," says David Steinberg, the founding CEO of Enlight Biosciences and a senior principal at PureTech Venture.
Enlight Biosciences was founded just about a year ago to spin out companies to take on the challenge of creating technologies that would fulfill the needs of the R&D pipeline. According to Steinberg and Vijay Murthy, Enlight's associate director of technology and business development, these technologies will focus on everything from platforms for discovering novel targets to new chemistry for creating and screening drugs, new approaches to discovering and validating biomarkers, and new methods for delivering drugs. "Pharma's recognized that if they just waited around for other people to bring it to them, they could be waiting a long time," Steinberg adds. "And they also recognize that each of them trying to go it alone and develop all those capabilities internally was too expensive, would take too long, and requires too much of a shift of focus away from their core work."
Already, Enlight has launched a medical imaging company called Endra that's focused around photoacoustic tomography, a technology that combines optical imaging with ultrasound capabilities.