Skip to main content
Premium Trial:

Request an Annual Quote

Drug Development for Dummies


As an engineer by training, I must admit that I stand in awe of the ingenious souls who design and build the pills I pop every morning. How do they pack so much life-saving succor into each tiny tablet?

The goal of drug R&D is to find or create chemicals that are effective at treating human disease, safe, and can be manufactured at reasonable cost. This is a tall order. Like many engineering activities, it requires a combination of unfettered creativity, rigid process, dogged persistence, and tons of money.

For those of us who spend our days writing code to help make the drug discovery process a bit smoother, it’s easy to lose sight of the bigger operation that bioinformatics work is enabling. If you understand the basics of how drugs are developed, you’ll understand how you fit into the system. That’s why you need Drug Development for Dummies.



Drug R&D can be divided into three major tasks: Discovery comes up with compounds that show promise. Preclinical development refines them into compounds that might actually work. And clinical trials test the compounds on people to confirm that they really work. Then comes the final hurdle: regulatory approval.

Drug R&D is expensive (see sidebar p. 72) and time consuming. Discovery is open-ended and can drain huge amounts of time and money. The rest of the process typically takes six to 10 years.

It’s also prone to failure. Only 10 percent of compounds that enter preclinical development make it all the way to the end. The industry has been trying to improve this number for decades with little success.



In discovery, biologists study the mechanisms of disease, and chemists devise compounds that can alter these mechanisms. The boundary between discovery and preclinical development is fuzzy. Some companies leave most of the chemistry work to preclinical.

Biologists can understand the mechanisms of disease at various levels. The ideal is to understand the complete biochemical pathways that underlie disease. But usually, one knows far less.

To find relevant compounds chemists often screen a collection, or library, of compounds against a collection of assays. An alternative is structure-based drug design: chemists use software to design a compound to match the active region of the protein. This approach has never lived up to its hype. But the future may be brighter as companies and academics work aggressively to determine more protein structures.

There is enormous value in understanding disease at the level of proteins, because it provides concrete handles for learning more. Biologists, for example, can use this knowledge to discover other proteins that interact with the known proteins, or genes that are coregulated. And chemists can use it to develop high-throughput assays that measure binding affinity between compounds and proteins. Structure-based drug design depends totally on this knowledge.

In the jargon of the industry, proteins implicated in disease are called targets. There are entire conferences and reports about target identification and validation — a buzz phrase that means, “let’s understand the mechanism of disease in terms of proteins and interactions.”

Once the chemists find some promising compounds, they can feed them back to biology to learn more about mechanisms. For example, biologists can introduce a compound into a cellular assay and do expression studies to see which genes the compound affects.

The resulting process is, or at least should be, an iterative one in which biology identifies mechanisms, chemistry finds compounds that affect those mechanisms, biology then uses those compounds to learn more about mechanisms, and so on, until compounds emerge that are good enough to move forward.



Preclinical development is essentially an engineering task in which medicinal chemists and pharmacologists tinker with the compounds found in discovery so that they might actually work.

The term lead compound, or simply lead, means a compound under study. Lead optimization refers to the central problem of refining lead compounds so they work better.

The design problem is immense. The team has to maintain or improve the biological activity of the drug while improving other properties: potency, selectivity, solubility, permeability, stability, toxicity, mutagenicity, absorption, distribution, metabolism, and elimination. Plus, the designers must also ensure that the compound can be manufactured at reasonable cost.

Toxicity is the major killer of lead compounds, and its most common cause is lack of specificity. In other words, the compound does not selectively bind to the intended target, but binds to other proteins as well. Toxicity can also have a biological cause: the compound may block a target protein’s essential role in normal physiology. Another issue to consider is the toxicity of the chemical fragments produced when the body degrades the compound.

An important trend is the use of quick assays to predict toxicity. Microarrays, for instance, may help highlight toxic substances by showing that they induce predictable gene expression changes in the liver and other organs.

It is important that preclinical information and compounds shuttle back to discovery. Certainly, the discovery teams need to know about any biological toxicities that arise. And the improved compounds can serve as valuable reagents in discovery.

Preclinical is relatively inexpensive — the average cost of a successful preclinical project in 1995 and 1996 was about $6 million, according to data from the Pharmaceutical Education and Research Institute (PERI). It’s probably about $12 million now.



Here is where drugmakers try to confirm that their products work in the real world. Most compounds that enter this stage fail.

FDA regulations mandate the basic structure of clinical trials. Phase I examines safety in a small population. Phase II establishes that the drug has some beneficial effect, determines the dosage needed for that effect, and confirms safety in a larger population. Phase III proves that the beneficial effect is real and not a fluke by testing in a large enough population to rule out chance. It also verifies safety in a larger population over a longer period of time.

Phase I typically follows 20 to 100 people for a year. Phase II enrolls 100 to 300 for two years. The size and duration of Phase III depends on how small a benefit you want to detect. It also depends on the severity of the illness and the expected market size, since this influences how hard one has to look for side effects. Typically, Phase III requires many hundreds or thousands of patients over two years or more.

Costs increase dramatically as you move through the phases. According to PERI, costs jump from about $7 million for Phase I to $19 million for Phase II and $43 million for Phase III. Again, this is for 1995 and 1996, so double it.

Industry experts suggest that about 15 percent of compounds fail in Phase I, 50 percent fail in Phase II, and two-thirds fail in Phase III.

Drug R&D is a spectacular engineering challenge. Everyone seems to be hoping that omics will save the day. I join in this hope, but it’s by no means a sure thing. The central difficulty remains in extrapolating preclinical results to humans. And this is where better algorithms and predictive modeling may help all the talented drug engineers in pharma keep those pills popping out of the pipeline.



You’ve probably heard the joke about the accountant looking for a job. The CFO asks, “How much is 1+1?” The accountant glances around furtively to make sure no one is eavesdropping, furrows her brow, and replies, “How much do you want it to be?”

Well, that’s the way it is with drug development costs.

The industry trade organization, the Pharmaceutical Research and Manufacturers of America (PhRMA), claims that it costs $800 million to develop a new drug, citing an authoritative study by Charles DiMasi of the Tufts Center for the Study of Drug Development.

“Balderdash!” replies Public Citizen, a consumer advocacy group. “The true cost is $50 million to $100 million per drug.” Other studies, including one by the now defunct US Congressional Office of Technology Assessment, pipe in with numbers somewhere between these extremes.

Despite the wide range of answers, all of these studies approach the question the same way. They basically take the total cost of drug R&D and divide by the number of new drugs approved in a given year. This, of course, rolls the cost of failed projects and non-specific activities like discovery into the cost per drug. The only real subtlety is the long time lag between when money is spent and when a drug is approved.

The DiMasi study, unlike Public Citizen, includes the cost of capital. This is a big-ticket item, about 50 percent of the total cost because of the lengthy development time. Another big difference is that DiMasi looks at pre-tax costs, while Public Citizen considers post-tax costs. That accounts for a 34 percent difference at current tax rates. A third big difference is the time frame that was studied. The DiMasi study is current, while the Public Citizen numbers are older and should be increased by about 50 percent to compensate.

There are also a few noteworthy differences that don’t affect the bottom line very much. DiMasi had access to detailed, proprietary cost data from 10 pharmaceutical companies, while Public Citizen had to rely on summary data from PhRMA’s web site. And DiMasi only considered novel drugs, while Public Citizen’s low number was for all drugs, including me-toos; their high number represents novel drugs.

Stepping back from the details, a bigger question is, Why bother? What do these numbers mean? They’re not the cost of developing any particular drug, but averages that include failures and other non-specific activities. They indicate how much the company must make in order for this year’s drugs to be profitable from a theoretical perspective, but don’t even hint at how much the company should be spending to develop future profitable drugs. These numbers may pique the interest of investors, but they’re not the key numbers, since investors buy futures, not history.

The only good use for these numbers, beyond political sloganeering, is for comparative purposes. The details don’t matter much for this purpose so long as they are held constant.

More useful, I think, is a study by the Pharmaceutical Education and Research Institute that looked at the cost of successful projects. Their number for 1995 and 1996 is $75 million. I’d double it to get a current number.


— NG


Tufts Center for the Study of Drug Development

Public Citizen report on drug R&D costs

US Congressional Office of Technology Assessment
Study of Drug R&D Costs

Pharmaceutical Research and Manufacturers of America

Pharmaceutical Education and Research Institute

PricewaterhouseCoopers pharmaceutical practice

US Food and Drug Administration
Center for Drug Evaluation and Research Handbook

ArQule Drug Discovery 101

Milne, G. Accelerating R&D Productivity.
IBC Drug Discovery keynote webcasts

Walters, DE. The Rational Basis of Drug Design
Finch University of Health Sciences


The Scan

UK Pilot Study Suggests Digital Pathway May Expand BRCA Testing in Breast Cancer

A randomized pilot study in the Journal of Medical Genetics points to similar outcomes for breast cancer patients receiving germline BRCA testing through fully digital or partially digital testing pathways.

Survey Sees Genetic Literacy on the Rise, Though Further Education Needed

Survey participants appear to have higher genetic familiarity, knowledge, and skills compared to 2013, though 'room for improvement' remains, an AJHG paper finds.

Study Reveals Molecular, Clinical Features in Colorectal Cancer Cases Involving Multiple Primary Tumors

Researchers compare mismatch repair, microsatellite instability, and tumor mutation burden patterns in synchronous multiple- or single primary colorectal cancers.

FarGen Phase One Sequences Exomes of Nearly 500 From Faroe Islands

The analysis in the European Journal of Human Genetics finds few rare variants and limited geographic structure among Faroese individuals.