Tallying it Up

It's a number that he says has been spotted in many places, but In the Pipeline's Derek Lowe wonders whether researchers really know the causes of 4,500 diseases. The figure, Lowe adds, is usually used in reference to the National Institutes of Health's translational medicine program.

For example, a new Clinical Pharmacology & Therapeutics article references it, saying that "we now know the causes of more than 4,500 diseases, but it has been estimated that more than 90% of these still have no effective treatment."

Lowe writes that this figure seems rather high, and he tries to trace it back to its source, which appears to be the Online Mendelian Inheritance in Man Gene Map. The scoreboard there says that the molecular bases of 4,838 phenotypes are known.

"But read the fine print: 'Phenotypes include single-gene mendelian disorders, traits, some susceptibilities to complex disease … and some somatic cell genetic disease …'" he writes. "My guess is that a lot of what's under that banner does not rise to 'knowing the cause,' but I'd welcome being corrected on that point."


The number of 'solved'

The number of 'solved' diseases in OMIM is greater than the number of causal genes, because there are some allele-specific phenotypic differences which lead to some genes having multiple phenotypes. If one queries OMIM with search 0001 and restricts to entries with allelic variants, for now one gets 2951 unique genes, a small proportion of which are actually weak SNP associations. If one queries OMIM for phenotypes with molecular characterization (prefix pound sign) one gets 3751. Not sure exactly what query generates 4838. Parenthetically, the HGMD database has much higher stats as they include many more GWAS-defined SNP associations that I would myself not call 'solved' diseases. I would say the true number for diseases if by that one means phenotypes, is between 2800-3500. Whatever specific number one chooses, it is clear that most protein-coding genes in the human genome still lack any phenotypic association either as a weak SNP or as a strong pathogenic mutation.