Tallying it Up

Derek Lowe asks whether researchers really know the causes of 4,500 diseases.

The number of 'solved' diseases in OMIM is greater than the number of causal genes, because there are some allele-specific phenotypic differences which lead to some genes having multiple phenotypes. If one queries OMIM with search 0001 and restricts to entries with allelic variants, for now one gets 2951 unique genes, a small proportion of which are actually weak SNP associations. If one queries OMIM for phenotypes with molecular characterization (prefix pound sign) one gets 3751. Not sure exactly what query generates 4838. Parenthetically, the HGMD database has much higher stats as they include many more GWAS-defined SNP associations that I would myself not call 'solved' diseases. I would say the true number for diseases if by that one means phenotypes, is between 2800-3500. Whatever specific number one chooses, it is clear that most protein-coding genes in the human genome still lack any phenotypic association either as a weak SNP or as a strong pathogenic mutation.