NEW YORK — A new study has cast doubt on the assumption that the gene regions used in forensic analyses do not reveal medical information.
In the US, forensic genotyping identification analyses rely on the genotyping of 20 highly polymorphic short tandem repeats (STRs), known as the Combined DNA Index System (CODIS) core loci. Thirteen of these STRs were selected by the Federal Bureau of Investigation in 1998 for their ease of PCR analysis and lack of known ties to private medical information. A further seven STRs were added to the CODIS core loci in 2017.
"It is important from a legal standpoint that CODIS genotypes do not reveal medical information," researchers led by San Francisco State University's Rori Rohlfs wrote in their new paper appearing Tuesday in the Proceedings of the National Academy of Sciences. "Laws authorizing the compulsory collection of DNA from certain persons may come into conflict with state privacy statutes or the US Constitution if medical information is embedded."
In their analysis, Rohlfs and colleagues examined whether CODIS genotypes were associated with differences in the expression of neighboring genes.
For this, they examined STR length variation within a subset of the 1,000 Genomes Project cohort and whether that variation was associated with differences in gene expression in lymphoblastoid cell lines. As the 1,000 Genomes Project relied on short-read sequencing, the researchers had to impute STR genotypes based on the linkage disequilibrium between STRs and the surrounding SNPs. This approach, they cautioned, varied in accuracy and was particularly lower for non-European ancestry cohorts and for STRs with more alleles.
Still, the researchers uncovered six CODIS STRs that were significantly associated with differences in gene expression levels.
The strongest signal was a significant negative correlation between the CODIS STR D3S1358 allele length and expression of the gene LARS2. Through additional analyses, the researchers found that D3S1358 is likely to be in linkage disequilibrium with both a variant that affects LARS2 expression and with DNase I hypersensitivity sites that are active in lymphoblasts to have its effect on LARS2.
Likewise, expression of the gene CSF1R is negatively correlated with genotype of the CODIS loci CSF1PO, with additional analysis suggesting that the STR could itself affect CSF1R expression or is in linkage disequilibrium with a locus that does. Additionally, D18S51 alleles were correlated with KDSR expression, particularly among Yorubans in the 1,000 Genomes Project.
These findings suggest that the CODIS loci could provide trait information, contrary to previous assumptions, and possibly even medical information, the researchers said.
CSF1PO is further intronic to CSF1R, which encodes a cytokine receptor involved in microglial regulation. Disruptive mutations in CSF1R have been linked to leukoencephalopathy, while inhibition of the related protein has appeared to be protective against some neurological conditions like Alzheimer's disease. Additionally, variations in CSF1R expression and splicing have been associated with psychiatric conditions such as depression and schizophrenia.
LARS2 and CSF1R have likewise also been associated with medical conditions like Perrault syndrome and certain skin and platelet conditions, respectively.
"These results join a growing body of work showing that CODIS genotypes may contain more information than purely identity," the researchers wrote, adding that "these findings raise concerns about the medical privacy of individuals whose CODIS profiles are seized, databased, and accessed, as well as the genetic relatives of those persons."