Principal component analysis (PCA), a multivariate analysis that reduces data's complexity while preserving their covariance, is widely used in population genetics and related fields, but a new study published in Scientific Reports indicates that PCA-based findings often lack reliability and robustness, leading to incorrect findings. As a result, the use of PCA in population genetics should be reconsidered. PCA is extensively used as the first and primary analysis in many studies, and the outcomes of such analyses are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. In light of PCA's pervasiveness, and given that it has never been proven to yield correct results, Lund University's Eran Elhaik and colleagues undertook an extensive empirical evaluation of PCA through 12 test cases, each assessing a typical usage of PCA using color and human genomic data. "In all the cases, we applied PCA according to the standards in the literature but modulated the choice of populations, sample sizes, and, in one case, the selection of markers," they write. The investigators find that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes, and that PCA adjustment also yielded unfavorable outcomes in association studies. Based on the findings, Elhaik concludes that PCA may have a biasing role in genetic investigations, potentially affected as many as 216,000 genetic studies, and that it should not be used for genetic investigations.
Report Finds Principal Component Analysis Biased, Unreliable
Aug 30, 2022 | staff reporter