NEW YORK (GenomeWeb) – An international team of researchers uncovered a set of SNPs that distinguishes personal genome samples. The team further developed a platform to encode that information into quick response (QR) barcodes.
With the increasing number of samples undergoing genetic analysis, the odds of a sample being mislabeled may also be on the rise. Because of this, researchers led by Harvard Medical School's Jie Huang identified a set of 80 SNPs that can uniquely identify a personal genome based on allele frequencies among five major population groups. As they reported in PLOS One today, the researchers also developed a website to turn these SNP identifiers into QR codes that can be compared and tracked in databases.
"It is our goal to come up with a most parsimonious list of SNPs to uniquely identify any single person across the globe, through genetic data," Huang and his colleagues wrote in their paper.
About a billion people are expected to undergo whole-genome sequencing within the next 10 years, the researchers said, but they noted that there's currently a sample mix-up rate of some 0.1 percent to 1 percent. That, they estimated, amounts to between 500 samples and 5,000 samples from the 500,000 samples in the China Kadoorie Study.
To cut down on such mix-ups, Huang and his colleagues generated a list of SNPs that are common to different genotyping platforms, including various Affymetrix and Illumina arrays. They whittled the list down by limiting it to SNPs with minor allele frequencies greater than 0.25 among five major population groups: Africans, Eastern Asians, South Asians, Europeans, and Native Americans. They also removed SNPs from the list that were deemed pathogenic or likely pathogenic by the ClinGen database. That way, they noted, the SNPs used don't reveal anything about the person's health.
Seventy four SNPs met the researchers' criteria. They also added four SNPs that can predict ABO blood type and two that can predict sex. This yielded a set of 80 SNPs, more than the 60 that have been predicted to be needed to distinguish people from within the global population.
They tested their SNP set using 150,00 samples from the UK Biobank, and found it generated unique codes for each sample.
Huang and his colleagues also devised a web-based application to extract that SNP information from raw genotyping data and translate it into a QR code. QR codes, a square of black-and-white dots, can be scanned and read by imaging processors, including smartphone cameras.
The researchers said that such codes could be created and compared on their site. By examining codes, the SNP information they are based on is compared and their concordance analyzed. In that way, the different genotypes can also be compared.
Such QR codes could be used as a sample check, the researchers said. Huang and his colleagues noted that companies like Affymetrix use hundreds of markers for sample tracking, but that their SNP set would be faster and cheaper to use as it relies on a smaller number of markers.
"QR codes from different datasets can also be compared, leading to a check across commercial genotyping companies," the researchers added. "This feature has already been implemented in addition to coding and decoding QR codes."
They did note, however, that their approach allows for uncertainty and may not be suitable for forensic or paternity testing applications.