Skip to main content
Premium Trial:

Request an Annual Quote

New York Team Crowdsources Genomic Data Through DNA.Land


NEW YORK (GenomeWeb) – Researchers from Columbia University and the New York Genome Center have launched a new research-focused website called DNA.Land, where participants can submit genomic data generated by several direct-to-consumer companies.

The platform is designed to provide users with ancestry, relationship, and other information beyond that available from any single one of the companies doing the genotyping.

"We think of people not as research subjects, but as research participants — it's a partnership," DNA.Land co-leader Yaniv Erlich, a computer science researcher with Columbia and NYGC, told GenomeWeb. "The idea is that we can recruit a large number of people and … if people are going to spend their time [participating], they want to get something back."

At the same time, the team behind DNA.Land hopes to harness the crowdsourced SNP data submitted by each individual — together with genome-wide variant patterns imputed from them — to tackle research problems that benefit from very large sample sizes.

They may eventually make it possible for DNA.Land participants to link to information from social media sites they use as a means of getting a glimpse at phenotypes and behavior in a natural, non-survey setting.

The site stemmed from the realization that millions of Americans have already opted to have their genomes analyzed by DTC genotyping companies, and many more are expected in the coming years.

"We are already in the era of ubiquitous genomic information," Erlich said. "This project is to see, 'Can we crowdsource genomes from the population for scientific studies?'"

Erlich introduced DNA.Land during a presentation at the ASHG meeting on Saturday, just 24 hours after the site went live. At that point, more than 1,250 participants had submitted genomic data to the site, Erlich said, jokingly calling the unexpected flood of interest in the site "a big headache to my programmer."

The number of participants has since climbed to more than 5,500 participants and Erlich said his team has been working non-stop to make sure the site is adept at dealing with all of the new genomic data and to troubleshoot issues so that all of the current users can receive reports.

The site is not only reaching interested geneticists who heard about DNA.Land at the ASHG meeting and over Twitter. Erlich noted that his group has reached out to potential participants who belong to the International Society of Genetic Genealogy, who are keenly interested in the sort of ancestry and kinship data that DNA.Land will initially provide.

In the past, Erlich and his colleagues used crowdsourcing to put together trees of related individuals that encompassed up to 13 million people, using data provided in public profiles of 43 million users on the genealogy site That work was presented at the ASHG meeting in 2013.

Participants are provided with a short consent document that is meant to be as easy-to-read and understandable as possible. The idea is to let them decide how much information they want to make available to others.

Individuals' data will be kept as secure as possible, though Erlich admitted that privacy breeches are a potential risk when dealing with genetic data. In a study published in Science in 2013, for example, he and his colleagues revealed the risk that men who have had their full genomes sequenced may be re-identifiable based on short tandem repeats found on the male sex chromosome. 

Erlich noted that he, co-leader Joseph Pickrell, and other DNA.Land organizers have submitted their own genetic data to the site and are subject to the same risks as anyone else who signs up.

The site supports genotyping data generated by 23andMe, AncestryDNA, and FamilyTreeDNA and uses data from the 1000 Genomes Project to impute variant patterns genome-wide.

Erlich stressed the importance of giving information back to participants, particularly ancestry information inferred from genome-wide SNP profiles. Because the site draws from multiple DTC sources, DNA.Land also offers a kin-matching service to help participants find relatives who may have been genotyped by a different company.

"A critical concept of DNA.Land is reciprocation," Erlich and his co-authors wrote in the abstract for the ASHG presentation. "To serve participants' curiosity in their genomes and family histories, our platform is built to efficiently offer analyses unavailable through DTC companies."

At the moment, the site does not provide any health-related risk information, though Erlich expressed interest in providing that type of feedback at some point in the future if there are changes to the US Food and Drug Administration's guidelines governing the return of health risk information based on DTC consumer genomic data.

"There is nothing exceptional in DNA," he said. "DNA is just a predictor, like any other predictor."

Another feature that is meant to boost participant engagement with DNA.Land is the stie's use of "gamification," Erlich explained. Participants will earn points and participation badges depending on the amount of data they submit and the number of survey questions they answer, similar to the reward features built into the social media traffic app Waze.

Once the team has dealt with the functionalities of accommodating thousands of genotyping profiles on the site, it intends to start the process of collecting phenotype information — part of a broader effort to use DNA.Land as a research tool for delving into the genetic architecture of complex traits and conditions.

For example, Erlich noted that incorporating information on Twitter use might provide phenotypic information such as the sleep patterns of Tweet-happy users who spent many waking hours on the site.

"Perhaps we can actually extract fairly interesting phenotypes," he said. "This is something that we'd like to explore and we hope that some users would be interested in something like that."

DNA.Land does not support sex chromosome array data generated by DTC genotyping providers. Likewise, whole-genome sequences cannot currently be added, largely because of the complications associated with transferring the large sequence file from a local computer to the DNA.Land server, Erlich said. 

"Right now, it's not the top priority because there aren't as many people with whole-genome sequencing," he explained. "But that's something that, in the future, we might support."