Sequencing thousands of Icelanders takes "a lot of organization," Hakon Gudbjartsson, Decode's VP for informatics tells IEEE Spectrum.
Each person they've sequenced — a total of some 3,600 people thus far — generates around 100 gigabytes of data, Gudbjartsson says. And, he adds, databases like Oracle or MySQL don't quite make sense for this purpose.
Instead, Decode, which was bought in 2012 by Amgen, is turning to a genomically ordered relations, or GOR, database.
"It's a database that organizes the downstream data according to the position in the genome," Gudbjartsson tells IEEE Spectrum. "Whether its a SNP or… a copy number variation, anything. All the tables are basically ordered according to the genome."
By combining this genetic data with clinical and genealogical information, researchers would be able to make connections to disease.
But as IEEE Spectrum points out, they could also deduce the phenotypes of people who aren't participating in the study because of Iceland's small, closely related population. Gudbjartsson notes that researchers there have signed an agreement to never make such data inferences.