Skip to main content
Premium Trial:

Request an Annual Quote

Fujitsu Labs Launches Improved Query Technology for GWAS


NEW YORK (GenomeWeb) – Researchers at Fujitsu Laboratories, a wholly owned subsidiary of Fujitsu, have developed technology that they claim will make it quicker to correlate genomic variations with environmental and demographic information such as disease and lifestyle data in the context of genome-wide association studies.

Specifically, researchers from the Kawasaki, Japan-based company have developed a data structure and processing method for quickly aggregating genomic information that they claim speeds up disease-gene variant queries by a factor of up to 360 compared to existing methods. Some of the researchers presented the technology and published an accompanying short paper at the International Conference on Extending Database Technology in Bordeaux, France this week.

Motoyuki Kawaba, a research manager with the Data Systems Project in the Computer Systems Laboratory at Fujitsu Laboratories, explained in an email that the team developed the method in response to a perceived need for faster methods of performing in-database genome analyses for GWAS. Current methods, he told GenomeWeb in an email, are "problematically slow."

GWAS studies combine multiple datasets from patients including genetics, demographic, and lifestyle information. These datasets can be integrated using standard relational database management systems (RDMS), but processing the data in these RDMS is challenging because of the large number of variants that are associated with these studies.

According to numbers from Fujitsu labs, aggregating data on a single variant across a population of 100,000 people takes about one second of processing time using existing database schemas. That means that for a single disease, aggregating variants at 10 million loci in a study population of 100,000 people would take roughly 120 days. GWAS projects require multiple iterations of this kind of analysis making improvements in processing speeds a pressing issue.

The Fujitsu approach builds on traditional RDMS by adding a new data type and aggregation function, according to the paper. The method works by storing each individual's genomic information in a single column in the database and then encoding the information on each variant with a fixed bit length for storage. By storing them in single columns, variants can be aggregated and queried simultaneously, which improves the aggregation processing performance per variant, according to Fujitsu. In contrast, variants stored in a conventional relational database table structure require repeated queries. The more variants there are, the more queries are needed.

The Fujitsu structure also incorporates a method of capturing variation in variant length without slowing down the aggregation process. Most variant types can be described using two-bit codes but some require codes of three or more bits. Fujitsu's method makes it possible to store data on variants of variable length without breaking the fixed bit-length structure, thus enabling high-speed aggregation processing.

In addition, the structure uses an encoding technique that compresses the genomic information to one-sixteenth the size it would be if the variants were stored as text strings. This means that data for even several hundreds of thousands of people can be handled in-memory which also supports high-speed processing.

According to the conference paper, Fujitsu's technology speeds up GWAS data processing by about 50-fold to 360-fold depending on which method it is being compared to. In the paper, the researchers compared the Fujitsu approach to three existing methods including a naïve method that counts the number of occurrences of each genotype for all the gene variants of patients with the disease being studied; and an external count method that uses a SQL query to retrieve genotypes of the gene variants and then exports the results into a second application that counts the number of each genotype.

The full results of the comparison are described in one of the figures included in the paper. It compares the Fujitsu method's performance with the naïve and external count methods on data from 1,000 individuals and a total of 3,000,000 variants. For the naïve method, the execution time per variant was 13 milliseconds and the external method clocked in at 1.86 ms per variant. In comparison, the Fujitsu method started out at 0.280 ms per variant but the researchers were able to get that down to 0.035 ms.

With this technology, queries of GWAS covering tens of millions of loci can be performed on a conventional computer in a short period of time, Fujitsu said. Faster query technology could also free researchers to identify disease-gene correlations that they might have overlooked in the past due to limits on the variants they could study because of time constraints.

The method is designed specifically to accelerate analysis processing for GWAS and, currently, it cannot be applied to other kinds of studies, Kawaba said. "But we are looking into the possibility." They are also working on improving the genome aggregation function using dictionary and vectorization techniques, according to the conference proceedings.

Fujitsu also plans to incorporate the technology into solutions in its Healthcare Systems Unit that targets universities, research laboratories, and pharmaceutical companies. "Globally there has been increased activity in medicine that utilizes genomic information," Kawaba said. Japan's Ministry of Health, Labor and Welfare, for example, set up a taskforce in November 2015 that will be responsible for enabling genomics-based medical care. "A driving factor behind this push to utilize genomic information has been next-generation sequencing, which has accelerated and lowered the cost of genome sequencing," he said. "In this context, we expect that our technology will be used for correlation analyses using clinical data against SNP array data, whole-exome data, or whole-genome data."

Specifically, with the new technology, customers of Fujitsu's healthcare solutions will be able perform disease-related gene searches and analyze correlations between gene polymorphisms and drug responses, according to Kawaba. Users will also be able to quickly search and aggregate accumulated genomics and clinical data including prescriptions, examination results, and disease names, as well as lifestyle data. Fujitsu did not name the specific products that will use the data aggregation technology.

Kawaba also said that Fujitsu is considering creating other products for use in genomics but currently has no plan for when such products would be released.

The Scan

Machine Learning Helps ID Molecular Mechanisms of Pancreatic Islet Beta Cell Subtypes in Type 2 Diabetes

The approach helps overcome limitations of previous studies that had investigated the molecular mechanisms of pancreatic islet beta cells, the authors write in their Nature Genetics paper.

Culture-Based Methods, Shotgun Sequencing Reveal Transmission of Bifidobacterium Strains From Mothers to Infants

In a Nature Communications study, culture-based approaches along with shotgun sequencing give a better picture of the microbial strains transmitted from mothers to infants.

Microbial Communities Can Help Trees Adapt to Changing Climates

Tree seedlings that were inoculated with microbes from dry, warm, or cold sites could better survive drought, heat, and cold stress, according to a study in Science.

A Combination of Genetics and Environment Causes Cleft Lip

In a study published in Nature Communications, researchers investigate what combination of genetic and environmental factors come into play to cause cleft lip/palate.