Seeking to pioneer a new business model of offering online, direct-to-consumer genomic analysis, Navigenics and 23andMe have each developed new IT tools and informatics workflow methods to analyze customer DNA, present the results, and keep clients updated on the evolving links between genetics and disease.
Officials from these firms told BioInform that data encryption, tracking, redundancy, and security are top priorities as they apply existing software and methods to assure quality control in their internal DNA analyses and securely transmit information to and from their customers via the web.
Customers also expect clear, reliable information delivered quickly and in a visually appealing format. Directing customers to scientific genome browsers like the UCSC Browser or Ensembl to peruse their personal SNP analysis will not work. “Scientists like their information detailed and technical, consumers do not,” said Navigenics COO Sean George.
These firms have also found that they need to create tools for people with all levels of experience, enabling them to directly see their SNP analyses and understand the possible health outcomes — for example, their risk of developing diabetes or Alzheimer’s.
“We spent a lot of time designing the user interface, the information hierarchy, [figuring out] the level of information you get with what page,” said George. “We are probably not even 10 percent of the way there.”
At 23andMe, the user interface has been two years in the making, “and we’re still not done,” company co-founder Linda Avey told BioInform this week. Avey said that the company’s interface was built “from the ground up” in-house. “I don’t think we have taken any off-the-shelf software.”
Avey noted that she and her colleagues come from the research world and are accustomed to sifting through genomic data. They tested several iterations of the user interface to see how non-scientists navigate the site and to determine if it is intuitive enough.
Some 23andMe engineers do the “mom test,” she said: They sit down with their mothers and ask them to go through the site. “We found very quickly that a lot of moms weren’t getting it, so we do a lot of customer experience testing,” said Avey.
At the other end of the customer spectrum are “nerd scientists” who seek much more information than the average user. “So we put [the information] in layers and if they want to keep diving for more information, it is all there, all the way down to the references, to the papers,” said Avey.
Navigenics, too, works intensively with customer feedback to explore user interfaces and navigation. While 23andMe relies entirely on in-house development, Navigenics uses a combination of in-house and outside vendors, said George. “We have human geneticists, bioinformaticists, Java developers, web developers, genetic counselors, physicians and product managers all working together,” he said.
The design grows and changes as the science and offerings expand, said George. The system includes the customer’s risk analysis, health conditions, interactions with the environment, and family history. “As you layer on input into a personalized wellness plan, it can get complex pretty quickly; that is where information design comes in,” he said.
Since the scientific literature delivers a torrent of new genomic information almost daily, these companies are also seeking ways to continuously update their customers about new findings on genetic links to conditions.
“There are various levels of automated curation tools available and we have built some of our own,” said George. There are numerous databases such as OMIM that catalog the available markers, and journal publications about genes and diseases that have been associated with them — each working as a “starting tool,” he said. “Ultimately a human curation team goes through and makes sure we can stand behind what we are telling our members.”
At 23andMe, scientific updates are accomplished more by hand curation. “You can hear a buzz around here when a really big paper comes out,” Avey said.
As evidence of the growth in company services to customers Avey said that as of November, 23andMe had 14 so-called Gene Journal articles, which are compilations of traits and conditions linked to SNPs including both heritability and environmental factors. Today it offers 60 of these internally vetted compilations. The company wants to be certain the results are “solid” so it “feels comfortable reporting it back to our customers,” Avey said.
“We came up with two buckets of information we include in the Gene Journals now,” she said. There is a “four-star” category of established research, such as a study done in a large population with matched controls, and that has been replicated. Then there is “one- to two-star” preliminary research. “That rating system really seemed to click with customers,” she said.
“Scientists like their information detailed and technical, consumers do not.”
23andMe’s first two hires were fresh grads from Stanford’s PhD program in biomedical informatics, Avey said. They searched the academic literature for databases that aggregated genome-wide association study results in a usable format. The company expected that it could then build on this resource by adding it to a proprietary database along with its own SNP information.
However, Avey said, “we found that [that] really, truly did not exist. We had to build up the database of information from the ground” up.
After compiling a list of diseases and conditions, the company’s scientists then sought technologies that could provide the genetic association data and other information that they wanted to offer customers. “Not only were we looking at health-related associations, [but] we were looking at ancestry markers, being able to provide information about your haplogroup assignments both maternal and paternal … so we had a lot of requirements for the technology,” said Avey.
Once it developed the method to predict these associations, 23andMe created a proprietary data repository it dubbed Coregen, which serves as the foundation for its web-based software. Logging into one’s account allows access to personal data through these tools.
“Depending on what data you want to look at, you go in through those tools and it links you in a very proprietary way to your own dataset,” Avey said. “We built all of the security and privacy measures around that.” Consumers who log into an account can only look at their own data. “That whole process took us about a year and a half to develop,” she said.
The web platform at 23andMe is based on a LAMP stack, which includes the open source components Linux, Apache, MySQL and PHP. “Mostly because it is very efficient and very inexpensive,” said Avey.
SNPs in Workflow
Consumer genomics firms have also needed to develop new analytical tools for genotyping arrays, because the software that comes with SNP chips was not developed for the types of services they offer.
23andMe’s service is based on a customized Illumina HumanHap550+ BeadChip that includes 30,000 additional SNPs.
“We are developing some of our own proprietary analytical tools because we feel there is a lot more we can do with the data that are generated from the Illumina platform,” Avey said.
The company is building analytical tools on top of the Illumina BeadStudio software. Although that work is mainly done in-house, 23andMe draws on some outside collaborators, including Jonathan Pritchard, a geneticist at the University of Chicago and one of the company’s scientific advisors. Avey said Pritchard has created open-source software tools that 23andMe uses, though she did not elaborate.
Although next-generation sequencing may one day have an influence in this space, Avey said she finds these large-scale studies are “still cost-prohibitive.”
Genotyping arrays, on the other hand, “have been quite successful and will be used for at least a couple of years out,” she said.
Navigenics, meanwhile, is using the Affymetrix Genome-Wide Human SNP Array 6.0, which tests for close to 2 million genetic markers, including more than 900,000 SNPs.
“We are more than happy with the genotyping that comes off the chips,” George said. There is much sample-batch testing, and the company “follow[s] that all the way through to make sure the quality is there across the board.”
Navigenics has developed its own quality-control system to ensure the genotyping calls are correct. Results are transferred to a rules engine and are tracked with a barcode that corresponds to each member’s sample, said George.
“We use some genotyping calls to double-check [that] nothing got mixed up in the lab, so we have multiple, multiple redundancies to make sure we get the right calls for the right person,” he said.
Genotyping data is piped into a proprietary rules engine developed by the firm’s bioinformatics team, and it, too, is regularly tested against quality metrics. “The rules engine works out — based on the genotype calls — what is the information we present to the customer,” George said.
To quantitatively assess how a SNP relates to a condition, the rules engine calculates a score called the genetic composite index. According to a Navigenics white paper, the rules engine was required because relative risk is usually only reported for epidemiology studies, while case-control studies use odds ratios, making it “often difficult to calibrate risk estimates.” So the company has created quantitative methods to devise relative risks as well as relative lifetime risks from the odds ratios.
“What we call the rules engine [are] our algorithms and criteria incarnate in the infrastructure we set up to run the rules,” George said. The algorithm itself is available on the company’s website. “It is proprietary but we are sharing it freely. It is part of our commitment to openness and establishing standards in this space,” he said.
Looking Down the Road
The emerging area of electronic health records stands to greatly affect direct-to-consumer personal genomics companies, and officials from these firms said they are keeping an eye on trends in that space.
“There are standards that every one seems to be closing in on for the security, the storage, the encryption, and the handshake authentication for the different providers to link up, download, or upload an individual’s health information at their request,” George said.
While this area is not a “top focus” for Navigenics — the standards are still evolving — “we are certainly going to make sure that this genotyping data can be portable to people’s personal health records,” he said.
Avey said she doesn’t think 23andMe’s offerings can mesh with current systems. “These different electronic medical record [systems] don’t talk to each other and it hasn’t been standardized,” she said.
“We are really excited about the efforts Microsoft and Google have underway,” she added, noting that it’s possible that these firms might help standardize personal health records. “That is probably a better point for us to integrate with health records on the PHR [personal health records] level.”
On the horizon, Avey said she envisions a role for 23andMe to perform bioinformatics analysis of its own. “We do plan on doing our own analysis when we get up to a critical mass for various phenotypes and diseases,” she said.
For example, the company last week announced a partnership with the Parkinson’s Institute and Clinical Center. With funding from the Michael J. Fox Foundation, 23andMe plans to develop web-based clinical-assessment tools for online communities, to expand patient involvement in clinical research, and to use the data in Parkinson’s disease studies.
That activity can include building Web 2.0 functionality around groups of people, connected by common genetic information. “We have no idea what will be of extreme interest to our customers,” Avey said. “It might be for people who know how to roll their tongue.”
Sometimes customers are intrigued by analysis that shows them where they are unique but also where they are like others. “Then they want to find out who else is ‘like me,’” she said. “That very well fits in with people interested in various diseases.”