NEW YORK (GenomeWeb) – BC Platforms and Microsoft have agreed to work together to provide cloud-based genomic data management solutions for Codigo46, a commercial laboratory established in Mexico that seeks to build and maintain what is expected to be the largest biobank of genotype information in Latin America.
Codigo — which means Code in Spanish — is building the bank in collaboration with Mexican health and research authorities. Over the next three years, the company and its partners hope to genotype one million Mexican patients. They hope to eventually expand the biobank to include genotype and phenotype data from participants in other Latin American countries including Colombia, Argentina, and Brazil, Codigo CEO Lorenza Haddad told GenomeWeb.
The ultimate goal is to be able to build a database of the Latin American genotype, which is currently understudied, she said. Currently, most tests done on Latin American populations compare their data to European samples, but differences in the genetic make-up of the two populations as well as differences within populations with ethnic Mexican ancestry could confound those results, she said.
Through this initiative "we want to translate the technologies that have been developed outside of Mexico ... and [use them to] start producing knowledge and technologies within Mexico and other Latin American countries," Haddad said.
Initially, Codigo and its partners will focus on recruiting patients with metabolic diseases such as diabetes and associated complications, but they hope to expand that to explore hereditary cancers and psychiatric disorders in the patient population. According to Haddad, diabetes is of immediate interest because it is the second-leading cause of death in Mexico and a major cause of disabilities in later life.
Currently, an estimated 15 percent of the Mexican population has diabetes, although only about 10 percent of those cases have actually been diagnosed. "There's a huge percentage of the population that is not treated and [whose] diabetes is not controlled," Haddad said. And those numbers are expected to grow as obesity rates in the country rise. Currently, about 15 percent of the country's federal health budget is spent on diabetes, so "it's a huge issue," she said.
Codigo46 and its partners have begun recruiting and collecting saliva samples from the first set of patients from local hospitals and health centers — members of the public in Mexico can also contribute samples to the biobank as well. Codigo hopes to recruit and genotype as many as 150,000 participants — both adult and children — within the first year of the project, according to Haddad. They will use a customized version of Illumina's global screening array that will cover 50,000 SNPs to genotype patients. The array is currently being developing with several genetics researchers in Mexico who have expertise in the disease areas of interest to Codigo, she said.
In some cases Codigo will accept data from patients who have been genotyped in the past, however, it may re-genotype some patients if their previous testing focused only on specific SNPs and was not as comprehensive as the current project requires, Haddad said. The partners will also collect phenotype data from each patient, she said. In return for their contributions to the Codigo bank, patients who have the diseases of interest will receive personal pharmacogenomics profiles that they can share with their clinicians. Non-clinical contributors will receive ancestry profiles as well as some details about their risk for certain diseases.
All of the genotypes collected will be housed on the Microsoft Azure cloud and managed using software from BC Platforms. The repository will also hold some phenotype data on patients collected from electronic medical records and other health surveys. To provide the needed software for the project, Codigo and BC Platforms signed a partnership agreement in July this year. Under the terms of that agreement, BC platforms agreed to develop a customized technology platform that would allow Codigo46's customers and collaborators to securely access and analyze biobank data as well as store large volumes of data. BC Platforms had already signed an agreement with Microsoft back in March to provide its genomic data management solutions on the Azure cloud. It is now working with Microsoft's genomics arm on solutions for Codigo46.
"Since the announcement earlier this year of our collaboration with Microsoft, we have been optimizing our product offerings for Azure utilizing both advancements in software and cloud hardware," BC Platforms CEO Tero Silvola said in a statement. "Microsoft has built a scalable cloud-based service that enables us to easily and reliably process large volumes of genomic data, and we are leveraging this in our partnership with Codigo46."
Silvola told GenomeWeb that BC Platforms is deploying and implementing several of the proprietary solutions that it has developed to support Codigo's efforts. He noted that the Codigo effort has much in common with an initiative launched by the University of Colorado Anschutz Medical Campus, which also uses BC Platform's infrastructure. That program, which launched in 2015, aims to develop a proprietary databank that complements existing institutional tissue repositories and will facilitate efforts to develop diagnostic tests. Researchers involved in the project expect to genotype roughly half a million patients per year.
Under the terms of the agreement with BC Platforms, Codigo will use several components of the company's biobanking infrastructure. This includes BC Data which is a storage solution for handling genome and pedigree data as well as clinical data. They will also use BC Merge which offers tools for integrating and harmonizing data from disparate sources and environment. Lastly, they will use BC Genome, which provides tools for analyzing and managing genomic data. According to the company, BC Genome supports over 30 peer-reviewed academic analysis packages and in-house scripts, and offers tools for integrating genomic and phenotypic data, among other features.
For its part, Codigo selected BC Platforms to provide infrastructure for the project because the company offered collaboration rather than an off-the-shelf solution, Haddad told GenomeWeb. "They were not just capable of doing what we needed them to do, but they also wanted to build a partnership like we wanted to," she said. "We wanted someone that would help us in this journey and not just sell us a product."
Following the initial agreement with BC Platforms in July, the partners shopped around for cloud providers. "Even though they had already partnered with Microsoft, they were open to [searching for] the best possible option for us [and] looking at different options," she said. Codigo was ultimately attracted to Microsoft because of the company's investments not just in providing hardware but also on making sense of the data. They are "actually focused on what this genetic data means, and how to use it, and how to keep it and maintain it and have it be useful," she said. "That's why we really like them."
"Microsoft has been investing in this domain quite significantly," Silvola added. As one of a number of companies tapped to participate in an incubator program for genomics companies established by Microsoft, "we have received really significant developer support to optimize the cloud for not only big data but also very complicated data," he said. Furthermore, Microsoft has a number of customers in the healthcare and clinical space, Timo Kanninen, BC Platforms' chief architect, added. As BC Platforms seeks to link clinical, healthcare, and genomic data, it makes sense to partner with a vendor that already has customers using its infrastructure for one or all of those data types, he said.
There are also some technical benefits to using the Azure cloud, according to Kanninen. One of these is a tool called Batch, which is used for scheduling and managing compute jobs on tens, hundreds, or thousands of virtual machines. Previously this was only available for Windows, but Microsoft has now made it available on the Linux platform, which is what most statistical algorithms used for genomics are run on, he said. This means that users now have the ability to launch hundreds or thousands of servers in the Linux environment and run jobs on them. Kanninen said that partners plan to use the tool for the Codigo project.
Codigo plans to make the data it collects available for academic and commercial research use for yet-to-be-determined fees. All of the data will be de-identified before it is shared. The exact pricing details are still being discussed, but the idea is to have a pricing structure that will be based on the disease being studied or the genes that people are looking for as well as the amount of data they want to access, Haddad said. For example, if a customer wants access to both genotype and phenotype data from the EMR, they would have to pay more than a customer that wants just genotype data.