CHICAGO – What started as a personal quest for a Microsoft data scientist who lost a child to sudden infant death syndrome has evolved into a full-fledged genetic research and analytics program at Seattle Children's Hospital.
The latest step is an effort to create a public database of whole-genome sequences of deceased children and their parents in hopes of discovering SIDS risk factors and to eventually find ways to prevent the deaths of thousands of babies per year.
"The goal, broadly, is to try to sequence as many children or families who have unfortunately had SIDS," including sequencing brain tissue and other samples of the deceased in search of genetic risk factors, explained Ghayda Mirzaa, a pediatric neurogeneticist at the Seattle Children's Research Institute (SCRI) and the University of Washington.
This builds on work going back about two and a half years that resulted in the publication of a paper in the April issue of Pediatrics that looked at the role maternal smoking played in SIDS.
In that study, working with the Seattle Children's Research Institute and other academic centers, Microsoft data scientists applied machine learning and business intelligence technology to a Centers for Disease Control and Prevention database on 26 million births and deaths in the US and to other publicly available datasets. They found that 22 percent of deaths from SIDS and the broader category of sudden unexplained death in childhood (SUDIC) could be prevented by parents quitting smoking not just during pregnancy, but also in the three months before conception.
That work looked at environmental factors from population-level data. The SIDS genomic database extends the research to include genomics that could help identify genetic markers and risk identifiers.
"From there, it really opens up. Finding genetic risk factors is only the first step toward understanding more fundamental things like the mechanisms of SIDS and interventions that can be used," Mirzaa said. If the researchers do find any genetic risk factors, it could lead to future research into mechanisms, dysregulated pathways, body systems affected, and potentially helpful interventions, she added.
"A goal that is related to that is to make any data we generate available to collaborators and researchers across the world," Mirzaa said. With that in mind, the research institute and Microsoft are looking to create an open database.
Seattle Children's has previously sequenced children with other brain disorders, but did not have a database of SIDS cases. The sequencing and research infrastructure is in place at SCRI to perform the molecular tests, generate data, set up a repository, and perform secondary analyses.
"More importantly, there's a network of collaborators here and investigators that are super experienced and can help guide this project," Mirzaa said. SCRI has archival tissue from more than 100 cases, and the immediate job is to sequence as many SIDS families as possible.
The impetus for this project came from Microsoft Chief Data Analytics Officer John Kahan and his wife, Heather, who lost their infant son to SIDS in October 2003. Kahan has since dedicated his work to preventing SIDS — which applies to children between one month and one year of age — and the broader category of SUDIC.
The SIDS project started on a volunteer basis with Kahan and other Microsoft data scientists joining Seattle Children's researchers. A year ago, they formalized it as the Aaron Matthew SIDS Research Guild at Seattle Children's Hospital, with participants from institutions including the University of Bristol in the UK, the University of Auckland in New Zealand, and the University of Virginia.
The Kahans personally donated $100,000 plus additional funds to cover all overhead for the research guild, and Microsoft made a contribution through its $115 million AI for Good initiative that provides data science, technical resources, and grants to apply AI to humanitarian efforts. Including resources contributed by Seattle Children's, the SIDS project has raised more than $1.5 million in cash, infrastructure, and support services.
Microsoft has been able to scale up its genomic analysis through its partnership with St. Jude Children's Research Hospital on the St. Jude Cloud.
"Because of St. Jude, we get to scale. The one cost factor here that you can't eliminate but we can do at scale is the wet-lab sequencing," Kahan said.
Since the Seattle Children's program started, it has cost about $4,000 to sequence each sample. To reach every US child affected by SUDIC and their parents would run about $5 million a year. With the scale afforded by the St. Jude Cloud and other partners, the sequencing cost is down to about $1,400 per person, Kahan said.
The St. Jude work, for which Microsoft donated cloud services, data science expertise, and genomics processing capabilities through the Redmond, Washington-based software giant's AI for Humanitarian Action effort, created a framework for fast-tracking institutional review board approvals, a process that could take years when building a new genomics cloud database.
"Because we had already cut our teeth with St. Jude and had all the processing approach and the legal approach and the privacy approach and how to de-anonymize data and create the right security on the data, we replicated the exact same process for Seattle Children's," Kahan said. The IRB signed off within six months.
Some of the first batch of whole-genome data on SIDS victims has come back. Detailed analyses will start once the SCRI has all of the data over the next few months to look for risk factors. "From there, we are going to hopefully continue to sequence other families," Mirzaa said.
After that, they will look to refine research techniques and open the database up to outside researchers.
The ultimate goal, according to Kahan, is to build a database that supports prenatal screening for SIDS risk. He envisions SIDS risk assessment becoming part of amniocentesis. "You could do this theoretically because we are focused on trinomial sequencing, where you do the two parents and the child," he said.
On the technology side, Kahan hopes that the markers the research identifies eventually get factored into safety products and practices.
"Safe sleep is not a preventative nature. It's not a causal feature," Kahan said. "If you put a child on its back in a crib with nothing in it [as is currently recommended], it's nothing more than putting a seatbelt on that child. It will prevent death in certain scenarios, but it doesn't prevent death in every scenario."
But the knowledge that using a seatbelt or putting a baby on its back in an empty crib could be built into machine-learning algorithms.
Kahan is involved in another project to build AI into video cameras to monitor sleeping children. Such smart cameras would be able to detect when a child is in an unsafe sleep position and send a warning.
"That's why we're working on things simultaneously: research, machine learning around genomics, which is critical, and then machine learning will lead to behavioral change in products," he said.