This article has been updated with additional information from the Chan Zuckerberg Initiative.
CHICAGO – Following the release of its first-ever policy on the release of data last month and additional funding from the Chan Zuckerberg Initiative in July, the Human Cell Atlas continues to gather data and make it available for analysis through its data coordination platform.
The data release policy explicitly states that the HCA follows the principles of the Fort Lauderdale Agreement that dates back to the Human Genome Project. That means it requires users to make their own data open to the research community as widely and as quickly as they can and seek broad consent from tissue donors.
"[M]embers of the HCA should aim to finalize agreements that allow the broadest, least restricted data release and use, including across national borders, to the extent possible," reads the Human Cell Atlas policy, finalized in July and made public in August.
"This data release policy is an early step for helping the community rally around and understand and feel safe and confident in sharing," said Jonah Cool, science program officer at the Chan Zuckerberg Initiative (CZI), a key supporter of the Human Cell Atlas.
"It really sets community expectations and principles around supporting open access to the data and [sharing it] rapidly early on so as to help collective progress," Cool said. "It's a statement by the community that when data is shared early, it's free to be analyzed."
The Human Cell Atlas was launched in 2016 to create a reference atlas of all human cell types. The project is using single-cell genomics approaches to produce 3D maps of how different cells function together, and how changes in these networks can lead to disease. CZI has been providing funding to a number of groups contributing to the Cell Atlas, including $4 million in July to a University of California, Irvine-led research group to build a map of breast cells.
Underlying the HCA is the data coordination platform (DCP), an open-source effort that the CZI technology team has been a central contributor to. Chen called this platform a hub for storing, processing, and serving single-cell data to the research community.
Redwood City, California-based CZI primarily supports this data coordination platform through four institutions: Stanford University, the Broad Institute, the European Bioinformatics Institute, and the University of California, Santa Cruz. The organization is also supporting single-cell biology and data generation at laboratories in more than 20 countries, according to a CZI spokesperson.
"CZI was an early supporter of the Human Cell Atlas with a particular focus on challenges related to helping the distributed community store and analyze diverse data types," CZI Product Manager T.J. Chen added. "We support a community via both funding and collaborative contributions to the development of important technology."
To date, the DCP has handled or is in the process of analyzing about 25 sets of single-cell transcriptomics data, according to Cool. He said that this collection of datasets — generated with different technologies at each participating institution — is "growing steadily."
Most of the data is considered early stage, providing preliminary analysis of tissue samples. However, "many of these datasets have been reused by groups developing or benchmarking new computational tools," Cool said.
This, he said, is setting the stage for meta-analyses of cross-institutional data in the future.
Cool said that CZI, founded by Facebook CEO Mark Zuckerberg and his wife, pediatrician Priscilla Chan, is helping the Human Cell Atlas build cloud infrastructure, scale the platform, and distribute data.
"We are becoming that vehicle in centralizing the data and metadata so that they can communicate and do harmonized analysis across these datasets as they're starting to come together and produce this data for the draft atlas," Chen said of the data coordination platform.
CZI is also working on improving data analysis and developing open-source tools to make this happen. For example, the organization this year released CellXGene, an interactive application for exploring single-cell transcriptomic data, and highlighted it in a poster presented at the Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB) conference in Switzerland in July.
"The Human Cell Atlas is a platform to understand cellular mechanisms for many diseases," only some of which genetic risk factors are known for, Cool said. "The DCP is a way for that community to centralize their efforts and their data and hopefully start to analyze and make progress toward those mechanisms."
He noted that the whole field of bioinformatics has been struggling with interoperability.
"In the early days of single-cell transcriptomics, there were many datasets generated from the cortex, for example," he said, and different laboratories might process similar data through their own pipelines.
"If you or I wanted to then go and reanalyze these data as a single large dataset and do some meta-analysis, bringing those datasets together because they've all been processed individually would be quite difficult," Cool said. "What the DCP has been focused on is that uniformity and helping the community progress and standardize," both in terms of file formats and in terms of workflow processes.
While the Human Cell Atlas is focused on single-cell transcriptomics and genomics today, Cool said that CZI envisions it being extended to imaging or epigenomic sequencing data in the future, if that is the direction the scientific community wants to go.
"We're really looking to see what the tissue and the HCA community really needs at the time. Right now, the need is much more focused on understanding what data people are collecting, the tissue types, as well as starting to help those people who are starting to integrate the data and evaluate the quality of it," Chen said.