NEW YORK (GenomeWeb) – The bioinformatics coordination program of the US Department of Agriculture's National Animal Genome Research Program is soliciting feedback from the livestock genomics community to help chart its future course.
At the recently concluded Plant and Animal Genome conference in San Diego, James Reecy, an animal science professor at Iowa State University and co-coordinator of the program, provided an update on his group's activities and asked livestock genomics players for their help in prioritizing future projects.
More specifically, Reecy and his colleagues are currently accepting feedback about perceived gaps in the current informatics pipeline and how addressing these gaps should be prioritized, he told GenomeWeb this week. Based on submitted suggestions, the team will put together a survey and send it out to the community hopefully by the end of the month, pending approval by the Internal Review Board at Iowa State, he said.
The bioinformatics coordination program operates under the umbrella of the NAGRP's National Research Support Project, NRSP-8, which launched in 1993 to initially coordinate US genome mapping efforts in cattle, sheep, swine, and poultry, with horses and aquatic species added more recently. NAGRP, which is part of the USDA's National Institute of Food and Agriculture, is one of two programs that were established by the 1990 Farm Bill as a result of the recognition of the potential of agricultural genomics.
The NRSP-8 was set up, among other reasons, to facilitate communication among various interest groups, maintain genomic maps, and establish databases for sharing information among various stakeholders. It supports the activities of several collaborative research projects including one aimed at using genetic and functional genomic approaches to improve pork production and quality; and one that explores gene function as it pertains to immune response in poultry.
For its part, the bioinformatics coordination program seeks to help the animal genomics community make use of available informatics infrastructure as well as effectively share, manage, and analyze information gleaned from genomics studies. Its largest resource to date, according to Reecy, is the Animal Quantitative Trait Loci (QTL) database which was set up to aggregate publicly available trait mapping data, candidate genes and association data, and copy number variations that have been mapped to livestock genomes. It currently has QTL information from cattle, chicken, horse, pig, rainbow trout, and sheep and is working to crosslink this information with data in QTL resources for human, mouse, and rat, as well.
The bioinformatics group also works to provide computational resources that members of the community can use to analyze their data. The list of applications covers tools for tasks such sequence searching, associating genes with their unique GO terms, genetic linkage analysis of diploid species, designing gene-specific primers based on known gene structure and EST sequence information, a tool for gene ontology enrichment analysis, and more. Also available is genome browser with annotated tracks for multiple fish species, cattle, chicken, horse, pig, sheep, and oyster. Active projects for the group include the development of an ontology for animal phenotypes and a clinical measurement ontology that's intended to standardize morphological and physiological measurement records generated from clinical and model organism research.
Furthermore, the researchers are working with members of the iPlant Collaborative on a variant calling pipeline that will reside and run on the iPlant infrastructure, allowing members of the livestock community to take advantage of the compute resources and storage available on that system, Reecy said. They've also developed resources that allow members of the community to share data such as variant call files, he said.
As much as possible, the group tries to use existing bioinformatics solutions, such as BWA-MEM, Platypus, SAMtools, and the Genome Analysis Toolkit in its pipelines, Reecy said, optimizing them to work more efficiently with livestock reference assemblies, many of which he said are not as "pristine" as the human reference, for example.
Lower-quality assemblies make tasks such as sequence alignment and variant calling extremely time consuming. For example, simply running GATK's integrative genotyper out of the box on some animal genomes requires some 48 hours to align sequences and call variants — roughly 24 hours per task, Reecy said. To shorten the time to results, the NRSP-8 group worked with members of the iPlant consortium to parallelize the analyses on the iPlant infrastructure and were able to cut alignment and variant calling time requirements down to around six hours. There are also complementary efforts by other groups within NAGRP to improve existing livestock reference assemblies and to develop SNP chips for genotyping the different species, he added.
Through its survey, the NRSP-8 bioinformatics group hopes to gain some perspective about what its immediate next steps should be although it has some intuition about the most pressing needs. "One of the biggest gaps right now is in terms of raw data and the systems biology integration of that data ... [to] get from genotype to phenotype," Reecy said. Basically, "the equivalent of the ENCODE data but for livestock species." To that end, the NRSP-8 is involved in an international initiative called the Functional Annotation of Animal Genomes (FAANG) consortium, which has taken up the task of identifying all functional elements in multiple domesticated species.
Having access to this functional data and being able to combine it with existing genomic information will make it possible to do run more kinds of evolutionary analyses than is currently possible with existing resources, Reecy said. It would also improve researchers' ability to predict early on which animals are, for example, genetically predisposed to be healthier than other or will produce more nutritious meat and milk products, he added.
"Our guess is that most of [the survey responses are] going to be headed towards the systems biology integration of the different omics platforms in order to better explain the variation of the phenotype that's there," he said, adding that the group still wants to do the survey "to make sure that we are not missing something obvious."