Pacific Northwest National Laboratory's (PNNL) Ronald Taylor has published an overview of Hadoop, the popular open-source software framework the supports data-intensive distributed applications. Taylor's paper in BMC Bioinformatics looks at how Hadoop has been adopted by the bioinformatics community, with a specific focus on next-generation sequencing.
Hadoop, an open source implementation of the MapReduce programming paradigm — a framework for processing huge datasets developed by Google — is a cost-effective method of analyzing data on commodity Linux clusters and the cloud. Taylor also discusses some of the major open source project that are built on top of Hadoop, including the Hive framework used for ad hoc querying with an SQL type query language, and Pig, a high-level data-flow language for bath processing of data.
The Magellan project, a joint research effort of the National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory, and the Leadership Computing Facility at Argonne National Laboratory (ANL), uses Hadoop and HBase, a non-relational distributed database, on a cluster at NERSC and have been run using Hadoop in streaming mode for BLAST computations. NERSC is also evaluating the use of Hadoop and solid state storage, a low-energy memory technology that is being explored by the HPC community.
Taylor concludes that "for much bioinformatics work not only is the scalability permitted by Hadoop and HBase important, but also of consequence is the ease of integrating and analyzing various large, disparate data sources into one data warehouse under Hadoop, in relatively few HBase tables."
For a good breakdown of Hadoop and the history of MapReduce, check out this video:
Advances in Single-Cell Genomics: Live Cell RNA and Circulating miRNA Detection
Sponsor: EMD Millipore
Data presented in this webinar illustrates the value of live cell analysis at the single-cell level to identify differences in expression levels across populations of cells. The cells remain intact for downstream analysis. Our experts also discuss the use of SmartFlare RNA detection probes for the direct quantification of circulating miRNAs with rapid processing of blood plasma/serum, which is done without the use of enzymes. Using circulating miRNAs with established roles in cancer and quality control, we can accurately detect these miRNAs in plasma using a microplate fluorometer within an hour after plasma preparation.
Optimization of NGS Library Preparation: Low Inputs and Fast, Streamlined Workflows
Sponsor: New England Biolabs
Library preparation methods continue to be challenged by the requirement for faster and more efficient protocols, using lower input amounts. In this online seminar, recorded Feb. 7, 2013, experts discuss new approaches to tackle these challenges, particularly for bacterial and exome sequencing.
Studying for a dual degree in mathematics and molecular biology, Nick Tatonetti became interested in using computational models to study biology and make sense of its massive datasets. As a bioinformatics PhD student at Stanford, he developed new statistical models and computational approaches for analyzing drug effects and drug-drug interactions.
At Columbia, Tatonetti is now focusing on molecular mechanisms of drugs. "We can actually think of each time a patient is being given a drug as an experiment," he says. "When the drug goes into the human system, it interacts molecularly, and then phenotypes come out of this system," which can be connected to molecular mechanisms in new ways.
In particular, he is developing techniques that use clinical data to develop networks that highlight interactions between different systems in the human body, such as two organs.
In Science this week: breast cancer protein linked to DNA replication, lung cancer signature, and more.
May 24, 2013
Papers of Note
High-resolution transcriptome maps reveal strain-specific regulatory features of multiple Campylobacter jejuni isolates Dugar, Herbig, et al. PLOS Genetics
The University of Würzburg's Cynthia Sharma and colleagues undertook a transcriptomics-based analysis of the gastroenteritis-causing bacterial species Campylobacter jejuni. The team used its so-called differential RNA sequencing strategy to sequence and compare the transcriptomes of four C. jejuni isolates (three from humans and one from a chicken), applying a new method to automatically annotate transcription start sites in each. "Overall," they write, "our study provides new insights into strain-specific transcriptome organization and [small RNAs], and reveals genes that could modulate phenotypic variation among strains despite high conservation at the DNA level."
The barley powdery mildew (Blumeria graminis f. sp. hordei) pathogen genome is comprised of chunks of sequence that are particularly rich or replete in polymorphisms, according to a study by researchers from the Max Planck Institute for Plant Breeding Research. The team sequenced the genomes of two Bgh isolates from Europe, comparing each to the barley powdery mildew reference genome. The newly sequenced isolates each contained distinct combinations of sequence blocks with high or low SNP concentrations — isolate-specific mosaic genomes that point to "exceptionally large standing genetic variation in the Bgh population," study authors say. Meanwhile, their transcriptome sequencing experiments offered a look at genes used by Bgh during attempted infiltration of barley or immunocompromised Arabidopsis.
People on the Move
Kevin Hrusovsky is resigning his post at PerkinElmer as senior VP and president of the Life Science and Technology division. Hrusovsky will serve as a consultant to the company for up to one year, beginning in June. He joined PerkinElmer through the company's acquisition of Caliper Life Sciences, where he was CEO and president.
Hologic has appointed former Beckman Coulter head Scott Garrett to its board of directors, where he will serve on the corporate development committee.
Garrett currently is an operating partner with Water Street Healthcare Partners, a private equity firm. Garrett spent 10 years at Beckman Coulter, where he was chairman, president, and CEO.
Gina Costa is now senior director of genomic applications at Illumina. She joins Illumina from Life Technologies, where she was senior director of genetic analysis, working on development of the Ion Torrent and SOLiD sequencing technologies. She has also held positions at Agencourt Bioscience and Roche's 454 Life Sciences.
Bioinformatics firm Golden Helix has hired Andreas Scherer to be its new president and CEO. Scherer has managed large global software services businesses, and he started his executive career at AOL/Netscape. He will replace Former CEO Christophe Lambert, who will take on the new role of company chairman.
An international team has sequenced the genome of the carnivorous bladderwort plant, Utricularia gibba. Their findings suggest that the carnivorous plant has ditched virtually all its non-coding DNA, retaining a set of sequences that's almost exclusively genic. "What that says is that you can have a perfectly good multicellular plant with lots of different cells, organs, tissue types and flowers, and you can do it without the ['junk' DNA]," said co-corresponding author Victor Albert.
Agilent Technologies announced a restructuring program expected to reduce its headcount by about 450 employees and save the company $50 million annually in operating expenses. CEO Bill Sullivan said that the focus of the restructuring will be on Agilent's Electronic Measurement Group and that the company will explore opportunities "to streamline our organization around the world." The firm also announced that its Q2 revenues were flat year over year.
The US Department of Energy's Joint Genome Institute has funded six new initiatives to develop technologies that will help JGI and its users conduct their research efforts into microbiology, metagenomics, and plant genomics. The projects will be supported under the Emerging Technologies Opportunity Program with a total of around $3.5 million over the next two years. Among the researchers receiving funding are Stephen Quake and Jay Shendure.
GenomeWeb and EMD Millipore invite you to view an archived webinar discussing new approaches to detect RNA at the single-cell level as well as new probes for the direct quantification of circulating miRNAs. In this free online seminar, recorded April 25, 2013, our expert panel shares protocols for improved RNA and miRNA detection.