A new computational method for quantifying copy-number aberrations (CNAs) and whole-genome duplications (WGDs) in bulk tumor sequencing data is presented in Nature Communications this week. Developed by scientists from Princeton University, the approach — called holistic allele-specific tumor copy number heterogeneity, or HATCHet — is designed to infer allele- and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient. It does so by globally clustering read-depth ratios and B-allele frequencies along the entire genome and across all samples, and by solving a matrix factorization problem to infer allele- and clone-specific copy numbers from all samples, the researchers write. HATCHet also separates the two sources of ambiguity in copy-number deconvolution, the presence of subclonal CNAs and the occurrence of WGDs, using a model-selection criterion to distinguish these sources. The team demonstrates HATCHet using both simulated and cancer data, showing that it outperforms six current state-of-the-art methods.
The same Princeton investigators also report in Nature Biotechnology this week an algorithm for inferring allele-specific and haplotype-specific copy numbers in single cells from low-coverage DNA sequencing data. Dubbed CHISEL — short for copy-number haplotype inference in single cells using evolutionary links — the method amplifies the weak signal in individual single-nucleotide polymorphisms into a sufficiently strong signal to compute B-allele frequencies in genomic regions of modest size by combining reference-based phasing methods with a new algorithm to phase short haplotype blocks across cells, the scientists write. It also phases allele-specific copy numbers across cells using an evolutionary model to derive haplotype-specific copy numbers that indicate the number of copies of the alleles located on the same haplotype in individual cells. CHISEL is demonstrated with 10 single-cell sequencing datasets of around 2,000 cells from two patients with breast cancer