NEW YORK – As one research team presents findings from a complete, newly assembled version of the repeat-rich Y chromosome, another group has started to tally the structural and genomic variation that exists across the sex chromosome from one individual to the next.
For the first of the studies, published in Nature on Wednesday, members of the Telomere-to-Telomere (T2T) consortium turned to a combination of Pacific Biosciences HiFi long reads, Oxford Nanopore ultralong reads, Illumina short reads, Strand-seq, advanced assembly approaches, manual assembly, and bioinformatics methods to put together a gap-free Y chromosome assembly spanning nearly 62.5 million base pairs.
"The Y chromosome plays critical roles in sexual development and fertility, but it is also one of the most repetitive and difficult-to-sequence chromosomes of the genome," senior and corresponding author Adam Phillippy, NHGRI researcher and T2T consortium leader, explained in an email. "For this reason, it was the last of the 24 human chromosomes to be completely read and was often overlooked in prior genomic studies."
When the team annotated the sequence with the help of RNA isoform sequence data on B lymphocyte cells and induced pluripotent stem cells, together with published gene expression data representing several data types, it found 30 million bases of sequences and 41 predicted protein-coding genes not reported in the past. Together, the new data made up more than half of the diminutive chromosome's total sequence.
Along the way, the investigators tracked down unappreciated regulatory circuits and rearrangements on the sex chromosome, while highlighting alterations linked to male infertility and other conditions.
"Mapping these previously unknown regions of the genome is only the first step on a long journey towards understanding how it works," he added, noting that newly completed Y chromosome sequences "will be of great interest to researchers studying sexual development, fertility, genealogy, certain cancers, evolution, [et cetera], all of which have fascinating connections to the Y."
The Y chromosome has traditionally been difficult to fully sequence and assemble, since it contains rampant repeats ranging from palindromic sequences that read the same backwards or forwards to highly repetitive satellite DNA sequences spanning large noncoding stretches of sequence.
Although it is best known for its contributions to male development and fertility, the chromosome has also been implicated in other traits and conditions, while prior studies suggest that sex development depends on genetic contributions from other parts of the genome.
The latest work complements past T2T efforts to close human chromosome gaps using data generated for a hydatidiform mole that lacked a Y chromosome — work reported in Science in 2022.
Among the insights already obtained from the completed Y chromosome, the researchers pointed to palindromic repeat-related deletions that appear to coincide with azoospermia, a form of male infertility marked by sperm-free ejaculate.
"The azoospermia factor region on the Y chromosome contains an unusual number of these palindromes," Phillippy said, noting that such structures "are typically unstable, but it seems that natural selection favors them on the sex chromosomes."
The team also tracked down a multi-gene "gene array" for the sperm production-related gene TSPY, which was present in around 10 to 40 copies in different Y chromosome carriers, potentially helping to boost expression of the gene during the spermatogenesis process.
The new Y chromosome reference sequence further pointed to person-to-person variation in the location of a gene known as TSPY2, which turned up at one of two distinct locations depending on the individual profiled due to sequence swapping between repetitive regions on the Y chromosome.
In a related Nature paper, members of the Human Genome Structural Variation Consortium turned to long-read sequencing, Strand-seq, Bionano Genomics optical mapping, and other approaches to compare Y chromosome sequences from 43 diverse individuals, unearthing an overrepresentation of large inversions and distinct mutation rates in male-specific sequences, among other recurrent variations.
"With these new human Y chromosome studies, we (as a scientific community) now finally have the complete human genome reference sequence and we can incorporate the genetic variants that we found on the Y chromosome to … association studies," co-senior and corresponding author Charles Lee, director and professor at the Jackson Laboratory for Genomic Medicine, explained in an email, noting that the work fits with his team's longstanding interest in structural genomic variation.
Prior to the study, Lee added, he speculated that it would be possible to pick up a wide range of structural genomic variants by analyzing sequences from many unrelated individuals — a prediction that panned out in the new study.
In particular, the researchers were able to identify repetitive or otherwise complex parts of the Y chromosome that are typically marked by variation or conservation, he explained, while using the sequence data to retrace roughly 183,000 years of human evolution.
"Whereas both the GRCh38 … and the T2T Y assemblies represent European Y lineages, half of our Y chromosomes constitute African lineages and include most of the deepest-rooted human Y lineages," the authors wrote, adding that the "newly assembly dataset of 43 Y chromosomes therefore provides a more comprehensive view of genetic variation, at the nucleotide level, across over 180,000 years of human Y chromosome evolution."
At sites surrounding a highly heterochromatic sequence block at the chromosome region Yq12, for example, the team saw inversions involving the DYZ1 and DYZ2 repeat units that tended to have consistent one-to-one ratios despite their having a variable and complex organization.
Such findings point to an "as-of-yet unknown, functional significance for this chromosome region," Lee suggested.
Likewise, he noted that past research linking Y chromosome loss to bladder cancer risk, prognoses, and checkpoint immunotherapy treatment response hints at the possibility of using the complete Y chromosome reference sequence and Y chromosome variation data to focus in on specific sequences contributing to such features.
"Ultimately, the ability to effectively assemble the complete human Y chromosome has been a long-awaited yet crucial milestone towards understanding the full extent of human genetic variation," authors of the study explained, "and also provides the starting point to associate Y-chromosomal sequences to specific human traits and more thoroughly study human evolution."