NEW YORK (GenomeWeb) – The timeline of when HIV first entered the US has long been contentious. Although, there has been evidence suggesting that it entered the country in the 1970s, genome analyses of strains prior to 1980 had only been done on samples from Africa.
Now however, a group led by researchers from the University of Arizona and the University of Cambridge developed a sample prep technique similar to those developed for analyzing highly degraded ancient DNA, and used it to extract, amplify, and sequence viral genomes from archival patient samples dating from 1978 and 1979. After sequencing the genomes, the researchers were able to place those genomes within a phylogenetic tree showing that the strain that caused the epidemic in the US likely entered New York City in 1971 from the Caribbean.
The study, published today in Nature, provides evidence that the virus had been circulating in the US for around 10 years prior to the formal recognition of HIV/AIDS in 1981. The newly sequenced viral genomes display "a telltale pattern of extensive diversity in New York City, suggesting that was a key hub, and restricted diversity in San Francisco, suggesting that it was a later dispersal out of the hub," lead author Michael Worobey said in press briefing.
In addition, the researchers sequenced a sample from a patient described as "Patient Zero" in Randy Shilts' book And the Band Played On about the AIDS epidemic. The sequence data illustrated that there was "neither biological nor historical evidence that he was the primary case in the US or for subtype B as a whole," the authors wrote.
In the study, the researchers analyzed samples from a cohort of men who had been part of an AIDS study in 1984, 378 of whom had been part of a hepatitis B virus study that began in 1978. Previous work had found that 6.6 percent of those men were HIV seropositive. The researchers chose 33 of those samples and attempted to sequence them. In addition, they tested 2,231 samples from a second cohort of patients enrolled in an HBV study in San Francisco, finding that 3.7 percent were HIV seropositive. Of that cohort, they selected 20 for whole-genome sequencing.
Underscoring the difficulty of analyzing genomic material from archival samples, the researchers were ultimately able to get enough RNA and sequence data from only eight samples of the 53 samples they had chosen to analyze. The samples were highly degraded from being stored long term in formalin-fixed paraffin-embedded wax blocks. In addition, when RNA was recovered, it was often below the limits of quantification. "Initial attempts at amplification of reverse-transcribed viral RNA failed consistently and indicated that viral RNA survived in the 1970s samples only in short fragments," the authors wrote.
So, the researchers developed an approach they described as "jackhammering." Essentially, "we cover the whole genome with tiny overlapping fragments of nucleotides" in order to boost the odds of capturing the viral genome, Worobey explained.
The approach improved the researchers' ability to detect viral RNA and recover complete viral genomes from samples. They used large panels of primers to amplify many short fragments in separate pools, generating amplicons that overlapped between pools but not within the same pool. Next, the researchers did a preliminary multiplex amplification step in order to concentrate the target RNA. "We want to make sure that if there is only one strand of virus RNA in the tube … you don't lose it," Worobey said. Next, they used reverse transcriptase to make cDNA, then enriched for the target, and performed a final amplification step before sequencing.
The researchers were able to sequence and assemble eight HIV genomes — five New York City samples and three San Francisco samples. A phylogenetic analysis showed that these HIV genomes were the oldest ones to be sequenced outside of Africa, but still did "not fall on the deepest branches" of the evolutionary history of HIV, even among the dominant US epidemic strain called subtype B.
Looking at the diversity of the genomes, the researchers were able to determine with a probability of around 99 percent that HIV set foot in the US initially in New York City before spreading to San Francisco and other cities in California and the US.
The study "highlights the importance of complete viral genomes from early archival specimens, carefully contextualized through historical analysis, without which this detailed picture of these early landmarks in the HIV/AIDS pandemic would not have been possible," the authors wrote.