By Andrea Anderson
Australian researchers have demonstrated that high-throughput amplicon sequencing can be used to discern the plant and animal ingredients in sometimes-complex traditional Chinese medicine mixtures, pointing to the potential application of such technology for testing other natural remedies or wildlife samples in a regulatory setting.
As they reported online last week in PLoS Genetics, investigators from Murdoch University used the Roche 454 GS Junior to sequence barcoded amplicons generated from 15 traditional Chinese medicines, or TCMs, using primers targeting part of the so-called p-loop region of the plant plastid gene trnL and/or the animal 16S ribosomal RNA gene.
"In the past, in terms of genetics, people have gone in doing PCR using species-specific approaches to try and fish out a tiger or a rhino to say whether those particular species are within the DNA extracted from traditional Chinese medicines or other herbal remedies," senior author Michael Bunce, head of Murdoch University's ancient DNA research laboratory, told In Sequence.
"We took a different approach with this, with the advent of high-throughput platforms," he added, "and that was to look more holistically at the plants and animals within it, rather than going in with a specific targeted approach."
The method made it possible to ascertain animal and plant constituents in the TCMs at a consumables cost of around $35 per sample — a genetic audit that uncovered examples of TCMs containing material from trade-restricted animals and potentially toxic or allergenic plants, as well as ingredients not mentioned on available ingredient lists, in some cases.
Although the patchiness of existing plant databases made it more difficult to classify plants from their trnL sequences than to determine the identity of vertebrate animals from 16S sequences, the group identified many plants in the TCM samples at the family and genera level, even those found in complicated mixtures. And Bunce said he expects to get greater and greater resolution in the future as plant sequence databases continue to improve.
The researchers have plans to use the same sort of the deep amplicon sequencing approach to define the constituents in a range of other natural products and environmental samples.
The work is expected to have implications in a broader wildlife forensic setting as well, Bunce explained, since the same approaches can be used to test any material suspected of containing material from legally protected species.
"One of the aims of this study was to determine the efficacy of [high-throughput sequencing-based] auditing approaches, specifically with the goal of screening additional samples whose constituents might need to be identified in cases involving illegal imports, food fraud, medicine fraud, and forensics," the study's authors wrote.
While others have applied amplicon sequencing to test for the presence of protected animal species in bush meat or canned meat samples, for instance, the authors of the new study take this product testing approach a step further with high-throughput sequencing, according to Seth Bybee, a post-doctoral researcher in Keith Crandall's laboratory at Brigham Young University who is using next-generation amplicon sequencing to explore questions related to insect phylogeny and gene family evolution.
"They make the case pretty soundly that this is a great, low-budget, cost-effective way to be able to analyze any of these types of samples that might contain endangered biota," Bybee told IS.
In his own lab, Bybee noted, researchers are working on ways to keep amplicon sequencing costs even lower by doing very small PCR reactions using primers generated in the lab rather than commercial fusion primers.
"It's something that we're hoping we can scale up quite a bit," he noted. "The plus side is that it can also be used in any small lab too."
For their proof-of-principle study, Bunce and his colleagues focused on a few dozen TCM samples seized by Australia's customs and border protection service at the country's air and seaports.
Such TCMs were of interest to the team not only because of concerns centered on the use of endangered or protected animals in some of the concoctions, but also because of product safety or labeling issues associated with any incompletely characterized supplement.
There is worry, for instance, over the possible presence of toxins, undeclared allergens, or drugs found at undisclosed concentrations in TCMs, researchers explained, with past chromatographic analyses turning up plant toxins or heavy metal contaminants in some TCM samples.
To look at the feasibility of genetically auditing the constituents of TCMs, the researchers first attempted to extract DNA from 28 TCM samples, including pills, powders, bile flakes, and herbal teas.
Using quantitative PCR assays with primers targeting the plant trnL gene or the animal mitochondrial 16S rRNA gene, they narrowed in on 15 of the samples that most readily yielded sufficient levels of high-quality DNA.
"We've been very careful in this study when extracting DNA to make sure it doesn't have co-purified inhibitors within it that can mess up our amplification and that it has enough template molecules to actually do something meaningful with," Bunce said.
DNA from these samples was then amplified using trnL or 16S primers designed to introduce a unique molecular identifier, or MID, tag to the end of each amplicon.
Rather than dropping the resulting amplicon into vector and sequencing the insert — standard practice in the heyday of cloning-based sequencing — researchers multiplexed the barcoded samples together on the GS Junior, an approach that lets them sequence between 50 and 100 samples simultaneously.
"Cloning is something that's exceptionally time consuming, expensive, and difficult to keep track of," BYU's Bybee noted. "There are a lot of challenges when you're doing a lot of cloning and this is a really nice way to get around that."
After read filtering and quality control steps, the team was left with nearly 50,000 reads of usable sequence each for 13 TCM samples tested for plant products and another 50,000 reads for nine TCMs tested for animal constituents.
"We've gone through and done quite a bit of editing on those reads in terms of quality control, in terms of throwing out reads that have got sequencing errors in the MID-tags or the primers, removing reads that are there at very low prevalence or aren't very well represented," Bunce said.
The team also tossed out human sequences that may have been introduced either in the lab or at some point during the manufacturing of the products.
From there, they compared plant and animal barcodes to data in GenBank, using the Metagenome Analyzer, or MEGAN, software (IS 1/30/2007) to analyze matching reads.
In four of the TCMs, for example, the investigators saw reads corresponding to Asiatic black bear or Saiga antelope, which are classified as vulnerable or endangered by the international Convention on Trade in Endangered Species, or CITES.
The presence of material from these CITES-protected animals was not unexpected given their inclusion on the labels or ingredient lists for some of the TCMs. What was more surprising, though, was the identification of undeclared animal products.
For instance, one of the samples that was purported to be pure Saiga antelope horn powder also contained DNA from goats and sheep, while water buffalo, cow, and deer DNA turned up in products that did not include those animals in their ingredient lists.
In the 13 samples tested by trnL sequencing, meanwhile, researchers found representatives from 68 plant families, with several samples containing material from plants in the licorice root, mint, and/or Asarum genera.
That latter genus is a concern, researchers explained, because it contains plants that produce aristolochic acid, a carcinogen and renal toxin implicated in elevated urinary tract cancer risk in Taiwan in a recent study in the Proceedings of the National Academy of Sciences.
Another TCM tested contained plants from the Ephedra genus, which can be toxic outside of a narrow dosing range and have been banned by the US Food and Drug Administration for nearly a decade.
Despite their success in identifying plant families and genera from complex TCM mixtures, though, the researchers noted that there are limitations to confidently classifying these plant constituents at the species level, largely owing to gaps in the plant barcode databases.
"In this particular study, the animals were relatively straightforward [to identify]. We can tell what they are from about a mile off [by] looking at the DNA," Bunce explained. "The plants were more problematic."
A number of large DNA barcoding efforts and individual research studies are helping to flesh out such databases, he added, suggesting it should be possible to identify plants with increasing resolution and confidence in the future.
So, while he argued that such databases will need to improve before sequence-based product auditing moves forward wholesale, Bunce said that that should not deter people from starting to do these sorts of experiments.
"Our ability to assign [the sequences we've generated] might be limited, but that will only improve over time," he explained. "Maybe if someone takes this dataset five years from now and reanalyzes it, they'll have a heap of better idea, at the species level, about what plants are in there."
The ability to identify individual plant species could also be improved by targeting longer sequences or a wider set of identifier genes, though the researchers are keen to keep the set as streamlined as possible to curb excess primer and sequencing costs.
"There's got to be an economy of scale in this somewhere," Bunce said. "Do you want to spend $10,000 on each medicinal product that you're auditing or do you want to spend $100 on it or $35 on it? Where is the cost-benefit in that and what level of resolution do we really need?"
He and his colleagues also cautioned against trying to quantify specific ingredients in TCMs based on DNA sequence abundance, since some genetic material may be degraded during product preparation or found at varying concentrations in the same amount of plant material.
There are hints from other studies that the approach can be applied quantitatively, Bunce noted, pointing to a 2011 study that he and other and other Murdoch University researchers published in PLoS One comparing GS Junior amplicon sequencing data with results generated by quantitative PCR as part of a study on little penguin diet.
That work suggested that the high-throughput sequencing approach could provide a fairly accurate quantitative picture of prey in the penguin's diet. Even so, that more quantitative approach requires extensive validation, according to Bunce.
"Every primer set is subtly different; it has different biases," he said. "So it needs to be looked at on a case-by-case basis."
The study authors were also quick to point out that the amplicon-sequencing approach did not provide information on whether there were pharmaceutically active compounds in the TCMs tested, including the samples with Ephedra or Asarum DNA.
"The DNA tells us what the ingredient list is within there, what traces of DNA there might be," Bunce said. "It doesn't tell us anything about the activity of specific chemicals that might be of concern."
That type of information would require metabolomic studies to complement the information obtained by amplicon sequencing — something the team has now done for one of the Asarum-containing TCM samples described in the PLoS Genetics study.
"You kind of need a one-two punch to get a decent audit: one to look at the ingredient list and honesty in labeling and the other to look at the activity of certain compounds in there," Bunce said.
Although a similar approach can be done in conjunction with any of the high-throughput sequencing technologies available, Bunce noted that the read length offered by the GS Junior gives it a slight edge over some of the shorter read platforms such as Life Technologies' Ion Torrent, another platform that his lab is working with.
That's because read depth is not necessarily as informative as read length when sequencing the sorts of amplicons that he and his colleagues used to identify unknown plants and animals in the TCMs.
"For a plant product where we got 5,000 reads, for instance, getting 50,000 reads isn't going to tell us incrementally more information," Bunce explained. "So it's not just about coverage. Sometimes it's about the length of the read, sometimes it's about the quality of that read."
And because there is a significant investment in time and money associated with getting the primers needed to generate barcoded trnL or 16S amplicons on a given platform, he added, the team would be hesitant about jumping into wholesale studies using a different technology unless it offered some significant advantages over the GS Junior.
Long read lengths could be even more advantageous when dealing with samples with less DNA degradation such as fresh bush meat samples, Bybee noted. For their own high-throughput amplicon sequencing studies, he and his colleagues are relying on the Roche 454 Titanium platform.
For their part, Bunce and his group are now on the hunt for funding that would help expand their research beyond TCMs to look at additional traditional medicines and herbal remedies from other parts of the world.
The work is part of a broader set of studies that employ high-throughput amplicon sequencing for a number of environmental and quality control applications using samples ranging from ancient and modern sediments to bat and penguin feces.
Bunce predicted that it will likely take some time before next-generation amplicon sequencing catches on as a tool for law enforcement agencies or those regularly testing plant or animal products from a regulatory perspective. As the field matures, though, he said the approach appears promising for such applications.
"There are a huge number of applications for this sort of environmental auditing … in a number of industrial and enforcement areas," Bunce said. "It's early days yet, but it has tremendous potential and we do intend to try to help agencies or companies get going that can actually offer services in this area."
Have topics you'd like to see covered in In Sequence? Contact the editor at anderson [at] genomeweb [.] com.