False positives complicate ancient pathogen identifications, but only if you are naive and arrogant
Writing about web page http://www.biomedcentral.com/1756-0500/7/111
I came across this piece published in BMC Research Notes a few weeks ago, but have only just found time to comment on it: http://www.biomedcentral.com/1756-0500/7/111
- False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing BMC Research Notes 2014, 7:111 doi:10.1186/1756-0500-7-111 Michael G Campana (email@example.com) Nelly Robles García (firstname.lastname@example.org) Frank J Rühli (email@example.com) Noreen Tuross (firstname.lastname@example.org)
I cannot say that I am too happy with the style of the comments therein on our recent publication of metagenomic recovery of a TB genome from mummified remains (http://www.nejm.org/doi/full/10.1056/NEJMc1302295):
Additionally, a recent study by Chan and colleagues  claiming the identification of multiple strains of pathogenic tuberculosis (Mycobacterium tuberculosis) through non- targeted metagenomic sequencing has demonstrated insufficient analytical rigor to support their conclusions. The authors aligned their sequences against a single strain of pathogenic tuberculosis, but did not account for misalignments or environmental contamination with ubiquitous soil mycobacteria. Chan and colleagues’ data merit reanalysis with appropriate environmental controls. We recommend that the authors of these three studies demonstrate the veracity of their findings using a targeted capture approach and further bioinformatic analysis.
I guess working in Harvard makes people prone to academic arrogance! Perhaps there is also a whiff of sour grapes: they couldn't find any pathogens in their samples by metagenomics so we can't have done too! But dealing with the substance of the comments is easy enough. And ironically, I agree entirely with their earlier comments that these two papers are highly suspect:
(NB they are misreferenced in this paper).
OK, so let's take their points on our study one by one...
- The authors aligned their sequences against a single strain of pathogenic tuberculosis, but did not account for misalignments or environmental contamination with ubiquitous soil mycobacteria.
- Mycobacterium tuberculosis is a genetically monomorphic species, so there is not much to be gained by aligning against multiple strain genomes. But we did also compare the SNP profiles of our genomes with the recent close relative 7199/99. In the standard filtering that we employed, SNPS with low/high coverage and low mapping scores were removed, thus avoiding problems with repetitive DNA. The fact the majority of the mixed SNPs matched those of 7199/99 and H37Rv confirms that they are real. Plus, we did discuss the presence of environmental Actinobacteria in the metagenome in the Supplementary Material, where we report the presence of a Nocardia sp at around 200X coverage and of a relative of Thermobifidia fusca at around 10X coverage. We binned contigs according to Z score and coverage to avoid mixing up reads from different species. And we obtained deep and even coverage of the M. tuberculosis genome, which cannot be accounted for by misinterpretation of matches to environmental species. We have seen such spurious matches in some analyses, but they appear only when a low-stringency approach is applied to mapping and are obvious because they show spikey coverage limited to conserved regions (e.g. rRNA genes) rather than across the whole genome.
- Chan and colleagues’ data merit reanalysis with appropriate environmental controls.
- And what might these controls be? We have analysed a piece of lung tissue from mummified remains from a casket rather than the soil. As detailed in previous papers (http://www.ncbi.nlm.nih.gov/pubmed/?term=12576588+12541332+18399990), rigorous efforts were taken to avoid contamination during sampling and storage. We have never grown M. tuberculosis or sequenced TB genomes in the lab. Where else could the M. tuberculosis DNA have come from other than the sampled individual?
- We recommend that the authors of these three studies demonstrate the veracity of their findings using a targeted capture approach and further bioinformatic analysis.
- There is some faulty logic here. We are indeed contemplating using a capture-based approach to increase the sensitivity of our analyses, but this will do nothing for the speciificity of the approach, since any contaminating sequences which map to the pathogenic reference strains in silico are likely to be captured in vitro anyway because of their similarity to the bait. The answer is instead to increase the stringency of mapping and look for a consilience of results from multiple sources of evidence (e.g. evenness of coverage, SNPs that allow an assignment within an established clade), which we have done.
We are continuing to perform metagenomics on mummified material from Vác and on other historical samples and will be publishing additional studies in due course. This is an exciting area of research and one does has to be careful in interpretation, but our findings stand firm. Anyone who wants to repeat the analyses we reported in Chan et al is welcome to do so. The reads are available here:http://www.ncbi.nlm.nih.gov/sra?LinkName=pubmed_sra&from_uid=23863071
But I am afraid I agree with Campana et al when they critiicise the other two papers, Thèves et al because, inter alia, you cannot tell Shigella from E. coli by 16S and Khairat et al, because no sequence data is available in the public domain. Caveat lector! But enjoy the excitement of progress in this field (see also http://www.ncbi.nlm.nih.gov/pubmed/?term=24708363+23765279)