November 18, 2015

Thermal age, cytosine deamination and the veracity of 8,000 year old wheat DNA from sediments

Writing about web page

You may recall these two earlier blog posts:

Well, this story has taken an unfortunate turn recently, in that a group from the Max Planck Institutes in Tübingen has contested our finding of wheat in the British Isles 8kya, essentially arguing that the results are too good to be true!

At the heart of their argument is the assumption (almost dogma) that DNA ages in a certain predictable way (through cytosine deamination) and these changes can be used to determine the age of DNA. As they could not detect the signatures of DNA damage in our wheat sequences, they have jumped to the conclusion that the wheat sequences must represent modern contamination.

However,this doesn't take into account the environment in which the DNA has been stored: the submerged sediments have effectively been stored in a refrigerator for the last 8000 years, because the ambient temperature for such sediments is only ~4 ° C. The argument here is a bit like saying that if you bought two loaves of bread and put one at room temperature in the bread bin and the other in the fridge or freezer and then came back a couple of weeks later to find only the loaf in the breadbin was mouldy it was safe to conclude that they couldn't possibly have been bought on the same day!

However, in addition to problems with the substance of the arguments, there have been problems in the way in which they have been made. They have been published in eLife, a fairly new open-access peer-reviewed journal, sponsored by the Max Planck Society:

The fact that the journal is sponsored by the Max Planck Society may or may not mean that authors from Max Planck Centres get an easier ride through peer review: judge for yourself as eLife publishes the reviews and decision letter.

But more problematic is that eLife, despite all its fanfare about being a revolutionary new open-access journal has not given us any right to reply to this publication, even though it is clearly a polemical piece aimed at discrediting our work. Oddly, Science, the journal we published in, also declined to let us publish a response. Luckily, given the old Internet addage that "information wants to be free", we have alternatives!

So, I am pleased to announce the appearance of this manuscript on bioRxiv, the preprint server for biology, and would ask you to read it, comment on it, Tweet it and Like it!

The manuscript goes far beyond a simple rebuttal to encompass an analysisof 148 palaeogenomic data sets to show that the rate of cytosine deamination is a thermally correlated process and that organellar generally shows higher rates of deamination than nuclear DNA in comparable environments. In addition, we argue that the PCR enzyme used in our sedaDNA study would not have had the capability to report 5-prime cytosine deamination, so absence of this feature is to be expected.

Robin Allaby has worked extremely hard to prepare this manuscript and get it up there on bioRxiv. However, I have suggested to him that the work merits eventual publication in a peer-reviewed journal. Who knows, eLife might even take it! Watch this space!! And read and Tweet the manuscript!

September 07, 2015

Background reading on BBC Horizon's First Britons

Writing about web page

A couple of weeks ago, the BBC Horizon programme First Britons ( featured Vince Gaffney talking about the collaborative work that he, Robin Allaby and I published in Science earlier in the year. You can still pick up the programme for the next 12 days on the BBC website via iPlayer and it is likely to accessible via YouTube for some time to come if you try searching with this link:

I have already written a blog entry providing the story behind the paper. However, I am bit frustrated that Horizon hasn't provided any bibiography or links to the entire body of exciting work presented in the programme. Really wouldn't take much for them to bridge the gap between journalism and scholarship! So, I have quickly pulled together material here that people can use to follow up on the programme:

April 30, 2015

Mark Achtman elected as Fellow of Royal Society

Warwick Medical School’s Professor Mark Achtman has today been elected a Fellow of the Royal Society.

The Royal Society, founded in the 1660s, is a fellowship of the most eminent scientists in the United Kingdom and the Commonwealth countries. Fellows are elected for life through a peer-review process on the basis of excellence in science.

Born in Poland during WWII, Achtman emigrated several years later to Canada. He gained a BSc from McGill University and then an MSc from the University of Manitoba, before moving to Berkeley, California to complete a PhD. After a short spell in Edinburgh, Achtman moved in 1971 to Germany, where he worked for more than 35 years, before moving to University College Cork in 2007. Professor Achtman was recruited to Warwick in 2013 to a Chair in Bacterial Population Genetics within the newly formed Division (now Unit) of Microbiology and Infection in Warwick Medical School.

Professor Achtman has made important contributions to many fields within bacteriology—his long-time collaborator Paul Keim describes Achtman as a “force of nature” for his determined and insightful approach to scientific problems. In the 1960s and 1970s, Achtman investigated the molecular mechanisms of bacterial sex, characterising the Escherichia coli F sex factor. In the 1980s, his attention turned first to neonatal meningitis caused by E. coli—prompted by a sabbatical with John Robbins in Bethesda, MD—and then on to the molecular epidemiology of epidemic meningococcal meningitis, including field work in Africa. This work not only informed vaccine design, but also paved the way for Achtman’s next major contribution to bacteriology: the use of sequence-based approaches to unravelling the population structure, genealogical history and spread of pathogenic bacteria.

By the 1990s, bacteriologists had generated a plethora of molecular fingerprinting approaches to bacteria. In a classic paper from 1996, Achtman wittily berated these approaches as YATMs – an acronym for “Yet Another Typing Method”. Two years later, Achtman was corresponding author on a pioneering paper on multi-locus sequence typing (MLST), which changed the focus of bacterial typing from fingerprinting to sequencing DNA and which has become a citation classic. This approach was readily adopted across the globe, providing a laser-sharp tool for dissecting the evolution, population structure and spread of almost every bacterial infection.

Building upon the concepts of MLST and the increasing tractability of bacterial whole-genome sequencing, Achtman then went on to develop sequence-based phylogenomic analyses based on single-nucleotide polymorphisms (SNPs). With his usual relentless energy, he has used these approaches to drive forward our understanding of the evolution and spread of bacterial pathogens, most recently focussing on Salmonella.

Although his sequence-based approaches to the epidemiology of infection clearly fall within the remit of translational medical research, Professor Achtman has also used his remarkable research talents to attack some of the big questions about human existence—for example, using DNA sequences from a bacterium that lives in our stomach, Helicobacter pylori, to work out how humans spread out of Africa to people the world, or exploring the origins, evolution and spread of the bacterium that causes the plague, including the Black Death pandemic which killed around a third of the inhabitants of Europe in the fourteenth century.

Professor Mark Pallen, who recruited Achtman to Warwick, says

“I am thrilled that Mark has been elected as a Fellow of the Royal Society. He clearly deserves this recognition for his pioneering and wide-ranging contributions to our discipline. This award amply vindicates our recent investments in microbiology and infection research at Warwick.”

Access Mark Achtman's astonishing publication output on Google Scholar:

April 20, 2015

Grand opening of our MRC CLIMB microbial bioinformatics facility at Warwick

On Friday 17th April, we had great fun holding an opening event for our £8.4m MRC-funded CLIMB project and facility at Warwick Medical School.

[For those of you who wish to know more about the CLIMB (cloud infrastructure for microbial bioinformatics) project, take a look at our web site: or watch Tom Connor's talk on YouTube:]

You can access a YouTube video of the first half of the event here:

The event kicked off with me providing a brief introduction to the project and stressing the achievements so far:

  • spending millions of pounds on bioinformatics infrastructure within a single financial year (£3.7m on computers across the four participating universities; £0.7m on building work at Warwick and >£1m at Swansea)
  • getting all the procurement, purchase orders, invoices, building etc through our university systems
  • recruiting excellent support staff and academic staff to the project
  • getting the building work finished (an end to all that noise!)

I then introduced our three guest speakers:

  • Stanley Falkow, godfather of the field of bacterial pathogenesis, skyping in from Stanford and immortalised in our glasswork with his quote "Never met a microbe I didn't like"
  • Randal Keynes, great-great-grandson of Charles Darwin
  • Jon Chase, aka OortKuiper, science rapper.

Stan appeared on our huge 95 inch screen like Big Brother in the iconic Apple ad!


Stan has kindly provided a summary of his words of support for the project and event:

I am glad to add to the chorus of those who celebrate the opening of the CLIMB, the Cloud Infrastructure for Microbial Bioinformatics at Warwick.

As you know It’s an admirable enterprise that includes a consortium with Universities at Birmingham, Cardiff and Swansea,to permit a public as well as a private Internet resource, available to all in the UK

How I envy you! You are the generation of scientists who have availabile to you software, data stotrage and bioinformatic expertise to be able to track and understand the epidemics of the past, as well as the contemporary realties of the dynamics of infection and transmission of infectious diseases. You have the ability to examine the genome of the offending microbe as well as the host - including us, the human.

I am reminded by the presence of Professor Mark Achtman in this gathering at a time in 1978 when he, as well as Gordon Dougan, who was until recently Group Leader in Pathogen Genomics at the Sanger Centre, were visitors in my laboratory in the University of Washington. We were at that time able to sequence 75 base pairs a day, and analyze the results on a Radio shack computer with 16 K of RAM using a program I wrote in BASIC. I remember the thrill of my first ATG start codon! How far we have come in the past 37 years!

I can only dream about what you can accomplish in the next decade. Would that I will be here to learn of you accomplishments - of which, I may say, you can only dream and speculate about now! I always say The good old days are now!

Good luck and God speed in the adventure that awaits you.

Randal then said a few words, linking our efforts to the legacy of Darwin:


I’m here today because when Mark Pallen told me about CLIMB, I was fascinated and impressed by the boldness of the plan - managing all these fresh kinds of information with Cloud power and flexibility for all the intriguing investigations you’re now working out how to tackle. Mark suggested I might say some things for Darwin at this gathering, and I felt at once that yes, if Darwin could be with us for this opening today, he’d be fascinated and impressed. No – excited, I realised. He’d see very clearly the opportunities CLIMB presents, and he’d sense the spirit of the whole venture. He’d be eager to hear about every plan and delighted to talk. I thought that if it might mean anything for you today to hear what I can say about this point, I’d be glad to tell you here and now, for Darwin’s sake as well as yours. And why that? Because this venture is one of so many continuations of what he started with The Origin of Species, and I feel that all who join in the great effort in ways like yours are joined with him and each other in it.

Hearing about CLIMB – the technical opportunity, the resources you have to seize it with, and all the lines of research you’re planning to use it for, I think of the chain of scientists whose ideas you developing – Darwin as founding father and then Robert Koch, Ferdinand Cohn and Carl Woese among others since, each with his step forward. Each of them had some luck. He made it his opportunity and realised its potential. You and all your colleagues in the CLIMB collaboration have yours with CLIMB today.

Cloud Infrastructure for Microbial Bioinformatics. Wonderful words for an onlooker like me to understand, as they fit together so clearly and tightly for your purpose. Let me now touch on some points that I feel link Darwin with you.

The Evolutionary Factor

Mark Pallen took as his text for his Inaugural Lecture last year, Dobzhansky’s insistence that “Nothing in biology makes sense except in the light of evolution.” Looking then more closely at his Department’s special concerns, Mark showed how true that is for microbiology as a specialism. Looking yet closer today at CLIMB, he’s made it clear to me how central Dobzhansky’s point is for so much of microbial bioinformatics. One key point, it seems to me, is the speed of reproduction in microbiology with the extent of variation and selection in the process, and the significance of the changes for other organisms. It’s a truism how difficult we find it to observe evolution of macroorganisms; there are the few well-known examples, but for all other species it’s like ‘grandmother’s footsteps’ with the stillness whenever you look behind you for any movement. But microbial genomics can be a helter-skelter ride. In this area, excitingly, the often bucking process of change is a key concern, a central issue that we need to understand.

The Tree of Life

Darwin wrote in The Origin of Species that relationships between all species of the same kinds “have sometimes been represented by a great tree.” The ‘Tree of Life’ no less. He then wrote carefully, “I believe this simile largely speaks the truth. The green and budding twigs may represent existing species; and those produced during each former year may represent the long succession of extinct species. [In this way] the great Tree of Life … fills with its dead and broken branches the crust of the earth, and covers the surface [above] with its ever-branching and beautiful ramifications.” It’s important to recognize here that the essential point for Darwin when he wrote “I think” in his secret notebook and drew his first branching diagram, and for CLIMB now, was and is surely not the huge trunk of the great tree but quite simply those “ever-branching and beautiful ramifications”; Darwin’s “endless forms, most beautiful and most wonderful, [which] have been, and are being evolved”, as he ended his final sentence of The Origin of Species. “Are being evolved.” With those last three words Darwin placed his final emphasis on the ever-continuing process that is central to what CLIMB will be all about.

Big Data

Four petabytes of data storage and 78 terabytes of total RAM. The need for such remarkable capacity and power follows directly from the understanding of evolution that has stemmed from Darwin’s writings with its randomness and variation, its endless proliferations and the whole global diversity of life in which microorganisms indeed excel over macroorganisms. Hearing your talk of ‘big data’ I think at once of the number of data mountains Darwin had to climb through his working life to gain an adequate understanding of the topics he was having to tackle over the range of life on earth to make sense of the factors involved in their global dimension. Especially, he would have said, the eight years he had to spend dissecting barnacles in order to prove himself to be a competent taxonomist, so that he’d be able to write as he wanted to, on species around the world and through geological time, and gain any serious attention for his views on what he found to explain their relations. For the time he took, we should remember his poor son who when visiting a friend at the age of ten was shown around the house and asked ‘Where does your father do his barnacles?’ From Darwin’s understanding of the endless variety of natural life and the infinite complexity of all the interactions it involves, he would appreciate at once why CLIMB is focussing so sharply on the scale of data storage and processing capacity it’s able to provide.

Value for Medicine

When Mark Pallen first explained the project to me, I saw at once its great interest for pure science, but when he explained about some of the investigations it is to be used for and mentioned the MRC funding, only then did I really take in its significance for medicine. Picking up on its potential for work on hospital infections and Antimicrobial Resistance, I remembered at once a point about Darwin. Not an achievement of his but his reaction to one of another person, how quickly he recognised its value for medical treatment and how strongly he felt about that value.

I have to sketch in some background. When Darwin was working on The Origin of Species, his first daughter, Annie, then ten years old, fell ill, probably with TB. He was devoted to her; he did all he could to save her life, caring for her night and day in her last illness; he was devastated by her loss, and he was deeply shaken by the doctors’ inability to identify, understand and treat her illness. Twenty five years later, in 1877 as Louis Pasteur was making the case for the germ theory of infection, a close scientific friend of Darwin’s, Professor Ferdinand Cohn, then a plant physiologist but later to become the founding figure of microbial taxonomy, sent him a copy of his journal for plant science. The issue contained the first photographs ever published of bacteria. They had been taken by Robert Koch who was to identify the TB bacillus five years later. Dr Koch had come to Cohn with his photographs of his first microscopic preparations of the anthrax bacillus, and had shown him his paper arguing for the first time that these bacilli were the cause of the disease. Professor Cohn recognised at once the great importance of his findings for medicine and the saving of life, and wrote to Darwin that Koch’s photographs showed “the least but also perhaps the mightiest living beings”. Darwin replied to him, “I well remember saying to myself between twenty and thirty years ago, that if ever the origin of any infectious disease could be proved, it would be the greatest triumph to Science; now I rejoice to have seen the triumph.” That in those words was what Koch’s achievement meant to Darwin, the scientist and the father.

Information Management

I was fascinated to see Tom Connor’s explanation on Youtube of the CLIMB project, with his picture of the sequencing iceberg and his breakdown of the budget for different parts of the project. 75% for the expertise in the informatics, a critical need for the whole venture. It is fascinating to see how CLIMB users will be using together with their quantities of genomic data such quantities also of data of other very different kinds from very different sources, clinical, diagnostic, and then also population and epidemiological.

I have no scientific experience but have worked in the public sector on some matters needing careful and effective management of ranges of different kinds of information together, with gaps and inconsistencies in and between different datasets often compounding the difficulties of drawing any sound conclusions. So often, critical needs for information management just weren’t recognised by the managers and weren’t provided for. It seems to me that many people just don’t see these kinds of problems and their consequences, because they feel that information is just information and doesn’t need any managing to achieve completeness, accuracy, consistency, availability - and so meaning. People are often perfectly good with their own data simply because they know it well, but are then casual and careless about other peoples’, and when they use others’ in combination with their own, they don’t see the potholes until they realise they are stuck in one. With the range of data you’ll be using for all your range of aims, all the care you’re taking with the discipline of informatics will be invaluable for success.

Suggestions from Darwin

With all that lies ahead for you all in your work with CLIMB on today’s scientific and medical challenges, I’d like to offer from Darwin’s experience two suggestions on how to move forward. The first is an early comment of his when as a young man he was first glimpsing the power of the ideas he was fitting together on ‘descent with variation’ and ‘natural selection’, and the second is the last comment he made on research like yours before he died.

For the first suggestion, shortly after Darwin drew his first iconic branching diagram in his secret notebook, he spotted the extraordinary implication for all humans and animals and went on, “If we choose to let conjecture run wild, then animals, our fellow brethren in pain, disease, death and suffering, our slaves in the most laborious works, our companions in our amusements – they may partake from our origin in one common ancestor, we may be all netted together.” Just notice how he started that comment. “If we choose to let conjecture run wild …” Yes, with the fresh information and ideas you’ll be developing with CLIMB, choose to do just that, dare to! Bold conjectures may succeed powerfully.

Darwin’s last comment on research of this kind appeared in a preface he wrote for a work by a brilliant young friend on plants’ remarkable adaptations for cross-fertilisation by their pollinators. He spun out a series of ideas he’d found in the book for further investigations he’d love to pursue, and then, knowing privately that he was dying and wouldn’t be able to take any of them up, he continued – “But it would be superfluous to make any further suggestions. These will occur in abundance to any young and ardent observer who will study this work and then observe for himself, giving full play to his imagination, but rigidly checking it by testing each notion experimentally. If he will act in this manner, he will, if I may judge by my own experience, receive … much pleasure from his work.

CLIMB now offers a wealth of fresh opportunities for research just like those that Darwin could then see. Opportunities for “young and ardent observers”, if they will “observe for themselves, give full play to their imagination, but rigidly checking it by testing each notion experimentally”. And we here today can add “analytically” with all CLIMB’s processing powers.

Then Randal symbolically opened the champagne

Randal and champagne

and we had a brief interlude before Jon Chase began his science rap session.


You can access the video of Jon's performance here:

A full set of photos of the event can be accessed here:

And to close this blog post, how about this classic pose of me and Jon! Cool, no?

MP and JC

And a big thank you to all who worked to make this such a special event!!

March 13, 2015

The story behind the paper: Sedimentary DNA from a submerged site reveals wheat in the British Isles

Writing about web page

Late last month, I was proud to be joint last author on a paper in Science on the presence of wheat in the British Isles 8000 years ago. But how does a medical microbiologist come to be involved in a study on the intricacies of the Neolithic transition?

Well, like many of life’s greatest ventures, it all began in a bar…

I have to admit to a weakness for rounding the week off by a Friday evening trip to the bar. This started when I worked in Barts in 1980s and 1990s, where the Robin Brook Centre bar hosted many a lively conversation (and acted as a link to various melodramas, including an alleged murder, hostage taking and a police shoot-out: but that’s another story).

When I arrived at the University of Birmingham in 2001, I was delighted to discover the delights of the Bratby Bar, nestled within the university’s Staff House. During more than a decade of visits, I had the chance to chat to all sorts of people from across the University, from Pro-Vice-Chancellors to post-docs. Fortuitously, John Heath (formerly Head of Biosciences, latterly Birmingham’s PVC for Estates) introduced me to Vince Gaffney, a garrulous landscape archaeologist from Geordieland (below).

Vince Gaffney

Having recently set up a next-generation sequencing service and also having picked up on the excitement of ancient DNA research, at intervals I suggested to Vince that he should let us have some archaeological material to play with, to see if we could get any sequences out of it. Imagining we could tread in the footsteps of Schliemann or Carter, I had in mind something glamorous like a mummified hand or a skeleton from a ritual burial. Instead, we ended up with some mud! But mud of a highly precious and productive sort.

Vince was interested in understanding how the Neolithic transition (the spread of farming after the domestication of plants and animals) arrived in northwest Europe. The arrival of farming in this part of the world coincided with rising sea levels following the end of the last Ice Age. Vince had a track record in studying the landscapes that were inundated during this time and he was convinced the earliest clues to the arrival of the Neolithic in this part of the world would be found in these now-submerged sites.

Vince pointed me in the direction of some pioneering studies on sedimentary ancient DNA, which had established that DNA from macroscopic plants and animals could be detected in sediments even in the absence of macrofossils and could be used to reconstruct past environments. Two studies in particular stood out: one on the Viking settlements in Greenland and the other on the detection of sheep and moa DNA from outside a cave in New Zealand. It struck me that this was an exciting emerging field, fertile with opportunity.

Vince suggested that we try to detect signs of Neolithisation by searching submerged sediments for DNA from domesticated species that had no natural relatives in North Western Europe. That ruled out cows (wild relative: the aurochs) and pigs (related to wild boar), but made sheep and goats an attractive target. I pointed out to Vince that although we had the wherewithal to do the high-throughput sequencing and bioinformatics, it would be a rather fraught process trying to devise and implement protocols for target-specific amplification of ancient DNA. Instead, buoyed up by recent success with metagenomics on human faecal samples, I suggested that we try simple shotgun metagenomics—in other words we just extract DNA from the sediment cores and sequence it directly without any attempt at target-specific amplification or capture.

And then a period of turbulence descended on our academic lives…

I was headhunted and recruited to a new position at the University of Warwick in April 2013, while Vince was preparing to leave the University of Birmingham and eventually ended up at the University of Bradford. This could have signalled the end of the proposed research, but Vince and I were determined to continue with the work.

In fact, as luck would have it, my move to Warwick breathed new life into the project, as I hooked up with Robin Allaby from Warwick’s School of Life Sciences. Robin, seen here in the guise of a modern-day Jesus of the barley field, not only had a track record in the evolution of domesticated species, particularly plants, but had also established a dedicated ancient DNA laboratory at Warwick, ideal for performing DNA extractions from sediment cores.

Robin Allaby

I quickly persuaded Robin of the merits of the project and, as I was preoccupied with establishing a new Division of Microbiology and Infection, passed over to him day-to-day supervision of the work. Fortunately, Robin was able to recruit his recently graduated PhD student, Oliver Smith to the study. Oliver was an ideal candidate in having experience with ancient DNA studies, while also being between projects. Funding for the work came from my start-up package from Warwick Medical School, which paid for a sequencing instrument (an Illumina MiSeq), sequencing reagents and a salary for Oliver for nine-ten months.

By the middle of 2013, Vince had tracked down the perfect samples for the project—some 8000-year-old submerged sediment cores that had been collected from the Solent by an maritime archaeologist Gary Momber. Oliver extracted DNA from four samples of sediment in the ancient DNA lab and then sequenced them on our MiSeq. He and Robin then analysed the metagenomic sequences. Robin soon recognised that naïve use of existing metagenomics analysis pipelines was likely to turn up spurious results because of biases in what was represented in the databases (see recent Ed Yong's blog post on “discovery” of platypus DNA in Virginia and plague on New York subway), so he devised an improved method that avoided the problem.

Contrary to our initial hopes, Robin and Oliver did not discover any sheep or goat DNA. Instead, they discovered sequences from wheat, a domesticated plant that originated in the Middle East, with no close wild relatives in Northern Europe. This represented a triumph for metagenomics in an ancient DNA research, confirming two advantages of this approach over target-specific assays:

  1. It is open-ended, not just targeting what you expect to find, but also revealing the unsuspected.
  2. It is probably more sensitive than target-based amplification in garnering relevant information from billions of base pairs of unamplified DNA rather than amplified copies of just a few hundred base pairs of a sequence barcode.

After that, Robin played a key role in co-ordinating the writing and submission of a manuscript, carefully steering our paper through the reviewing and editorial process. And so, finally, we ended up with every academic’s dream-come-true—a paper in Science magazine!

Of course, my account of things here is heavily biased towards the role of sequencing and bioinformatics in this project. It is also important to recognise the key role played by our archaeological collaborators in framing the right questions, gathering the right samples, performing the palaeo-environmental analyses and providing the relevant contextual interpretation of the findings.

And this success brings a new challenge: what on earth is Warwick Medical School going to do with this high-impact paper in Science for REF2020, as I cannot see it flying with the clinical medicine Unit of Assessment! But we have five years to work on that problem!

Let me close by raising a figurative glass to toast the role of Birmingham’s Staff House Bar in all this! A note to all PVCs for Estates: shouldn’t all universities be investing in similar drinking establishments to catalyse new projects and facilitate collegiality? And a note to the relevant promotion panel in Warwick: shouldn’t it soon be Professor Robin Allaby. I’ll drink to both points!

Pallen and Allaby

Robin and I celebrating success at the top of the Shard.

The paper:

Commentary on the Paper in Science:

Press release:

September 23, 2014

Sequence the sputum: using metagenomics to diagnose tuberculosis

Writing about web page

Laboratory diagnosis of tuberculosis (TB) using conventional approaches is a long drawn-out process, which takes weeks or months—plus, relying on laboratory culture means using techniques that date back to the 1880s!

In a report published today in the peer-reviewed journal PeerJ, we describe a new approach to the diagnosis of TB that relies on metagenomics—that is direct sequencing of DNA extracted from sputum—to detect and characterize the bacteria that cause TB without the need for time-consuming culture in the laboratory. Using the latest high-throughput sequencing technologies and some smart bioinformatics, we can now obtain sequences from the bacteria that cause TB in just a few days straight from clinical samples and gain insights into their genome sequences and the lineages they belong to, all without having to culture cells or capture or amplify DNA.

In this study, first-year PhD student Emma Doughty ( and bioinformatician Dr Martin Sergeant, both working at Warwick Medical School, have worked with African scientists Dr Martin Antonio and Dr Ifedayo Adetifaworking at the MRC Unit in The Gambia to develop and exploit novel sequencing and analytic approaches. They detected sequences from the TB bacteria in all eight sputum samples they investigated and were able to assign the bacteria to a known lineage in seven of the samples. Two samples were found to contain sequences from Mycobacterium africanum, a variety of the TB bacterium that is particular to West Africa.

This is part of a connected programme of research in the Pallen group, where we have been using metagenomics to detect bacterial pathogens in contemporary and historical human material. Last year, we used metagenomics to obtain an outbreak strain genome from stool samples from an E. coli outbreak and to recover TB genomes from ~200-year-old Hungarian mummies. Earlier this year, we recovered the genome of Brucella melitensis, which causes an infection called brucellosis in livestock and humans, from a 700-year-old skeleton from Sardinia, Italy.

We now aim to work on a larger number of sputum samples, perhaps looking at a hundred consecutive samples in the fullness of time. But, before then, we need to spend a bit more time optimising our DNA extraction protocols. We were pleasantly surprised that the protocol we used worked “out of the box”, but we are confident that we can improve things so we get fewer human DNA sequences and more mycobacterial sequences from each sample. If we can increase coverage of the TB genomes, we may soon be able to detect mutations associated with drug-resistance directly from the sputum.

The final goal, shimmering on the horizon, is that we might one day be able to extract information from all the macromolecules in a sample (DNA, RNA, proteins) so that we get a read-out of what pathogens are there, what virulence or resistance genes are being expressed, what host responses are switched on and also maybe detect cancerous or pre-cancerous changes in the patient’s genome. This is probably going to rely on a new kind of approach: nanopore sequencing—to learn more about this, watch the recent Bioinformatics and Balti session on YouTube. The future is looking very exciting!

PS: we have been very impressed with the service offered by PeerJ, with just two weeks from submission to acceptance!

Professor Mark Pallen, Professor of Microbial Genomics and Head of the Microbiology and Infection Unit,

Warwick Medical School

July 16, 2014

Recovery of medieval Brucella genome by metagenomics

Writing about web page

Diagnosing a 700-Year-Old Infection

Last summer, Warwick Professor of microbial genomics Mark Pallenand colleagues described recovering tuberculosis genomes from the lung tissue of a 215-year-old mummy from Hungary in the New England Journal of Medicine.Soon afterwards, news of his interest in metagenomic analyses on historical samples spread, and materials started to flow in.

Italian anthropologist Raffaella Bianucci asked Pallen if he would look for pathogens in archaeological samples from Belgium and Sardinia, an island off the coast of Italy, and he agreed. The relationship led to recovering a genome of the bacterium Brucella melitensis from a 700-year-old skeleton found in the ruins of a Medieval Italian village.

Reporting this week in mBio®, the authors describe using a technique called shotgun metagenomics to sequence DNA from a calcified nodule in the pelvic region of a middle-aged male skeleton excavated from the Sardinian settlement of Geridu, thought to have been abandoned in the late 14th century. Shotgun metagenomics allows scientists to sequence DNA without looking for a specific target.

Brucella pics
Skeleton and calcium deposits -- courtesy Mark Pallen, Warwick Medical School

From this sample, the researchers recovered the genome of Brucella melitensis, which causes an infection called brucellosis in livestock and humans. In humans, brucellosis is usually acquired by ingesting unpasteurized dairy products or from direct contact with infected animals. Symptoms include fevers, arthritis and swelling of the heart and liver. The disease is still found in the Mediterranean region.

“Normally when you think of calcified material in human or animal remains you think about tuberculosis, because that’s the most common infection that leads to calcification,” Pallen says. “We were a bit surprised to get Brucella instead.”

The skeleton contained 32 hardened nodules the size of a penny in the pelvic area, though Pallen says it’s unclear if they originated in the pelvis, or higher up in the chest or other body part. The team took care to sample the interior of a nodule, to eliminate the risk of contamination from soil.

In additional experiments, the research team showed that the DNA fragments extracted had the appearance of aged DNA – they were shorter than contemporary strands, with only 100 base pairs, and had characteristic G-A or C-T mutations at the ends. They also found that the medieval Brucella strain, which they called Geridu-1, was closely related to a recent Brucellastrain called Ether, identified in Italy in 1961, and two other Italian strains identified in 2006 and 2007. They confirmed their findings by comparing the distribution of genetic insertions and deletions located in Geridu-1 with those found in other Brucella strains.

The study “confirms that whole-genome sequences from bacterial pathogens can be recovered from human remains by metagenomics hundreds or even thousands of years postmortem,” Pallen says.

Brucella melitensis -- credit: CDC

Pallen’s team is now testing shotgun metagenomics on a range of additional samples, including historical material from Hungarian mummies; Egyptian mummies; a Korean mummy from the 16th or 17th century; and lung tissue from a French queen from the Merovingian dynasty, which ruled France from the 5th to 8th centuries; as well as contemporary sputum samples from the Gambia in Africa.

“Metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution and spread of microbial pathogens,” Pallen says. “We’re cranking through all of these samples and we’re hopeful that we’re going to find new things.”

-- Karen Blum, science journalist writing for Mbiosphere

Original blog posting here:

April 15, 2014

Videos and photos from the Pallen Inaugural

Follow-up to Nothing in Microbiology makes Sense except in the Light of Evolution from The Microbial Underground

Live streamed version:

Slidecast version (better sound quality and no need to look at my ugly mug!)

Photos from the day here:

Nothing in Microbiology makes Sense except in the Light of Evolution

Writing about web page

Here is online companion to my Inaugural Lecture.

April 14, 2014

False positives complicate ancient pathogen identifications, but only if you are naive and arrogant

Writing about web page

I came across this piece published in BMC Research Notes a few weeks ago, but have only just found time to comment on it:

  • False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing BMC Research Notes 2014, 7:111 doi:10.1186/1756-0500-7-111 Michael G Campana ( Nelly Robles García ( Frank J Rühli ( Noreen Tuross (

I cannot say that I am too happy with the style of the comments therein on our recent publication of metagenomic recovery of a TB genome from mummified remains (

Additionally, a recent study by Chan and colleagues [54] claiming the identification of multiple strains of pathogenic tuberculosis (Mycobacterium tuberculosis) through non- targeted metagenomic sequencing has demonstrated insufficient analytical rigor to support their conclusions. The authors aligned their sequences against a single strain of pathogenic tuberculosis, but did not account for misalignments or environmental contamination with ubiquitous soil mycobacteria. Chan and colleagues’ data merit reanalysis with appropriate environmental controls. We recommend that the authors of these three studies demonstrate the veracity of their findings using a targeted capture approach and further bioinformatic analysis.

I guess working in Harvard makes people prone to academic arrogance! Perhaps there is also a whiff of sour grapes: they couldn't find any pathogens in their samples by metagenomics so we can't have done too! But dealing with the substance of the comments is easy enough. And ironically, I agree entirely with their earlier comments that these two papers are highly suspect:,21765907

(NB they are misreferenced in this paper).

OK, so let's take their points on our study one by one...

  • The authors aligned their sequences against a single strain of pathogenic tuberculosis, but did not account for misalignments or environmental contamination with ubiquitous soil mycobacteria.
  • Mycobacterium tuberculosis is a genetically monomorphic species, so there is not much to be gained by aligning against multiple strain genomes. But we did also compare the SNP profiles of our genomes with the recent close relative 7199/99. In the standard filtering that we employed, SNPS with low/high coverage and low mapping scores were removed, thus avoiding problems with repetitive DNA. The fact the majority of the mixed SNPs matched those of 7199/99 and H37Rv confirms that they are real. Plus, we did discuss the presence of environmental Actinobacteria in the metagenome in the Supplementary Material, where we report the presence of a Nocardia sp at around 200X coverage and of a relative of Thermobifidia fusca at around 10X coverage. We binned contigs according to Z score and coverage to avoid mixing up reads from different species. And we obtained deep and even coverage of the M. tuberculosis genome, which cannot be accounted for by misinterpretation of matches to environmental species. We have seen such spurious matches in some analyses, but they appear only when a low-stringency approach is applied to mapping and are obvious because they show spikey coverage limited to conserved regions (e.g. rRNA genes) rather than across the whole genome.
  • Chan and colleagues’ data merit reanalysis with appropriate environmental controls.
  • And what might these controls be? We have analysed a piece of lung tissue from mummified remains from a casket rather than the soil. As detailed in previous papers (, rigorous efforts were taken to avoid contamination during sampling and storage. We have never grown M. tuberculosis or sequenced TB genomes in the lab. Where else could the M. tuberculosis DNA have come from other than the sampled individual?
  • We recommend that the authors of these three studies demonstrate the veracity of their findings using a targeted capture approach and further bioinformatic analysis.
  • There is some faulty logic here. We are indeed contemplating using a capture-based approach to increase the sensitivity of our analyses, but this will do nothing for the speciificity of the approach, since any contaminating sequences which map to the pathogenic reference strains in silico are likely to be captured in vitro anyway because of their similarity to the bait. The answer is instead to increase the stringency of mapping and look for a consilience of results from multiple sources of evidence (e.g. evenness of coverage, SNPs that allow an assignment within an established clade), which we have done.

We are continuing to perform metagenomics on mummified material from Vác and on other historical samples and will be publishing additional studies in due course. This is an exciting area of research and one does has to be careful in interpretation, but our findings stand firm. Anyone who wants to repeat the analyses we reported in Chan et al is welcome to do so. The reads are available here:

But I am afraid I agree with Campana et al when they critiicise the other two papers, Thèves et al because, inter alia, you cannot tell Shigella from E. coli by 16S and Khairat et al, because no sequence data is available in the public domain. Caveat lector! But enjoy the excitement of progress in this field (see also

Search this blog

Blog archive


Most recent comments

  • I have just seen this paper on the ~7–thousand–year auroochs genome:… by Mark Pallen on this entry
  • Hi Chris, You are right that there is nothing implicit in being Open Access that guarantees a right … by Mark Pallen on this entry
  • Good to see it on biorxiv. I didn't fully follow the second point about elife, your criticism seems … by Chris Keene on this entry
  • Congratulations to Professor Achtman. by Madikay Senghore on this entry
  • Hi Shilp, glad that you found it useful . I used seqtk sample. So if I had 100 reads for 90% seqtk s… by Andrew Millard on this entry
RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder