With funding from an STFC Food Network+ (SFN) Scoping Grant, an exciting project has shown the potential for computational genetics to detect the most dangerous strains of a notorious food-borne bacteria, Shigatoxigenic E. coli
No matter how inviting a plate of food may look, it could be harbouring an invisible threat. According to the Food Standards Agency, 2.4 million people in the UK become ill from food-borne pathogens every year, at a cost of to society of around £9.1 billion. One of the most notorious of these is Shiga-toxin producing (Shigatoxigenic) Escherichia coli (STEC), which can cause symptoms ranging from gastroenteritis and kidney failure, to meningitis and even death. A public health priority risk, STEC infections are most commonly associated with undercooked minced meat products, underwashed salad vegetables, unpasteurised dairy products, and handling foods with soil residues. The World Health Organization estimate that in 2010 food-borne STEC caused more than one million illnesses, 128 deaths, and nearly 13,000 Disability Adjusted Life Years.
Controlling these outbreaks depends on being able to rapidly identify contaminated food products. But this is complicated by the fact that not all STEC strains are pathogenic, as Nicola Holden, Professor in Food Safety at SRUC, explains. “For an individual STEC strain to cause disease, it needs the right combination of several different factors. These include the anitgens on its surface, the toxins it produces, and the virulence factors that enable it to infect a cell. The exact combination of these will determine the ability to cause severe disease. It is similar to the different coronavirus variants, where small differences in, for instance, the spike protein, affect transmissibility and disease severity.”
To date, most cases of food poisoning from STEC have been caused by the O157 strain, so-called because it carries a distinct ‘O antigen’ that can be recognised in serology tests. But recent years have seen a worrying trend: a rise in STEC cases caused by non-O157 isolates.
“We currently don’t have any rapid means of identifying the STEC strains behind these new disease cases” says Nicola. “Diagnostic laboratories have to carry out additional tests to assess what combination of antigens, toxins and virulence factors a particular strain has, so they can work out the overall likelihood that it can cause severe disease. This takes time and introduces uncertainty.”
Unless we can detect these dangerous pathogens quickly, outbreaks can rapidly spiral out of control. Consequently, Nicola believes it is time to overhaul these “historic and cumbersome” diagnostic methods and instead adopt a new approach that uses the power of genetics. “We saw during the COVID-19 pandemic how the introduction of lateral flow devices completely changed the game when it came to controlling the spread of the disease” she says. “Ultimately, that is what we need for STEC: a DNA-based method to enable rapid diagnostics using miniaturised point-of-care devices. This would help identify contaminated products and likely transmission routes quickly enough to control potentially dangerous STEC outbreaks.”
In December 2021, Nicola was awarded a STFC Scoping Grant to explore how feasible this would be by searching for genetic signatures that could distinguish between pathogenic and non-pathogenic STEC strains.
Step one: Assembling a genomic library
The first stage of the project was to compile as many genomes as possible from a diverse range of STEC isolates. Together with her co-investigators Dr Martynn Winn, a Computational Biologist at STFC Harwell, and Dr Tim Dallman, an expert in food-borne pathogens at the University of Utrecht, Nicola convened an online stakeholder workshop in December 2021. This brought together a wide range of research- and policy-related organisations who work on STEC, including the Scottish E. coli reference lab, the Food Standards Agency and Food Standards Scotland, and the Animal and Plant Health Agency.
“Working with these partners, we were able to access a good range of different STEC genomes held in reference databases, over 200 in total” says Nicola. “Crucially, these included 104 samples from human patients that we knew had been responsible for causing clinical disease.” The remaining samples had been collected during surveys on Scottish deer, cheese, and mince.
Step two: Comparing genes and genomes
Having acquired a diverse library of genomes, the team then used comparative genetic approaches to categorise genes as being ‘disease related’ and ‘non disease’ related.
First, they took an ‘informed’ approach, by searching for specific genes known to code for virulence factors in clinically pathological isolates. “Because these genes are more likely to be associated with pathological isolates, their presence indicates a likely clinical disease outcome” says Nicola. Mapping these virulence factor genes across the wider set of STEC genomes identified their presence in certain food and wildlife isolates, indicating that these may also be pathogenic.
In the second stage, the team used an ‘unguided’ approach, that assumed no prior knowledge about the genes and their functions. Instead, the genomes as a whole were compared against each other to assess which regions were shared across the different samples. “This approach enables us to quickly assess which genetic regions are commonly seen across different pathological samples, and could therefore be associated with clinical disease” says Nicola.
Using these methods, the team successfully identified common genetic signatures that could distinguish different classes of STEC.
Step three: Explore the potential of Big Data
Besides these relatively straightforward comparative techniques, the team were also keen to investigate the potential of Big Data approaches, including machine learning- and artificial intelligence-based methods. With the SFN funding, they provided a three-month internship to PhD student Eddie Martin (University of Edinburgh) to explore whether these could help distinguish the most harmful pathogens from closely related ones that don’t cause disease.
“AI tools that use deep-learning offer exciting potential to investigate beyond the identification of genomic sequences” says Nicola. “For instance, they could help to determine if a similarity in sequence between two genes ultimately translates into functional similarity.”
Going forwardIn August 2022, Nicola and her colleagues convened a second stakeholder workshop at STFC Harwell to share their results so far, and to discuss the next stages for the project. The team are now seeking funding for a larger project to exploit the predictive power of Big Data to accurately classify pathogenic STEC.
“Further project development will fall into two main areas” says Nicola. “First, the basic bioscience – that is, refining the computational approaches so that these can sufficiently discriminate pathogenic bacteria. And secondly, applied bioscience to assess how to incorporate these methods into point-of-care devices for rapid diagnostics and surveillance.”
“Another challenge we hope to address is data accessibility” she adds. “There are currently many barriers to obtaining high-quality genomic data associated with clinical disease, and this is true for any human pathogen. This makes it crucial that this work is developed in partnership with our stakeholders. The SFN+ project provided an excellent opportunity to bring together a diverse team who would not have had the chance otherwise.”
According to Nicola, once DNA sequence-based diagnostic approaches have been refined, they would be an excellent route forward to discriminate between any set of pathogens and other organisms. We may never be able to eliminate food-borne bugs completely, but the near future shows promise for them no longer being such a scourge within our food systems.
You can keep up to date with Nicola’s work by following her on Twitter: @NicolaJHolden
Antigen: A toxin or other foreign substance which provokes an immune response in the body, particularly the production of antibodies.
Serology test: A laboratory test that assesses the presence of antibodies and other substances in a blood sample.
September 2022 - Caroline Wood, Freelance Science Writer