How do scientists piece together genomic information from sequencing data? Play these two fun online puzzles to find out.
Characterising the makeup of the many microbes that live on and inside us helps us understand the possible roles those microbes play in human health and disease. Scientists use DNA sequencing technologies to analyse the genetic material of microbial samples to identify the microbial species present.
Rather than providing full genome sequences of the organism(s) in a microbial sample, DNA sequencing produces hundreds or thousands of short, linear pieces of microbial DNA. To try identifying the organism(s), scientists often use these DNA fragments and combine them into contiguous fragments of DNA (contigs) using computational approaches.
This educational resource guides teachers how to use a simple “puzzle” metaphor to introduce students to the concept of genome reconstruction of single bacteria and complex microbial communities.
DNA sequencing determines the nucleotide acid sequence of an organism’s unique hereditary information. As output, it generates hundreds or thousands of short, linear pieces of microbial DNA, which are fragments of the full DNA genome. The next step after DNA sequencing, therefore, involves combining (assembling) those fragments into contiguous fragments of DNA (contigs) using computational approaches.
The genome of a single bacterium is generally:
Commonly used DNA sequencing technologies (applying so-called 2nd generation sequencing) generate pieces of DNA that are:
Therefore, you can think of the task of genome reconstruction as a somewhat “hard” puzzle problem: we need to rebuild a whole image from its pieces.
How do we do that exactly when aiming to reconstruct the genome sequence of a single bacterium? In the most straightforward case, our organism has already been sequenced and its genome sequence has been deposited in a public repository (such as the EMBL-EBI’s European Nucleotide Archive, ENA). In this case, we can use this sequence to help us rebuild the “puzzle”, similarly as you would do by looking at the image on the cover of the puzzle box. This approach is called "mapping" – identifying where a specific piece of DNA comes from by comparing it to a known reference.
Remember that this is obviously a simplistic approach: due to their very high mutation rate, hardly ever is the genome of a sequenced bacterium absolutely identical to that of the reference genome. We must therefore be ready to accept that the mapping will not be perfect, and that the mismatches themselves, if sufficiently proven, might be the most interesting spots in the genome.
What are the added challenges when reconstructing the genomes in a complex microbial community such as your gut microbiome?
To solve this more complex problem, there are several strategies that, once more, resemble what you would instinctively do with a puzzle:
As mentioned, however, we might not have all the pieces we need to fully reconstruct the image. Since this image is the starting point to then investigate the bacterial composition in the sample (who is there) and subsequently their possible function (what they might be doing), take a second to think about the impact of the missing part of the data: aside from hampering a complete understanding of the microbial community, we must also understand that we can describe what we see, but we cannot claim any meaning from what we don’t see. Simply put, if I grab a few socks from my drawer and none of them is red, I cannot conclude that I have no red socks. Why? The overall complexity of the microbial community is too high for our sampling capacity; therefore, we will end up with missing data.
In this educational resource, the microbial genome puzzles are used as metaphors to illustrate how researchers move from raw genetic data produced by DNA sequencing to an overview of the genome of a single bacterium (Puzzle 1) or of a complex community such as a microbial sample of the gut (Puzzle 2).
The puzzle activity consists of two online puzzles: the first one shows a cartoon version of an Escherichia coli bacterium and can be used as a metaphor for single genome reconstruction, the second one is a cartoon representation of multiple microbes and can be used as a metaphor for the reconstruction of the genomes of a microbial community. You can access the puzzles below:
Please find below an outline of a possible approach of embedding the microbial genome puzzles in your genomics lessons.
This article was adapted from the one originally published as EMBL-ELLS Teaching Material.