In the second of two articles, Jarek Bryk describes how scientists dig deep into our genes – to test the molecular basis of an evolutionary adaptation in humans.
The DNA of every organism holds information about its recent and ancient evolutionary history. By studying the patterns and changes in the DNA sequence – comparing the sequences between different individuals or species – we can uncover what has happened to them. We can find out which genes or fragments of the genome are likely to have provided an advantage to those individuals and species that carried them, allowing for their better survival and reproduction (see glossary for all terms in bold).
In a previous article (Bryk, 2010), I described a few examples of such beneficial genetic changes in humans and other organisms. Demonstrating which genetic changes might have been beneficial is difficult – especially in humans – but an even greater challenge is demonstrating the mechanism by which these changes could have improved the organisms’ survival and reproduction.
In this article, I present one of the approaches that scientists can use first to identify regions of our genome that could have helped us survive and reproduce, and then to test how these regions might have provided our ancestors with an advantage.
One of the ways potentially beneficial regions of our genome can be identified is simply by comparing DNA sequences of many individuals from different populations. In a very simple scenario, if one of these populations has been under selective pressure (for example, high UV radiation in a sunny region) that was absent in the other populations, the DNA sequence responsible for an appropriate adaptation (for example, a darker skin colour) should be different.
In the vast majority of cases, however, we do not know what selective pressures populations were exposed to in the past, or which genetic sequences are responsible for the adaptations. Let us begin, therefore, by comparing DNA sequences between human populations without any assumptions about what we may find. Figure 1 shows one such comparison, for a single nucleotide of the human genome.
When individuals have different nucleotides at a particular position in the DNA sequence, we call this a single nucleotide polymorphism (SNP, pronounced ‘snip’); three million such variants of the human genome are catalogued in the publicly available HapMap databasew1. The SNP in Figure 1, rs3827760, is found in two variants or alleles: at that point in the sequence, one of two bases may be found – either thymine (T) or cytosine (C).
Each circle in the figure represents a single population, and depicts the frequency in that population of the two possible alleles.
The thymine-containing allele is present in all African and most European samples, but is almost completely absent in East Asia and the Americas, where cytosine is most prevalent at that position in the sequence (Sabeti et al., 2007, 2006; Xue et al., 2009).
If we did this comparison for all the other three million SNPs from the HapMap we would see that the distribution of rs3827760 variants among human populations is very unusual. Thus rs3827760 definitely deserves a more detailed look, even though the distribution does not tell us anything about the potential benefit of the variants (their adaptive value), or even whether they are adaptive at all. All we know so far is that for some reason, the original thymine that was present at this position in ancestral human populations in Africa changed to cytosine, and that this change spread through East Asians and Americans. Even the estimates of when this change happened are very imprecise: somewhere between 1000 and 70 000 years ago, all individuals in East Asian populations had the cytosine variant.
recombination, parts of the
DNA are exchanged between
maternal (green) and
paternal (grey) chromosomes
and passed on to the next
generation in a new
configuration. The closer two
regions are together, the less
likely they are to be
recombination: A and B are
closer together than B and C,
thus are less likely to be
separated. Click to enlarge
Image courtesy of Nicola Graf
How, then, can we decide whether this pattern arose due to positive selection (cytosine conferred an advantage in East Asia and the Americas) or is simply due to chance? To see whether the DNA change (thymine to cytosine) was positively selected, we look at the DNA sequence around the SNP. If the DNA sequence surrounding rs3827760 were similar in all populations, we would have no evidence that the SNP had an effect on the organism’s fitness. If, however, one population (East Asian, for example) were exposed to a selective pressure and rs3827760 contributed to the development of adaptation to that selective pressure, DNA sequences around the SNP would differ between populations. To understand why that is, see Figure 2.
When the DNA sequences around rs3827760 are compared, it becomes obvious that the diversity around the cytosine variants in the East Asian populations is indeed much lower than the diversity around the thymine variants found in the African and European populations (the Americans were not tested). This suggests that positive selection was responsible for the cytosine variant spreading in the East Asian populations. But was this SNP really selected – does it in fact do anything?
Not all changes to the DNA sequence have an effect on protein sequences: most of the SNPs catalogued in the HapMap database either are located in the non-coding parts of the genome (e.g. between genes) or are synonymous – that is, they are located in the coding part of the genome but do not cause a change in the protein sequence encoded (see Figure 4).
In the case of rs3827760 we are lucky, because it is located in the coding part of a gene – towards the end of a gene called EDAR, which is involved in the development of hair follicles, sweat glands and teeth. Furthermore, the thymine-to-cytosine DNA sequence change results in a change in the protein sequence: Africans and Europeans (carrying the SNP variant with thymine) have the amino acid valine at position 370 of the protein, whereas East Asians and Americans (with the nucleotide cytosine) have the amino acid alanine. This part of the protein is involved in interactions with other proteins, and mutations there are known to cause ectodermal dysplasias – abnormal development of the teeth, hair and sweat glands – in humans and mice (see Figure 5). This fact strongly suggests that an amino acid change at position 370 may not only change the sequence of the protein but also how the protein behaves, affecting the physical characteristics of the organism itself.
To see whether the change in protein sequence really does affect its function, we turn to experiments on the biochemical pathway in which the EDAR protein takes part: a series of reactions which are involved in the development of hair follicles, sweat glands and teeth. When these reactions were carried out in the laboratory, the alanine variant of the protein (found in East Asians and Americans, encoded by the cytosine SNP variant) was found to make the pathway more active than the valine variant (found in Africans and Europeans, encoded by the thymine SNP variant) did. This ties in with comparisons of hair structure, which show that people with the alanine variant have thicker hair than people with the valine variant. For a more direct demonstration, mice were genetically modified to increase the activity of the EDAR pathway. These mice had visibly denser fur with thicker hair, as well as larger salivary glands, than mice with normal EDAR activity (Chunyan et al., 2008; Chang et al., 2009).
Taken together, these findings suggest that the two SNP variants (containing either thymine or cytosine) may affect both the structure and function of the EDAR protein, and may lead to physical differences in humans: differences in hair thickness and, potentially, the size of the salivary glands.
The differences in the DNA sequences that we observe now are historical records of natural experiments, and we can only speculate about the selective pressures that the Asian and American populations were exposed to, which encouraged the spread of the cytosine allele. But the combination of genomic studies, laboratory experiments and animal models makes it possible to test hypotheses about the functional roles of genetic differences between populations or species. Using these approaches, we may uncover the molecular basis of past adaptations in our ancestors and other organisms, highlighting how we adapt to a constantly changing environment.
Adaptive value: a trait has an adaptive value if it enables an individual to survive and reproduce better in a given environment than individuals that do not possess this trait. More formally, a trait is regarded as adaptive if it increases fitness.
Fitness: a hard-to-define formal term from evolutionary biology and population genetics; it describes the average number of offspring over one generation that is associated with one genotype compared to another genotype in a population. Thus genotypes that produce more offspring have greater fitness.
Genome: the total DNA of an organism. Usually understood in eukaryotes as the total nuclear DNA, as opposed to including mitochondrial or plastid DNA. For further information, see ‘What is a genome’ on the US National Library of Medicine websitew2.
Positive selection: natural selection is one of the mechanisms of evolution; it describes the different survival and reproduction of individuals in a given environment. Natural selection is called ‘positive’ when it promotes certain traits that help individuals to survive and reproduce better than others.
Selective pressure: a feature of the environment (e.g. temperature; presence of parasites; predation or aggression from members of the same species) that imposes differential survival and reproduction of individuals.