Coalescence at bottlenecks

This is Part 1 of my response to Dr Dennis Venema’s second Biologos Blog “Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 2)”. Dr Venema was responding to my blog at Nature Ecology and Evolution Community about his book Adam and the Genome. Since Dr Venema’s Part 1 blog responding to me, a vigorous debate has been ongoing on the Biologos Forum here. This debate is now beginning to come to a conclusion, so I have a bit of time to respond to the Part 2 blog by Dennis, which branched out from the Forum debate. Here is Part 1 of my response.

In his blog “Adam, Eve and Population Genetics: A Reply to Dr. Richard Buggs (Part 2)”, Dr Dennis Venema raises the interesting issue of how much coalescence will occur at a bottleneck of two. This is a very relevant issue as rates of coalescence can be used to detect a long lasting bottleneck and it is possible that they might be able to detect a short, sharp one too. Dr Venema makes the strong claim that, at a one-generation bottleneck of two, “on average about 25% of genes will coalesce,” and this will leave “a detectable mark” on the genome.

Dr Venema does not back up this claim by any direct reference to the literature. He seems to base it on his own extrapolations from a calculation in the literature about heterozygosity after a bottleneck (a calculation that I pointed out to him in my original email). He expresses it like this in his blog: “even in the most favorable conditions after a bottleneck [of two], heterozygosity is preserved only about 75% of the time”.

This is correct. But from this he goes on to reason:

“This means that about 25% of the time, heterozygosity is lost, and that only one allele remains in the population for a given gene. If only one allele is present, then this is a coalescence point for that gene: going forward, we will have to wait for mutations to produce new alleles, and those new alleles will coalesce back to their single ancestral allele that survived the bottleneck. In the future, as new alleles are produced from the surviving allele through mutation, the new alleles will all coalesce within a few generations of the bottleneck. Their TMRCA values will thus be almost identical… Coalescent-based methods are thus an excellent way to detect bottlenecks—even really brief ones, if they are severe enough. Even a brief, severe bottleneck will still greatly increase the chances of alleles being lost, and the telltale signature of numerous genes that coalesce within a short time frame.”

Two mistakes

I think that Dr Venema is wrong in making this claim. Let me explain why. I think that he is making at least two mistakes here.

(1) In calculations that show that 75% of heterozygosity would be maintained after a bottleneck, the level of heterozygosity before the bottleneck is “known”. But coalescent models run backwards in time, and we can only “see” those lineages that survive the bottleneck. Thus we cannot directly know how many alleles were lost via sampling at the bottleneck. The loss of alleles via the sampling effect of the bottleneck will not show up as coalescence events in a coalescence model. These are two separate effects of a bottleneck.

(2) Dennis is assuming that if only one allele is present in a population, then that allele has coalesced. This is a misunderstanding of coalescent theory. In coalescent theory, two gene lineages only coalesce when they reach a single copy in a single genome within a population. This means that if only one allele is present at a particular locus in a bottleneck of two, we know for sure that this allele has NOT coalesced, as it is present in four genomes (two in each person). It must therefore coalesce before the bottleneck. If the ancestral population is large, that coalescence will be a long time before the bottleneck.

An illustration

It might help at this stage if I give an illustration of what a coalescence scenario can look like at a bottleneck of two followed by population doubling at each generation. Figure 1 (below) traces back one of many possible ancestries of a sample of eight copies of a gene found in four individuals. To understand this figure best, you need to start at the bottom, where the four sampled individuals are, in generation g13 after the bottleneck. This is the present. Each individual contains two copies of the gene we are interested in, and if the individual is a male, he is depicted as a square, whereas if the individual is a female, she is depicted as an oval. Different alleles of the gene are shown as different colours. There are five alleles sampled in generation 13. Individuals that do not contain coloured circles are individuals that were not sampled, or do not contain one of the lineages leading to the sampled individuals.

In the figure, the lines joining the coloured dots show the line of ancestry of that particular copy of the gene in this particular scenario. They run back though parents (g12) to grandparents (g11) and so on. Every-so-often, two lineages coalesce. For example the pink lineage coalesces in g10. Every-so-often a mutation occurs. For example, the pink lineage mutates to become the yellow lineage in g7. Note however, that this does not mean that the formerly pink lineage has now coalesced with all the yellow lineages: they don’t all coalesce until g-4, before the bottleneck.

Figure 1. A coalescence scenario for gene copies in a four people, sampled 13 generations after a bottleneck of two

In this figure, the only coalescence that occurs at the bottleneck of two in generation g0 is that of three blue lineages. No other lineages coalesce at the bottleneck.

Looking at this diagram, it is obvious that coalescence down to a single lineage is actually impossible in generation g0, as it has to contain four lineages. It is also clear that coalescence events are very unlikely in the generations immediately prior to the bottleneck due to the large population size in generations g-1 to g-7. Any coalescence of all the lineages to just one lineage is most likely to occur in the generations subsequent to the bottleneck, when the population size is small. (Contrary to what Dr Venema says, a new mutation does not have to occur before a coalescence can happen after the bottleneck).

How coalescence to one lineage can occur after a bottleneck

How would coalescence to one lineage occur after a bottleneck? Figure 2 shows a scenario in the minimum possible number of generations.

Figure 2. How coalescence to a single lineage could occur at a bottleneck of two in a minimum number of generations.

To get coalescence to one lineage at a bottleneck of two in the scenario of Figure 2, we need one parent to only pass on one of his/her two copies of the gene to all of the offspring. We then need all of the children to pass on only that copy to all of their offspring. How likely is this to happen? Would it occur 25% of the time?

Calculating the chances

We can calculate the chances, because normally we would expect there to be a 0.5 chance of a parent passing a particular gene copy to their children. So the chances of what we see in generation g1 above are: 0.5 x 0.5 x 0.5 x 0.5 = 0.54 = 0.0625. The chances of what we see in generation g2, given what we see in generation g1, are 0.516 = 0.0000153. The overall probability of this is 0.520= 0.000000954.

As we have four starting lineages at the bottleneck, we need to multiply by four, to find the overall chance of having coalescence to one lineage at the bottleneck. This gives us 0.00000381. So all in all, we expect coalescence to a single lineage 0.000381% of the time. Not 25% of the time.

If the population grew faster than doubling each population, the probability of coalescence to one lineage at the bottleneck is even lower.

Other scenarios that take more generations are possible, but are much less likely if we are assuming rapid population growth at each generation. This is because the bigger the population, the less probably coalescence is. Including scenarios over more generations will only slightly raise the probability of coalescence to one lineage at the bottleneck. Similarly, if we only sample a subset of the lineages, the probability of all of our sample coalescing at the bottleneck is slightly higher than the probability of all lineages coalescing, but again, this will not make a huge difference. For example, if we sampled four individuals in generation g2 the probability of all sampled lineages coalescing in the bottleneck would be 0.000977.

If my calculations are correct (and I stand ready to be corrected if they are not) then Dennis is quite wrong to think that 25% of genes would coalesce to one lineage at a bottleneck of two. Less than 1% would.


Dr Venema is correct to say that 25% of heterozygsity will be lost in a population as it passes though a bottleneck of two, and he is correct that rate of coalescence will be fastest in the generations immediately after the bottleneck. However, he is mistaken in trying to link these two facts into a claim that “on average about 25% of genes will coalesce”. His line of reasoning is wrong because he has misunderstood coalescent theory. Whether or not rates of coalescence could be used to detect a one generation bottleneck of two individuals remains an open question in my mind.