453
115
1
Biochemistry, Genetics and Molecular Biology
Genetics
The expected magnitude of phenotypic differences between human populations under genetic drift is often underestimated. This commentary challenges recent claims of minimal differences by addressing statistical weaknesses in Lala & Feldman[1], Gusev[2][3], and Roseman & Bird[4], specifically the misinterpretation of polygenicity’s role in genetic drift, the failure to adjust for diploidy, and the use of non-standard effect size metrics. Using typically-reported FST values and heritabilities, medium to large phenotypic differences are expected under genetic drift across major human biogeographic ancestry groups. It is noted that specific phenotypic differences may also be shaped by other evolutionary forces, such as convergent or divergent selection, as well as environmental factors. By clarifying the mathematical basis for expected differences, this comment advances the discussion on genetic variance and its implications for human phenotypic diversity.
Under conditions of neutrality, the expected phenotypic variance between populations is approximately proportional to their average genetic variance. Given commonly reported narrow heritabilities for traits within major human biogeographic ancestry groups —such as East Asians, Europeans, and Sub-Saharan Africans—and typically reported FST values among these[5], one would anticipate medium to large phenotypic differences arising under genetic drift. Biologists have frequently underplayed the expected magnitude of these differences, a tendency that some attribute to political considerations. David Reich, who runs a major genetics lab at Harvard University, elaborates on this issue (p. 254)[6]:
When asked about the possibility of biological differences among human, we have tended to obfuscate, making mathematical statements in the spirit of Richard Lewontin about the average difference between individuals from within any one population being around six times greater than the average difference between populations… But this carefully worded formulation is deliberately masking the possibility of substantial average differences in biological traits across populations.
Common assertions such as 'genetic differences among populations are small in comparison to variation within' are not technically incorrect, yet they often mislead. By similar reasoning, what are considered large effects in the social and biomedical sciences could also be dismissed as 'small in comparison to variation within'. Several researchers have moved beyond merely suggesting 'small' differences; they purport to demonstrate small expected differences either theoretically or quantitatively. In this context, recent statements by Lala & Feldman[1], Gusev[2][3], and Roseman and Bird[4] are critically reviewed.
In conservation biology, it is standard practice to compare phenotypic and genetic variance in order to detect signals of selection[7]. This practice involves what are known as QST-FST comparisons, where QST measures the additive genetic differentiation in quantitative traits among populations, and FST measures genetic differentiation based on allele frequencies at genetic loci. Whitlock & Guillaume[8] provide the formula for QST in context to diploids:
Qst = σ2ABσ2AB+2σ2AW,where σ2AB and σ2Aw are, respectively, the between and within group phenotypic variances due to
additive genetics.
By rearranging the terms, the formula can be expressed as:
σ2AB = 2Qst∗ σ2AW1−QstIn this formula, the factor of 2 adjusts for the distribution of variance among diploids, where approximately half of the variance occurs within individuals between homologous chromosomes. Given that FST and QST are equivalent under conditions of neutrality, this formula can be adapted to predict expected phenotypic variance attributable to additive genetic variation as follows:
σ2AB = 2Fst∗ σ2AW1−FstConceptually, FST values range from 0 (no genetic differentiation, all alleles shared) to 1 (complete genetic differentiation, no alleles shared). However, in practice, FST values are constrained by within-population heterozygosity[9][10]. As a result, highly variable markers like microsatellites, which typically exhibit high heterozygosity, often produce maximum FST values significantly below 1. For example, Meirmans & Hedrick[9] note:
To illustrate this relationship, Fig. 1 gives the joint values of FST and HS found in the past 4 years in Molecular Ecology…. Notice that the observed range of FST is always less than HS and that the range of FST becomes very small when HS is large. For example when HS = 0.9, a value that is commonly encountered for microsatellite markers, the maximum possible value of FST is 0.1.
For this reason, for QST-FST comparisons, it is commonly recommended to use markers with lower variability, such as SNPs[11], to align QST and FST more closely on the same scale. However, even with SNPs, the maximum FST value is usually well below 1, which may lead to an underestimation of genetic variance (on a scale from 0 to 1).
Lala and Feldman[1] argue against the possibility of ancestry group differences in IQ and scholastic attainment by asserting that "recent human evolution has been dominated by drift rather than selection," and that drift could not lead to large differences in highly polygenic traits. They contend:
However, if, as the data suggest, intelligence is affected by many genes of small effect, it becomes implausible that IQ differences between socially defined races arose through a process of random genetic drift; this is relevant because analyses of genetic variation show that recent human evolution has been dominated by drift rather than selection (89). The probability that a long sequence of random changes would all go in the same direction, leading to increases in the intelligence of one population and not others, approaches zero.
The argument that non-trivial differences in a trait are implausible under drift is based on a misunderstanding, as noted by Yair & Coop[12]. This misconception contradicts well-established evolutionary theory. Yair and Coop[12] clarify:
Naively, as trait-increasing alleles underlying a neutral trait are equally likely to drift up or down, one might think that over many loci we expect only a small mean difference between populations. However, the polygenic score is a sum rather than a mean, and so each locus we add into the score is like an additional step in the random walk that two populations take away from each other [99]. We expect the variance among populations, i.e. the average squared difference between population means and the global mean, to be 2VA FST[1][13].
Edge and Rosenberg[14][15] note that the expected magnitude of differences under drift is independent of polygenicity. If highly polygenic traits cannot diverge under drift and divergent selection is rare, as Lala & Feldman[1] suggest, how do they explain the substantial anatomical differences across human ancestry groups documented by physical anthropologists?
Gusev[2][3] acknowledges that human populations could diverge substantially in highly polygenic traits under drift but argues that the magnitude of the divergence is equivalent to 1VA FST -- thus with no adjustment for ploidy. Gusev[2] states:
In the context of population differences — a focus of the piece in The Atlantic — direct/within-family heritability provides an upper bound on how much a trait can drift between populations under neutrality (see[15] and summary). For educational attainment, for example, we can already calculate that the expected variance between continental populations under neutrality is minuscule: heritability*Fst = 0.04*0.15 = 0.006
Further expanding, in Gusev[3]:
As a consequence, “the proportion of heritable variance in the trait attributable to genetic differences between the populations” (QST) is approximately equal to the cross-population (Hudson) FST (which[15] rederive as FST,l). This quantity also does not depend on polygenicity. Note: the relationship is sometimes reported as 2*FST*h2, but this is only an approximation for Nei’s FST: Nei’s FST is approximately equal to half of Hudson’s FST when the former is close to 0 or 1 (see [8.6]).
Gusev[3] attributes the adjustment to a correction for properties of the FST estimator, which differs from its primary rationale. He references Edge and Rosenberg, who clarify the context for diploids in their[14] study:
To keep our model simple, we considered haploids rather than diploids. For diploids, the analysis would proceed similarly, but because diploids have two alleles at each locus, comparable information for distinguishing populations is achieved in a diploid model with half as many loci as in a haploid model. A slightly modified expression for QST in diploids takes this difference into account[16][7].
This adjustment, the factor of 2, is explicitly incorporated in Formula 28 of Edge and Rosenberg[15]. The rationale for this correction is illustrated in Figure 1:
In diploids, the additive genetic variance within populations (VAw, ithin) reflects differences between individuals, as shown in Figure 1’s “Variance between individuals within populations.” The factor of 2 in the QST equation’s denominator (VGa, mong ÷ [VGa, mong + 2VAw, ithin]) accounts for roughly half the genetic variance among diploids being within individuals, not between them, due to two homologous chromosomes[16]. VAw, ithin scales with the additive effects of two alleles per locus, doubling the within-population variance contribution relative to a haploid. Figure 1’s “Variance between chromosomes within individuals,” part of “Variance within individuals,” captures this within-individual variance, which is not part of VAw, ithin. Thus, the factor of 2 in QST adjusts for this by ensuring proper scaling of VAw, ithin for diploids.
Roseman & Bird[4] state: "We wish to lay particular emphasis on the following point: Under the neutral additive expectation … the expected difference between two lineages as sampled randomly after evolving under random genetic drift is 0." If this is intended to suggest that the expected difference in a specific trait under drift between two populations is 0, this interpretation is inconsistent with established theory, as also discussed in our commentary on Lala & Feldman[1].
Despite this statement, Roseman & Bird[4] compute σ2B correctly. However, they nonetheless conclude that "unless there is pronounced natural selection acting to differentiate pairs of groups, large pairwise differences between groups would rarely occur" and that "The only way that large amounts of evolutionary divergence among groups in IQ could be reconciled with the FST values estimated using neutrally evolving polymorphisms is if strong natural selection had acted to make the groups diverge from one another."
They base this conclusion on their computed "expected absolute difference between groups". For this statistic, they reference equation 4 in Bird[17] and provide a formula in their appendix 2:
E[|Δi,j|] =2σB√π = 2√σ2B1.772where σ2B represents the between-group additive genetic variance.
I refer to this statistic as Bird’s b. Interpretative claims about the magnitude of differences (e.g., “large pairwise differences”), the use of a related metric in Bird[17], and the stated goal of evaluating claims in the “hereditarian race science literature” where the focus “is on understanding the absolute number of, say, IQ points” strongly suggest that Bird’s b is treated as equivalent to Cohen’s d, a standard effect size with established benchmarks for classifying group differences as “large” (i.e., d ≥ 0.8). However, Bird’s b is not equivalent to Cohen’s d. The former includes a constant denominator, while the latter accounts for within-group variance, leading to a clear discrepancy. This difference can be demonstrated straightforwardly. The formula for Cohen’s d, assuming equal variances within groups, is:
d=M1−M2σWpooled = =M1−M2√σ2WpooledUnder the law of total variance, the total variance \(\sigma_{total}^{2}\\)is the sum of within-group variance σ2W and between-group variance σ2B. The variance between groups can then be expressed as:
σ2between=n1(M1−M)2− n2(M2−M)2n1+n2−1Noting that
M1−M=M1− n1M1− n2M2n1+n2 and M2−M=M1− n1M1− n2M2n1+n2Assuming equal sample sizes, we can substitute and rearrange the terms as:
M1−M2= √σ2B∗4(2n−1n)Under the condition of large sample sizes 2n−1n ≈ 1, and we get:
M1−M2= 2√σ2BThis result corresponds to the numerator, but not denominator, in Bird’s equation for Bird’s b. When variances are the same, we obtain:
d ≂ 2√σ2B√σ2=2√η21−η2,which is equivalent to converting Cohen’s d from eta-squared.
It’s evident that the denominator in Bird’s b differs from that in the usual Cohen’s d formula. Bird’s b divides by a constant, not by within-group variance. As a result, Bird’s b underestimates standardized differences in proportion to between-group variance. For example, at σ2B =0.5, Bird’s b is 0.8, while Cohen’s d is 2; at σ2B =0.9, Bird’s b is 1.07, while Cohen’s d is 6. Thus, at best, Bird’s b is an idiosyncratic metric, not equivalent to those commonly used in the social and biomedical sciences.
To illustrate, Table 1 presents the expected differences as expressed in Bird’s b and in Cohen’s d across various FST values and heritabilities. Notably, Roseman & Bird’s[4] “very small absolute differences” in Bird’s b turn out to be medium to large sized differences in Cohen’s d.
Given typically reported kinship-based heritabilities (e.g., Polderman et al.[18]) and typical SNP-based FST values between major human populations, we should anticipate medium to large-sized differences in arbitrary traits under neutral divergence. For example, Roseman & Bird[4] adopt a FST = 0.12 for SNPs underlying educational and intelligence-related traits based on the results of Bird[17]. When h2 =.35 /.50, Formula 3 yields σ2AB = 0.095 /.136, which, with Formula 10, yields d = .650 /.795 – a predicted moderate / large-sized effect owing to exclusively additive genetic differences. While factors discussed below might lead us to not expect such pronounced behavioral differences, this magnitude of divergence is consistent with many anthropometric traits.
FST | h2 | σ2AB | Cohen’s d | Bird’s b | Interpretation |
---|---|---|---|---|---|
0.05 | 0.05 | 0.0053 | 0.15 | 0.08 | Small |
0.05 | 0.20 | 0.0211 | 0.29 | 0.16 | Small to Medium |
0.05 | 0.35 | 0.0368 | 0.39 | 0.22 | Small to Medium |
0.05 | 0.50 | 0.0526 | 0.47 | 0.26 | Medium |
0.05 | 0.65 | 0.0684 | 0.54 | 0.30 | Medium |
0.10 | 0.05 | 0.0111 | 0.21 | 0.12 | Small |
0.10 | 0.20 | 0.0444 | 0.43 | 0.24 | Small to Medium |
0.10 | 0.35 | 0.0778 | 0.58 | 0.31 | Medium |
0.10 | 0.50 | 0.1111 | 0.71 | 0.38 | Medium to Large |
0.10 | 0.65 | 0.1444 | 0.82 | 0.43 | Large |
0.15 | 0.05 | 0.0176 | 0.27 | 0.15 | Small to Medium |
0.15 | 0.20 | 0.0706 | 0.55 | 0.30 | Medium |
0.15 | 0.35 | 0.1235 | 0.75 | 0.40 | Medium to Large |
0.15 | 0.50 | 0.1765 | 0.93 | 0.47 | Large |
0.15 | 0.65 | 0.2294 | 1.09 | 0.54 | Large |
0.20 | 0.05 | 0.0250 | 0.32 | 0.18 | Small to Medium |
0.20 | 0.20 | 0.1000 | 0.67 | 0.36 | Medium to Large |
0.20 | 0.35 | 0.1750 | 0.92 | 0.47 | Large |
0.20 | 0.50 | 0.2500 | 1.15 | 0.56 | Large |
0.20 | 0.65 | 0.3250 | 1.39 | 0.64 | Large |
0.25 | 0.05 | 0.0333 | 0.37 | 0.21 | Small to Medium |
0.25 | 0.20 | 0.1333 | 0.78 | 0.41 | Medium to Large |
0.25 | 0.35 | 0.2333 | 1.10 | 0.55 | Large |
0.25 | 0.50 | 0.3333 | 1.41 | 0.65 | Large |
0.25 | 0.65 | 0.4333 | 1.75 | 0.74 | Large |
There is a tendency among some biologists to downplay the magnitude of genetic differences between human populations, which may reflect an effort to avoid conclusions similar to those the young Franz Boas drew:
It does not seem probable that the minds of races which show variations in their anatomical structure should act in exactly the same manner. Differences of structure must be accompanied by differences of function, physiological as well as psychological; and, as we found clear evidence of difference in structure between the races, so we must anticipate that differences in mental characteristics will be found.[19].
Lala & Feldman[1], Gusev[2][3], and Roseman and Bird[4] are notable in this context for their theoretical quantitative claims, which extend beyond carefully worded qualitative statements. Lala & Feldman[1] assert that differences in polygenic traits under genetic drift cannot be substantial. Gusev[2][3] acknowledges the widely accepted relationship between expected phenotypic variance and genetic variance but fails to account for the necessary adjustment for diploidy. While Roseman and Bird[4] do incorporate this adjustment, they introduce an effect-size-like statistic — denoted here as Bird’s b — that does not align with commonly used metrics in the social and biomedical sciences, where established interpretive guidelines are available.
While these authors focus on academic achievement and intelligence, the argument is more general: medium to large differences between human ancestry groups can arise without pronounced natural selection. The evolutionary default is not zero phenotypic difference but rather differences proportional to neutral genetic divergence, adjusted for ploidy.
Although this commentary emphasizes statistical expectations under neutral genetic drift, it is acknowledged that complex behavioral traits like intelligence may exhibit smaller-than-expected differences due to factors such as low narrow-sense heritability or stabilizing/convergent selection across populations. For example, personality traits—unlike many anthropometric differences—often show only small differences between ancestry groups within the same country (e.g., Foldes et al.[20]).
Understanding these expected differences can illuminate the evolution and genetic structure of such traits. For instance, the consistently small personality differences, even between the most genetically distant ancestry groups, pose an intriguing question for future research—an area currently obscured by misunderstandings about the expected magnitude of genetic differences under neutrality.