• JF
Views

453

Downloads

115

Peer reviewers

1

Make action
PDF

Field

Biochemistry, Genetics and Molecular Biology

Subfield

Genetics

Open Peer Review

Preprint

4.00 | 1 peer reviewer

CC BY

Misestimation of Expected Genetic Differences – A Statistical Note on Some Recent Papers

John Fuerst1

Affiliation

  1. Department of Biotechnology, University of Maryland, United States

Abstract

The expected magnitude of phenotypic differences between human populations under genetic drift is often underestimated. This commentary challenges recent claims of minimal differences by addressing statistical weaknesses in Lala & Feldman[1], Gusev[2][3], and Roseman & Bird[4], specifically the misinterpretation of polygenicity’s role in genetic drift, the failure to adjust for diploidy, and the use of non-standard effect size metrics. Using typically-reported FST values and heritabilities, medium to large phenotypic differences are expected under genetic drift across major human biogeographic ancestry groups. It is noted that specific phenotypic differences may also be shaped by other evolutionary forces, such as convergent or divergent selection, as well as environmental factors. By clarifying the mathematical basis for expected differences, this comment advances the discussion on genetic variance and its implications for human phenotypic diversity.

Under conditions of neutrality, the expected phenotypic variance between populations is approximately proportional to their average genetic variance. Given commonly reported narrow heritabilities for traits within major human biogeographic ancestry groups —such as East Asians, Europeans, and Sub-Saharan Africans—and typically reported FST values among these[5], one would anticipate medium to large phenotypic differences arising under genetic drift. Biologists have frequently underplayed the expected magnitude of these differences, a tendency that some attribute to political considerations. David Reich, who runs a major genetics lab at Harvard University, elaborates on this issue (p. 254)[6]:

When asked about the possibility of biological differences among human, we have tended to obfuscate, making mathematical statements in the spirit of Richard Lewontin about the average difference between individuals from within any one population being around six times greater than the average difference between populations… But this carefully worded formulation is deliberately masking the possibility of substantial average differences in biological traits across populations.

Common assertions such as 'genetic differences among populations are small in comparison to variation within' are not technically incorrect, yet they often mislead. By similar reasoning, what are considered large effects in the social and biomedical sciences could also be dismissed as 'small in comparison to variation within'. Several researchers have moved beyond merely suggesting 'small' differences; they purport to demonstrate small expected differences either theoretically or quantitatively. In this context, recent statements by Lala & Feldman[1], Gusev[2][3], and Roseman and Bird[4] are critically reviewed.

In conservation biology, it is standard practice to compare phenotypic and genetic variance in order to detect signals of selection[7]. This practice involves what are known as QST-FST comparisons, where QST measures the additive genetic differentiation in quantitative traits among populations, and FST measures genetic differentiation based on allele frequencies at genetic loci. Whitlock & Guillaume[8] provide the formula for QST in context to diploids:

Qst = σ2ABσ2AB+2σ2AW,

where σ2AB and σ2Aw are, respectively, the between and within group phenotypic variances due to

additive genetics.

By rearranging the terms, the formula can be expressed as:

σ2AB = 2Qst σ2AW1Qst

In this formula, the factor of 2 adjusts for the distribution of variance among diploids, where approximately half of the variance occurs within individuals between homologous chromosomes. Given that FST and QST are equivalent under conditions of neutrality, this formula can be adapted to predict expected phenotypic variance attributable to additive genetic variation as follows:

σ2AB = 2Fst σ2AW1Fst

Conceptually, FST values range from 0 (no genetic differentiation, all alleles shared) to 1 (complete genetic differentiation, no alleles shared). However, in practice, FST values are constrained by within-population heterozygosity[9][10]. As a result, highly variable markers like microsatellites, which typically exhibit high heterozygosity, often produce maximum FST values significantly below 1. For example, Meirmans & Hedrick[9] note:

To illustrate this relationship, Fig. 1 gives the joint values of FST and HS found in the past 4 years in Molecular Ecology…. Notice that the observed range of FST is always less than HS and that the range of FST becomes very small when HS is large. For example when HS = 0.9, a value that is commonly encountered for microsatellite markers, the maximum possible value of FST is 0.1.

For this reason, for QST-FST comparisons, it is commonly recommended to use markers with lower variability, such as SNPs[11], to align QST and FST more closely on the same scale. However, even with SNPs, the maximum FST value is usually well below 1, which may lead to an underestimation of genetic variance (on a scale from 0 to 1).

1.1. Lala & Feldman[1].

Lala and Feldman[1] argue against the possibility of ancestry group differences in IQ and scholastic attainment by asserting that "recent human evolution has been dominated by drift rather than selection," and that drift could not lead to large differences in highly polygenic traits. They contend:

However, if, as the data suggest, intelligence is affected by many genes of small effect, it becomes implausible that IQ differences between socially defined races arose through a process of random genetic drift; this is relevant because analyses of genetic variation show that recent human evolution has been dominated by drift rather than selection (89). The probability that a long sequence of random changes would all go in the same direction, leading to increases in the intelligence of one population and not others, approaches zero.

The argument that non-trivial differences in a trait are implausible under drift is based on a misunderstanding, as noted by Yair & Coop[12]. This misconception contradicts well-established evolutionary theory. Yair and Coop[12] clarify:

Naively, as trait-increasing alleles underlying a neutral trait are equally likely to drift up or down, one might think that over many loci we expect only a small mean difference between populations. However, the polygenic score is a sum rather than a mean, and so each locus we add into the score is like an additional step in the random walk that two populations take away from each other [99]. We expect the variance among populations, i.e. the average squared difference between population means and the global mean, to be 2VA FST[1][13].

Edge and Rosenberg[14][15] note that the expected magnitude of differences under drift is independent of polygenicity. If highly polygenic traits cannot diverge under drift and divergent selection is rare, as Lala & Feldman[1] suggest, how do they explain the substantial anatomical differences across human ancestry groups documented by physical anthropologists?

1.2. Gusev[2][3]

Gusev[2][3] acknowledges that human populations could diverge substantially in highly polygenic traits under drift but argues that the magnitude of the divergence is equivalent to 1VA FST -- thus with no adjustment for ploidy. Gusev[2] states:

In the context of population differences — a focus of the piece in The Atlantic — direct/within-family heritability provides an upper bound on how much a trait can drift between populations under neutrality (see[15] and summary). For educational attainment, for example, we can already calculate that the expected variance between continental populations under neutrality is minuscule: heritability*Fst = 0.04*0.15 = 0.006

Further expanding, in Gusev[3]:

As a consequence, “the proportion of heritable variance in the trait attributable to genetic differences between the populations” (QST) is approximately equal to the cross-population (Hudson) FST (which[15] rederive as FST,l). This quantity also does not depend on polygenicity. Note: the relationship is sometimes reported as 2*FST*h2, but this is only an approximation for Nei’s FST: Nei’s FST is approximately equal to half of Hudson’s FST when the former is close to 0 or 1 (see [8.6]).

Gusev[3] attributes the adjustment to a correction for properties of the FST estimator, which differs from its primary rationale. He references Edge and Rosenberg, who clarify the context for diploids in their[14] study:

To keep our model simple, we considered haploids rather than diploids. For diploids, the analysis would proceed similarly, but because diploids have two alleles at each locus, comparable information for distinguishing populations is achieved in a diploid model with half as many loci as in a haploid model. A slightly modified expression for QST in diploids takes this difference into account[16][7].

This adjustment, the factor of 2, is explicitly incorporated in Formula 28 of Edge and Rosenberg[15]. The rationale for this correction is illustrated in Figure 1:

A diagram of a triangle with black text AI-generated content may be incorrect.
Figure 1. Variance decomposition between populations and between diploid individuals

In diploids, the additive genetic variance within populations (VAw, ithin) reflects differences between individuals, as shown in Figure 1’s “Variance between individuals within populations.” The factor of 2 in the QST equation’s denominator (VGa, mong ÷ [VGa, mong + 2VAw, ithin]) accounts for roughly half the genetic variance among diploids being within individuals, not between them, due to two homologous chromosomes[16]. VAw, ithin scales with the additive effects of two alleles per locus, doubling the within-population variance contribution relative to a haploid. Figure 1’s “Variance between chromosomes within individuals,” part of “Variance within individuals,” captures this within-individual variance, which is not part of VAw, ithin. Thus, the factor of 2 in QST adjusts for this by ensuring proper scaling of VAw, ithin for diploids.

1.3. Roseman & Bird[4]

Roseman & Bird[4] state: "We wish to lay particular emphasis on the following point: Under the neutral additive expectation … the expected difference between two lineages as sampled randomly after evolving under random genetic drift is 0." If this is intended to suggest that the expected difference in a specific trait under drift between two populations is 0, this interpretation is inconsistent with established theory, as also discussed in our commentary on Lala & Feldman[1].

Despite this statement, Roseman & Bird[4] compute σ2B correctly. However, they nonetheless conclude that "unless there is pronounced natural selection acting to differentiate pairs of groups, large pairwise differences between groups would rarely occur" and that "The only way that large amounts of evolutionary divergence among groups in IQ could be reconciled with the FST values estimated using neutrally evolving polymorphisms is if strong natural selection had acted to make the groups diverge from one another."

They base this conclusion on their computed "expected absolute difference between groups". For this statistic, they reference equation 4 in Bird[17] and provide a formula in their appendix 2:

E[|Δi,j|] =2σBπ = 2σ2B1.772

​​​where σ2B ​represents the between-group additive genetic variance.

I refer to this statistic as Bird’s b. Interpretative claims about the magnitude of differences (e.g., “large pairwise differences”), the use of a related metric in Bird[17], and the stated goal of evaluating claims in the “hereditarian race science literature” where the focus “is on understanding the absolute number of, say, IQ points” strongly suggest that Bird’s b is treated as equivalent to Cohen’s d, a standard effect size with established benchmarks for classifying group differences as “large” (i.e., d ≥ 0.8). However, Bird’s b is not equivalent to Cohen’s d. The former includes a constant denominator, while the latter accounts for within-group variance, leading to a clear discrepancy. This difference can be demonstrated straightforwardly. The formula for Cohen’s d, assuming equal variances within groups, is:

d=M1M2σWpooled = =M1M2σ2Wpooled

Under the law of total variance, the total variance \(\sigma_{total}^{2}\\)is the sum of within-group variance σ2W and between-group variance σ2B. The variance between groups can then be expressed as:

σ2between=n1(M1M)2 n2(M2M)2n1+n21

Noting that

M1M=M1 n1M1 n2M2n1+n2 and M2M=M1 n1M1 n2M2n1+n2

Assuming equal sample sizes, we can substitute and rearrange the terms as:

M1M2= σ2B4(2n1n)

Under the condition of large sample sizes 2n1n  1, and we get:

M1M2= 2σ2B

This result corresponds to the numerator, but not denominator, in Bird’s equation for Bird’s b. When variances are the same, we obtain:

d  2σ2Bσ2=2η21η2,

which is equivalent to converting Cohen’s d from eta-squared.

It’s evident that the denominator in Bird’s b differs from that in the usual Cohen’s d formula. Bird’s b divides by a constant, not by within-group variance. As a result, Bird’s b underestimates standardized differences in proportion to between-group variance. For example, at σ2B =0.5, Bird’s b is 0.8, while Cohen’s d is 2; at σ2B =0.9, Bird’s b is 1.07, while Cohen’s d is 6. Thus, at best, Bird’s b is an idiosyncratic metric, not equivalent to those commonly used in the social and biomedical sciences.

To illustrate, Table 1 presents the expected differences as expressed in Bird’s b and in Cohen’s d across various FST values and heritabilities. Notably, Roseman & Bird’s[4] “very small absolute differences” in Bird’s b turn out to be medium to large sized differences in Cohen’s d.

Given typically reported kinship-based heritabilities (e.g., Polderman et al.[18]) and typical SNP-based FST values between major human populations, we should anticipate medium to large-sized differences in arbitrary traits under neutral divergence. For example, Roseman & Bird[4] adopt a FST = 0.12 for SNPs underlying educational and intelligence-related traits based on the results of Bird[17]. When h2 =.35 /.50, Formula 3 yields σ2AB = 0.095 /.136, which, with Formula 10, yields d = .650 /.795 – a predicted moderate / large-sized effect owing to exclusively additive genetic differences. While factors discussed below might lead us to not expect such pronounced behavioral differences, this magnitude of divergence is consistent with many anthropometric traits.

FSTh2σ2ABCohen’s dBird’s bInterpretation
0.050.050.00530.150.08Small
0.050.200.02110.290.16Small to Medium
0.050.350.03680.390.22Small to Medium
0.050.500.05260.470.26Medium
0.050.650.06840.540.30Medium
0.100.050.01110.210.12Small
0.100.200.04440.430.24Small to Medium
0.100.350.07780.580.31Medium
0.100.500.11110.710.38Medium to Large
0.100.650.14440.820.43Large
0.150.050.01760.270.15Small to Medium
0.150.200.07060.550.30Medium
0.150.350.12350.750.40Medium to Large
0.150.500.17650.930.47Large
0.150.650.22941.090.54Large
0.200.050.02500.320.18Small to Medium
0.200.200.10000.670.36Medium to Large
0.200.350.17500.920.47Large
0.200.500.25001.150.56Large
0.200.650.32501.390.64Large
0.250.050.03330.370.21Small to Medium
0.250.200.13330.780.41Medium to Large
0.250.350.23331.100.55Large
0.250.500.33331.410.65Large
0.250.650.43331.750.74Large
Table 1. Relation between FST, h2σ2AB, Cohen’s d, and Bird’s b, along with the typical interpretation of the effect sizes

2. Conclusion

There is a tendency among some biologists to downplay the magnitude of genetic differences between human populations, which may reflect an effort to avoid conclusions similar to those the young Franz Boas drew:

It does not seem probable that the minds of races which show variations in their anatomical structure should act in exactly the same manner. Differences of structure must be accompanied by differences of function, physiological as well as psychological; and, as we found clear evidence of difference in structure between the races, so we must anticipate that differences in mental characteristics will be found.[19].

Lala & Feldman[1], Gusev[2][3], and Roseman and Bird[4] are notable in this context for their theoretical quantitative claims, which extend beyond carefully worded qualitative statements. Lala & Feldman[1] assert that differences in polygenic traits under genetic drift cannot be substantial. Gusev[2][3] acknowledges the widely accepted relationship between expected phenotypic variance and genetic variance but fails to account for the necessary adjustment for diploidy. While Roseman and Bird[4] do incorporate this adjustment, they introduce an effect-size-like statistic — denoted here as Bird’s b — that does not align with commonly used metrics in the social and biomedical sciences, where established interpretive guidelines are available.

While these authors focus on academic achievement and intelligence, the argument is more general: medium to large differences between human ancestry groups can arise without pronounced natural selection. The evolutionary default is not zero phenotypic difference but rather differences proportional to neutral genetic divergence, adjusted for ploidy.

Although this commentary emphasizes statistical expectations under neutral genetic drift, it is acknowledged that complex behavioral traits like intelligence may exhibit smaller-than-expected differences due to factors such as low narrow-sense heritability or stabilizing/convergent selection across populations. For example, personality traits—unlike many anthropometric differences—often show only small differences between ancestry groups within the same country (e.g., Foldes et al.[20]).

Understanding these expected differences can illuminate the evolution and genetic structure of such traits. For instance, the consistently small personality differences, even between the most genetically distant ancestry groups, pose an intriguing question for future research—an area currently obscured by misunderstandings about the expected magnitude of genetic differences under neutrality.

References

  1. abcdefghiLala KN, Feldman MW (2024). "Genes, culture, and scientific racism". Proceedings of the National Academy of Sciences. 121(48): e2322874121.
  2. abcdefgGusev A (2024a). "No, intelligence is not like height". [PDF file]. Retrieved from https://theinfinitesimal.substack.com/p/no-intelligence-is-not-like-height
  3. abcdefghGusev A (2024b). "A molecular genetics perspective on the heritability of human behavior and group differences". [PDF file]. Retrieved from http://gusevlab.org/projects/hsq/hsq.pdf
  4. abcdefghiRoseman CC, Bird KA (2023). "Between group heritability and the status of hereditarianism as an evolutionary science". BioRxiv. 2023-12.
  5. ^Bhatia G, Patterson N, Sankararaman S, Price AL (2013). "Estimating and interpreting FST: the impact of rare variants". Genome Research. 23(9): 1514-1521.
  6. ^Reich D (2018). Who we are and how we got here: Ancient DNA and the new science of the human past. New York: Pantheon Books.
  7. abLeinonen T, McCairns RS, O'hara RB, Merilä J (2013). "QST–FST comparisons: evolutionary and ecological insights from genomic heterogeneity". Nature Reviews Genetics. 14(3): 179-190.
  8. ^Whitlock MC, Guillaume F (2009). "Testing for spatially divergent selection: comparing QST to FST". Genetics. 183(3): 1055-1063.
  9. abMeirmans PG, Hedrick PW (2011). "Assessing population structure: FST and related measures". Molecular Ecology Resources. 11(1): 5-18.
  10. ^Alcala N, Rosenberg NA (2017). "Mathematical constraints on FST: biallelic markers in arbitrarily many populations". Genetics. 206(3): 1581-1600.
  11. ^Edelaar PIM, Burraco P, Gomez‐Mestre IVAN (2011). "Comparisons between QST and FST—how wrong have we been?". Molecular Ecology. 20(23): 4830-4839.
  12. abYair S, Coop G (2022). "Population differentiation of polygenic score predictions under stabilizing selection". Philosophical Transactions of the Royal Society B. 377(1852): 20200416.
  13. ^Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. (2009). "Genetic structure of Europeans: a view from the North–East". PloS One. 4(5): e5472.
  14. abEdge MD, Rosenberg NA (2015a). "Implications of the apportionment of human genetic diversity for the apportionment of human phenotypic diversity". Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences. 52: 32-45.
  15. abcdEdge MD, Rosenberg NA (2015b). "A general model of the relationship between the apportionment of human genetic diversity and the apportionment of human phenotypic diversity". Human Biology. 87(4): 313.
  16. abWhitlock MC (2008). "Evolutionary inference from QST". Molecular Ecology. 17(8): 1885-1896.
  17. abcBird KA (2021). "No support for the hereditarian hypothesis of the Black–White achievement gap using polygenic scores and tests for divergent selection". American Journal of Physical Anthropology. 175(2): 465-476.
  18. ^Polderman TJ, Benyamin B, De Leeuw CA, Sullivan PF, Van Bochoven A, Visscher PM, Posthuma D (2015). "Meta-analysis of the heritability of human traits based on fifty years of twin studies". Nature Genetics. 47(7): 702-709.
  19. ^Boas F (2023). The mind of primitive man: A course of lectures delivered before the Lowell Institute, Boston, Mass., and the National University of Mexico, 1910-1911 [eBook]. Project Gutenberg. (Original work published 1911). https://www.gutenberg.org/ebooks/71630
  20. ^Foldes HJ, Duehr EE, Ones DS (2008). "Group differences in personality: Meta‐analyses comparing five US racial groups". Personnel Psychology. 61(3): 579-616.

Open Peer Review