June 2, 2013

Not more than 2% of educational attainment can be attributed to genes

Correction: a reader indicates that ~2% is the amount of influence from the addition of many SNPs (often with a tiny estimated impact each), while ~0.2% is the amount of influence estimated for each of the three most influential SNPs, not together. I stand corrected (but still a very small influence). 

While the authors express themselves hopeful that this proportion will increase in the future, so far only three SNPs have been found with a clear correlation to educational attainment, representing 0.2% of all genetic influence. When they consider a linear polygenic score from all measured SNPs, they can't still measure more than an elusive 2% of putative genetic causes for these differences.

Cornelius A. Rietveld et al., GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment. Science 2013. Pay per viewLINK [doi:10.1126/science.1235488]


A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.

Seriously, if they could find no more than an elusive ~2% in more than 100,000 individuals... how do they expect to ever find any more?

Let's be honest: there is only very limited genetic influence on intelligence and cognitive or educational attainment, because, after all what genes do with the brain is to lay out the hardware, so to say, with all the software (but surely a basic emotional-instinctual ROM) being the product of environmental interaction. It's possible that there is minor variance in the hardware (genetics) and maybe even more in its initial configuration (basic epigenetics) but, the same that most desktop computers can do the same things, with less important variability, human brains can too (unless somehow damaged).


  1. The conclusion that missing heredity means no heredity isn't sound. Evidence from twin studies, etc. show IQ having at least 50% inheritance (probably more) and education being closely linked to IQ. The fact that we can't find a genotype to match, which is true across the board in cases of phenotypes known to be genetic, simply means that the genotype isn't a simple one.

    1. They are attempting GWAS on 126,000 individuals, what more can be done? A million, a billion? It won't make much of a difference because 126,000 people of diverse ancestry have nearly all human genetic variation already.

      Of course that there can be complex heritability that GWAS misses but MZ twins still share but that is not any single allele: it'd be a complex combo of many many alleles (not counting epigenetics, etc.)

      These things can be like complex mechanics: if you put the wrong pieces together, it doesn't work or works faultily, but if you have the rare luck of getting all the pieces that combine well, then it works extraordinarily. These ideal rare combos can be many different ones, not just one.

      That is within heritability but is not any single gene or group of discrete genes. This may also explain why moderate inbreeding has some small favorable effects, as it is the overall genotype (or rather its phenotype manifestation) what is selected for and not any particular allele (Nature does not cull/restrict genes but whole individuals).

      Anyhow the data I can find on IQ and twin studies says that:
      1. Unrelated children—Reared together .30
      2. Biological siblings—Reared together .47
      3. Fraternal twins—Reared together .55
      4. Identical twins—Reared together .86


      5. Unrelated children ~ .00
      6. Biological siblings—Reared apart .24
      7. Fraternal twins—Reared apart .35
      8. Identical twins—Reared apart .76

      We see that the environmental effect (according to these studies, which I have not reviewed in any depth) is:

      For unrelated children (1-5): 30%
      For biological siblings (2-6): 23%
      For DZ twins (3-7): 20%
      For MZ twins (4-8): 10%

      So it seems that there is not just one absolute heritability: near 0% for unrelated children, near 50% for regular siblings, near 63% for DZ twins (who paradoxically are as related genetically as the previous category) and 86% for MZ twins.

      This does not make much sense because it actually suggests that the IQ heritability is more intense in unrelated siblings than in MZ twins, as the former only share on average 50% of the genome, while the latter share 100%. In fact it seems to suggest that the highest heritability relative to shared genome corresponds to DZ twins (.63x2=126%), followed by regular siblings (.50x2=100%), and only then by twins (86%, no correction needed).

      How do you explain that? What's wrong here? Is that because of shared genome between people regardless of known relatedness? If so, why do DZ twins seem to reach over the limit of 100% while MZ twins do not?

      I'm very confused by twin studies' results: the basic results seem to follow a heritability logic pattern but when we apply the correction of actually shared genome, we get weird results.

  2. "They are attempting GWAS on 126,000 individuals, what more can be done? A million, a billion? It won't make much of a difference because 126,000 people of diverse ancestry have nearly all human genetic variation already."

    You misunderstand. Heritability is the proportion of the trait variance explained by genes. Kinship estimates typically give an adult intelligence/education heritability of 0.8/0.4. Genome-wide Complex Trait Analysis gives a somewhat smaller estimate -- about 0.6/0.2 -- indicating that not all of the genetic variance is captured by common genes. Both of these methods estimate total genetic influence based on genetic and phenotypic similarity. They don't estimate the influence of specific genes. The study cited, on the other hand, does. It found that a list of specific variants could explain 2% of educational attainment and 2.5% of cognitive ability. For comparison, only 10% of the variance in height can be explained specifically, while 90% is still attributable to genes based on both kinship and GCTA analyses. Why only 2-2.5%? Because each gene has a small effect and, unless the sample is large enough, the effect of the specific gene will not meet the significance threshold. So, for example, previous studies n ~ 10,000 have been able to explain, in terms of aggregated specific genetic effects, precisely zero percent of the variance. To answer your question, based on the authors' discussion, for EA, 1 million sounds about right. As the authors note, only 20% of the variance in EA is captured by genes of common effect in aggregates. (Quote: "An asymptotic upper bound for the explanatory power of a linear polygenic score is the additive genetic variance across individuals captured by current SNP microarrays. Using combined data from STR and QIMR, we estimate that this upper bound is 22.4% (S.E. = 4.2%) in these samples (5) (table S12).") So, they were able to explain 10% of that with specific loci.

    1. Specific named variants in this study are three SNPs, accounting for just 0.2% of the phenotype measured as educational attainment. Statistical inference finds only 2%: "linear polygenic score from all measured SNPs accounts for ≈ 2%". The 'linear polygenic score' are not specific alleles but an statistical estimate of interactive whole genome effects, so with this study in hand only 2% of educational attainment can be attributed to genes as a whole.

      Another thing is that it might be an underestimate, as the authors and yourself suggest, hope.

      I have to read more on twin studies of heritability but I'm feeling that there's something that is not working properly in them.

      "As the authors note, only 20% of the variance in EA is captured by genes of common effect in aggregates".

      Maybe but this would have to be properly demonstrated. There's way too much appeal to rare genetic variants as of now, and an 80% attribution seems very much excessive without any clear evidence. After all we humans are way too similar among us genetically and even more at population levels, explaining nearly everything about "missing heritability" on rare variants (i.e. pertaining almost only to individuals or particular families) seems very far fetched, honestly.

      There must be other explanations: either complex intra-genome interactions that the current methods are unable to grasp or fundamental errors in the estimate of heritability based on twin-studies itself.

    2. "The 'linear polygenic score' are not specific alleles but an statistical estimate of interactive whole genome effects"

      You're still wrong. Reread the paper. The above is for all alleles that had a statistically significant effect, not for genotype in general as indexed by genetic/phenotypic similarity in the sense of Genome-wide Complex Trait Analysis. So, three "major" identified SNPs = .2% x 3. A bunch of minor identified SNPs = 2% -.6%. A linear polygenic score is a score derived from adding the independent effects of all alleles that meet some p-value.

      "After all we humans are way too similar among us genetically and even more at population levels,

      This is just silly. See: "Genome-wide association studies establish that human intelligence is highly heritable and polygenic." In this study, inter-individual genetic variance correlated with inter-individual phenotypic variance at SQRT(.40) to SQRT(.51). So, the genetic variance is there.

    3. "Reread the paper."

      Send me a free copy to lialdamiz[at]gmail.com. Thanks in advance.

      I could never read the paper because of the greed of the researchers or their academic institution, who published in a pay-per-view publication. I live in a third world country called European Union and I have no money to pay for luxuries, or even sometimes even for basic needs like a dental prosthesis.

      But I do appreciate your clarification. I was founding my read only on the abstract and the press release.

  3. Intelligence shows a robust genetic component in twin studies. The problem is that the GWAS methodology misses a lot of genetic variation and doesn't capture gene-gene interactions. It focuses on SNPs but often misses important information like copy number variation.

    The same is true in obesity research. Body mass index variation is ~60% inherited, but GWAS studies can only account for ~2% of it (despite sample sizes over a million now). These studies are still useful because they can point us to the biological processes that are important determinants of the phenotype in question. For example, obesity GWAS loci are disproportionately in genes that are involved in the brain regulation of adiposity, which is consistent with a variety of other lines of investigation. The GWAS findings are a nice confirmation that these pathways are important, since it's a semi-unbiased methodology, as opposed to the hypothesis-driven research that predominates in the obesity field.

    1. A problem with twin studies is that we don't know how much epigenetic effects are at play. There can be shared inheritable epigenetics (yes, "epigenes" can be inherited) and there can be shared non-inherited epigenetic effects in the womb as well (probably very important). I guess that the very important heredity differences between DZ twins and regular siblings (whose genetic relatedness is the same) must respond largely to in-womb epigenetics. These differences (~10 pps, amounting to 40% of all heredity in siblings reared apart) strongly suggest that a large part of heredity is non-heritable epigenetics (i.e. environmental). Part of the rest of heredity may well be also from heritable epigenetics but at this time we cannot really quantify it properly.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).