October 14, 2015

Neolithic genomes from Northwestern Turkey

Or yet another ancient European DNA study, with some quirks and, critically, the first ancient farmer sample from the Eastern Mediterranean, specifically Northwestern Anatolia, near Yenişehir (Bursa province).

Iain Mathieson et al. Eight thousand years of natural selection in Europe. BioRxiv (pre-pub), 2015. Freely accessibleLINK [doi: ]


The arrival of farming in Europe around 8,500 years ago necessitated adaptation to new environments, pathogens, diets, and social organizations. While indirect evidence of adaptation can be detected in patterns of genetic variation in present-day people, ancient DNA makes it possible to witness selection directly by analyzing samples from populations before, during and after adaptation events. Here we report the first genome-wide scan for selection using ancient DNA, capitalizing on the largest genome-wide dataset yet assembled: 230 West Eurasians dating to between 6500 and 1000 BCE, including 163 with newly reported data. The new samples include the first genome-wide data from the Anatolian Neolithic culture, who we show were members of the population that was the source of Europe's first farmers, and whose genetic material we extracted by focusing on the DNA-rich petrous bone. We identify genome-wide significant signatures of selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height.

As you can see from the title and the abstract, much of the study is focused on more or less debatable selection signatures. Interesting, of course, but not what my greatest interest, less so as I perceive that there is missing data that may be crucial for the understanding of some of such selection, notably the mainstream European LCT 13910-T allele.

How can you do an analysis of selection on this allele while ignoring the first known carriers of it: Chalcolithic (proto-)Basques and Swedes (Gökhem particularly)?

Luckily there are other highlights...

Northwestern Anatolian ancient genetics

The new ancient Neolithic samples come from two sites: Menteşe Höyük (n=5) and Barcın Höyük (n=21), both located in the Yenişehir plain, southeast of Istanbul across the Marmara Sea. The archaeological context of the samples, as well as that of many other European ones, resequenced for this study with new technology, is discussed in the Supplementary Information section.

These ancient Northwest Anatolian farmers have shown to be very similar to early European farmers. The authors estimate that these were only some 10% admixed with Paleoeuropeans, relative to the Anatolian samples, although later individuals from the West did of course had further Paleoeuropean admixture.

I must emphasize the adjective "Northwestern" because Anatolia Peninsula is a large territory where the Neolithic had differential implementation in time and cultures. Critically we cannot be certain that there is any identity between these Western Anatolian first farmers and those from South-Central Anatolia, for example those of the world-famous Çatal Höyük village. This is because the Neolithic of South-Central Anatolia is much older and there are archaeological indications that suggest that the settlement of Western Anatolia and Greece took place via coastal migration. The origin of this coastal migration probably involved Cyprus, which in turn was more directly related to the Neolithic of the Levant (PPNB) than to that of South-Central Anatolia. Some genetic data also seem to suggest that the precursors of early European farmers were from the Levant, rather than from further North. But of course the full resolution of this mystery will have to await for ancient DNA from the relevant regions, something that may be aided by the recent technological breakthroughs but that will also require peace, so geneticists and archaeologists can do their field work (there are of course many other much more excruciating reasons to hope for peace and normalization in West Asia, naturally, don't get me wrong).

In any case we finally have a reference genome for what can be termed the Aegean Neolithic and it seems it was even closer to European derivatives. We cannot anyhow discard that there was some backflow from Greece or other parts of the Balcans to Western Anatolia because there was indeed some interaction across the Aegean. However a much more clear cut cultural divide has been argued to exist between the cultures of the Marmara Sea and those of inland Thrace, so, if there was any such backflow, it probably happened before the expansion of Thessalian Neolithic northwards.

Principal Component Analysis

This is the Principal Component Analysis provided by this study (fig. 1B). The modern samples are in gray with no labeling whatsoever but I guess most readers will approximately identify them easily, as the basic layout has been repeated in so many recent aDNA studies:

Figure 1: Population relationships of samples. (...) B: Principal component analysis of 777 modern West Eurasian samples (grey), with 221 ancient samples projected onto the first two principal component axes and labeled by culture. Abbreviations: [E/M/L]N Early/Middle/Late Neolithic, LBK Linearbandkeramik, [E/W]HG Eastern/Western hunter-gatherer, [E]BA [Early] Bronze Age, IA Iron Age.

As in Olalde 2015 or Haak 2015, or even Lazaridis 2014, WHGs appear located rather "towards the South", unlike in some other PCAs, particularly Europe-only ones. I do find this to be interesting and potentially informative, at least while we await for Atlantic European ancient nuclear DNA.

Therefore, you'll forgive me for the redundancy of reusing the above image a couple of times in order to make a couple of points.

The thesis that most of these studies are pushing for is a simplistic triangular scenario for the formation of modern Europeans with a formula that can be described as {x.EEF+y.WHG+z.Kurgan}. I don't deny that this is quite approximative but I am also quite certain that it is missing important clues. In fact, the triangular thesis seems to fail to explain most Northern European genetic makeup, while the origins of Basques also remain somewhat unexplained by it. Let's see:

Annotations on the PCA: triangular thesis fails, extra HG (EHG?) is needed.

It would seem quite apparent that the triangular thesis (described on the PCA by a slashed line) fails to explain most of Western and Northern European genetic makeup, which is clearly much more deviated towards Paleoeuropean hunter-gatherers than it allows.

Just including on the equation Eastern European hunter-gatherers (dotted line) would be enough to solve most of the problem, although it does of course arises other questions about how and when this extra Paleoeuropean blood was incorporated.

This solution would still leave Basques outside of it. It requires instead of a Western hunter-gatherer extra admixture on top of a simple Neolithic cluster basis:

Annotations on the PCA: Basques can be explained (?) as simple {Neo-European + WHG admixture}

Of course that the actual sources of Paleoeuropean admixture can be more complex, as suggested by some studies, like Günther & Valdiosera 2015, who claimed Scandinavian HG admixture not just in Gökhem farmers but also in Ötzi ("Iceman" in the above graph). These did not use EHG samples but in any case, if correct, it is a pre-Kurgan admixture from the Northeast of the subcontinent.

A key excerpt from the Supplementary Information 2 section that someone (Simon, I think) used to argue for steppe ancestry in Basques in a discussion at Eurogenes blog:
The Iberian Chalcolithic population lacks steppe ancestry, but Late Neolithic central and northern Europeans have substantial such ancestry (Extended Data Fig. 3E) suggesting that the spread of ANE/steppe ancestry did not occur simultaneously across Europe. All presentday Europeans have less steppe ancestry than the Corded Ware5, suggesting that this ancestry was diluted as the earliest descendants of the steppe migrants admixed with local populations. However, the statistic f4(Basque, Iberia_Chalcolithic; Yamnaya_Samara,Chimp)=0.00168 is significantly positive (Z=8.1), as is the statistic f4(Spanish, Iberia_Chalcolithic; Yamnaya_Samara, Chimp)=0.00092 (Z=4.6). This indicates that steppe ancestry occurs in present-day southwestern European populations, and that even the Basques cannot be considered as mixtures of early farmers and hunter-gatherers without it4.

What does this say in fact? It says nothing about Western Hunter-Gatherers, only that Basques appear as more Yamna-like than the Iberian Chalcolithic sample. I see no reason why this cannot be caused by simple extra WHG admixture, although it can also imply other Paleoeuropean such as SHG or EHG inflow. What I do see from other studies (and again for all I can discern in this one) is that Basques do lack any clear Yamna signature and notably their Caucasus or Northern West Asian component (always present where Kurgan admixture is unmistakable and therefore a clear indicator of it) is effectively zero (some individuals may have tiny non-zero such component, all very normal).

Admixture analysis with two and three source populations

The authors find that, while many populations can be modeled as product of simple two-way admixture, many need a three-way model, notably from the late Chalcolithic onwards:

Extended Data Figure 2: Early isolation and later admixture between farmers and steppe populations. A [actually B]: Mainland European populations later than 3000 BCE are better modeled with steppe ancestry as a 3rd ancestral population. B [actually A]: Later (post-Poltavka) steppe populations are better modeled with Anatolian Neolithic as a 3rd ancestral population. C: Estimated mixture proportions of mainland European populations without steppe ancestry. D: Estimated mixture proportions of Eurasian steppe populations without Anatolian Neolithic ancestry. E: Estimated mixture proportions of later populations with both steppe and Anatolian Neolithic ancestry. [F is below]

However this varies, because notably the Iberian Chalcolithic sample can still be modeled as a two-way admixture, what is in conformity with the consideration that the increase in the complexity took place not in any single event but rather first in Central and Eastern Europe and only later further West. This is in full conformity with the Kurgan model of Indoeuropean expansion, although it may require some refinement here and there.

For example it is becoming quite obvious that there was not only a westward movement of Eastern European populations but also a subsequent eastward backflow of the resulting admixed Central European ones. This is discussed in the supplementary materials, from page 43 onwards.

Extended Data Figure 2: (...) F: ADMIXTURE plot at k=17 showing population differences over time and space.
(click to expand)

To this I must add my conviction that the triangular model is not enough to actually explain modern European genetics and that greater Paleoeuropean genetic input in Northern and Western Europe is required as well. The great challenge in this regard is to sample Atlantic (and Baltic) Europe properly and extract whatever consequences that ancient genomes from these areas may provide. 

Naturally there is also some other research to be done in West Asia, where a good deal of the European (and also West Asian, naturally) ancestors lived once upon a time. That is the other major challenge. In this sense this study must be commended for its breakthrough in sampling ancient Northwestern Anatolians, which is a step in the right direction.

There are other blank zones to be researched as well in Southern Europe (Italy, Balcans) that may well provide complementary information.

Alleged selection

The authors claim to have found evidence for selection in twelve different alleles. I remain mildly skeptic because it is hard to judge if this was all selection or founder effect was involved as well. 

Some of the alleged targets of selection are:

Lactase persistance: rs4988235, also known as 13910-T, already mentioned above. The authors mention that the allele’s earliest appearance in our data is in a central European Bell Beaker sample (individual I0112) that lived between approximately 2300 and 2200 BCE. Older signals from the Chalcolithic Basque Country (fixated in a subpopulation) and Sweden are totally ignored. Of course it is a draft so far but it is clear that key information, widely available, is being ignored.

A light skin allele known as rs16891982 (in the gene SLC45A2). This allele was low in the studied ancient populations (but again might have been higher in the blank under-researched areas, I can't say). Unlike it, the derived allele of gene SLC24A5, was fixated in Neolithic NW Anatolians, as well as derived European ancient populations, being a clear case of founder effect (although it may have also helped with adaption to the low vitamin D diet caused by transition to agriculture). There are other pigmentation genes that may have been selected in complex interaction, as they seem to be partly correlated with latitude and are hard to explain based on ancient populations alone.

An important datum here is that: unlike closely related western hunter-gatherers, the Motala samples have predominantly derived pigmentation alleles at SLC45A2 and SLC24A5. So... is there another source of these light skin alleles (there are others and much is unknown anyhow) that is not from Neolithic farmers?

Another selection target is in the TLR1-TLR6-TLR10 gene cluster, which seems related to resistance to mycobacteria such as those causing leprosy, tuberculosis, etc. Regarding this complex cluster, I rather quote:
The strongest signal is at rs2269424 near the genes PPT2 and EGFL8 but there are at least six other apparently independent signals in the MHC (Extended Data Fig. 3); and the entire region is significantly more associated than the genome-wide average (residual inflation of 2.07 in the region on chromosome 6 between 29-34 Mb after genome-wide genomic control correction). This could be the result of multiple sweeps, balancing selection, or background selection in this gene-rich region.

The EDAR gene, related to tooth morphology (remember Pippi?) and hair thickness, as well denser sweat and mammary glands, in East Asians is also listed. Curiously enough, half of the Motala individuals (Epipaleolithic Sweden), carried the derived allele of rs3827760. Modern Scandinavians often have this derived allele, although the authors believe that it is because of more recent admixture:
The EDAR derived allele is largely absent in present-day Europe except in Scandinavia, plausibly due to Siberian movements into the region millennia after the date of the Motala samples.

Uh, really? How can you be so sure? I am very skeptic here again and would rather suspect a more complex pattern of partial Paleolithic (or at least Epipaleolithic) continuity, which may indeed have been brought from East Asia with the proto-Uralic migrations (or whatever). 

Another trait for which the authors claim selection is what they call "genetic height", i.e. height not measured from the actual individuals but from alleles that are believed to influence it. They argue for selection for lower height in Neolithic and Chalcolithic Iberia and for greater height in the Steppe instead, both being corrected to greater height in modern populations. Without objective measures to control for the assumed "genetic height", among other reasons, I find the whole story a bit hard to believe but who knows?

The production of this entry took a whole 8 hours working journey, including a half hour break but not preliminary reading and related discussions. If you liked it and have some coin to spare, consider donating. Thank you.


  1. This comment has been removed by the author.

  2. This comment has been removed by the author.

  3. It would be interesting to use Ibero-Maurusian samples one day and compare it with other groups.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).