For what they were... we are: population genetics

Showing posts with label population genetics. Show all posts

October 11, 2018

Major Guanche genetic influence in Puerto Ricans (guest article by Thierno)

Guest article by Thierno

A discussion on a study on Caribbean autosomal ancestry from 2013 by Andrés Moreno Estrada et al., "Reconstructing the Population Genetic History of the Caribbean,” was posted on this blog:

http://forwhattheywereweare.blogspot.com/2013/06/caribbean-autosomal-ancestry.html

There were two important elements of information to consider from said post.

1) The ADMIXTURE graphs displayed a "black" component, largely found in Caribbean Admixed Latinos but only poorly represented in South Europe, which suggested a "recent" founder effect some 500 years ago. [Note: "black" here refers to color coding of an autosomal component in Moreno 2013, not to Tropical African ancestry, that was color-coded as "green", please follow the link above for more details].

2) An interesting and informative discussion between a Puerto Rican named Charles, in search of his ancestry, and another blogger named Maju shed light on the little-known historical contribution of Canarian aboriginal Guanches (Berber) to the colonization of America. It is often referred to as the "Tributo de Sangre" (Blood Tribute).

https://es.wikipedia.org/wiki/Tributo_de_sangre

They concluded that the "black" component which was displayed on the ADMIXTURE graphs of the study most likely had a North African origin, by way of Canarian aboriginal Guanche ancestry.

K=4

This graph represents the stacked bar-plot of an unsupervised ADMIXTURE exercise which is aimed at studying the complex and intricate ancestral components of Puerto Ricans from Puerto Rico, based on samples that were collected from the 1000 Genomes panel.

The choice of populations that are represented in these ADMIXTURE graphs was firstly made to account for the major, historically known contributors to the Puerto Rican population: Iberians, indigenous Caribbeans, and former African slaves who are, respectively, represented by the "Maya" and "Yoruba" samples.

Secondly, the presence of the merged North African samples in the dataset of these ADMIXTURE graphs serves as a formal test of comparison with the Iberian population in order to verify the aforementioned hypothesis.

The graph for K=4 clearly shows the "light-blue" component, represented in the Puerto Rican (PUR) samples, in addition to their Iberian (red), “Maya-like” (green), and “Yoruba-like” (purple) contributions.

The "light-blue" component is largely restricted to the North African population and also mostly found in the Saharawi samples, making it a "Saharawi-like" component. In other words, it is the identifiable North African component of this ADMIXTURE exercise.

This finding contrasts with the typically much lower North African scores of Hispanic Caribbeans that are reported in commercial autosomal DNA tests. I suspect that the use of Mozabite samples as proxies for North African may conflate their Berber ancestral component with the Iberian ancestral side of their complex genetic makeup.

I included Canarian samples because they still display a minor distinct variation of North African admixture relative to Iberians, although it is important to keep in mind that individuals from those samples, as well as present-day Canarians, are more similar to Iberians from an autosomal genetic standpoint. Moreover, studies that were done on Canarian autosomal DNA have shown disparities in the amount of Guanche (Berber) admixture among individuals who are located in different Islands of the archipelago. Canarians from La Gomera seem to have retained the most Guanche ancestry.

Maju had a blog post about a paper on the estimates of Guanche or Berber genetic influence of Canarians here:

https://forwhattheywereweare.blogspot.com/2011/04/canarians-nw-africans-iberians-etc-from.html

Hypotheses made in the recent past about a possible genetic link between Canarian aboriginal Guanches and Puerto Ricans, on the basis of the unknown role that the Canary Islands have played in the colonization of the Americas, are supported in these unsupervised ADMIXTURE runs. Hypothetically, they could have similar implications for some Admixed Latinos and specific Caribbean communities, but most notably for Hispanic Caribbeans.

Reasons for investigating this issue

I am a person of Fula descent. I wasn't predisposed to experiment on this issue, in the sense that I have a different ethnic history. With the help of the software ADMIXTURE, I decided to use my autosomal data and compare it with publicly available datasets, which include populations that are compatible with my genetic history. In addition to my Fula-specific and West African ancestral components, which were similarly detected in the populations studied by Henn et al. in 2012, I also scored a North African percentage.

I had noticed that my data matched up considerably with "New World" Afro-descendants but also, very intriguingly, with a large number of Hispanic Caribbeans.

At first, I attributed it to the fact that West Africa was a region from which slaves were sent to the Americas.

However, when I tried to identify what specific ancestral components I share with some of those Hispanic Caribbean matches, a common restricted Northwest African ancestry seemed to emerge as a pattern with several of them. After reading the blog-post of Maju on Caribbean autosomal ancestry - several years after he posted it - and the possible Northwest African hypothesis of Hispanic Caribbeans, I figured I would try to verify it and maybe, at the same time, manage to elucidate some of my questions.

K=5

[Note (update Dec-31-2020): the sharing of this very drifted (PUR) component between the complex admixed Puerto Rican samples and my sample is difficult to interpret precisely, and from a historical standpoint, as the Lawson et al. paper makes very clear (please see the last update from 2 years ago). Comments below are clues for follow-up research.]

The graph for K=5 indicates a green specific homogenization of the Puerto Rican samples (PUR) in comparison to the other populations, which suggests a recent founder effect that most likely took place over the past few centuries.

Very intriguingly, my North African component is replaced by this PUR specific component instead of the yellow North Africa. It suggests that the "Guanche-Berber" side of the Puerto Ricans overlaps with my Northwest African component.

I would say that it indicates some complex genetic links between the Guanches and, possibly, other Northwest African populations.

I hope that these unsupervised ADMIXTURE exercises can be of help to those interested in the autosomal genetic links between Hispanic Caribbeans and Canarian aboriginal Guanches.

Thierno

Appendix

I used publicly available datasets to perform these ADMIXTURE exercises.

The first one contains a combined dataset of populations from both the 1000 genome project and HGDP unrelated samples, for a total of 162,645 SNPs. It has been filtered and re-arranged by its contributors who are Peter Carbonetto and Amir Kermany.

It belongs to the Ancestry DNA workshop on Github.com.

All the repositories can be accessed here: https://github.com/Ancestry

It was publicly available until a year ago and was utilized during the Computational, Evolutionary and Human Genomics (CEHG) Symposium.

The PUR (Puerto Ricans in Puerto Rico), IBS (Iberians from Iberia), the Maya and The Yoruba samples were selected from this dataset.

The second dataset is from the Henn et al. study from 2012, “Genomic Ancestry of North Africans Supports Back-to-Africa Migrations.” It contains the North African samples that I used for the exercises. I merged them with the dataset that contains the PUR samples, and intersected 44,804 SNPs.

This is the link to access it: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/

The third dataset is from the Botigué et al. study from 2013, “Gene flow from North Africa contributes to differential human genetic diversity in Southern Europe.”

It has Spain_S (Andalucians), Spain_NW (Galacians), and Canary Islanders. I also intersected 44,804 SNPs with the first and the main datasets.

The link to access it is here: http://biologiaevolutiva.org/dcomas/north-african-affy-6-0-data-henn-et-al-submitted/

I used the software PLINK to update the physical and genetic positions of the SNPs from the second and third dataset, in order to properly merge them with the ones from the first dataset. I also made sure to merge only SNPs that were already found in the selected dataset (1000 genome and HGDP).

Lastly, I intersected my personal data with the dataset (1000 genome and HGDP), for a total of 161,764 SNPs.

The software ADMIXTURE was used to estimate ancestry.

R was used to plot the estimates.

Update (Oct 30th):

I would like to briefly elaborate on the sampling strategy. The first ADMIXTURE runs that I produced contained additional continental European populations, as well as other West Asian samples. The display showing the distinctive ADMIXTURE coded colors between North African and European samples of the dataset appeared at higher K values, with their respective higher standard errors of the cross-validation error estimate.

I had asked for Maju’s insight on admixture analyses in the past, as I was interested in how his posts on West African and Berber genetics related to my personal autosomal DNA. I did the same for this analysis.

I followed Maju’s recommendations to limit the selection of the reference population to be analyzed to just 4: Iberians, West Africans, Mayans, and Northwest Africans. This resulted in the clear and distinctive display of Berber and Iberian components, starting at K=4 which has a lower standard error. I later added Canary Islander samples.

Note: I have also been asked to replace Yoruba with Senegambian Mandinka samples to check for potential differences. This is something that I had already checked, but I didn't notice any difference in either the Berber percentage in Puerto Ricans or in their homogenization, which indicated a recent founder effect.

Thierno

Update March 14th 2019:

After this article was posted last October, I received a lot of interesting feedback on the admixture analyses and suggestions for different ancestral contributions of Hispanic Caribbeans, both in private messages and in the comment section of this post/both publicly and privately. In light of this, I would like to go over some aspects of the analysis again.

A note of caution in the interpretation of estimates

The estimates of the clusters from ADMIXTURE are not to be interpreted literally. The different ancestral k components are not “real” populations. They are designed to help identify differentiation between populations.

Both supervised and unsupervised analyses will produce FSTs between the designated populations or between the clusters. They serve to evaluate “approximately” possible genetic variations. In this type of analysis, as we can observe in the graphs contained in this post, moderate amounts of the components that are less divergent from each other overlap across populations which share lower FSTs. Considering that FSTs between North Africans and West Eurasians is usually around 0.06, there will inevitably be a shared overlapping effect. As a result, it isn’t possible to obtain a very precise delineation between North African and Iberian samples. So, essentially, this is an evaluation of variation and not an accurate system of measurement.

Intuitively, it seems that analyses which contain populations or clusters that are separated with higher FSTs will be more robust. It also seems that when FSTs fall below 0.05, the degree of differentiation in the displayed clusters is difficult to evaluate or make sense of. This may explain why analyses of intra-European/Mediterranean populations with FSTs that are around 0.01 are difficult to evaluate with ADMIXTURE. Other steps can be taken to mitigate the effects of linkage disequilibrium, as was the case for the dataset that was used for the analysis in this post.

ADMIXTURE works better for recently admixed groups who derive their ancestry from distinct populations. For obvious historical reasons, African Americans and Hispanic Americans have recent ancestries that most admixture analyses can detect fairly well.
Evidently, the total complexity and chaotic processes of ancient migrations which are not static, but rather dynamic cannot realistically be captured by ADMIXTURE. The complete reconstruction of such patterns on the basis of present-day populations would obviously be misleading.

Daniel J. Lawson, Lucy van Dorp and Daniel Falush wrote a paper called, “A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots” (2018) in which they warned against some of the pitfalls of admixture analyses.

While it’s not possible to make exact predictions from tools that are used in the field of population genetics, when interpreted correctly, some interesting information can still be extracted from various analyses.

Previous research on the possible Canarian legacy in America, including the examination of historical records, had been conducted prior to the publication of the Moreno et al. (2013) paper. With regard to the genetic affinity of the aboriginal inhabitants of the Canary Islands, a similar analysis was done more recently on the autosomal DNA of ancient Guanche samples that may have lived between the 7th and the 11th century and is discussed in a paper by Rodrı́guez-Varela et al. (2017). The authors conclude that a Northwest African-specific ancestry component makes up the majority of their autosomal ancestry, as well as other Berber populations from North Africa. Additionally, Y-DNA E1b-M81, which is found at high frequencies in Northwest Africa, was also detected in these samples.

In the study from Arauna et al. (2016) on how “Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa,” the authors expressed doubts about the use of Mozabite samples as the sole proxy for North African genetic diversity.

Considering that the paper from Moreno et al. didn’t have North African samples, the focus of this post was to explore potential variations by including Northwest African samples such as Moroccans and Saharawis.

Naturally, to exactly what extent inhabitants of the Canary Islands – whose gene pool could have already been affected by the DNA of Iberian settlers - may have impacted the genetic pool of Hispanic Caribbeans is a question which would require further and more diversified analyses.

mtDNA L(xM,N)

Several studies have reported mtDNA L(xM,N) among various Latin American communities. They strongly suggest recent African ancestry in the context of the recent colonization of the New World. The uncommon L(xM,N) lineages that have formed variant specific subclades which are not native to Africa but rather found in other continents or regions are extremely rare, it seems.

In 2012 Cerezo et al. published a paper on subject which is titled, “Reconstructing ancient mitochondrial DNA links between Africa and Europe.”

Another study, published by Ricardo Rodriguez-Varela and his colleagues, is called “Genomic Analyses of Pre-European Conquest Human Remains from the Canary Islands Reveal Close Affinity to Modern North Africans.”

More recently, a paper called “Mitogenomes illuminate the origin and migration patterns of the indigenous people of the Canary Islands” was published by Rosa Fregel with the mtDNA sequencing of 48 ancient individuals. Out of all of the L(xM,N) lineages that were analyzed, only the newly defined L3b1a12 was identified as a new Canarian-specific lineage.

It appears that European and Canarian autochthonous mtDNAL(xM,N) lineages form subclades which correspond to specific mutations that are less likely to be found in Africa.

In the case of Puerto Ricans, there was a project from National Geographic called “Genographic Project DNA Results Reveal Details of Puerto Rican History” (2014). After sampling 326 individuals from southeastern Puerto Rico and Vieques, they found that 80% of Puerto Rican men carry West Eurasian (or European) Y-DNA paternal lineages, while 60% of Puerto Ricans carry maternal lineages of Native American origin. This may shed some more light on the findings of Moreno et al., (2013), who wrote of the “Latin-European” component which seemed to indicate a founder effect.

In contrast, it would be interesting for future research to sample Hispanic Caribbean communities where African ancestry may have been retained in higher proportions and, in the process, collect more mtDNA and Y-DNA.

Thierno

January 1, 2017

Chad's Eurasian genetic input similar to that in Ethiopia

Quickies

Marc Haber et al. Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations. AJHG 2016. Open access → LINK [doi:10.1016/j.ajhg.2016.10.012]

Summary

Understanding human genetic diversity in Africa is important for interpreting the evolution of all humans, yet vast regions in Africa, such as Chad, remain genetically poorly investigated. Here, we use genotype data from 480 samples from Chad, the Near East, and southern Europe, as well as whole-genome sequencing from 19 of them, to show that many populations today derive their genomes from ancient African-Eurasian admixtures. We found evidence of early Eurasian backflow to Africa in people speaking the unclassified isolate Laal language in southern Chad and estimate from linkage-disequilibrium decay that this occurred 4,750–7,200 years ago. It brought to Africa a Y chromosome lineage (R1b-V88) whose closest relatives are widespread in present-day Eurasia; we estimate from sequence data that the Chad R1b-V88 Y chromosomes coalesced 5,700–7,300 years ago. This migration could thus have originated among Near Eastern farmers during the African Humid Period. We also found that the previously documented Eurasian backflow into Africa, which occurred ∼3,000 years ago and was thought to be mostly limited to East Africa, had a more westward impact affecting populations in northern Chad, such as the Toubou, who have 20%–30% Eurasian ancestry today. We observed a decline in heterozygosity in admixed Africans and found that the Eurasian admixture can bias inferences on their coalescent history and confound genetic signals from adaptation and archaic introgression.

Worth a read no doubt but careful, careful, careful with their chronological guesstimates. Their starting point is the assumption (once and again demonstrated all kinds of WRONG) of:

Eurasians and Africans diverged around 60,000–80,000 ya and subsequently had different patterns of population-size changes: in particular, compared with Africans, the Eurasian population experienced a sharp decrease in size ∼60,000 ya.

So add around 65-70% (x1.7) to all dates, else you are bound to fall in the pit of molecular-clock-o-logical self-complacent pseudoscience. So where it reads c. 6-7 Ka for the first migration (R1b-related), it should be 10,000 years ago (which is the actual dating of Afroasiatic expansion by most accounts, with origin not exactly in "Eurasia" but rather in or near Sudan, where those Eurasian lineages, R1b and J1, had since long before most likely), and when they say 3 Ka ago, it's probably 5000 years ago, within the context of Neolithic inflows possibly: 3000 years ago was already well into Ancient Egypt and peoples just did not cross it without proper paperwork anymore, actually 3000 years ago is the Bronze Age collapse and Egypt, Lower Egypt specifically, fell to Africans, to Libyans and other Berbers known as Meshwesh (Amazigh, probably from modern Tunisia) to be precise.

Dr. Haber: time to update your clock, it just doesn't work, and you are confusing people to no avail.

September 8, 2016

Genetic structure in South-Eastern Africa

Quickies

Another quite interesting paper on Khoesan and Southern African genetics:

Caitlin Uren et al., Fine-Scale Human Population Structure in Southern Africa Reflects Ecogeographic Boundaries. Genetics 2016. Freely accessible → LINK [doi:10.1534/genetics.116.187369]

Abstract

Recent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until ∼2000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about their population history. We examine new genome-wide polymorphism data and whole mitochondrial genomes for >100 South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share ancestry with other Khoe-speaking forager populations that form a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu speakers, ∼14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited genetic impact from eastern Africa.

Figure 2

Five spatially distinct ancestries indicate deep population structure in southern Africa. Using global ancestry proportions inferred from ADMIXTURE k = 10, we plot the mean ancestry for each population in southern Africa. The five most common ancestries in southern Africa, from the Affymetrix HumanOrigins data set, are shown separately in A–E. The x- and y-axes for each map correspond to latitude and longitude, respectively. Black dots represent the sampling location of populations in southern Africa. The third dimension in each map (depth of color) represents the mean ancestry proportion for each group for a given k ancestry, calculated from ADMIXTURE using unrelated individuals, and indicated in the color keys as 0–100% for five specific k ancestries. Surface plots of the ancestry proportions were interpolated across the African continent.

See also:

May 4, 2016

Large Paleoeuropean DNA survey

An unprecedented survey of ancient DNA from Paleolithic Europe has been just published:

Qiaomei Fu et al., The genetic history of Ice Age Europe. Nature 2016. Pay per view → LINK [doi:10.1038/nature17993]

The supplemental materials (PDF) are freely accessible, as are the figures and tables (HTML).

Quick highlights:

Oldest Y-DNA R1b1 (and therefore R1b and R1) ever documented (Villabruna, Veneto, 14 Ka ago, Epigravettian cultural context). Also more Japan and La Braña related C1!
Oldest mitochondrial DNA H (H7) may be in Gravettian Moravia, also oldest U6 may not be in Iberia or North Africa but in Gravettian Romania.
Very important insights in autosomal DNA: a distinct Paleoeuropean population since Gravettian, two different late UP/Epipaleolithic populations.
Still very important gaps, notably SW France (the core of Paleolithic Europe) and most of Iberia. Also still missing West Asian sequences altogether, except for the rather anomalous Caucasus population and whatever may be inferred from Early European Farmers, whose ancestry was mostly (aprox. 3/4) West Asian.

A good synthesis of the scope and some of the findings of this study is in fig. 1:

(click to expand)

Y-DNA

The survey confirms (supp. materials 4) that haplogroup I used to be the most common patrilineage in Paleolithic Europe. But it was not the only one:

The oldest ones (pre-Villabruna, c. 14 Ka BP) were largely C1:

Kostenki 14 (Russia, Gravettian): C1b
Goyet Q116-1 (France, Aurignacian): C1a
Vestonice 16 (Moravia, Gravettian): C1a2

Also in this oldest group (arbitrarily defined as pre-Villabruna), there was some I* or maybe pre-I (some markers are missing in many individuals), including: Pavlov 1 (Gravettian, Moravia), Paglicci 133 (Gravettian, South Italy), Hohle Fels 49 (Magdalenian, Swabia), Goyet Q2 (Magdalenian, France) and Bukhardtshohle (Magdalenian, Swabia). Notice that its prevalence and clarity as "I proper" increases after the LGM; the Gravettian ones seem to be pre-I rather than true I.

Other oldest lineages are BT* (Vestonice 15), CT* (Ciclovina 1, Kostenki 12, Vestonice 13), F* (Vestonice 43). Notice that in most cases not all the ideal SNP testing was performed, so it is still possible and even probable, I'd think, that BT* and CT* are actually F*.

In the more recent "post-Villabruna" group:

The revelation of the group is of course Villabruna, which carried R1b1.

There are also two I* (Cuiry Les Chardaudres 1 and Berry Au Bac), one I2 (Rochedane) and one F* (Falkenstein).

I must also mention that previous studies found mostly I2 in Epipaleolithic samples, excepted La Braña, which carried C* (maybe some sort of C1 but unconfirmed). R1a1* was found in Karelia as well.

Synthesis: I and R1b1, the most common lineages of Europe West of the Elbe, only show up after the Last Glacial Maximum, at least as far as we know. I probably coalesced in the subcontinent, the issue of where R1b, the most common modern patrlineage of Western Europe, coalesced and how it expanded remains open but the Villabruna data point defines a terminus ante quem for this haplogroup, which MUST be older than 14,000 years necessarily, discarding some of the most outrageous recentist chronologies altogether. The great initial diversity of CT-derived lineages suffered bottlenecks with the LGM and probably also later, pruning most of them (although rare instances of some of those lines such as F* or C1 are still found among modern Europeans).

Mitochondrial DNA

Lots of interesting stuff in this issue of the matrilineages, but also some strange issues in the data that do raise eyebrows quite a bit. The full dataset is in the supplemental materials section 2.

However they do not provide clear data on how the tests were performed, just a generic listing. This is very problematic, notably when they state that El Mirón is U5b, when Hervella (with more clear methodology) classified her as H just a year ago. Another similar issue is the apparent H7 (H7a1?) in Vestonice 14, which is first classified as "damaged" (based apparently on X-chr contamination, the CI for H7 is 0.9-1) and then listed as "U" in the extended table 1, with no reasoning whatsoever for the change.

Rumor is already around about a mysterious H-hater "black hand" being at play here. I can't neither confirm nor reject it but I do think that the authors should explain themselves more clearly on this most important matter, which is beginning to be more than just annoying, fueling conspiracy theories and what-not.

Another interesting issue is a possible U6 in Muierii (Gravettian Romania, CI 0.88-0.97), labeled as "damaged" again and refurbished as mere amorphous "U". This is a very important issue and is directly related with the presence of mtDNA H in Paleolithic Europe and the origin of these lineages in North Africa.

Northwestern Africa (not counting Cyrenaica) did not experience any sort of Upper Paleolithic (UP) until c. 22 Ka BP, when a new culture of very likely Iberian Solutrean affinity, the Iberomaurusian or Oranian expanded from Taforalt (Arif, North Morocco). In my understanding this is the most likely origin of mtDNA H (H*, H1, H3, H4 and H7) in North Africa and maybe also of mtDNA V, and also should be related to the bicontinental distribution of mtDNA U6 (in North Africa but also and quite diversely in Iberia) and the surely related distribution of Y-DNA E1b-M81.

While it's easy to imagine mtDNA H (and maybe also V) migrating from Europe to North Africa in this context, less clear has been so far the issue of U6 origins: as U-derived lineage it must ultimately derive from the early UP populations of West Asia but then again the first UP in the region must have arrived from SW Europe in the Last Glacial Maximum (LGM) period. So something I've been wondering all this time, particularly since the crucial, rare and basal, U6c lineage was discovered to exist not just in Morocco but also in Andalusia, is if U6 actually arrived to NW Africa from Europe and not, as is often assumed, vice-versa.

So you will understand how this issue of properly identifying ancient mtDNA H and U6 lineages is important not only for the understanding of the roots of Europeans but also for those of North Africans. There are interests at play here because many geneticists have made a personal issue of "molecular clock" age estimates (whose actual scientific, empirical, value is often close to zero but are "sold" as "scientific" instead) and also of exaggerating the West Asian Neolithic influence in Europe beyond reason, leading to true quasi-ideological "DNA wars" that are totally out of place.

Please, let's be serious: there is no room for childish games on these matters, you guys and gals are grown ups with a PhD!

Otherwise a lot of U (as usual: U*, U5, U2), notable is U8c (CI 0.91-1 but declared "damaged" in spite of extremely low X-chr contamination), which, if confirmed, could offer clues about the origins of the rare Italo-Jordanian U8c (and indirectly about Basque U8a and the quite common but surely Neolithic haplogroup K). Also discarded are several samples that initially produced lineages under macro-haplogroup M, however Goyet Q116-1 was labeled as "pass" with this lineage. So there is Paleoeuropean M, or at least there was once upon a time, this one beyond any doubt.

Autosomal DNA

This last part is most interesting as well. As you can see in the figure 1 above, the authors described three Paleoeuropean clusters: blue (aka Vestonice), green (aka El Mirón, however El Mirón is actually green-red admixed) and red (aka Villabruna, equivalent to the WHG grouping seen in some recent studies). Black-marked samples are out of any group and the Siberian (Mal'ta) and Caucasus (Satsurbilia) clusters are not too relevant here.

Annotated by me: in green approx. dates for reference, in gray approx. reconstruction of the ancestry of late Paleoeuropeans

First of all it is clear that all or most Paleoeuropeans form a unique macro-cluster (orange shaded) to the exclusion of the Mal'ta and Satsurbilia clusters and also of Early Neolithic Stuttgart (~3/4 West Asian). This macro-cluster is comparable in affinity to that of Han-Dai-Karitiana, so even the word "race" can be used. Some people have argued that "there was no Europe" back then, because the Bosporus was an isthmus, but from the genetic data it seems clear that Europe was more distinctive then than it is now, after the Neolithic massive admixture event that spanned from Europe to India with West Asian centrality.

Then we see an older "Gravettian" or blue or Vestonice cluster, that is clearly pre-LGM and that does not include however peripheral Gravettians such as Mal'ta, Kostenki or Goyet Q53-1.

But the most interesting feature is that two different populations existed at the end of the Paleolithic period: the green one (El Mirón) is strictly Magdalenian and vanishes with the Epipaleolithic (at least for this sample, which has mayor gaps), instead the red one (Villabruna or WHG) was initially less common in Magdalenian and spans beyond its cultural borders into Epigravettian Italy too, however it becomes the only thing around in the Epipaleolithic, suggesting the expansion of a single population in that late period, maybe with the geometric microlithism which precedes in most areas the arrival of Neolithic and may well have expanded from France.

Looking at the orange range of less obvious affinities, I tried to pinpoint tentative origins for those two populations. The green one relates best with GoyetQ116-1 (Aurignacian), while the red one does with GoyetQ53-1 (Gravettian). This is also somewhat apparent in the PCA and I tried to indicate it with the annotated arrows.

Especial thanks for his insights to Jean Lohizun.

March 16, 2016

South Asian autosomal structure

A recent study finds "five" components, although in practice they can be reduced to three.

Analabha Basu et al., Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. PNAS 2015. Freely accessible → LINK [doi: 10.1073/pnas.1513197113]

Abstract

India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.

One of the components, very distant from the rest, is the Andamanese one (Jarawa, Onge), but the isolated islands are not really in South Asia, rather in SE Asia (south of Myanmar, belonging to India only because of historical accident), what reduces the structure of South Asia to what we can see in the following graph:

Fig. 2.

(A) Scatterplot of 331 individuals from 18 mainland Indian populations by the first two PCs extracted from genome-wide genotype data. Four distinct clines and clusters were noted; these are encircled using four colors. (B) Estimates of ancestral components of 331 individuals from 18 mainland Indian populations. A model with four ancestral components (K = 4) was the most parsimonious to explain the variation and similarities of the genome-wide genotype data on the 331 individuals. Each individual is represented by a vertical line partitioned into colored segments whose lengths are proportional to the contributions of the ancestral components to the genome of the individual. Population labels were added only after each individual’s ancestry had been estimated. We have used green and red to represent ANI and ASI ancestries; and cyan and blue with the inferred AAA and ATB ancestries. These colors correspond to the colors used to encircle clusters of individuals in A. (Also see SI Appendix, Figs. S2 and S3.)

It is quite apparent that the AAA (Ancient Austroasiatic) component behaves as the ASI (Ancient South Indian) one but with a tendency towards the ATB (Ancient Tibeto-Burman) one, strongly suggesting it is basically product of admixture and not a truly autonomous ancestral component.

This may be more apparent in the wider pan-Asian context:

Fig. 3.

Approximate “mirroring” of genes and geography. Genomic variation of individuals, represented by the first two PCs, sampled from 18 mainland Indians combined with the CS-Asians) and E-Asians from HGDP, compared with the map of the Indian subcontinent showing the approximate locations from which the individuals and populations were sampled.

In this wider mapping (would be even more clear if West Asian populations were included), we see that:

ANI (Ancient North Indian) strongly tends to the West. In other analyses it is very similar to the Caucasus modal component and therefore a logical conclusion is that we are before a Neolithic immigrant element, much as happens in Europe.
ATB (Ancient Tibeto-Burman) strongly tends to the East, more specifically SE Asia, and is therefore the reverse to ANI, although much less influential.
ASI (Ancient South Indian) is the true aboriginal (pre-Neolithic) component of India, better preserved in southern populations but more clinal than the sample choice allows us to perceive.
AAA (Ancient Austroasiatic) is very similar to ASI but has some SE Asian admixture, as is logical to expect, being Austroasiatic a SE Asian language of likely Neolithic expansiveness.

So ASI and AAA are basically the same thing and that's why I say that the "five" components can be simplified to just three. Said that, it is indeed possible that there is underlying complexity within the ASI+AAA component but this study does not help us to clarify that.

It is true that the K=4 (after exclusion of Andamanese, K=5 with them) fits the parsimony criterion best but the K=3 is also a good fit and shows AAA exactly as I describe them: largely ASI ("aboriginal") with a significant ATB (Eastern) component. The AAA component can therefore be perceived as consolidated, homogenized, ancient admixture. Prove me wrong on this and I'll eat my words.

Caste apartheid stopped genetic flow

Quite interestingly, the authors also dwell on how the admixture process was stopped by the Gupta laws (Middle Ages) that imposed apartheid (caste system) enforced endogamy and caused the now apparent genetic isolation of the multiple groups.

We have provided evidence that gene flow ended abruptly with the defining imposition of some social values and norms. The reign of the ardent Hindu Gupta rulers, known as the age of Vedic Brahminism, was marked by strictures laid down in Dharmaśāstra—the ancient compendium of moral laws and principles for religious duty and righteous conduct to be followed by a Hindu—and enforced through the powerful state machinery of a developing political economy (15). These strictures and enforcements resulted in a shift to endogamy. The evidence of more recent admixture among the Maratha (MRT) is in agreement with the known history of the post-Gupta Chalukya (543–753 CE) and the Rashtrakuta empires (753–982 CE) of western India, which established a clan of warriors (Kshatriyas) drawn from the local peasantry (15). In eastern and northeastern India, populations such as the West Bengal Brahmins (WBR) and the TB populations continued to admix until the emergence of the Buddhist Pala dynasty during the 8th to 12th centuries CE. The asymmetry of admixture, with ANI populations providing genomic inputs to tribal populations (AA, Dravidian tribe, and TB) but not vice versa, is consistent with elite dominance and patriarchy. Males from dominant populations, possibly upper castes, with high ANI component, mated outside of their caste, but their offspring were not allowed to be inducted into the caste. This phenomenon has been previously observed as asymmetry in homogeneity of mtDNA and heterogeneity of Y-chromosomal haplotypes in tribal populations of India (6) as well as the African Americans in United States (34). In this study, we noted that, although there are subtle sex-specific differences in admixture proportions, there are no major differences in inferences about population relationships and peopling whether X-chromosomal or autosomal data are used. We have also found our inferences to become more robust when our data are jointly analyzed with HGDP data.

I can't but find quite curious how, once again, Indian and European histories behave so similarly: in Europe also a simpler but also "god-sanctioned" caste system (designed by Agustin of Hippo) was imposed upon the collapse of the Roman Empire (very similar dates). However popular revolutions gradually but systematically destroyed it. The same is happening in India now but with a delayed timeline. Instead Muslim West Asia (and surroundings) had no caste system and that's probably why it was so successful back in the day: because it allowed relatively more freedom and intellectual pursuit than other neighboring social systems. Of course, this stopped being the case after the Mongol conquests, roughly coincident with European Renaissance, when Islam cocooned itself into reactionary mode, leading to stagnation and eventually to colonial subservience.

February 2, 2016

Most Africans do not have significant Eurasian admixture

This is major news: the authors of the study on the ancient East African genome of Mota have recanted their conclusions. In a correction note echoed by Nature they say:

Erratum'to'Gallego'Llorente'et'al.'2015'

The results presented in the Report “Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent“ were affected by a bioinformatics error. A script necessary to convert the input produced by samtools v0.1.19 to be compatible with PLINK was not run when merging the ancient genome, Mota, with the contemporary populations SNP panel, leading to homozygote positions to the human reference genome being dropped as missing data (the analysis of admixture with Neanderthals and Denisovans was not affected). When those positions were included, 255,922 SNP out of 256,540 from the contemporary reference panel could be called in Mota. The conclusion of a large migration into East Africa from Western Eurasia, and more precisely from a source genetically close to the early Neolithic farmers, is not affected. However, the geographic extent of the genetic impact of this migration was overestimated: the Western Eurasian backflow mostly affected East Africa and only a few SubUSaharan populations; the Yoruba and Mbuti do not show higher levels of Western Eurasian ancestry compared to Mota.

We thank Pontus Skoglund and David Reich for letting us know about this problem.

This makes much better sense admittedly. I strongly appreciate the willingness of Gallego, Jones et al. for publicly amending their wrong as quickly as possible. It's said that erring is human but correcting is only for the wise.

January 3, 2016

Irish ancient DNA

This study was published just a few days ago but is already from the previous year, tricks of the calendar. It is a scheme-breaker in several aspects, so I hope to be able to reflect here the most important aspects of it.

Lara M. Cassidy, Rui Martiniano et al., Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. PNAS 2015. Freely accessible → LINK [doi: 10.1073/pnas.1518445113]

Abstract

The Neolithic and Bronze Age transitions were profound cultural shifts catalyzed in parts of Europe by migrations, first of early farmers from the Near East and then Bronze Age herders from the Pontic Steppe. However, a decades-long, unresolved controversy is whether population change or cultural adoption occurred at the Atlantic edge, within the British Isles. We address this issue by using the first whole genome data from prehistoric Irish individuals. A Neolithic woman (3343–3020 cal BC) from a megalithic burial (10.3× coverage) possessed a genome of predominantly Near Eastern origin. She had some hunter–gatherer ancestry but belonged to a population of large effective size, suggesting a substantial influx of early farmers to the island. Three Bronze Age individuals from Rathlin Island (2026–1534 cal BC), including one high coverage (10.5×) genome, showed substantial Steppe genetic heritage indicating that the European population upheavals of the third millennium manifested all of the way from southern Siberia to the western ocean. This turnover invites the possibility of accompanying introduction of Indo-European, perhaps early Celtic, language. Irish Bronze Age haplotypic similarity is strongest within modern Irish, Scottish, and Welsh populations, and several important genetic variants that today show maximal or very high frequencies in Ireland appear at this horizon. These include those coding for lactase persistence, blue eye color, Y chromosome R1b haplotypes, and the hemochromatosis C282Y allele; to our knowledge, the first detection of a known Mendelian disease variant in prehistory. These findings together suggest the establishment of central attributes of the Irish genome 4,000 y ago.

The two sample sites are from North Ireland, being the so-called Neolithic one from the interior (Co. Down, c. 3200 BCE) and the so-called Bronze Age ones are from a small island (Rathlin) north of the main island (Rathlin 1 and 2 from c. 1900 BCE, Rathlin 3 from c. 1600 BCE).

I say "so-called" because I'm not really confident that the terms "Neolithic" and "Bronze Age" apply in fact to most of them (I'd rather use Chalcolithic, shorthand for "advanced Neolithic with social complexity, regardless of metals", for all but Rathlin 3). I think in any case that the divider here is not metallurgy as such but actually the Bell Beaker divide: before and after Bell Beaker.

Bell Beaker is becoming a key element to our understanding of the demographic changes in Northern Europe, more than I would have expected admittedly. In the case of Ireland (and to a lesser extent parts of Britain) the arrival of the Bell Beaker phenomenon is accompanied with striking demographic growth, which may (or not) imply new settlement from outside. For Ireland, it seems growingly clear now, it probably does, unless Rathlin is a very unusual site, what is not parsimonious as we will see.

Enough with the intro, let's get to the substance.

Haploid genetics

Ballanyhatty (Co. Down), a woman, carried the mitochondrial haplogroup (matrilineage) HV0. Rathlin 1 carried U5a1b1e, Rathlin 2 U5b2a2 and Rathlin 3 carried J2b1a. The only thing remarkable here is the lack of haplogroup H, the most common one in Europe today and detected since the Magdalenian era in Iberia, but more commonly later on within Neolithic. It can be a fluke of course but the shallow impression is that the mtDNA pool is "pre-modern". However all the rest is very "modern" in Rathlin Island, so... let's assume it's a mere fluke.

The three Rathlin individuals are all men, and their Y-DNA haplogroup has been successfully sequenced: they all belong to R1b-M529, the most common patrilineage in Ireland (and much of Britain and also Brittany) to this day. There's some hints that some of them could belong to downstream subhaplogroups but, if you read the fine print (the supp. materials) this is quite unclear, so let's leave it at this.

R1b-S116 structure per Valverde 2015

The implications of this data point are important: it clearly defines a terminus ante quem for all possible R1b-M529 and upstream haplogroups' chronologies. Whoever defended a shorter chronology was clearly wrong. Together with a German Bell Beaker individual, these are the oldest R1b-S116 known so far, what is hardly surprising considering the huge blank in aDNA sampling in Western Europe but also suggests that, at least in some areas, Bell Beaker was implicated in the expansion of this most important European patrilineage and in general in the formation of modern-like Western European populations.

There are many open questions here yet because we lack ancient DNA data from France, West Germany, Belgium, Britain, much of Iberia, etcetera. But, with these new data points, I am beginning to believe that Bell Beaker was, if not a general cause, at least a key pivot around which these demographic changes leading to modern populations took place. It was probably a cause in Ireland but it's truly hard to extrapolate to other regions, where aDNA information is missing and archaeological one suggests different patterns of change or continuity.

Autosomal DNA

The most striking implication of the autosomal DNA of these two Irish sites is that Rathlin men are almost identical to modern Irish (also Scots, Welsh and Cornish), while the much older Ballanyhatty woman is only slightly similar to modern Irish (and Scots), being much more like Sardinians and some South Iberians (what is congruent with what happens to all other Neolithic samples through much of Europe).

Selection from fig. 3

So we are before a clear-cut demographic change in Ireland (and maybe other regions) at some point in the third millennium BCE. The most plausible date for the beginning of this change is probably around 2500 BCE, when we see the start of significant demographic growth in Ireland and is also the approx. date for Bell Beaker arrival to the island and other parts of Northern Europe (several centuries older in the South however).

Putting these samples in the wider context the authors get this:

Fig. 1. Genetic affinities of ancient Irish individuals. (A and B) Genotypes from 82 ancient samples are projected onto the first two principal components defined by a set of 354,212 SNPs from Eurasian populations in the Human Origins dataset (29) (SI Appendix, Section S9.1 and S10). (A) This PCA projects ancient Eurasian Hunter–Gatherers and Neolithic Farmers, where they separate clearly into Early Neolithic, MN (including the Irish Ballynahatty genome), and several hunter–gatherer groups. (B) PCA projection of Late Neolithic, Copper, and Bronze Age individuals where the three Rathlin genomes adopt a central position within a large clustering of European Bronze Age individuals. (C) A plot of ADMIXTURE ancestry components (K = 11) of these same ancient genomes. In West and Central Europe, ancient individuals are composed almost entirely of two dominant strands of ancestry, linked to hunter–gatherer (red) and early farmer (orange) populations, until the Late Neolithic. At this point, a third (green) Caucasus component features. Previously, this component was only seen in ancient Steppe and Siberian populations such as the Yamnaya. The three Rathlin genomes each display this Caucasus strand of ancestry whereas the Irish Neolithic does not.

Sure: a key element here is the "teal" Caucasus-related component, which is a tell-tale signature of the Indoeuropean or Kurgan expansion into Europe. As exercise to get a rough estimate of how much Indoeuropean (Yamna-like) ancestry there is in each sample, I propose you to get a ruler and a calculator, measure it for each sample and find the resulting fraction. You can also do the same for the early Neolithic (EEF) ancestry, using the "orange" component. There is an interesting substantial leftover fraction that can only be extra "hunter-gatherer" (HG), wherever it comes from.

My own estimates are as follows:

Late Neolithic (LN) samples: 80% EEF + 20% extra HG.
German LN (early Kurgans) = 23% IE + 40% EEF + 37% HG → 27% extra HG relative to LN
Corded Ware = 64% IE + 21% EEF + 15 HG → 10% extra HG rel. to LN
Elbe Bell Beaker (avg.) = 13% IE + 44% EEF + 43% HG} → 32% extra HG rel. to LN
Irish BA = 25% IE + 34% EEF + 41% HG} → 32% extra HG rel. to LN

There is some data in the supp. materials (S12.2.2) which is roughly consistent with this, although their fraction of extra HG (using Lochsbour as reference) is smaller than mine, while their Yamna or IE one is larger instead (no idea why this lesser contradiction, honestly, although they almost overlap once we include error margins).

Where does this extra HG fraction comes from? It is quite apparent that the currently available samples do not include its source. As I have mentioned many times, there is a huge "Atlantic" blank in the autosomal samples, including nearly all France and many areas around it: Switzerland, West Germany, Low Countries, Britain and about 3/4 of the Iberian Peninsula.

In this study however we get a hint in the supp. materials: KO1, an Epi-Magdalenian sample from Hungary, stands out like a sore thumb in the f3 analyses of all three Rathlin samples:

Figure S12.1. Outgroup f3-Statistics for each ancient Irish Individual. Tests in the form f3(Mbuti; IA, X), where IA is an Irish ancient genome and X is any other ancient individual or population. Data points are coloured by archaeological context.

Obviously the origin of the extra HG cannot be KO1 as such but there must be one or several populations, as of yet unsampled, in which this extra HG (most akin to KO1) was notorious. My best candidates as of now are the following cultures:

Michelsberg, which replaced LBK in most of Germany, North France, Belgium, Switzerland, etc. prior to the Corded Ware shockwave. It's part of the wider Funnelbeaker and Megalithic phenomena and one of the ancient cultures I really want to see sampled in some depth.
Artenac, which replaced previous layers in all West France and Belgium and is part of the wider Megalithic and Bell Beaker phenomena. It originated around Dordogne and is usually considered proto-Aquitanian, i.e. proto-Basque.
The major civilization of Zambujal or Vila Nova de Sao Pedro in Portuguese Estremadura, which was a key pivot in the Megalithic and particularly the Bell Beaker phenomenon.

And in general I'd complement these with samples from all the Atlantic facade of Europe, including Britain, the Basque Country (a lot was going on in the Chalcolithic here in spite of the small size), West France, Belgium, the Rhône valley and Switzerland, etc. If we'd have data points for all these areas in the Chalcolithic period, we'd surely have a much more clear picture of what was going on in Europe in this critical period of demographic change. Definitely it's not just Corded Ware and the Elbe basin can only give us so much information anyhow.

This is also important regarding the origins and spread of R1b-S116 and its "brother" haplogroup U106, no kidding. Let's sample the West, it's about time.

January 1, 2016

Caucasus and Swiss hunter-gatherer genomes

I know I'm late for the party but better late than never, right?

A recent study sequenced three hunter-gatherer genomes from Georgia and one from Switzerland, expanding our understanding of the pre-Neolithic genetic landscape of West Eurasia.

Eppie R. Jones et al., Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nature Communications 2015. Open access → LINK [doi:10.1038/ncomms9912]

Abstract

We extend the scope of European palaeogenomics by sequencing the genomes of Late Upper Palaeolithic (13,300 years old, 1.4-fold coverage) and Mesolithic (9,700 years old, 15.4-fold) males from western Georgia in the Caucasus and a Late Upper Palaeolithic (13,700 years old, 9.5-fold) male from Switzerland. While we detect Late Palaeolithic–Mesolithic genomic continuity in both regions, we find that Caucasus hunter-gatherers (CHG) belong to a distinct ancient clade that split from western hunter-gatherers ~45 kya, shortly after the expansion of anatomically modern humans into Europe and from the ancestors of Neolithic farmers ~25 kya, around the Last Glacial Maximum. CHG genomes significantly contributed to the Yamnaya steppe herders who migrated into Europe ~3,000 BC, supporting a formative Caucasus influence on this important Early Bronze age culture. CHG left their imprint on modern populations from the Caucasus and also central and south Asia possibly marking the arrival of Indo-Aryan languages.

Figure 1: Genetic structure of ancient Europe.

(a). Principal component analysis. Ancient data from Bichon, Kotias and Satsurblia genomes were projected¹¹ onto the first two principal components defined by selected Eurasians from the Human Origins data set¹. The percentage of variance explained by each component accompanies the titles of the axes. For context we included data from published Eurasian ancient genomes sampled from the Late Pleistocene and Holocene where at least 200 000 SNPs were called^{1, 2, 3, 4, 5, 6, 7, 9} (Supplementary Table 1). Among ancients, the early farmer and western hunter-gatherer (including Bichon) clusters are clearly identifiable, and the influence of ancient north Eurasians is discernible in the separation of eastern hunter-gatherers and the Upper Palaeolithic Siberian sample MA1. The two Caucasus hunter-gatherers occupy a distinct region of the plot suggesting a Eurasian lineage distinct from previously described ancestral components. The Yamnaya are located in an intermediate position between CHG and EHG. (b). ADMIXTURE ancestry components¹² for ancient genomes (K=17) showing a CHG component (Kotias, Satsurblia) which also segregates in in the Yamnaya and later European populations.

The Swiss one (Bichon, Jura) is maybe less of a novelty, roughly falling within the already known parameters for Western European hunter-gatherers of Magdalenian tradition (WHG in the jargon) but the three samples from the Caucasus (CHG) are really a much needed new data-point, different from everything else that what we knew and surprisingly close to modern Caucasus populations.

They are however very distant from all other known ancient West Eurasian samples. Fig. 2 shows an estimated divergence with early Neolithic Europeans (EEF, Stuttgart) dating from before the Last Glacial Maximum, to 24,000 years ago. The divergence of this composite West Asian macro-population (EEF's Paleoeuropean admixture is accounted for separately) with the pre-Neolithic Europeans seems to be of c. 46,000 years, what is consistent with early Upper Paleolithic (the large error margin allows for a secondary Gravettian genesis contact anyhow). On the other hand the divergence between Bichon and Lochsbour seems to fit with the Magdalenian time-frame as one would expect.

CHG are surprisingly close to modern Caucasus population, particularly to Georgians. CHG also appear to be an excellent candidate population for the formation of the early Indoeuropean Yamna people, which fit best as a mix of CHG and EHG (Eastern European hunter-gatherers).

Figure 4: The relationship of Caucasus hunter-gatherers to modern populations.

a). Genomic affinity of modern populations¹ to Kotias, quantified by the outgroup f₃-statistics of the form f₃(Kotias, modern population; Yoruba). Kotias shares the most genetic drift with populations from the Caucasus with high values also found for northern Europe and central Asia. (b). Sources of admixture into modern populations: semicircles indicate those that provide the most negative outgroup f₃ statistic for that population. Populations for which a significantly negative statistic could not be determined are marked in white. Populations for which the ancient Caucasus genomes are best ancestral approximations include those of the Southern Caucasus and interestingly, South and Central Asia. Western Europe tends to be a mix of early farmers and western/eastern hunter-gatherers while Middle Eastern genomes are described as a mix of early farmers and Africans.

I find notable that the CHG component (do not confuse with the African one of similar color) is still apparent in the Indian subcontinent, something that was already detected in other analyses. The CHG component seems to be the core of the so called "ancient North Indian" (ANI) component, also known as "Gedrosian" or "Caucaso-Baloch". What they call in the above analysis "South Asian" would be approximately the also known as "ancient South Indian" (ASI) component, which is presumably pre-Neolithic.

"Farmer" means European Early Farmer (EEF) and already implies some Paleolithic European admixture, until we have some Levant and Mesopotamian genuine first Neolithic samples, we should not assume that all the Fertile Crescent Neolithic people were just like that, although some may have been close. In fact, I tend to think that the CHG or a similar "highlander" component was probably important in the Zagros Neolithic and consequently in the Mesopotamian and Iranian one, reaching eventually to South Asia. See here for more details on how the Neolithic expansion in Europe and India were largely parallel but not identical at all in source populations.

To illustrate this early Neolithic complexity, still apparent to some extent in West Asian genetics and, as I just said, in European and South Asian ones, the following archaeo-cultural map should help:

Source: Eleni Asouti 2006 (red color annotation is mine)

I strongly recommend to read the full source study, because it is very informative about what were some of our ancestors¹ doing when farming and herding were being developed in West Asia, but the map above alone gives a very good glimpse of the ethno-cultural complexity of these ancient West Asian populations.

My understading is that the mainline (Thessalian or Aegean) European Neolithic founders must have originated within the PPNA/B complex, although uncertain about which specific culture within it (most likely not Harifian because that one is surely at the origin of Semitic languages but almost any other one would do, notably those close to the Mediterranean coast: Sultanian, Aswadian and Mureybetian). Instead the populations affecting Eastern European, Mesopotamian-Iranian (Sumer and Elam) and Indian Neolithic are most probably rather linked to what is here called as M'lafatian or Zagros Neolithic, which in turn were most likely linked one way or the other to Caucasus hunter-gatherers and in general to the "highlander" West Asian element apparent in other studies in contrast to a more EEF-like "lowlander" one.

________________________________________

¹ Sure: I'm thinking mostly of Euro-Mediterranean and Central-South Asian peoples but even if you are East Asian or Tropical African it's still very probable that some random ancestor comes from this crucial paleo-historical knot (or almost from anywhere else: admixture never ends and we are all related, even if thinly, within the last millennium or so).

Pages