For what they were... we are: Y-DNA R1a spread from Iran

March 29, 2014

Y-DNA R1a spread from Iran

While this conclusion was something more or less reachable with previous data (see HERE for example), a new study adds some fine detail for us to reconstruct the paleohistory of this major Eurasian lineage.

Peter A. Underhill et al., The phylogenetic and geographic structure of Y-chromosome haplogroup R1a. EJHG 2014. Pay per view → LINK [doi:10.1038/ejhg.2014.50]

Important: supplemental materials are freely available.

Abstract

R1a-M420 is one of the most widely spread Y-chromosome haplogroups; however, its substructure within Europe and Asia has remained poorly characterized. Using a panel of 16 244 male subjects from 126 populations sampled across Eurasia, we identified 2923 R1a-M420 Y-chromosomes and analyzed them to a highly granular phylogeographic resolution. Whole Y-chromosome sequence analysis of eight R1a and five R1b individuals suggests a divergence time of ~25 000 (95% CI: 21 300–29 000) years ago and a coalescence time within R1a-M417 of ~5800 (95% CI: 4800–6800) years. The spatial frequency distributions of R1a sub-haplogroups conclusively indicate two major groups, one found primarily in Europe and the other confined to Central and South Asia. Beyond the major European versus Asian dichotomy, we describe several younger sub-haplogroups. Based on spatial distributions and diversity patterns within the R1a-M420 clade, particularly rare basal branches detected primarily within Iran and eastern Turkey, we conclude that the initial episodes of haplogroup R1a diversification likely occurred in the vicinity of present-day Iran.

This case, as well as many others, including that of its close relatives R1b and Q, illustrate why frequency is not the same as origin, which can only be inferred (if at all) by studying the hierarchical diversity of the lineage. These three lineages for example, must have spread from West Asia but they are relatively less important in numbers in that region today, overshadowed by other lineages, notably J. Instead their derived branches had major impacts in other regions (Europe, South and Central Asia, Siberia and America).

Frequencies of the main lineages

There are two main sub-lineages of R1a, which according to the current ISOGG tree version (maybe to be refitted after this study?) are known as R1a1a1b2 (Z93) and R1a1a1b1a (Z282). The first one is essentially Asian (with greatest frequencies in South and Central Asia, where it includes >98% of all R1a individuals) wile the latter is almost exclusively European (notably Eastern European but with a distinct branch in Scandinavia, encompassing together >96% of R1a individuals in Europe).

These maps give us a quite decent glimpse of the main scatter patterns of R1a but alone they can't inform us of its origins. For that we have to look at the detailed tree and the relationship of its samples with geography.

Origins and distribution of R1a

As mentioned above, the authors conclude that R1a and R1a1 must come from Iran, where the greatest basal diversity is:

To infer the geographic origin of hg R1a-M420, we identified populations harboring at least one of the two most basal haplogroups and possessing high haplogroup diversity. Among the 120 populations with sample sizes of at least 50 individuals and with at least 10% occurrence of R1a, just 6 met these criteria, and 5 of these 6 populations reside in modern-day Iran. Haplogroup diversities among the six populations ranged from 0.78 to 0.86 (Supplementary Table 4). Of the 24 R1a-M420*(xSRY10831.2) chromosomes in our data set, 18 were sampled in Iran and 3 were from eastern Turkey. Similarly, five of the six observed R1a1-SRY10831.2*(xM417/Page7) chromosomes were also from Iran, with the sixth occurring in a Kabardin individual from the Caucasus. Owing to the prevalence of basal lineages and the high levels of haplogroup diversities in the region, we find a compelling case for the Middle East, possibly near present-day Iran, as the geographic origin of hg R1a.

Between these top tier nodes (R1a and R1a1) and the two most common sublineages described above, this study only found one paragroup represented: R1a1a1* (M417). This should be an important step in the analysis but the researchers prefer to remain silent on it. Why? I guess that the reason is that it is complicated to analyze and reach to sound conclusions.

I spent some time today looking at the haplotypes of this paragroup mentioned in the study and I could not reach a conclusion either: the majority of the sequences are from Europe and all them (excepting a highly derived Norwegian line and including a low derived Iranian one) seem to derive from a North German haplotype. I call this group "branch A".

However there is at least one West Asian sequence (from Turkey) which seems independent ("branch B"), while an Indian and the already mentioned Norwegian sequence could derive from either one. So my impression is that there is an specifically North European "branch A" but also some other stuff with West Asian centrality ("branch B") within this key paragroup.

Guess that I could say a lot more about not being able to say much more on this key intermediate step but, synthetically there are two options among which I can't decide:

Branch A went back to West Asia from where it spread again to Eastern Europe and Central South Asia.
Branch B is actually at the origin of the two derived and highly spread subhaplogroups.

Whatever the case I understand that there are good reasons to think that these spread first from West Asia, at the very least Z93 and very likely also Z282.

R1a1a1b2 (Z93)

There is nothing European in this lineage: only some lesser terminal branches at the Southern Urals, roughly where the Kurgan phenomenon began some 6000 years ago.

This detail is indeed remarkable because, if, as often argued, R1a or some of its subclades spread from there, we should expect at least some basal diversity being retained. Instead all we see are some highly derived branches. So the main conclusion must be that the expansion of R1a does not seem related to the Kurgan phenomenon, except maybe in some secondary instances.

As mentioned before, this lineage is Central and South Asian and comprises the vast majority of R1a in those two regions.

The detailed haplotype network can be seen in Supp. Info fig. 2.

In essence we can say that:

Z93* has three apparent distinct branches stemming from West Asia (incl. Caucasus) and another one from South Asia/Altai (1).
Z95* has two apparent distinct branches:

A small one with presence in West Asia and Southern Europe
Another one (pre-M780?) stemming from South or West Asia

M780 has clear origins in South Asia (incl. most Roma lineages)
Z2125 also appears to originate in South Asia, even if it has a greater spread outside it, notably to Central Asia
M580 and M582 appear related and surely originated in West Asia

Weighting them:

Z95:

West Asia: 2
South Asia: 2
West/South Asia: 1

Therefore the origin of Z95 should be though as West-South Asian but undecided between either region. Say Afghanistan for example.

Z93:

West Asia: 3
West/South Asia: 1 (Z95)
South Asia/Altai: 1

In this case I would say that West Asia is almost certainly the origin, although tending to Central/South Asia. For example: Iran again.

So, regardless of whether the previous stage (M417) represents a stay in West Asia or a back-migration from Europe into West Asia, West Asia is clearly at the origin of Z93. It does not represent any Kurgan migration but an Asian phenomenon with origins towards the West (around Iran).

R1a1a1b1a (Z282)

On first sight this European sublineage seemed quite simpler: it is obvious that the bulk of it spread from Eastern Europe. However, when we look at the haplotype network, we cannot confirm this pattern for the Norwegian or Scandinavian haplogroup Z284, which is only linked to the rest via some South European and West Asian samples.

So my conclusion must be that Z282 experienced a main expansion from Eastern Europe but only into Eastern and Central Europe and that the Scandinavian variant almost certainly represents another flow within this haplogroup, with the knot being in West Asia.

Anyhow the main East and Central European expansion seems true. For some reason it is not centered in any obvious prehistorical locality, as could be the Volga or maybe Ukraine, but instead its center is further North around Smolensk.

Overall reconstruction of the spread of R1a

With all the previous analysis I made this map, which also shows in discrete gray color the general pattern of expansion of haplogroup R:

We have an expansion of R into South Asia and Western Eurasia (incl. Central Asia) and even into parts of Africa (R1b-V88) from apparent South Asian (R, R1 and R2) and West Asian (R1a, R1b) origins. Related lineages Q and P* could also be integrated into this pattern of expansion but I did not want to overload the map with too many details.

There is some uncertainty regarding the North European branches of R1a but otherwise the pattern seems quite clear.

On these North European branches, I must say that they remind me of other odd lineages with similar geography: R1b-U106, I1-M253 and I2a2-M223. With the likely exception of R1b-U106 neither appears to have experienced any significant re-expansion since their arrival to that corner of the World, however they do seem to survive pretty well in it.

Time frame?

Finally we seem to be entering the age of full Y chromosome sequencing and a more serious molecular clock based on it. As I have explained on other occasions (for example), the human Y chromosome is large enough to experience mutations almost every single generation, what should provide a decent molecular clock, unlike the very rough approximations used in the past.

However the issue of correct calibration remains open. As you surely know the academy is slow to incorporate the most recent evidence, especially from fields distinct to their specialty. Hence I do not expect them to calibrate based on the obvious fact that age(CF) or at least age(F)=100,000 years. They are probably still stuck in old concepts of a "recent" out-of-Africa migration c. 60 or at most 80 Ka ago, as well as the usual Pan-Homo spilt under-estimates

I must reckon in any case that I had not enough time to study this matter in depth yet, so the previous observation is rather my idea of what to expect.

In any case in this study the authors resorted to full Y chromosome to calculate their age estimates and I applaud them for doing so. As apparent in fig. 5, all R1 derived sequences have approximately the same number of accumulated SNPs, what in principle allows for a perfected molecular clock, assuming it is well calibrated.

Their estimate is as follows:

A consensus has not yet been reached on the rate at which Y-chromosome SNPs accumulate within this 9.99Mb sequence. Recent estimates include one SNP per: ~100 years,⁵⁸ 122 years,⁴ 151 years⁵ (deep sequencing reanalysis rate), and 162 years.⁵⁹ Using a rate of one SNP per 122 years, and based on an average branch length of 206 SNPs from the common ancestor of the 13 sequences, we estimate the bifurcation of R1 into R1a and R1b to have occurred ~25,100 ago (95% CI: 21,300–29,000). Using the 8 R1a lineages, with an average length of 48 SNPs accumulated since the common ancestor, we estimate the splintering of R1a-M417 to have occurred rather recently, B5800 years ago (95% CI: 4800–6800). The slowest mutation rate estimate would inflate these time estimates by one third, and the fastest would deflate them by 17%.

The references correspond to (4) Poznick 2013, (5) Francalacci 2013, (58) Xue 2009 and (59) Méndez 2013. This last is the Anzick study, of which at the very least we can say that they had a real calibration point in the ancient Amerindian DNA. It is also the one which provides the longest mutation rate.

Considering that Xue 2009 is "old" (for this avant-guard aspect of this pretty young science), I find their choice of the Poznick rate quite a bit conservative. The Francalacci rate is the intermediate one of the three "recent" papers referenced and it is also quite close to the calibrated Méndez rate.

Personally I would choose the later without a second thought. As long as CF ends up being younger than 100 Ka, it is positively too conservative anyhow.

Using the Méndez (Anzick-calibrated) rate of 162 years per SNP, I get the following corrected estimates:

R1a/R1b split (R1 node): 33,000 years ago (CI: 26.0-42.5 Ka)
R1a-M417 node: 7,700 years ago (CI: 6.4-9.0 Ka)

These seem fair enough to me, judging on the fact that the core R1a expansion seems to originate in West Asia (at the very least for the South/Central Asian branch), what fits much better with a Neolithic frame than with the Kurgan one.

It also fits better with my previous estimates after due re-calibration of Terry D. Robb's full sequence Y-DNA tree, although my estimates are even older, especially after a second recalibration to adjust to the recent discovery of widespread H. sapiens evidence in South and East Asia c. 100 Ka ago.

In my understanding the R1 node is actually c. 48 Ka old (R1b: c. 34 Ka.), what, apportioning, yields a date of c. 11.2 Ka for the R1a-M-417 node.

Update (Mar 31):best possible molecular clock estimates for R1:

Follows fig. 5 of Underhill et al. 2014, annotated by me in red and purple colors:

Red: age estimates calibrated for age(CF)=~100Ka, which is what modern archaeological evidence overwhelmingly supports for the second phase of the migration Out of Africa (or rather out of Arabia into India and East Asia). Dots drawn to help identify the estimated nodes.
Purple: general references of European (plus) prehistorical cultures or periods for the key ages estimated.

If I'm correct, then the expansion of R1b in Europe still corresponds in rough terms to the Magdalenian period or, more generally, the late Upper Paleolithic. This does not mean that it remained that way forever (it may well have been reshuffled later on: in the Epipaleolithic, Neolithic and Chalcolithic) but it seems to be the time-frame of its main expansion when the main lineages got established, whatever happened to them later on.

I know well that so far ancient DNA for this lineage remains to be found and that the dominant haplogroup among known Epipaleolithic hunter-gatherers was (for all we know) I2a. However this is what the refined full Y chromosome sequence molecular clock, properly calibrated according to the archaeological evidence for the settling of Asia by H. sapiens, has to say. If you wish to dismiss this and use another estimate instead, that's always up to you. I just hope that you know what you're doing.

Anyhow, if I am correct, then the expansion of R1a is neither Chalcolithic nor Neolithic but clearly Epipaleolithic. Does it make any sense? I can't say for sure because this period is not so well understood. Whatever the case, is it possible to integrate the key pre-Neolithic Zarzian culture of the Zagros (map) in this scheme of things? What about all the other question marks that fill the gaps of our mediocre knowledge of the Mesolithic of West Asia? Or is it the Balcanic Epigravettian to be blamed instead? Or both?

I really can't say with any certainty at this stage. But I am intrigued indeed.

Update (Mar 31): frequency pie charts of Underhill's data available at Kurdish DNA.

Update (Aug 2015): I must update the frequencies of the various upstream paragroups, in agreement with table S4, because I may have missed some details initially. However the overall tendency is the same.

R1a* (M420): Italy (1), Turkey East (1), Turkey Cappadocia (2), UAE (1), Oman (1), Iran (set 2) (2), Iran NE (1), Iran South (5), Iran North (5), Azeris-Iran (5).
R1a1* (SRY10831.2): Iran (set 2) (1), Iran NE (1), Iran South (2), Iran North (1), Kabardin (1). In addition it has more recently been found in two Epipaleolithic Eastern Europeans (EHG), from Karelia (Haak 2015) and Smolenskaya Oblast (Chekunova 2014).
Ra1a1a1* (M417): Ireland (1), Netherlands (3), Norway (1), South Sweden (1), Germany (1), Estonia (1), Hungary (1), Turkey East (Kurds) (1), Iran (set 3) (1), India South (1).

85 comments:

DavidskiMarch 30, 2014 at 7:48 AM
R1a mostly spread from here during the Copper Age, between the modern hotspots of Z282 and Z93, along with ANE.

http://i129.photobucket.com/albums/p217/dpwes/East_Euro_K15.png~original

That's the East Euro component from the Eurogenes K15. MA-1 carries 34.45% of it.

One of the problems with Underhill et al. 2014 is that the phylogeography of European R1a is a mess, with, for instance, Z280 being shown to be above Z282, M458 and Z284.

Another problem is that the M420* samples from the Near East appear to belong to single young subclade, so they're not evidence of an Iranian origin of R1a, especially since M420* is also found across Europe, except it wasn't reported from there in this study.

So your map doesn't make any sense. At some point you'll realise that when you start thinking of R1a as an ANE marker.
ReplyDelete
Replies
KristiinaMarch 30, 2014 at 10:31 AM
Maju, I would be very happy if you could check if they included Saami R1a in the study? Is it possible to know if Saami R1a belongs to Z282 or Z284? It seems that the Finnish R1a is split between Z282, Z284, M458 and M558.

Do they give an age estimation for M458 or for M558?

I would suppose that the Scandinavian Z284 does not have anything to do with the steppe IE languages but spread to the north during the Neolithic. Instead, M558 could be related to Steppe IE phenomenon and there are certain correspondences between the spread of M558 and the Corded Ware culture. Am I right that M458 has been proposed to be the Slavic marker (but not the only one). On the basis of the frequency map, it could be true.

Maju, you propose that R1a spread to Europe through Turkey. If R1a arose in Iran or in Afghanistan, I would prefer a route to Europe through Daghestan or even East of the Caspian Sea.

For Corded Ware maps:
http://en.wikipedia.org/wiki/Corded_Ware_culture
https://www.google.fi/search?hl=fi&site=imghp&tbm=isch&source=hp&biw=1165&bih=701&q=%22corded+ware%22&oq=%22corded+ware%22&gs_l=img.3...0.0.1.180761.0.0.0.0.0.0.0.0..0.0....0...1ac..38.img..6.7.730.wTxaK95RHh8
ReplyDelete
Replies
AnonymousMarch 30, 2014 at 1:26 PM
So, you think Z93* in the Altai comes from south Asia/south central Asia?

I think it would make more sense to link this R1a subclade to the arrival of the Europoid population appearing during chalcolithic (origin of Afanasevo) in south Siberia, a population clearly coming from south Russia/south-east of the Urals, from the eastern part of the Yamnaya peoples (common morphology, light pigmentation, _west eurasian_ (with modern european matches) female lineages, kurgans, early Yamnaya potteries and cultual objects and even axes, copper metallurgy, pastoralism (linked w/ modern cattle DNA in Mongolia and beyond w/ a sizeable European component), typical dental characteristics, and so on).

If this R1a-Z93* arrived from south Asia, how comes it is largely associated with typical _WESTERN_ female lineages (sometimes with modern matches as far as Iceland, in the case of a mtDNA H of bronze age Tarim).
How do you explain it?
If you don't associate it with the beginning of Afanasevo what are you associate it with?
Are you envisionning some kind of population replacement? Are you thinking of a wiping out when Afanasevo became part of the Andronovo horizon around 1700 BCE? It doesn't change much anyway, Y-DNA-wise as the source of andronovo is also ultimately the kurgan culture of Russia.

This europid Z93* in the Altai, associated with west eurasian female lineages (and not south Asian ones), seems to corroborate the Kurgan theory more than anything, since it links Z93's ancestor with south Russia's ancient eastern Kurgan cultures.
ReplyDelete
Replies
UnknownMarch 30, 2014 at 5:40 PM
First let me say as someone who hasn't read the paper , this is an informative blog-post, except for:

“Hence I do not expect them to calibrate based on the obvious fact that age(CF) or at least age(F)=100,000 years.”

This is just unsupported nonsense.

“As long as CF ends up being younger than 100 Ka, it is positively too conservative anyhow. “

I don't think this type of unwarranted confidence is helpful at all Maju, it sounds as if ancient YDNA from greater than 100 KYA has been found with the CF-P143 mutation, has there? Am I missing something?

Don't forget that mtDNA from ancient sites has more or less vindicated the orthodox views on molecular 'clockology' as you like to refer to it, we are just waiting for YDNA to do the same.
ReplyDelete
Replies
MajuMarch 30, 2014 at 6:47 PM
"This is just unsupported nonsense".

And that is an unsupported and very gratuitous disqualification.

There's a lot of recent archaeological and paleontological evidence piling up that clearly point to an arrival of H. sapiens to South and East Asia c. 100 Ka ago. Link in main entry.

Also it makes good sense if we consider the "pump" model for the OoA migration: when conditions were favorable in the Abbassia Pluvial, people moved to the "deserts" (then much more productive) of Sahara and Arabia. Lots of archaeological evidence confirm it since c. 125 Ka ago. When the Pluvial was declining, some of them may have been pushed in search of new opportunities, reaching to Asia East of the Arabian Sea and rapidly expanding in that area.

"Am I missing something?"

Yes: you are totally missing the archaeological evidence, which is the only one informing us about the Out-of-Africa migration time frame in fact.
ReplyDelete
Replies
KristiinaMarch 30, 2014 at 7:05 PM
It may be relevant in this respect that according to that Baraba Steppe paper, Andronovo and Iron Age mtDNA is not coming from Eastern Europe. For the most part, mtDNA is local, i.e. found in Bashkirs, Tatars and Volga-Uralic people, but the exotic haplotypes seem to have links to countries like Iran, Azebaidjan etc.
Some Baraba haplotypes are similar or close to the haplotypes of the following West Asian groups:
Andronovo Tartas:
T - Gilaki Iran
Iron Age Chicha:
H - Shungan Tadjikistan,
U1a - Azerbaidjan,
U4 - Shungan Tadjikistan
U5a - Hunza
T - Iranian Kurds, Shungan Tadjikistan, Ti Azerbaidjan
T1 - Bronze Age Kazakhstan kurgans, Kumandins, Mazandarians
J - Turkmen
H - Shungan Tadjikistan
U3 - Iranian Kurds, Gujarat
W - Mazandarians, Tadjikistan Ti
H6a1 - Hunza

It seems that only T2b, found in Baraba Chicha burials, is typically European and found in LBK and one Baraba Chicha U5a1 haplotype is found in Italy.
ReplyDelete
Replies
AnonymousMarch 31, 2014 at 12:10 AM
Maju: "This europid Z93* in the Altai"... "It is not "Europoid" in any way I can discern: just look at the haplotypes, for Chaos shake!"

Well as mentionnned, I was obviously referring to their morphology that the mainstream studies (references in the mainstream D. W. Anthony and J.P. Mallory's books as well) qualify as Europoid and more specifically as proto-europoid in east European studies.

The sudden appearance in south Siberia, around 3500 BCE, of Europoid population with early Yamnaya technology (objects, metallurgy), economy (pastoralism (also keep in mind the Mongolian cattle DNA being part "european")), culture (cultual object, kurgan) associated with west female lineages (with modern matches in Europe) and light pigmentation, really plead for a population movement from the west - and by west I mean the eastern part of early Yamnaya territory.

I can't just ignore the material archeological evidences.

"Isn't Central Asia "Western" (not necessarily meaning "European") since the very beginnings of the Upper Paleolithic? I don't need any particular explanation for Altai or other Central Asia aligning with Western genetics all the time before the Turkic migrations of the Iron Age: it's what I would expect considering its cultural links (Aurignacoid, Gravettian, Western Neolithic, etc.)"

As said, the female lineages happen to have European modern matches (as far as Iceland for the mtDNA H Tarim sample formerly mentionned), and some have very European-centered presence (for instance H5a (clearly European, not near eastern or Caucasian) in Kayzer et al 2009 and H11a (east European) found in a study about Udegeys - if I made no mistakes, besides haplogroups such as U5a1 (typically European), U4 or even some U1a associated with southern Russia and the "Maykop" region), etc...

No south Asian/south central Asian female lineages are found in the south siberian's most ancient aDNA.
The pre-neolithic pool of central Asian female haplogroups would have been quite similar to the recent European one and with no presence of south Asian lineages at all despite your surmised migration of R1a from there? Weird.
South Russia and the Altai are not close. If the Afanasevo people (end of neolithic) had early east Yamnaya axes and potteries besides Kurgans, they didn't come from west or south Asia.

"Also prolific farmers would be much more likely to cause a demic impact against the hunter-gatherer precursors than a bunch of Bronze Age raiders versus one of the greatest civilizations of that time."

I agree, but this doesn't seem backed at all by the south Siberian case. Clearly in this case it doesn't seem to have anything to do with such a process.

"the modal Z93* haplotype is quite divergent from the M417* ones"

I see no impossibilities here, though.

@ Kristiina about local mtDNA

Iron age is not really interesting to me because the mobility of Saka-like tribes allows for central Asian/south Asian haplotype appearance. I'm only interested by the oldest lineages.

It doesn't inform us much if many of these haplotypes you mentionned actaually have an ancient origin (or are derived) from ancient population movements from east Europe/Russia (As typically examplified by the bronze and iron age Kazakhstan kurgan's samples with T1 (Lalueza-fox et al 2006) - T1 is also present in north-eastern Europe IIRC BTW, or for instance with your U5a or H6a1 Hunza samples that have obviously an ancestor from somewhere else (IIRC H6 is present both in east Europe and central Asia). As for the presence among the modern Bashkir/volga-uralic haplotypes, I fail to see how it discards an ancient chalcolithic/bronze age origin or presence among antique eastern Kurgan populations).
ReplyDelete
Replies
MajuMarch 31, 2014 at 5:34 AM
Updated with my molecular clock estimates for all the fig. 5. All based on age(CF)=100 Ka ago, as archaeological evidence strongly implies. IMO R1a-M417 expansion in Europe as in Asia seems Epipaleolithic in essence (a quite fast one in any case), R1b instead looks older and in Europe likely "Magdalenian" (whatever reshufflings happened later on).
ReplyDelete
Replies
RyanApril 1, 2014 at 3:26 AM
There are North American specific R1 lineages too aren't there? How would they fit into the phylogeny shown in your post?
ReplyDelete
Replies
JaydeepApril 1, 2014 at 3:12 PM
Dear Maju,

Thank you for discussing this paper. Unfortunately I do not have access to this paper. Thankfully, I realised after coming to your blog that the Suppl. Material was free.

After going through it, I wish to put forward some issues. I hope you can shed some light on it.

1. I feel that there is a sampling bias in this study against South Asia which may have prejudiced the results towards the Middle East.

For example, 1765 samples were taken across Iran which has a population of 76 million. In contrast, only 176 samples were taken from Pakistan, a country with a population of 180 million. If we go by population to sample ratio, to get a fair estimate from Pakistan vis-a-vis Iran, a sample size of about 4000 should have been aimed at. So the no. of 176 samples is more than 20 times less than that. We ought to bear in end that in previous studies, Pakistan has been a candidate of high diversity for R1a1.

For the rest of South Asia i.e. India & Nepal, a total of 863 samples were taken i.e. not even half of the no. of samples from Iran. The combined population of India & Nepal is 1.25 billion or more than 16 times that of Iran and yet the combined sample from this region is not even half of that from Iran.

Out of the 863 samples, 387 were taken from Nepal (195) and the rest from Eastern India & 126 from Peninsular India - regions not known for great diversity of R1a1.

This leaves only 350 samples out of which 40 have been designated as Mixed, while 36 samples were taken from Central India. Only 274 samples were taken from North & North West India out of which only 127 samples were taken from NW India - another region of great R1a1 diversity ( as per earlier studies).

In contrast, more than 6,600 samples were taken from Europe, a region with a population of 740 million & approx. 650 samples were taken from Turkey which has a population of 74 million.

In this scenario, how do we expect a fair assessment of R1a1 diversity within South Asia ? I would like to read what you have to say on this.

2. For a large no. of samples from South Asia and Central Asia, table S3 shows haplogroup M576. Yet this haplogroup/haplotype is not mentioned anywhere else. Can you throw some light on this ?

3. Finally a minor point I think which is nevertheless important. Afghanistan has been considered a part of Central Asia which in my opinion is not correct. Afghanistan has been politically since the very earliest times been part of South Asian empires, especially the region South of the Hindu Kush. Even during the Harappan phase, it was connected with the South Asian cultural sphere. Genetically too, the y-dna of Pashtuns (the largest ethnic group of Afghanistan) is closest to Indians and Pakistanis. Pashtuns also have the most homogenous Y-dna profile among Afghan ethnic groups which suggests very little y-dna introgression in comparatively recent times, including from India/Pakistan. This indicates that atleast the y-dna heritage of Pashtuns which is shared with the South Asians, goes back many millenia. It possibly goes back to pre-Harappan times and reflects a common origin of these people.

ReplyDelete
Replies
RyanApril 1, 2014 at 9:13 PM
In any event, this paper's conclusions and yours seem very logical. This is exactly the time and place where goats and cows are believed to have been domesticated. I don't think it's a stretch to posit that R1a carrying pastoralists radiated out from the Zagros mountains starting around ~10Ka. Then a second wave begins with the domestication of the horse on the Pontic-Caspian Steppe, spreading to neighbouring pastoralists and beyond.

It rather elegantly marries the Anatolian hypothesis and the Kurgan hypothesis too. Both are correct, just at different time depths.

It almost fits too well.
ReplyDelete
Replies
KristiinaApril 4, 2014 at 5:01 PM
Wikipedia argues that the Baltic and Slavonic split occurred 1500-1000 BCE, and the coalescent time estimate for M458 in East/Central Europe is between 8384 and 2314 years and for M558 between 9819 and 2710 years. Ukrainians have the oldest coalescent time estimate for Z282, i.e. between 14795 and 4083 years. Thus, Z282 is quite old to be specifically linked with the proto-IE language.

According to Wikipedia, ”Baltic languages were spoken over a larger area: West to the mouth of the Vistula river in present-day Poland, at least as far East as the Dniepr river in present-day Belarus, perhaps even to Moscow, perhaps as far south as Kiev. Key evidence of Baltic language presence in these regions is found in hydronyms (names of bodies of water) in the regions that are characteristically Baltic. Historical expansion of the usage of Slavic languages in the South and East, and Germanic languages in the West reduced the geographic distribution of Baltic languages to a fraction of the area that they formerly covered”, and ” the range of the Eastern Balts once reached to the Ural mountains”. I find it also highly interesting that Wikipedia argues that ”more recent scholarship has suggested that there was no unified Proto-Baltic stage, but that Proto-Balto-Slavic split directly into three groups: Slavic, East Baltic and West Baltic. Under this view, the Baltic family is paraphyletic, and consists of all Balto-Slavic languages that are not Slavic. This would imply that Proto-Baltic, the last common ancestor of all Baltic languages, would be identical to Proto-Balto-Slavic itself, rather than distinct from it. Finally, there is a minority of scholars who argue that Baltic descended directly from Proto-Indo-European, without an intermediate common Balto-Slavic stage. They argue that the many similarities and shared innovations between Baltic and Slavic are due to several millennia of contact between the groups, rather than shared heritage”.

I find Table S4 very interesting. When you compare the various haplotype frequencies, you see that the frequencies of M558 and M458 are low in Scandinavia and England (0-2.8%). Conversely, the frequency of Scandinavian Z284 in Eastern Europe and Russia is practically zero. It does not seem that the Vikings left much genetic legacy in Eastern Europe. It is a pity that Balts have been omitted from the comparison. Russians, Belorussians, Hmelnitsk Ukrainians, Hungarians and Estonians carry clearly much more M558 than M458, whereas Czechs and Croats carry definitely much more M458 than M558. It seems that M458 and M558 are both typical for Balto-Slavic populations, but they may have arisen already at the stage of the proto-IE language or even before.

I must say that I feel tempted to argue that M458 is more related to the expansion ofSlavic languages and M558 to the expansion of Proto-Balto-Slavic. It would be exciting to know if Xinjiang Tarim Basin R1a belongs to to Z93. (http://upload.wikimedia.org/wikipedia/commons/4/4f/IndoEuropeanTree.svg)

Volgaic groups seem to have a high frequency of M558 and a low frequency of M458, e.g. Maris, Udmurts, Komi-Permyaks and Chuvashes carry 0% of M458, which seems to indicate that their R1a is not coming from the Russians but precedes the Slavic period.
ReplyDelete
Replies
KristiinaApril 4, 2014 at 5:01 PM
Wikipedia argues that the Baltic and Slavonic split occurred 1500-1000 BCE, and the coalescent time estimate for M458 in East/Central Europe is between 8384 and 2314 years and for M558 between 9819 and 2710 years. Ukrainians have the oldest coalescent time estimate for Z282, i.e. between 14795 and 4083 years. Thus, Z282 is quite old to be specifically linked with the proto-IE language.

According to Wikipedia, ”Baltic languages were spoken over a larger area: West to the mouth of the Vistula river in present-day Poland, at least as far East as the Dniepr river in present-day Belarus, perhaps even to Moscow, perhaps as far south as Kiev. Key evidence of Baltic language presence in these regions is found in hydronyms (names of bodies of water) in the regions that are characteristically Baltic. Historical expansion of the usage of Slavic languages in the South and East, and Germanic languages in the West reduced the geographic distribution of Baltic languages to a fraction of the area that they formerly covered”, and ” the range of the Eastern Balts once reached to the Ural mountains”. I find it also highly interesting that Wikipedia argues that ”more recent scholarship has suggested that there was no unified Proto-Baltic stage, but that Proto-Balto-Slavic split directly into three groups: Slavic, East Baltic and West Baltic. Under this view, the Baltic family is paraphyletic, and consists of all Balto-Slavic languages that are not Slavic. This would imply that Proto-Baltic, the last common ancestor of all Baltic languages, would be identical to Proto-Balto-Slavic itself, rather than distinct from it. Finally, there is a minority of scholars who argue that Baltic descended directly from Proto-Indo-European, without an intermediate common Balto-Slavic stage. They argue that the many similarities and shared innovations between Baltic and Slavic are due to several millennia of contact between the groups, rather than shared heritage”.

I find Table S4 very interesting. When you compare the various haplotype frequencies, you see that the frequencies of M558 and M458 are low in Scandinavia and England (0-2.8%). Conversely, the frequency of Scandinavian Z284 in Eastern Europe and Russia is practically zero. It does not seem that the Vikings left much genetic legacy in Eastern Europe. It is a pity that Balts have been omitted from the comparison. Russians, Belorussians, Hmelnitsk Ukrainians, Hungarians and Estonians carry clearly much more M558 than M458, whereas Czechs and Croats carry definitely much more M458 than M558. It seems that M458 and M558 are both typical for Balto-Slavic populations, but they may have arisen already at the stage of the proto-IE language or even before.

I must say that I feel tempted to argue that M458 is more related to the expansion ofSlavic languages and M558 to the expansion of Proto-Balto-Slavic. It would be exciting to know if Xinjiang Tarim Basin R1a belongs to to Z93. (http://upload.wikimedia.org/wikipedia/commons/4/4f/IndoEuropeanTree.svg)

Volgaic groups seem to have a high frequency of M558 and a low frequency of M458, e.g. Maris, Udmurts, Komi-Permyaks and Chuvashes carry 0% of M458, which seems to indicate that their R1a is not coming from the Russians but precedes the Slavic period.
ReplyDelete
Replies
KristiinaApril 5, 2014 at 9:14 AM
True, but there is Z284 in North England, Sweden and Denmark: North England 3.4%, Denmark 7.1%, South Sweden 3.5%. In Germany there seems to be only 0.9%.

The oldest time estimate for M458 is in Poles and the oldest time estimate for M558 is in Slovaks, which means that the areas are geographically very close. Z284 seems to be older in Ukraina than in Russia or Belorussia. If the Dniepr-Don culture radiated northward during the Neolithic, it was probably not an IE culture in the strict sense of the word. I do not believe that the Pitted Ware (ca 3200 BC– ca 2300 BC) people spoke an IE language, unless we think that the IE languages are a wide areal phenomenon with various substrates and which developed as a result of the post-Ice Age expansion of people from Eastern Europe. If the origin of IE languages is in Sredny Stog culture (4000-3500 BC), we should postulate a migration of M458 and M558 from Ukraine to Czech and Poland in the form of the Corded Ware. However, it is possible that Z284, M458 and M558 were in Central Europe already before the Corded Ware and preceded any proper IE language. However, as they all may have their origin in the area of Ukraine, it is possible that these languages were similar to the proper IE languages and were some sort of para-IE languages.

Wikipedia article on Pitted Ware even speculates that "as the (Pitted Ware) language left no records, its linguistic affiliations are uncertain. It has been suggested that its people spoke a language related to the Uralic languages and provided the unique linguistic features discussed in the Germanic substrate hypothesis."

Anyway, I think that the IE languages arrived to Scandinavia with the Corded Ware. According to Wikipedia, the Corded Ware culture flourished in Middle Europe c. 2900 – 2450/2350 cal. BC. Around 2400 BC the people of the Corded Ware replaced their predecessors and expanded to Danubian and Nordic areas of western Germany. A related branch invaded Denmark and southern Sweden.

I do not believe in this replacement of yDNA, but think that at this stage the IE yDNA had already merged with the local yDNA in the area of Germany and Poland. Moreover, IMO, part of Z282 had spread to Northern Europe before the Corded Ware and may have spoken whatever European paleo-languages. As Y DNA wise, Germanic and Slavic areas are quite distinct, there can hardly be said to have occurred any Bronze Age Indo-European yDNA replacement, in particular, in Scandinavia.
ReplyDelete
Replies
MajuApril 5, 2014 at 2:24 PM
"there is Z284 in North England, Sweden and Denmark: North England 3.4%, Denmark 7.1%, South Sweden 3.5%. In Germany there seems to be only 0.9%".

That's interesting. I guess that if you'd take as "Viking" marker (or Viking+Anglosaxon), it would imply that almost half of North English patrilineal ancestry is from that area (based on the Danish frequency). But it's also possible that it is something quite older in the region, maybe from the time of Hamburgian or Maglemosian cultures, which spanned across the North Sea.

"The oldest time estimate for M458 is in Poles and the oldest time estimate for M558 is in Slovaks, which means that the areas are geographically very close".

I wouldn't trust so much localized age estimates without considering the phylogeny (it can be an artifact of immigration from various sources). If we would know that the basal nodes and branches of these subhaplogroups are concentrated in those areas, I would accept the origin hypothesis but the haplotype network is not so precise about location, so it needs of much extra work in order to find out the exact geography of the expansion of these branches. The data is there anyhow, so I guess it's just a matter of building the haplotype network with the necessary country labels instead of just "Eastern European" for all.

"If the Dniepr-Don culture radiated northward during the Neolithic, it was probably not an IE culture in the strict sense of the word".

Surely not. But its radiation may have set some of the genetic basis on which the IE wave rode later on. After all DD was the first "victim" of Kurgan expansion but in a very complex way: Sredny-Stog II was a patchy society that mixes both traditions in very irregular forms. Some cultures that the Kurgan expansion produced, like Ezero (proto-Thracians), actually retained many DD cultural traits (extended burial with ochre and such), while in the Baltic Pitted Ware was in a way opening the route of later Kurgan flows. So it does not surprise me that DD peoples may have manned the Kurgan expansion even if they were originally distinct from IEs (how distinct?)

"I do not believe that the Pitted Ware (ca 3200 BC– ca 2300 BC) people spoke an IE language"...

Maybe they spoke a language distantly related to PIE. We do not know anything about the Neolithic genesis of the Samara culture but it seems apparent that they had at least some influence from DD.

IF the hypothesis that Vasconic is distantly related to PIE is correct, and my theory of Vasconic being spread by Neolithic Farmers stands, then it is indeed possible that there was once a linguistic family in SE-Eastern Europe that gave birth to Vasconic, PIE and surely other branches now lost, very possibly the language of Dniepr-Don and Pitted Ware peoples. Not demonstrated but quite plausible.

But I agree that it's not likely that PIE included the DD language as such. However it is very possible that the DD language strongly influenced the Western branches of IE as substrate initially, after all they used the Dniepr-Don area as main platform for further intrusions westwards.

...
ReplyDelete
Replies
UnknownApril 7, 2014 at 12:06 AM
@Maju

So where did R1a and R1b split.?..........same place.

According to David Adams, from the blacksea north caucasus through the caspian sea and through to the Aral sea was still underwater due to the melting of the northern central asian . According to Greek historians, the Greeks could still sail to the Aral sea in the bronze-age. With this in mind, your scenario would mean most groups that came from HIJKLT would fit this Iranian scenario
ReplyDelete
Replies
SakiusaApril 7, 2014 at 6:49 AM
Well at least this will end the nonsense of some in South Asia who are advocating a Out of India Theory (OIT) for R1a1, now we know it may be Iran or someplace in West Asia that's the origin.
ReplyDelete
Replies
danMay 29, 2016 at 12:44 AM
so r1a farmers from middle east bring the wheat farming in india?
ReplyDelete
Replies
Jimmy GoodmanMarch 18, 2021 at 7:19 PM
Here is a simple test. How old is the oldest r1a? How old is the oldest r1a found in europe? How old is the oldest r1a found in central asia? How old is the oldest r1a in india? How old is the oldest r1a found in iran?
ReplyDelete
Replies

Add comment

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).

Pages

March 29, 2014

Y-DNA R1a spread from Iran

85 comments: